UltraCM-13B-GGUF / README.md

alvarobartt HF staff

Update README.md

0f8bc7f 9 months ago

preview code

raw

history blame contribute delete

No virus

3.82 kB

	---
	license: mit
	language:
	- en
	datasets:
	- openbmb/UltraFeedback
	model_creator: OpenBMB
	model_name: UltraCM-13b
	model_type: llama
	base_model: openbmb/UltraCM-13b
	library_name: transformers
	pipeline_tag: text-generation
	inference: false
	tags:
	- dpo
	- rlaif
	- preference
	- ultrafeedback
	quantized_by: alvarobartt
	---

	## Model Card for UltraCM-13b-GGUF

	[UltraCM-13B](https://huggingface.co/openbmb/UltraCM-13b) is a fine-tuned LLM for completion-critique in order to evaluate
	LLM outputs on helpfulness, truthfulness, honesty, and to what extent the answer follows the given instructions.

	UltraCM-13B is a 13b param LLM that was released by [OpenBMB](https://huggingface.co/openbmb), as part of their paper
	[UltraFeedback: Boosting Language Models with High-quality Feedback](https://arxiv.org/abs/2310.01377).

	This model contains the quantized variants using the GGUF format, introduced by the [llama.cpp](https://github.com/ggerganov/llama.cpp) team,
	and also heavily inspired by [TheBloke](https://huggingface.co/TheBloke) work on quantizing most of the LLMs out there.

	### Model Details

	#### Model Description

	- Model type: Llama
	- Fine-tuned from model: [Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)
	- Created by: [Meta AI](https://huggingface.co/meta-llama)
	- Fine-tuned by: [OpenBMB](https://huggingface.co/openbmb)
	- Quantized by: [alvarobartt](https://huggingface.co/alvarobartt)
	- Language(s) (NLP): English
	- License: Apache 2.0

	### Model Files

	## Provided files

	\| Name \| Quant method \| Bits \| Size \| Max RAM required \| Use case \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----- \|
	\| [UltraCM-13b.q4_0.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_0.gguf) \| Q4_0 \| 4 \| 7.37 GB\| 9.87 GB \| legacy; small, very high quality loss - prefer using Q3_K_M \|
	\| [UltraCM-13b.q4_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_s.gguf) \| Q4_K_S \| 4 \| 7.41 GB\| 9.91 GB \| small, greater quality loss \|
	\| [UltraCM-13b.q4_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_m.gguf) \| Q4_K_M \| 4 \| 7.87 GB\| 10.37 GB \| medium, balanced quality - recommended \|
	\| [UltraCM-13b.q5_0.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_0.gguf) \| Q5_0 \| 5 \| 8.97 GB\| 11.47 GB \| legacy; medium, balanced quality - prefer using Q4_K_M \|
	\| [UltraCM-13b.q5_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_k_s.gguf) \| Q5_K_S \| 5 \| 8.97 GB\| 11.47 GB \| large, low quality loss - recommended \|
	\| [UltraCM-13b.q5_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_k_m.gguf) \| Q5_K_M \| 5 \| 9.23 GB\| 11.73 GB \| large, very low quality loss - recommended \|

	Note: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.

	For more information on quantization, I'd highly suggest anyone reading to go check [TheBloke](https://huggingface.co/TheBloke) out, as well as joining [their
	Discord server](https://discord.gg/Jq4vkcDakD).

	### Uses

	#### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	[More Information Needed]

	### Citation

	Since this is only a GGUF-quantization of the original weights, please refer and cite the original authors instead.

	```bibtex
	@misc{cui2023ultrafeedback,
	title={UltraFeedback: Boosting Language Models with High-quality Feedback},
	author={Ganqu Cui and Lifan Yuan and Ning Ding and Guanming Yao and Wei Zhu and Yuan Ni and Guotong Xie and Zhiyuan Liu and Maosong Sun},
	year={2023},
	eprint={2310.01377},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```