flashvenom
/

Airoboros-13B-SuperHOT-8K-4bit-GPTQ

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

flashvenom commited on Jun 25, 2023

Commit

0145612

•

1 Parent(s): f392262

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 Model upload of Airoboros-13B-SuperHOT in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
-## This uses the Airoboros-13B(v1.2) model (https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.2) and applies the SuperHOT 8K LoRA (https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test) on top, allowing for improved coherence at larger context lenghts, as well as improving output quality of Airoboros to be more verbose.
 You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
 ### Note: If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass `-cpe 2 -l 4096`

 Model upload of Airoboros-13B-SuperHOT in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
+## This uses the [Airoboros-13B(v1.2)](https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.2) model and applies the [SuperHOT 8K LoRA](https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test) on top, allowing for improved coherence at larger context lenghts, as well as improving output quality of Airoboros to be more verbose.
 You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
 ### Note: If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass `-cpe 2 -l 4096`