TheBloke commited on
Commit
87abc1c
1 Parent(s): ad586a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -15,7 +15,7 @@ inference: false
15
  # Koala: A Dialogue Model for Academic Research
16
  This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
17
 
18
- This version has then been quantized to 4-bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) and then converted to GGML for use with [llama.cpp](https://github.com/ggerganov/llama.cpp).
19
 
20
  ## My Koala repos
21
  I have the following Koala model repositories available:
@@ -23,13 +23,21 @@ I have the following Koala model repositories available:
23
  **13B models:**
24
  * [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
25
  * [GPTQ quantized 4bit 13B model in `pt` and `safetensors` formats](https://huggingface.co/TheBloke/koala-13B-GPTQ-4bit-128g)
26
- * [GPTQ quantized 4bit 13B model in GGML format for `llama.cpp`](https://huggingface.co/TheBloke/koala-13B-GPTQ-4bit-128g-GGML)
27
 
28
  **7B models:**
29
  * [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
30
  * [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
31
  * [GPTQ quantized 4bit 7B model in `pt` and `safetensors` formats](https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g)
32
- * [GPTQ quantized 4bit 7B model in GGML format for `llama.cpp`](https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g-GGML)
 
 
 
 
 
 
 
 
33
 
34
  ## How to run in `llama.cpp`
35
 
 
15
  # Koala: A Dialogue Model for Academic Research
16
  This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
17
 
18
+ This version has then been quantized to 4-bit and 5-bit GGML for use with [llama.cpp](https://github.com/ggerganov/llama.cpp).
19
 
20
  ## My Koala repos
21
  I have the following Koala model repositories available:
 
23
  **13B models:**
24
  * [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
25
  * [GPTQ quantized 4bit 13B model in `pt` and `safetensors` formats](https://huggingface.co/TheBloke/koala-13B-GPTQ-4bit-128g)
26
+ * [4bit and 5bit models in GGML format for `llama.cpp`](https://huggingface.co/TheBloke/koala-13B-GGML)
27
 
28
  **7B models:**
29
  * [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
30
  * [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
31
  * [GPTQ quantized 4bit 7B model in `pt` and `safetensors` formats](https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g)
32
+ * [4bit and 5bit models in GGML format for `llama.cpp`](https://huggingface.co/TheBloke/koala-7B-GGML)
33
+
34
+ ## REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!
35
+
36
+ llama.cpp recently made a breaking change to its quantisation methods.
37
+
38
+ I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.
39
+
40
+ The previous files, which will still work in older versions of llama.cpp, can be found in branch `previous_llama`.
41
 
42
  ## How to run in `llama.cpp`
43