elinas commited on
Commit
41a2029
1 Parent(s): 4d870a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -10,6 +10,10 @@ https://github.com/qwopqwop200/GPTQ-for-LLaMa
10
  LoRA credit to https://huggingface.co/baseten/alpaca-30b
11
 
12
  # Usage
 
 
 
 
13
  Since this is instruction tuned, for best results, use the following format for inference:
14
  ```
15
  ### Instruction:
@@ -17,6 +21,25 @@ Since this is instruction tuned, for best results, use the following format for
17
  ### Response:
18
  ```
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  --
21
  license: other
22
  ---
 
10
  LoRA credit to https://huggingface.co/baseten/alpaca-30b
11
 
12
  # Usage
13
+ 1. Run manually through GPTQ
14
+ 2. (More setup but better UI) - Use the [text-generation-webui](https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode)
15
+ 2a. Note that a recent code change in GPTQ broke functionality so please follow [these instructions](https://huggingface.co/elinas/alpaca-30b-lora-int4/discussions/2#641a38d5f1ad1c1173d8f192) to fix the issue
16
+
17
  Since this is instruction tuned, for best results, use the following format for inference:
18
  ```
19
  ### Instruction:
 
21
  ### Response:
22
  ```
23
 
24
+ If you want deterministic results, turn off sampling. You can turn it off in the webui by unchecking `do_sample`.
25
+
26
+ For cai-chat mode, you won't want to use instruction prompting, rather create a character and set sampler settings. Here is an example of settings that work well for me:
27
+ ```
28
+ do_sample=True
29
+ temperature=0.95
30
+ top_p=1
31
+ typical_p=1
32
+ repetition_penalty=1.1
33
+ top_k=40
34
+ num_beams=1
35
+ penalty_alpha=0
36
+ min_length=0
37
+ length_penalty=1
38
+ no_repeat_ngram_size=0
39
+ early_stopping=False
40
+ ```
41
+ You can then save this as a `.txt` file in the `presets` folder.
42
+
43
  --
44
  license: other
45
  ---