TheBloke
/

LongChat-7B-GGML

Model card Files Files and versions Community

TheBloke commited on Jul 1, 2023

Commit

5dbfa0e

•

1 Parent(s): 96f51ce

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -89,9 +89,11 @@ Refer to the Provided Files table below to see what files use which methods, and
 On Linux I use the following command line to launch the KoboldCpp UI with OpenCL aceleration and a context size of 4096:
 ```
-python ./koboldcpp.py --stream --unbantokens --threads 8 --usecublas --gpulayers 100 longchat-7b-16k.ggmlv3.q4_K_M.bin
 ```
 Change `--gpulayers 100` to the number of layers you want/are able to offload to the GPU. Remove it if you don't have GPU acceleration.
 For OpenCL acceleration, change `--usecublas` to `--useclblast 0 0`. You may need to change the second `0` to `1` if you have both an iGPU and a discrete GPU.

 On Linux I use the following command line to launch the KoboldCpp UI with OpenCL aceleration and a context size of 4096:
 ```
+python ./koboldcpp.py --contextsize 4096 --stream --unbantokens --threads 8 --usecublas --gpulayers 100 longchat-7b-16k.ggmlv3.q4_K_M.bin
 ```
+Change `--contextsize` to the context size you want - **it must be higher than 2048 else the model will produce gibberish**
 Change `--gpulayers 100` to the number of layers you want/are able to offload to the GPU. Remove it if you don't have GPU acceleration.
 For OpenCL acceleration, change `--usecublas` to `--useclblast 0 0`. You may need to change the second `0` to `1` if you have both an iGPU and a discrete GPU.