Can you please provide the command to change the context size?

#1
by yehiaserag - opened

Can you please provide the command to change the context size to avoid redownloading the whole model for those with limited/slow bandwidth?

@yehiaserag

I'm confused? Is there something special about this model handles context size? Usually you set that based on your inference engine at runtime. You can't do much about whatever was used for training of the model.

For example. in llama.cpp you would do:

Context Size

    -c N, --ctx-size N: Set the size of the prompt context (default: 0, 0 = loaded from model). The LLaMA models were built with a context of 2048-8192, which will yield the best results on longer input/inference.

There is the manual setting and there is the default value.
What's wrong here is the default value, in a UI like LM-Studio it is read automatically from the metadata inside the gguf.
This is why you'll notice the model was reuploaded yesterday (the Q4_K_M as an example)

Ahh I understand. Is there a way to edit the metadata of an existing GGUF to fix the bug? Good question... Here is what I've found:

  1. GGUF File Format - including
[llm].context_length: uint64: Also known as n_ctx. length of the context (in tokens) that the model was trained on. For most architectures, this is the hard limit on the length of the input. Architectures, like RWKV, that are not reliant on transformer-style attention may be able to handle larger inputs, but this is not guaranteed.
  1. GGUF CLI Tool - doesn't seem to have an "update" method.
  2. I asked the model itself how to do it. I used this "system" and "user" prompt before copy/pasting the file format above for context.
You are an experienced software developer with many years experience writing programs and scripts in bash, python, and Linux. Assist the user generating high quality professional well commented code.

How can I edit a GGUF formmated file and change the n_ctx to increase the default context length? Is there an existing program or must I use a hexeditor?
<copy/paste GGUF spec>
<repeat user prompt>

Editing a GGUF (GGML Universal Format) file to change the context length (`n_ctx`) or any other metadata typically requires a tool that understands the format. Using a hex editor is not recommended because it can easily corrupt the file if not done correctly.

As of my last update in October 2023, there isn't a widely known or standardized tool specifically for editing GGUF files. However, you can use the following general approach to modify the metadata:

1. **Understand the Format**: Familiarize yourself with the GGUF format specification. This will help you locate the metadata section where `n_ctx` is stored.

2. **Use a Script or Tool**: If you have access to the original conversion script or tool that created the GGUF file, you can modify the script to change the `n_ctx` value and then re-run the conversion.

3. **Custom Script**: Write a custom script in a language like Python to read the GGUF file, modify the `n_ctx` value, and then write the modified file back. This requires understanding the binary format and how the metadata is stored.

Here's a high-level outline of what a custom script might look like in Python:
<some generic struct packing JSON decoding python code>

I'm curious if you find an answer better than backing up the file and fussing with a hex editor lol...

Ah sorry I missed this!

you'll need llama.cpp locally

Run:

python3 llama.cpp/gguf-py/scripts/gguf_set_metadata.py /models/Mistral-Large-Instruct-2407-Q4_K_S-00001-of-00002.gguf llama.context_length 131072 --force

if you have a model that's split, target the first part as I did above. If it's not split, just target the full model file.

Thanks a lot @bartowski !

Sign up or log in to comment