LC-Edward commited on
Commit
ebaaa8b
1 Parent(s): d564d77

Update Xwin-Math-7B-V1.1

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ . filter=lfs diff=lfs merge=lfs -text
37
+ /openrlhf/RLHF/examples/scripts/ckpt/Xwin-Math-7B-V1.1 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,206 @@
1
  ---
2
  license: llama2
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
  ---
4
+ ---
5
+ license: llama2
6
+ ---
7
+ # Xwin-Math
8
+
9
+ <p align="center">
10
+ <a href="https://github.com/Xwin-LM/Xwin-LM/tree/main/Xwin-Math"><img src="https://img.shields.io/badge/GitHub-yellow.svg?style=social&logo=github"></a>
11
+ <a href="https://huggingface.co/Xwin-LM"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue"></a>
12
+ </p>
13
+
14
+ [Paper Link](https://arxiv.org/pdf/2403.04706) Xwin-Math is a series of powerful SFT LLMs for math problems based on LLaMA-2.
15
+
16
+
17
+ ## 🔥 News
18
+ - 💥 [May, 2024] The [Xwin-Math-70B-V1.1](https://huggingface.co/Xwin-LM/Xwin-Math-70B-V1.1) model achieves **51.9 pass@1 on the MATH benchmark** and **90.6 pass@1 on the GSM8K benchmark**. This is a new SoTA model based on LLaMA-2-70B!
19
+ - 💥 [May, 2024] The [Xwin-Math-7B-V1.1](https://huggingface.co/Xwin-LM/Xwin-Math-7B-V1.1) model achieves **44.7 pass@1 on the MATH benchmark** and **84.4 pass@1 on the GSM8K benchmark**. This is a new SoTA model based on LLaMA-2-7B!
20
+ - 💥 [Nov, 2023] The [Xwin-Math-70B-V1.0](https://huggingface.co/Xwin-LM/Xwin-Math-70B-V1.0) model achieves **31.8 pass@1 on the MATH benchmark** and **87.0 pass@1 on the GSM8K benchmark**. This performance places it first amongst all open-source models!
21
+ - 💥 [Nov, 2023] The [Xwin-Math-7B-V1.0](https://huggingface.co/Xwin-LM/Xwin-Math-7B-V1.0) and [Xwin-Math-13B-V1.0](https://huggingface.co/Xwin-LM/Xwin-Math-13B-V1.0) models achieve **66.6 and 76.2 pass@1 on the GSM8K benchmark**, ranking as top-1 among all LLaMA-2 based 7B and 13B open-source models respectively!
22
+
23
+
24
+ ## ✨ Model Card
25
+ | Model | GSM8K | MATH | Checkpoint | License |
26
+ |:-:|:-:|:-:|:-:|:-:|
27
+ |Xwin-Math-7B-V1.0 | 66.6 | 17.4 | 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-Math-7B-V1.0" target="_blank">HF Link</a> | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License|
28
+ |Xwin-Math-7B-V1.1 | 84.4 | 44.7 | 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-Math-7B-V1.1" target="_blank">HF Link</a> | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License|
29
+ |Xwin-Math-13B-V1.0| 76.2 | 21.7 | 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-Math-13B-V1.0" target="_blank">HF Link</a> | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License|
30
+ |Xwin-Math-70B-V1.0| 87.0 | 31.8 | 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-Math-70B-V1.0" target="_blank">HF Link</a> | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License|
31
+ |Xwin-Math-70B-V1.1| 90.6 | 51.9 | 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-Math-70B-V1.1" target="_blank">HF Link</a> | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License|
32
+
33
+ ## 🚀 Benchmarks
34
+ ### Xwin-Math performance on [MATH](https://github.com/hendrycks/math) and [GSM8K](https://github.com/openai/grade-school-math).
35
+
36
+ Xwin-Math-70B-V1.0 has achieved **31.8% on MATH** and **87.0% on GSM8K**. These scores are **5.3** and **3.1** points higher, respectively, than the previous state-of-the-art open-source MetaMath and LEMAv1 model.
37
+
38
+
39
+ | **Model** |**MATH (Our test)** | **GSM8K (Our test)** |
40
+ |:-:|:-:|:-:|
41
+ | GPT-4 (zero-shot) | 52.4 | 94.8 |
42
+ | GPT-35-Turbo (8-shot)| 37.1 | 81.0 |
43
+ | |
44
+ | WizardMath-70B | 23.9 | 81.1 |
45
+ | MAmmoTH-70B | 20.8 | 72.6 |
46
+ | MetaMath-70B | 26.5 | 82.0 |
47
+ | LEMAv1-70B | 25.9 | 83.9 |
48
+ |**Xwin-Math-70B-V1.0** |**31.8**|**87.0**|
49
+ |**Xwin-Math-70B-V1.1** |**51.9**|**90.6**|
50
+ | |
51
+ | WizardMath-13B | 15.0 | 63.7 |
52
+ | MAmmoTH-13B | 12.3 | 56.2 |
53
+ | MetaMath-13B | 22.7 | 70.9 |
54
+ | LEMAv1-13B | 13.6 | 65.0 |
55
+ |**Xwin-Math-13B-V1.0** | 21.7 | 76.2 |
56
+ | |
57
+ | WizardMath-7B | 10.9 | 55.0 |
58
+ | MAmmoTH-7B | 9.6 | 50.2 |
59
+ | MetaMath-7B | 20.1 | 66.6 |
60
+ | LEMAv1-7B | 10.0 | 54.7 |
61
+ |**Xwin-Math-7B-V1.0** | 17.4 | 66.6 |
62
+ |**Xwin-Math-7B-V1.1** | 44.7 | 84.4 |
63
+
64
+ We obtain these results using our flexible evaluation strategy. Due to differences in environment and hardware, the test results may be slightly different from the report, but we ensure that the evaluation is as accurate and fair as possible.
65
+
66
+ ### Xwin-Math performance on other math benchmarks.
67
+
68
+ Our 70B model shows strong mathematical reasoning capabilities among all open-sourced models. Also note that our model even approaches or surpasses the performance of GPT-35-Turbo on some benchmarks.
69
+
70
+ | **Model** | SVAMP | ASDiv | NumGlue | Algebra | MAWPS | **Average** |
71
+ |:-:|:-:|:-:|:-:|:-:|:-:|:-:|
72
+ | GPT-35-Turbo (8-shot)| 80.6 | 84.1 | 81.8 | 90.5 | 91.7 | 85.7 |
73
+ | |
74
+ | WizardMath-70B | 80.2 | 75.8 | 71.4 | 64.0 | 74.9 | 73.3 |
75
+ | MAmmoTH-70B | 71.2 | 73.9 | 62.7 | 58.1 | 72.2 | 67.6 |
76
+ | MetaMath-70B | 85.8 | 81.1 | 77.5 | 79.7 | 81.4 | 81.1 |
77
+ | LEMAv1-70B-MATH * | 81.6 | 77.1 | 72.1 | 69.4 | 81.8 | 76.5 |
78
+ |**Xwin-Math-70B-V1.0** | 84.0 | 84.1 | 81.3 | 78.4 | 90.8 | 83.7 |
79
+
80
+ \* LEMAv1 has two models, and we report the better LEMAv1-70B-MATH model in these benchmarks.
81
+
82
+ ## 🔨 Evaluation
83
+ In order to evaluate a model's mathematical capabilities more flexibly and ensure a fair comparison of results, particularly for the MATH benchmark, we have developed a new evaluation tool. We have also assessed the pass@1 results of recent models on MATH and GSM8K benchmarks, which provides more accurate results.
84
+
85
+ We hope this toolkit can benefit open-source community by providing more accurate insights and conclusions. For a deeper understanding of our evaluation tool and methods, please visit [here](https://github.com/Xwin-LM/Xwin-LM/tree/main/Xwin-Math/eval)
86
+
87
+ * "Report" refers to the accuracy stated in the original papers.
88
+ * "Repro" indicates the results is reproduced by generating responses and evaluating them using the respective open-source models and scripts.
89
+ * "Strict" and "Flex" denote the results we achieved by employing our two strategies to extract answer and evaluate the same responses as "Repro".
90
+
91
+ | Model | MATH <br> (Report) <br/> |MATH <br> (Repro) <br/> | MATH <br> (Strict) <br/> |MATH <br> (Flex) <br/> | GSM8K <br> (Report) <br/> |GSM8K <br> (Repro) <br/>| GSM8K <br> (Strict) <br/> | GSM8K <br> (Report) <br/> |
92
+ |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
93
+ | GPT-35-Turbo (8-shot)| 34.1 | - | 23.8 | 37.1 | 80.8 | - | 77.9 | 81.0 |
94
+ | |
95
+ | WizardMath-70B | 22.7 | 23.0 | 23.9 | 23.9 | 81.6 | 81.4 | 81.1 | 81.1 |
96
+ | MAmmoTH-70B | 21.1 | 18.0 | 20.0 | 20.8 | 72.4 | 72.6 | 72.6 | 72.6 |
97
+ | MetaMath-70B | 26.6 | 25.9 | 26.3 | 26.5 | 82.3 | 82.3 | 82.0 | 82.0 |
98
+ |**Xwin-Math-70B-V1.0** | - | - |**31.8**|**31.8**| - | - |**87.0**|**87.0**|
99
+ | |
100
+ | WizardMath-13B | 14.0 | 14.2 | 14.9 | 15.0 | 63.9 | 63.9 | 63.7 | 63.7 |
101
+ | MAmmoTH-13B | 12.9 | 10.8 | 11.8 | 12.3 | 56.3 | 56.2 | 56.1 | 56.2 |
102
+ | MetaMath-13B | 22.4 | 22.5 | 22.6 | 22.7 | 72.3 | 71.0 | 70.9 | 70.9 |
103
+ |**Xwin-Math-13B-V1.0** | - | - | 21.6 | 21.7 | - | - | 76.2 | 76.2 |
104
+ | |
105
+ | WizardMath-7B | 10.7 | 10.3 | 10.9 | 10.9 | 54.9 | 55.2 | 55.0 | 55.0 |
106
+ | MAmmoTH-7B | 10.4 | 8.6 | 9.1 | 9.6 | 50.5 | 50.2 | 50.2 | 50.2 |
107
+ | MetaMath-7B | 19.8 | 19.6 | 19.9 | 20.1 | 66.5 | 66.6 | 66.6 | 66.6 |
108
+ |**Xwin-Math-7B-V1.0** | - | - | 17.3 | 17.4 | - | - | 66.6 | 66.6 |
109
+
110
+ ### Installation
111
+
112
+ Before you start, please install the requirements.
113
+
114
+ ```bash
115
+ pip install -r requirements.txt
116
+ ```
117
+
118
+ We tested our result using `python 3.8` and `cuda 11.8`. We recommend you use docker.
119
+ ```bash
120
+ docker run --gpus all -it --rm --ipc=host superbench/dev:cuda11.8
121
+ ```
122
+
123
+ ### Generate
124
+
125
+ To generate the model's responses, you can use the `generate.py` script. Please be aware that generating responses is separate from verifying their correctness. After that, we will then check for their correctness.
126
+
127
+ For the generation process, we use the Vicuna-v1.1 system prompt with chain-of-thought and format instruction. We also employ a greedy decoding strategy and set the maximum sequence length to 2048.
128
+ ```
129
+ "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {instruction} Give your solution in detail. In the end, write your final answer in the format of 'The answer is: <ANSWER>.'. ASSISTANT: "
130
+ ```
131
+
132
+ Here is an simple example to generate using [vLLM](https://docs.vllm.ai/en/latest/).
133
+ ```bash
134
+ cd eval
135
+
136
+ python generate.py --dataset_path dataset/gsm8k.json --model_path path/to/your/model --tensor_parallel_size 4
137
+ ```
138
+ By default the results will be output to the `eval/response`, using the prompt `eval/prompt/xwin_math.json`. If you wish to change the output path or use a different prompt
139
+ ```bash
140
+ python generate.py --dataset_path dataset/gsm8k.json --model_path path/to/your/model --tensor_parallel_size 4 --output_path /your/path --prompt_path /your/path
141
+ ```
142
+
143
+
144
+ We provide some datasets (in `eval/dataset`):
145
+ - `gsm8k.json`: GSM8K.
146
+ - `math.json`: MATH.
147
+ - `combination.json`: A combination of many benchmarks, can evaluate the OOD capability of the model.
148
+
149
+ If you wan't to use your own datasets, please format your dataset like this.
150
+
151
+ ```jsonc
152
+ [
153
+ {
154
+ "question": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
155
+ "answer": "18",
156
+ "type": "GSM8K",
157
+ "subtype": "",
158
+ "level": 0,
159
+ },
160
+ // ... more data items
161
+ ]
162
+ ```
163
+
164
+
165
+ ### Evaluate
166
+
167
+ To verify the accuracy of the answers after generation, you can use the `check.py script.
168
+
169
+ Here is an simple example
170
+ ```bash
171
+ cd eval
172
+
173
+ python eval.py /path/to/model/response
174
+ ```
175
+ The result will be saved in `eval/evaluation`
176
+
177
+ If you do not want to save the results or want to change the save path
178
+ ```bash
179
+ python eval.py --data_path /path/to/model/response --save_path /path/to/save --save_result True
180
+ ```
181
+
182
+ Once you run the script, the terminal will display the output as a table. This table will show the number of instances for each benchmark and the corresponding accuracy. Here is a hypothetical example of what the output might look like:
183
+
184
+ ||Type|Subtype|Level|Correct|Incorrect|Total|Accuracy|
185
+ |---|---|---|---|---|---|---|---|
186
+ |0|MAWPS|addsub|0|359|33|392|0.915816|
187
+ |1|MAWPS|multiarith|0|586|14|600|0.976667|
188
+ |...|
189
+
190
+
191
+ ## Citation
192
+ Please consider citing our work if you use the data or code in this repo.
193
+ ```
194
+ @software{xwin-math,
195
+ title = {Xwin-Math},
196
+ author = {Xwin-Math Team},
197
+ url = {https://github.com/Xwin-LM/Xwin-LM/Xwin-Math},
198
+ version = {pre-release},
199
+ year = {2023},
200
+ month = {11},
201
+ }
202
+ ```
203
+
204
+ ## Acknowledgements
205
+
206
+ Thanks to [Llama 2](https://ai.meta.com/llama/), [FastChat](https://github.com/lm-sys/FastChat), and [vLLM](https://github.com/vllm-project/vllm).
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[PAD]": 32000
3
+ }
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/data/mnt/chen_llm/Llama-2-7b-hf",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 11008,
12
+ "max_position_embeddings": 4096,
13
+ "model_type": "llama",
14
+ "num_attention_heads": 32,
15
+ "num_hidden_layers": 32,
16
+ "num_key_value_heads": 32,
17
+ "pad_token_id": 0,
18
+ "pretraining_tp": 1,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_scaling": null,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.28.1",
24
+ "use_cache": false,
25
+ "vocab_size": 32001
26
+ }
generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "do_sample": true,
4
+ "eos_token_id": 2,
5
+ "max_length": 4096,
6
+ "pad_token_id": 0,
7
+ "temperature": 0.6,
8
+ "top_p": 0.9,
9
+ "transformers_version": "4.28.1"
10
+ }
pytorch_model-00001-of-00003.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:502146db64c298dc68049b3e42a462b9206bf577d8d5afeff0e601ddbd621346
3
+ size 9878008210
pytorch_model-00002-of-00003.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1f9ec3ea2c915fef21810d153a42552c1193f025322b793879b045dde583470
3
+ size 9894803382
pytorch_model-00003-of-00003.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45d00394ce58724f7f1293d82fc591a854492f94298e62d31064a8bdccdc2613
3
+ size 7181008761
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 26953703424
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00003-of-00003.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
16
+ "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
17
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
18
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
19
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
20
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
21
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
22
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
23
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
24
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
25
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
26
+ "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
27
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
28
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
29
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
30
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
31
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
32
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
33
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
34
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
35
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
36
+ "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
37
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
38
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
39
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
40
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
41
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
42
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
43
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
44
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
45
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
46
+ "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
47
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
48
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
49
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
50
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
51
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
52
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
53
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
54
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
55
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
56
+ "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
57
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
58
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
59
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
60
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
61
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
62
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
63
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
64
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
65
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
66
+ "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
67
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
68
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
69
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
70
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
71
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
72
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
73
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
74
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
75
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
76
+ "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
77
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
78
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
79
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
80
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
81
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
82
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
83
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
84
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
85
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
86
+ "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
87
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
88
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
89
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
90
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
91
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
92
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
93
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
94
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
95
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
96
+ "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
97
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
98
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
99
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
100
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
101
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
102
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
103
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
104
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
105
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
106
+ "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
107
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
108
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
109
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
110
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
111
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
112
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
113
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
114
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
115
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
116
+ "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
117
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
118
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
119
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
120
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
121
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
122
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
123
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
124
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
125
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
126
+ "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
127
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
128
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
129
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
130
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
131
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
132
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
133
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
134
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
135
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
136
+ "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
137
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
138
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
139
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
140
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
141
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
142
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
143
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
144
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
145
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
146
+ "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
147
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
148
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
149
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
150
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
151
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
152
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
153
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
154
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
155
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
156
+ "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
157
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
158
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
159
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
160
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
161
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
162
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
163
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
164
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
165
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
166
+ "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
167
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
168
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
169
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
170
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
171
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
172
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
173
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
174
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
175
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
176
+ "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
177
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
178
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
179
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
180
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
181
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
182
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
183
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
184
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
185
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
186
+ "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
187
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
188
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
189
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
190
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
191
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
192
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
193
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
194
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
195
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
196
+ "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
197
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
198
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
199
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
200
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
201
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
202
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
203
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
204
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
205
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
206
+ "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
207
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
208
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
209
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
210
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
211
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
212
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
213
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
214
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
215
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
216
+ "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
217
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
218
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
219
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
220
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
221
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
222
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
223
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
224
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
225
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
226
+ "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
227
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
228
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
229
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
230
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
231
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
232
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
233
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
234
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
235
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
236
+ "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
237
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
238
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
239
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
240
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
241
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
242
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
243
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
244
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
245
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
246
+ "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
247
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
248
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
249
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
250
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
251
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
252
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
253
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
254
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
255
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
256
+ "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
257
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
258
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
259
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
260
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
261
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
262
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
263
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
264
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
265
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
266
+ "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
267
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
268
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
269
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
270
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
271
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
272
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
273
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
274
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
275
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
276
+ "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
277
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
278
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
279
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
280
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
281
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
282
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
283
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
284
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
285
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
286
+ "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
287
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
288
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
289
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
290
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
291
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
292
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
293
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
294
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
295
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
296
+ "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
297
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
298
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
299
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
300
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
301
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
302
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
303
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
304
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
305
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
306
+ "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
307
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
308
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
309
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
310
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
311
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
312
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
313
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
314
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
315
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
316
+ "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
317
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
318
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
319
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
320
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
321
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
322
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
323
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
324
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
325
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
326
+ "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
327
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
328
+ "model.norm.weight": "pytorch_model-00003-of-00003.bin"
329
+ }
330
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "[PAD]",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<s>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "legacy": false,
22
+ "model_max_length": 2048,
23
+ "pad_token": null,
24
+ "padding_side": "right",
25
+ "sp_model_kwargs": {},
26
+ "tokenizer_class": "LlamaTokenizer",
27
+ "unk_token": {
28
+ "__type": "AddedToken",
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ }
35
+ }