ehartford commited on
Commit
18ddaeb
1 Parent(s): 2ca4fd6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +409 -3
README.md CHANGED
@@ -1,3 +1,409 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ base_model: meta-llama/Meta-Llama-3.1-8B
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - cognitivecomputations/Dolphin-2.9
8
+ - m-a-p/CodeFeedback-Filtered-Instruction
9
+ - cognitivecomputations/dolphin-coder
10
+ - cognitivecomputations/samantha-data
11
+ - microsoft/orca-math-word-problems-200k
12
+ - mlabonne/FineTome-100k
13
+ - arcee/agent_data
14
+ - PawanKrd/math-gpt-4o-200k
15
+ - cognitivecomputations/SystemChat-2.0
16
+ ---
17
+
18
+ # Dolphin 2.9.4 Llama 3.1 8b 🐬
19
+
20
+ This is the GGUF conversion, for use with llama.cpp, ollama, lmstudio etc.
21
+
22
+ Curated and trained by Eric Hartford and Cognitive Computations
23
+
24
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/h3K4XGj2RH)
25
+ Discord: https://discord.gg/h3K4XGj2RH
26
+
27
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
28
+
29
+ Our appreciation for the sponsors of Dolphin 2.9.4:
30
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xL40S node
31
+
32
+ This model is based on Meta Llama 3.1 8b, and is governed by the Llama 3.1 license.
33
+
34
+ The base model has 128K context, and our finetuning used 8192 sequence length.
35
+
36
+ Dolphin 2.9.4 uses ChatML prompt template format.
37
+
38
+ example:
39
+
40
+ ```
41
+ <|im_start|>system
42
+ You are Dolphin, a helpful AI assistant.<|im_end|>
43
+ <|im_start|>user
44
+ {prompt}<|im_end|>
45
+ <|im_start|>assistant
46
+
47
+ ```
48
+
49
+ Dolphin-2.9.4 has a variety of instruction following, conversational, and coding skills. It also has agentic abilities and supports function calling.
50
+ It is especially trained to obey the system prompt, and follow instructions in many languages.
51
+
52
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
53
+
54
+
55
+ <details><summary>Evals</summary>
56
+
57
+ ```
58
+ hf (pretrained=/workspace/axolotl/dolphin-2.9.4-llama3.1-8b-hf,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (4)
59
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
60
+ |-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
61
+ |leaderboard |N/A |none | 0|acc |↑ |0.2926|± |0.0041|
62
+ | | |none | 0|acc_norm |↑ |0.4513|± |0.0053|
63
+ | | |none | 0|exact_match |↑ |0.0982|± |0.0079|
64
+ | | |none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
65
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
66
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
67
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
68
+ | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.4931|± |0.0061|
69
+ | - leaderboard_bbh_boolean_expressions | 0|none | 3|acc_norm |↑ |0.8000|± |0.0253|
70
+ | - leaderboard_bbh_causal_judgement | 0|none | 3|acc_norm |↑ |0.5615|± |0.0364|
71
+ | - leaderboard_bbh_date_understanding | 0|none | 3|acc_norm |↑ |0.4520|± |0.0315|
72
+ | - leaderboard_bbh_disambiguation_qa | 0|none | 3|acc_norm |↑ |0.6640|± |0.0299|
73
+ | - leaderboard_bbh_formal_fallacies | 0|none | 3|acc_norm |↑ |0.5600|± |0.0315|
74
+ | - leaderboard_bbh_geometric_shapes | 0|none | 3|acc_norm |↑ |0.3640|± |0.0305|
75
+ | - leaderboard_bbh_hyperbaton | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306|
76
+ | - leaderboard_bbh_logical_deduction_five_objects | 0|none | 3|acc_norm |↑ |0.4600|± |0.0316|
77
+ | - leaderboard_bbh_logical_deduction_seven_objects | 0|none | 3|acc_norm |↑ |0.4360|± |0.0314|
78
+ | - leaderboard_bbh_logical_deduction_three_objects | 0|none | 3|acc_norm |↑ |0.6160|± |0.0308|
79
+ | - leaderboard_bbh_movie_recommendation | 0|none | 3|acc_norm |↑ |0.7880|± |0.0259|
80
+ | - leaderboard_bbh_navigate | 0|none | 3|acc_norm |↑ |0.5200|± |0.0317|
81
+ | - leaderboard_bbh_object_counting | 0|none | 3|acc_norm |↑ |0.4520|± |0.0315|
82
+ | - leaderboard_bbh_penguins_in_a_table | 0|none | 3|acc_norm |↑ |0.5205|± |0.0415|
83
+ | - leaderboard_bbh_reasoning_about_colored_objects | 0|none | 3|acc_norm |↑ |0.5120|± |0.0317|
84
+ | - leaderboard_bbh_ruin_names | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306|
85
+ | - leaderboard_bbh_salient_translation_error_detection | 0|none | 3|acc_norm |↑ |0.4320|± |0.0314|
86
+ | - leaderboard_bbh_snarks | 0|none | 3|acc_norm |↑ |0.5843|± |0.0370|
87
+ | - leaderboard_bbh_sports_understanding | 0|none | 3|acc_norm |↑ |0.7040|± |0.0289|
88
+ | - leaderboard_bbh_temporal_sequences | 0|none | 3|acc_norm |↑ |0.1440|± |0.0222|
89
+ | - leaderboard_bbh_tracking_shuffled_objects_five_objects | 0|none | 3|acc_norm |↑ |0.1560|± |0.0230|
90
+ | - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 0|none | 3|acc_norm |↑ |0.1320|± |0.0215|
91
+ | - leaderboard_bbh_tracking_shuffled_objects_three_objects| 0|none | 3|acc_norm |↑ |0.2840|± |0.0286|
92
+ | - leaderboard_bbh_web_of_lies | 0|none | 3|acc_norm |↑ |0.4840|± |0.0317|
93
+ | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.2903|± |0.0132|
94
+ | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm |↑ |0.2980|± |0.0326|
95
+ | - leaderboard_gpqa_extended | 1|none | 0|acc_norm |↑ |0.2839|± |0.0193|
96
+ | - leaderboard_gpqa_main | 1|none | 0|acc_norm |↑ |0.2946|± |0.0216|
97
+ | - leaderboard_ifeval | 2|none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
98
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
99
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
100
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
101
+ | - leaderboard_math_algebra_hard | 1|none | 4|exact_match |↑ |0.1596|± |0.0209|
102
+ | - leaderboard_math_counting_and_prob_hard | 1|none | 4|exact_match |↑ |0.0488|± |0.0195|
103
+ | - leaderboard_math_geometry_hard | 1|none | 4|exact_match |↑ |0.0530|± |0.0196|
104
+ | - leaderboard_math_hard |N/A |none | 4|exact_match |↑ |0.0982|± |0.0079|
105
+ | - leaderboard_math_intermediate_algebra_hard | 1|none | 4|exact_match |↑ |0.0143|± |0.0071|
106
+ | - leaderboard_math_num_theory_hard | 1|none | 4|exact_match |↑ |0.0455|± |0.0168|
107
+ | - leaderboard_math_prealgebra_hard | 1|none | 4|exact_match |↑ |0.2591|± |0.0316|
108
+ | - leaderboard_math_precalculus_hard | 1|none | 4|exact_match |↑ |0.0519|± |0.0192|
109
+ | - leaderboard_mmlu_pro | 0.1|none | 5|acc |↑ |0.2926|± |0.0041|
110
+ | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.3862|± |0.0173|
111
+ | - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm |↑ |0.5280|± |0.0316|
112
+ | - leaderboard_musr_object_placements | 1|none | 0|acc_norm |↑ |0.3594|± |0.0300|
113
+ | - leaderboard_musr_team_allocation | 1|none | 0|acc_norm |↑ |0.2720|± |0.0282|
114
+
115
+ | Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
116
+ |------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
117
+ |leaderboard |N/A |none | 0|acc |↑ |0.2926|± |0.0041|
118
+ | | |none | 0|acc_norm |↑ |0.4513|± |0.0053|
119
+ | | |none | 0|exact_match |↑ |0.0982|± |0.0079|
120
+ | | |none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
121
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
122
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
123
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
124
+ | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.4931|± |0.0061|
125
+ | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.2903|± |0.0132|
126
+ | - leaderboard_math_hard|N/A |none | 4|exact_match |↑ |0.0982|± |0.0079|
127
+ | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.3862|± |0.0173|
128
+ ```
129
+
130
+ </details>
131
+
132
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
133
+ <details><summary>See axolotl config</summary>
134
+
135
+ axolotl version: `0.4.1`
136
+ ```yaml
137
+ base_model: meta-llama/Meta-Llama-3.1-8B
138
+ model_type: LlamaForCausalLM
139
+ tokenizer_type: AutoTokenizer
140
+
141
+ load_in_8bit: false
142
+ # load_in_4bit: true
143
+ strict: false
144
+
145
+ datasets:
146
+ - path: /workspace/datasets/dolphin-2.9.4/dolphin201-sharegpt2.jsonl
147
+ type: sharegpt
148
+ conversation: chatml
149
+
150
+ chat_template: chatml
151
+ # adapter: qlora
152
+ # lora_r: 128
153
+ # lora_alpha: 16
154
+ # lora_modules_to_save: [embed_tokens, lm_head]
155
+ # lora_dropout: 0.05
156
+ # lora_target_linear: true
157
+
158
+ unfrozen_parameters:
159
+ - input_layernorm
160
+ - model.norm
161
+ - post_attention_layernorm
162
+ - self_attn.rotary_emb
163
+ - ^lm_head.weight$
164
+ - ^model.embed_tokens.weight$
165
+ # mlp.down_proj layers
166
+ - model.layers.1.mlp.down_proj
167
+ - model.layers.0.mlp.down_proj
168
+ - model.layers.30.mlp.down_proj
169
+ - model.layers.2.mlp.down_proj
170
+ - model.layers.21.mlp.down_proj
171
+ - model.layers.22.mlp.down_proj
172
+ - model.layers.29.mlp.down_proj
173
+ - model.layers.5.mlp.down_proj
174
+ - model.layers.4.mlp.down_proj
175
+ - model.layers.20.mlp.down_proj
176
+ - model.layers.23.mlp.down_proj
177
+ - model.layers.19.mlp.down_proj
178
+ - model.layers.3.mlp.down_proj
179
+ - model.layers.17.mlp.down_proj
180
+ - model.layers.6.mlp.down_proj
181
+ - model.layers.31.mlp.down_proj
182
+ # mlp.up_proj layers
183
+ - model.layers.4.mlp.up_proj
184
+ - model.layers.3.mlp.up_proj
185
+ - model.layers.0.mlp.up_proj
186
+ - model.layers.5.mlp.up_proj
187
+ - model.layers.7.mlp.up_proj
188
+ - model.layers.6.mlp.up_proj
189
+ - model.layers.2.mlp.up_proj
190
+ - model.layers.1.mlp.up_proj
191
+ - model.layers.8.mlp.up_proj
192
+ - model.layers.12.mlp.up_proj
193
+ - model.layers.14.mlp.up_proj
194
+ - model.layers.9.mlp.up_proj
195
+ - model.layers.15.mlp.up_proj
196
+ - model.layers.17.mlp.up_proj
197
+ - model.layers.13.mlp.up_proj
198
+ - model.layers.19.mlp.up_proj
199
+ # self_attn.k_proj layers
200
+ - model.layers.29.self_attn.k_proj
201
+ - model.layers.25.self_attn.k_proj
202
+ - model.layers.23.self_attn.k_proj
203
+ - model.layers.28.self_attn.k_proj
204
+ - model.layers.21.self_attn.k_proj
205
+ - model.layers.19.self_attn.k_proj
206
+ - model.layers.22.self_attn.k_proj
207
+ - model.layers.20.self_attn.k_proj
208
+ - model.layers.24.self_attn.k_proj
209
+ - model.layers.31.self_attn.k_proj
210
+ - model.layers.27.self_attn.k_proj
211
+ - model.layers.26.self_attn.k_proj
212
+ - model.layers.17.self_attn.k_proj
213
+ - model.layers.11.self_attn.k_proj
214
+ - model.layers.18.self_attn.k_proj
215
+ - model.layers.14.self_attn.k_proj
216
+ # self_attn.o_proj layers
217
+ - model.layers.14.self_attn.o_proj
218
+ - model.layers.7.self_attn.o_proj
219
+ - model.layers.5.self_attn.o_proj
220
+ - model.layers.11.self_attn.o_proj
221
+ - model.layers.6.self_attn.o_proj
222
+ - model.layers.24.self_attn.o_proj
223
+ - model.layers.9.self_attn.o_proj
224
+ - model.layers.13.self_attn.o_proj
225
+ - model.layers.10.self_attn.o_proj
226
+ - model.layers.12.self_attn.o_proj
227
+ - model.layers.8.self_attn.o_proj
228
+ - model.layers.25.self_attn.o_proj
229
+ - model.layers.21.self_attn.o_proj
230
+ - model.layers.23.self_attn.o_proj
231
+ - model.layers.15.self_attn.o_proj
232
+ - model.layers.16.self_attn.o_proj
233
+ # self_attn.q_proj layers
234
+ - model.layers.8.self_attn.q_proj
235
+ - model.layers.13.self_attn.q_proj
236
+ - model.layers.9.self_attn.q_proj
237
+ - model.layers.14.self_attn.q_proj
238
+ - model.layers.10.self_attn.q_proj
239
+ - model.layers.11.self_attn.q_proj
240
+ - model.layers.0.self_attn.q_proj
241
+ - model.layers.15.self_attn.q_proj
242
+ - model.layers.1.self_attn.q_proj
243
+ - model.layers.6.self_attn.q_proj
244
+ - model.layers.5.self_attn.q_proj
245
+ - model.layers.7.self_attn.q_proj
246
+ - model.layers.12.self_attn.q_proj
247
+ - model.layers.16.self_attn.q_proj
248
+ - model.layers.17.self_attn.q_proj
249
+ - model.layers.26.self_attn.q_proj
250
+ # self_attn.v_proj layers
251
+ - model.layers.26.self_attn.v_proj
252
+ - model.layers.17.self_attn.v_proj
253
+ - model.layers.3.self_attn.v_proj
254
+ - model.layers.28.self_attn.v_proj
255
+ - model.layers.29.self_attn.v_proj
256
+ - model.layers.21.self_attn.v_proj
257
+ - model.layers.15.self_attn.v_proj
258
+ - model.layers.16.self_attn.v_proj
259
+ - model.layers.20.self_attn.v_proj
260
+ - model.layers.25.self_attn.v_proj
261
+ - model.layers.6.self_attn.v_proj
262
+ - model.layers.23.self_attn.v_proj
263
+ - model.layers.4.self_attn.v_proj
264
+ - model.layers.1.self_attn.v_proj
265
+ - model.layers.22.self_attn.v_proj
266
+ - model.layers.14.self_attn.v_proj
267
+ # mlp.gate_proj layers
268
+ - model.layers.1.mlp.gate_proj
269
+ - model.layers.2.mlp.gate_proj
270
+ - model.layers.3.mlp.gate_proj
271
+ - model.layers.4.mlp.gate_proj
272
+ - model.layers.0.mlp.gate_proj
273
+ - model.layers.25.mlp.gate_proj
274
+ - model.layers.26.mlp.gate_proj
275
+ - model.layers.5.mlp.gate_proj
276
+ - model.layers.24.mlp.gate_proj
277
+ - model.layers.28.mlp.gate_proj
278
+ - model.layers.23.mlp.gate_proj
279
+ - model.layers.27.mlp.gate_proj
280
+ - model.layers.21.mlp.gate_proj
281
+ - model.layers.22.mlp.gate_proj
282
+ - model.layers.29.mlp.gate_proj
283
+ - model.layers.20.mlp.gate_proj
284
+
285
+
286
+
287
+
288
+ dataset_prepared_path: /workspace/axolotl/dolph-2.9.4-nemo-prepared
289
+ val_set_size: 0.01
290
+ output_dir: /workspace/axolotl/dolphin-2.9.4-llama3.1-8b
291
+
292
+ sequence_len: 8192
293
+ sample_packing: true
294
+ pad_to_sequence_len: true
295
+
296
+ wandb_project: dolphin-2.9.4-llama3.1-8b
297
+ wandb_watch:
298
+ wandb_run_id:
299
+ wandb_log_model:
300
+
301
+ gradient_accumulation_steps: 16
302
+ micro_batch_size: 2
303
+ num_epochs: 3
304
+ optimizer: adamw_torch
305
+ lr_scheduler: cosine
306
+ learning_rate: 5e-6
307
+ train_on_inputs: false
308
+ group_by_length: false
309
+ bf16: auto
310
+ fp16:
311
+ tf32:
312
+
313
+ gradient_checkpointing: true
314
+ gradient_checkpointing_kwargs:
315
+ use_reentrant: false
316
+ early_stopping_patience:
317
+ resume_from_checkpoint:
318
+ logging_steps: 1
319
+ xformers_attention:
320
+ flash_attention: true
321
+
322
+ warmup_steps: 100
323
+ # evals_per_epoch: 4
324
+ eval_table_size:
325
+ saves_per_epoch: 1
326
+ save_total_limit: 2
327
+ save_steps:
328
+ debug:
329
+ deepspeed: deepspeed_configs/zero3_bf16.json
330
+ weight_decay: 0.1
331
+ special_tokens:
332
+ eos_token: "<|im_end|>"
333
+ bos_token: "<|begin_of_text|>"
334
+ pad_token: "<|finetune_right_pad_id|>"
335
+ tokens:
336
+ - "<|im_start|>"
337
+
338
+
339
+ # fsdp:
340
+ # - full_shard
341
+ # - auto_wrap
342
+ # fsdp_config:
343
+ # fsdp_limit_all_gathers: true
344
+ # fsdp_sync_module_states: true
345
+ # fsdp_offload_params: true
346
+ # fsdp_use_orig_params: false
347
+ # fsdp_cpu_ram_efficient_loading: true
348
+ # fsdp_transformer_layer_cls_to_wrap: MixtralSparseMoeBlock
349
+ # fsdp_state_dict_type: FULL_STATE_DICT
350
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
351
+ # fsdp_sharding_strategy: FULL_SHARD
352
+ # fsdp_forward_prefetch: false
353
+ # fsdp_backward_prefetch: BACKWARD_PRE
354
+ ```
355
+
356
+ </details><br>
357
+
358
+ # workspace/axolotl/dolphin-2.9.4-llama3.1-8b
359
+
360
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the None dataset.
361
+ It achieves the following results on the evaluation set:
362
+ - Loss: 0.5655
363
+
364
+ ## Model description
365
+
366
+ More information needed
367
+
368
+ ## Intended uses & limitations
369
+
370
+ More information needed
371
+
372
+ ## Training and evaluation data
373
+
374
+ More information needed
375
+
376
+ ## Training procedure
377
+
378
+ ### Training hyperparameters
379
+
380
+ The following hyperparameters were used during training:
381
+ - learning_rate: 5e-06
382
+ - train_batch_size: 2
383
+ - eval_batch_size: 2
384
+ - seed: 42
385
+ - distributed_type: multi-GPU
386
+ - num_devices: 8
387
+ - gradient_accumulation_steps: 16
388
+ - total_train_batch_size: 256
389
+ - total_eval_batch_size: 16
390
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
391
+ - lr_scheduler_type: cosine
392
+ - lr_scheduler_warmup_steps: 100
393
+ - num_epochs: 3
394
+
395
+ ### Training results
396
+
397
+ | Training Loss | Epoch | Step | Validation Loss |
398
+ |:-------------:|:------:|:----:|:---------------:|
399
+ | 0.5837 | 1.0180 | 1161 | 0.5814 |
400
+ | 0.5525 | 2.0179 | 2322 | 0.5671 |
401
+ | 0.5514 | 2.9624 | 3420 | 0.5655 |
402
+
403
+
404
+ ### Framework versions
405
+
406
+ - Transformers 4.44.0.dev0
407
+ - Pytorch 2.4.0+cu121
408
+ - Datasets 2.19.1
409
+ - Tokenizers 0.19.1