lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_1e-4_lora2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:

Loss: 2.7443
Accuracy: 0.6446

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.05
num_epochs: 50.0

Training results

Training Loss	Epoch	Step	Accuracy	Validation Loss
1.3346	1.0	187	0.6671	1.2008
1.1638	2.0	375	0.6681	1.1940
1.0624	3.0	562	0.6676	1.2021
0.9457	4.0	750	0.6663	1.2319
0.8373	5.0	937	0.6639	1.2842
0.7159	6.0	1125	0.6607	1.3518
0.5964	7.0	1312	0.6582	1.4532
0.4861	8.0	1500	0.6549	1.5512
0.3754	9.0	1687	0.6529	1.6544
0.2938	10.0	1875	0.6505	1.7852
0.2268	11.0	2062	0.6490	1.9338
0.1792	12.0	2250	0.6479	2.0116
0.1418	13.0	2437	0.6470	2.1431
0.1171	14.0	2625	0.6447	2.2358
0.1038	15.0	2812	0.6461	2.3164
0.0958	16.0	3000	0.6452	2.3597
0.0848	17.0	3187	0.6453	2.4430
0.0804	18.0	3375	0.6441	2.4833
0.0786	19.0	3562	0.6439	2.4723
0.0786	20.0	3750	0.6437	2.5403
0.0792	21.0	3937	0.6441	2.4761
0.0792	22.0	4125	0.6447	2.5409
0.0781	23.0	4312	0.6449	2.5628
0.0766	24.0	4500	0.6446	2.5601
0.0709	25.0	4687	0.6453	2.5480
0.07	26.0	4875	0.6455	2.6145
0.0704	27.0	5062	0.6437	2.6258
0.073	28.0	5250	0.6449	2.5735
0.0738	29.0	5437	0.6441	2.6097
0.0727	30.0	5625	0.6427	2.5475
0.0727	31.0	5812	0.6435	2.6130
0.0715	32.0	6000	0.6441	2.6316
0.0679	33.0	6187	0.6442	2.5900
0.0684	34.0	6375	0.6445	2.6209
0.0676	35.0	6562	0.6452	2.6090
0.068	36.0	6750	0.6451	2.6729
0.0682	37.0	6937	0.6456	2.6381
0.0695	38.0	7125	0.6441	2.7113
0.07	39.0	7312	0.6438	2.6791
0.0709	40.0	7500	0.6444	2.6901
0.0662	41.0	7687	0.6455	2.6341
0.0664	42.0	7875	0.6451	2.7369
0.0658	43.0	8062	0.6452	2.6964
0.0677	44.0	8250	0.6442	2.6634
0.0668	45.0	8437	0.6436	2.7614
0.0657	46.0	8625	0.6446	2.7360
0.0656	47.0	8812	0.6441	2.7653
0.0658	48.0	9000	0.6453	2.7756
0.0626	49.0	9187	0.6464	2.7578
0.0666	49.87	9350	0.6446	2.7443

Framework versions

Transformers 4.34.0
Pytorch 2.1.0+cu121
Datasets 2.18.0
Tokenizers 0.14.1

tyzhu
/

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_1e-4_lora2

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_1e-4_lora2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_1e-4_lora2

Dataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_1e-4_lora2

Evaluation results