tyzhu's picture
End of training
e881de9 verified
metadata
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - generated_from_trainer
datasets:
  - tyzhu/lmind_hotpot_train8000_eval7405_v1_qa
metrics:
  - accuracy
model-index:
  - name: lmind_hotpot_train8000_eval7405_v1_qa_5e-5_lora2
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: tyzhu/lmind_hotpot_train8000_eval7405_v1_qa
          type: tyzhu/lmind_hotpot_train8000_eval7405_v1_qa
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.584886075949367

lmind_hotpot_train8000_eval7405_v1_qa_5e-5_lora2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the tyzhu/lmind_hotpot_train8000_eval7405_v1_qa dataset. It achieves the following results on the evaluation set:

  • Loss: 3.6692
  • Accuracy: 0.5849

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Accuracy Validation Loss
1.798 1.0 250 0.6067 1.8213
1.7 2.0 500 0.6077 1.8046
1.5869 3.0 750 0.6071 1.8293
1.4349 4.0 1000 0.6043 1.8974
1.3111 5.0 1250 0.6015 1.9769
1.197 6.0 1500 0.5992 2.0635
1.0729 7.0 1750 0.5975 2.1523
0.9833 8.0 2000 0.5947 2.2640
0.8672 9.0 2250 0.5924 2.3643
0.7883 10.0 2500 0.5908 2.4598
0.6879 11.0 2750 0.5890 2.5669
0.6295 12.0 3000 0.5885 2.7000
0.5545 13.0 3250 0.5851 2.8281
0.5208 14.0 3500 0.5853 2.8794
0.4679 15.0 3750 0.5863 2.9184
0.4464 16.0 4000 0.5852 3.0791
0.4136 17.0 4250 0.5856 3.0832
0.4021 18.0 4500 0.5847 3.0944
0.3776 19.0 4750 0.5828 3.2120
0.373 20.0 5000 0.5839 3.2298
0.3572 21.0 5250 0.5841 3.2434
0.3517 22.0 5500 0.5847 3.2606
0.3374 23.0 5750 0.5845 3.3392
0.3338 24.0 6000 0.5841 3.3489
0.3286 25.0 6250 0.5846 3.4036
0.3259 26.0 6500 0.5849 3.3878
0.3175 27.0 6750 0.5853 3.4960
0.3185 28.0 7000 0.5852 3.4873
0.3117 29.0 7250 0.5840 3.4780
0.3125 30.0 7500 0.5836 3.5383
0.3041 31.0 7750 0.5841 3.5253
0.3047 32.0 8000 0.5853 3.5283
0.2982 33.0 8250 0.5833 3.5511
0.3013 34.0 8500 0.5852 3.5445
0.295 35.0 8750 0.5841 3.5891
0.2988 36.0 9000 0.5833 3.6198
0.2939 37.0 9250 0.5842 3.5708
0.2952 38.0 9500 0.5833 3.6124
0.2927 39.0 9750 0.5840 3.6413
0.2931 40.0 10000 0.5828 3.6555
0.2891 41.0 10250 0.5841 3.6471
0.291 42.0 10500 0.5846 3.7233
0.2886 43.0 10750 0.5850 3.6348
0.289 44.0 11000 0.5839 3.6786
0.2846 45.0 11250 0.5845 3.6846
0.2858 46.0 11500 0.5855 3.7088
0.283 47.0 11750 0.5842 3.6938
0.2863 48.0 12000 0.5830 3.6793
0.2782 49.0 12250 0.5839 3.6805
0.2834 50.0 12500 0.5849 3.6692

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.14.1