jkazdan's picture
End of training
d9f5ee7 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: gemma-2-2b_hs2_iter1_sftsd2
    results: []

gemma-2-2b_hs2_iter1_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1002
  • Num Input Tokens Seen: 14387280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.577 0.0197 5 1.3562 276808
1.4016 0.0393 10 1.2393 559240
1.3349 0.0590 15 1.1712 839888
1.1987 0.0787 20 1.1441 1127136
1.1874 0.0984 25 1.1206 1413720
1.1064 0.1180 30 1.1225 1699736
1.0992 0.1377 35 1.1192 1984048
1.088 0.1574 40 1.1313 2268240
1.0345 0.1770 45 1.1271 2551264
0.9586 0.1967 50 1.1418 2825616
0.8898 0.2164 55 1.1422 3103504
0.8212 0.2360 60 1.1598 3386816
0.895 0.2557 65 1.1488 3670168
0.8268 0.2754 70 1.1537 3955840
0.7428 0.2951 75 1.1512 4235752
0.9565 0.3147 80 1.1497 4519608
0.7561 0.3344 85 1.1456 4805928
0.7523 0.3541 90 1.1405 5094240
0.6549 0.3737 95 1.1477 5379072
0.6742 0.3934 100 1.1458 5664776
0.6537 0.4131 105 1.1440 5946192
0.6865 0.4328 110 1.1431 6228232
0.7215 0.4524 115 1.1393 6512048
0.634 0.4721 120 1.1381 6797648
0.6158 0.4918 125 1.1367 7080376
0.7484 0.5114 130 1.1402 7360240
0.5453 0.5311 135 1.1329 7642072
0.688 0.5508 140 1.1310 7924024
0.7336 0.5704 145 1.1333 8210024
0.6394 0.5901 150 1.1276 8496496
0.7081 0.6098 155 1.1240 8781736
0.6234 0.6295 160 1.1242 9061576
0.5486 0.6491 165 1.1228 9347960
0.5489 0.6688 170 1.1227 9631776
0.4972 0.6885 175 1.1197 9914096
0.5114 0.7081 180 1.1195 10197376
0.531 0.7278 185 1.1164 10475760
0.4653 0.7475 190 1.1152 10758120
0.5525 0.7672 195 1.1123 11038032
0.5382 0.7868 200 1.1127 11320808
0.5825 0.8065 205 1.1093 11603064
0.5529 0.8262 210 1.1100 11890568
0.4708 0.8458 215 1.1083 12169248
0.4272 0.8655 220 1.1071 12450528
0.5019 0.8852 225 1.1053 12739456
0.5628 0.9048 230 1.1033 13021752
0.6113 0.9245 235 1.1038 13309360
0.4898 0.9442 240 1.1024 13594384
0.5342 0.9639 245 1.1010 13874576
0.5051 0.9835 250 1.1015 14161392

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1