Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1017
  • Num Input Tokens Seen: 30391200

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6925 0.0091 5 1.3858 274360
1.4659 0.0183 10 1.3191 554728
1.4901 0.0274 15 1.2524 829128
1.2538 0.0365 20 1.1937 1108016
1.2819 0.0457 25 1.1684 1390936
1.0879 0.0548 30 1.1576 1671768
1.0761 0.0640 35 1.1720 1947856
0.9364 0.0731 40 1.1667 2228328
0.8285 0.0822 45 1.2083 2515248
0.7714 0.0914 50 1.2007 2796696
0.7316 0.1005 55 1.2211 3077936
0.5592 0.1096 60 1.2119 3353176
0.5585 0.1188 65 1.2018 3626024
0.4803 0.1279 70 1.2017 3898832
0.5021 0.1370 75 1.1963 4175912
0.4514 0.1462 80 1.2121 4455072
0.3612 0.1553 85 1.1931 4727720
0.4515 0.1645 90 1.1881 5009488
0.4461 0.1736 95 1.1880 5282608
0.5034 0.1827 100 1.1860 5553496
0.5685 0.1919 105 1.1842 5836064
0.4516 0.2010 110 1.1854 6114952
0.2958 0.2101 115 1.1750 6392272
0.3735 0.2193 120 1.1766 6663208
0.3907 0.2284 125 1.1676 6944456
0.4901 0.2376 130 1.1709 7221960
0.3111 0.2467 135 1.1608 7500464
0.3151 0.2558 140 1.1681 7786536
0.3311 0.2650 145 1.1629 8061032
0.3119 0.2741 150 1.1624 8339776
0.425 0.2832 155 1.1626 8614064
0.3599 0.2924 160 1.1609 8885704
0.3478 0.3015 165 1.1554 9166584
0.4074 0.3106 170 1.1529 9453272
0.24 0.3198 175 1.1585 9734480
0.3161 0.3289 180 1.1508 10011232
0.3567 0.3381 185 1.1568 10284712
0.3651 0.3472 190 1.1469 10565320
0.2963 0.3563 195 1.1513 10834768
0.3133 0.3655 200 1.1498 11114320
0.4982 0.3746 205 1.1447 11395816
0.3136 0.3837 210 1.1435 11676048
0.2945 0.3929 215 1.1452 11957056
0.2632 0.4020 220 1.1417 12225504
0.2754 0.4111 225 1.1421 12506816
0.2892 0.4203 230 1.1411 12778688
0.3303 0.4294 235 1.1351 13052448
0.3272 0.4386 240 1.1422 13325752
0.2219 0.4477 245 1.1361 13612800
0.3318 0.4568 250 1.1347 13888688
0.3058 0.4660 255 1.1358 14167640
0.3574 0.4751 260 1.1317 14443576
0.3944 0.4842 265 1.1296 14722000
0.3048 0.4934 270 1.1306 14994688
0.2954 0.5025 275 1.1313 15271576
0.3244 0.5116 280 1.1269 15548760
0.371 0.5208 285 1.1297 15821744
0.3526 0.5299 290 1.1274 16091768
0.2937 0.5391 295 1.1271 16364464
0.3097 0.5482 300 1.1230 16641960
0.3057 0.5573 305 1.1273 16918448
0.3099 0.5665 310 1.1251 17193440
0.283 0.5756 315 1.1235 17470240
0.3392 0.5847 320 1.1248 17749104
0.3276 0.5939 325 1.1205 18032184
0.2521 0.6030 330 1.1216 18317360
0.2278 0.6122 335 1.1183 18588736
0.2214 0.6213 340 1.1208 18864160
0.3554 0.6304 345 1.1189 19143568
0.2126 0.6396 350 1.1188 19430928
0.3241 0.6487 355 1.1182 19712432
0.2468 0.6578 360 1.1167 19992936
0.302 0.6670 365 1.1179 20275360
0.225 0.6761 370 1.1145 20554416
0.2699 0.6852 375 1.1150 20833584
0.2959 0.6944 380 1.1127 21116288
0.3684 0.7035 385 1.1135 21393272
0.2894 0.7127 390 1.1132 21664504
0.3468 0.7218 395 1.1104 21945840
0.3365 0.7309 400 1.1112 22224640
0.2756 0.7401 405 1.1138 22492512
0.2134 0.7492 410 1.1097 22774128
0.273 0.7583 415 1.1099 23054632
0.248 0.7675 420 1.1095 23332744
0.4175 0.7766 425 1.1101 23610928
0.2982 0.7857 430 1.1105 23886096
0.2497 0.7949 435 1.1085 24164752
0.2912 0.8040 440 1.1079 24441944
0.3517 0.8132 445 1.1078 24716256
0.3852 0.8223 450 1.1070 24992216
0.3735 0.8314 455 1.1088 25271800
0.3185 0.8406 460 1.1092 25558096
0.2549 0.8497 465 1.1083 25837144
0.1872 0.8588 470 1.1066 26120576
0.2247 0.8680 475 1.1073 26393552
0.2985 0.8771 480 1.1055 26672072
0.27 0.8862 485 1.1037 26957208
0.2618 0.8954 490 1.1059 27236264
0.2642 0.9045 495 1.1053 27515256
0.2234 0.9137 500 1.1039 27791360
0.3124 0.9228 505 1.1068 28070688
0.3348 0.9319 510 1.1028 28340240
0.3423 0.9411 515 1.1021 28613928
0.24 0.9502 520 1.1043 28889472
0.2406 0.9593 525 1.1058 29170016
0.2347 0.9685 530 1.1031 29451680
0.2342 0.9776 535 1.1043 29728536
0.3459 0.9868 540 1.1039 30007456
0.2486 0.9959 545 1.1014 30279832

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

Base model

google/gemma-2-2b
Finetuned
(217)
this model