jkazdan's picture
End of training
f8a813f verified
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd1
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd1
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1132
- Num Input Tokens Seen: 71607360
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log | 0 | 0 | 1.3956 | 0 |
| 1.6689 | 0.0038 | 5 | 1.3944 | 264288 |
| 1.6934 | 0.0075 | 10 | 1.3821 | 536352 |
| 1.6749 | 0.0113 | 15 | 1.3496 | 808112 |
| 1.488 | 0.0150 | 20 | 1.2995 | 1078040 |
| 1.4069 | 0.0188 | 25 | 1.2552 | 1351760 |
| 1.4309 | 0.0225 | 30 | 1.2207 | 1626680 |
| 1.3386 | 0.0263 | 35 | 1.1859 | 1896400 |
| 1.2656 | 0.0300 | 40 | 1.1759 | 2162520 |
| 1.0868 | 0.0338 | 45 | 1.1760 | 2433008 |
| 1.0979 | 0.0376 | 50 | 1.1751 | 2708136 |
| 1.0354 | 0.0413 | 55 | 1.1888 | 2982432 |
| 0.9017 | 0.0451 | 60 | 1.2083 | 3254176 |
| 0.8046 | 0.0488 | 65 | 1.2372 | 3535928 |
| 0.7463 | 0.0526 | 70 | 1.2604 | 3806568 |
| 0.6937 | 0.0563 | 75 | 1.2611 | 4072080 |
| 0.5712 | 0.0601 | 80 | 1.2635 | 4342880 |
| 0.5063 | 0.0638 | 85 | 1.2658 | 4611016 |
| 0.6028 | 0.0676 | 90 | 1.2683 | 4876144 |
| 0.3775 | 0.0713 | 95 | 1.2525 | 5135720 |
| 0.4801 | 0.0751 | 100 | 1.2398 | 5406048 |
| 0.4635 | 0.0789 | 105 | 1.2440 | 5679256 |
| 0.3597 | 0.0826 | 110 | 1.2498 | 5952640 |
| 0.3238 | 0.0864 | 115 | 1.2332 | 6222336 |
| 0.3471 | 0.0901 | 120 | 1.2356 | 6496144 |
| 0.2307 | 0.0939 | 125 | 1.2284 | 6765056 |
| 0.2651 | 0.0976 | 130 | 1.2336 | 7031048 |
| 0.307 | 0.1014 | 135 | 1.2210 | 7304224 |
| 0.3672 | 0.1051 | 140 | 1.2382 | 7572808 |
| 0.2384 | 0.1089 | 145 | 1.2123 | 7836440 |
| 0.2206 | 0.1127 | 150 | 1.2281 | 8110064 |
| 0.3669 | 0.1164 | 155 | 1.2171 | 8383472 |
| 0.2905 | 0.1202 | 160 | 1.2165 | 8651664 |
| 0.3079 | 0.1239 | 165 | 1.2169 | 8917720 |
| 0.3 | 0.1277 | 170 | 1.2103 | 9192488 |
| 0.2806 | 0.1314 | 175 | 1.2130 | 9460088 |
| 0.2482 | 0.1352 | 180 | 1.2173 | 9728624 |
| 0.23 | 0.1389 | 185 | 1.2113 | 9990456 |
| 0.3042 | 0.1427 | 190 | 1.2119 | 10257952 |
| 0.2295 | 0.1465 | 195 | 1.2036 | 10534584 |
| 0.2748 | 0.1502 | 200 | 1.2010 | 10802928 |
| 0.1908 | 0.1540 | 205 | 1.2043 | 11079768 |
| 0.2832 | 0.1577 | 210 | 1.2108 | 11350976 |
| 0.2761 | 0.1615 | 215 | 1.2065 | 11612968 |
| 0.225 | 0.1652 | 220 | 1.1983 | 11885392 |
| 0.2191 | 0.1690 | 225 | 1.2087 | 12159712 |
| 0.2642 | 0.1727 | 230 | 1.2047 | 12433264 |
| 0.1318 | 0.1765 | 235 | 1.1998 | 12701168 |
| 0.2471 | 0.1802 | 240 | 1.2091 | 12966968 |
| 0.2115 | 0.1840 | 245 | 1.1981 | 13236320 |
| 0.202 | 0.1878 | 250 | 1.2029 | 13500792 |
| 0.2243 | 0.1915 | 255 | 1.1945 | 13768512 |
| 0.1554 | 0.1953 | 260 | 1.1980 | 14039760 |
| 0.2011 | 0.1990 | 265 | 1.1956 | 14307320 |
| 0.2448 | 0.2028 | 270 | 1.1948 | 14574744 |
| 0.18 | 0.2065 | 275 | 1.1991 | 14846392 |
| 0.1918 | 0.2103 | 280 | 1.1975 | 15124552 |
| 0.1706 | 0.2140 | 285 | 1.1979 | 15396200 |
| 0.1992 | 0.2178 | 290 | 1.1973 | 15666280 |
| 0.1747 | 0.2216 | 295 | 1.1913 | 15926208 |
| 0.277 | 0.2253 | 300 | 1.2005 | 16198504 |
| 0.1883 | 0.2291 | 305 | 1.1907 | 16473664 |
| 0.1765 | 0.2328 | 310 | 1.1939 | 16741520 |
| 0.2066 | 0.2366 | 315 | 1.1891 | 17010016 |
| 0.1443 | 0.2403 | 320 | 1.1919 | 17270320 |
| 0.1589 | 0.2441 | 325 | 1.1851 | 17537088 |
| 0.2572 | 0.2478 | 330 | 1.1853 | 17802288 |
| 0.1297 | 0.2516 | 335 | 1.1810 | 18067848 |
| 0.2256 | 0.2554 | 340 | 1.1817 | 18334464 |
| 0.1643 | 0.2591 | 345 | 1.1806 | 18597696 |
| 0.1802 | 0.2629 | 350 | 1.1791 | 18864984 |
| 0.2121 | 0.2666 | 355 | 1.1798 | 19137144 |
| 0.185 | 0.2704 | 360 | 1.1814 | 19407664 |
| 0.1939 | 0.2741 | 365 | 1.1810 | 19671744 |
| 0.1154 | 0.2779 | 370 | 1.1790 | 19937136 |
| 0.1722 | 0.2816 | 375 | 1.1815 | 20212776 |
| 0.24 | 0.2854 | 380 | 1.1747 | 20485528 |
| 0.1656 | 0.2891 | 385 | 1.1760 | 20744688 |
| 0.1353 | 0.2929 | 390 | 1.1733 | 21014696 |
| 0.1783 | 0.2967 | 395 | 1.1714 | 21283960 |
| 0.1773 | 0.3004 | 400 | 1.1716 | 21549752 |
| 0.1816 | 0.3042 | 405 | 1.1689 | 21823264 |
| 0.2094 | 0.3079 | 410 | 1.1722 | 22091488 |
| 0.2169 | 0.3117 | 415 | 1.1678 | 22362144 |
| 0.0832 | 0.3154 | 420 | 1.1752 | 22630688 |
| 0.1348 | 0.3192 | 425 | 1.1743 | 22897384 |
| 0.1198 | 0.3229 | 430 | 1.1667 | 23163104 |
| 0.195 | 0.3267 | 435 | 1.1698 | 23429176 |
| 0.1961 | 0.3305 | 440 | 1.1675 | 23692680 |
| 0.1643 | 0.3342 | 445 | 1.1616 | 23968624 |
| 0.1294 | 0.3380 | 450 | 1.1658 | 24235352 |
| 0.1459 | 0.3417 | 455 | 1.1668 | 24506592 |
| 0.1742 | 0.3455 | 460 | 1.1635 | 24778464 |
| 0.1323 | 0.3492 | 465 | 1.1627 | 25046232 |
| 0.2131 | 0.3530 | 470 | 1.1630 | 25316736 |
| 0.2034 | 0.3567 | 475 | 1.1622 | 25590624 |
| 0.2843 | 0.3605 | 480 | 1.1608 | 25861488 |
| 0.1817 | 0.3643 | 485 | 1.1585 | 26128792 |
| 0.1271 | 0.3680 | 490 | 1.1588 | 26396592 |
| 0.1022 | 0.3718 | 495 | 1.1603 | 26667880 |
| 0.142 | 0.3755 | 500 | 1.1603 | 26928216 |
| 0.1467 | 0.3793 | 505 | 1.1559 | 27198592 |
| 0.1832 | 0.3830 | 510 | 1.1559 | 27473144 |
| 0.2049 | 0.3868 | 515 | 1.1563 | 27738112 |
| 0.1298 | 0.3905 | 520 | 1.1586 | 28008840 |
| 0.2416 | 0.3943 | 525 | 1.1576 | 28275208 |
| 0.1191 | 0.3980 | 530 | 1.1531 | 28545208 |
| 0.173 | 0.4018 | 535 | 1.1576 | 28810664 |
| 0.0835 | 0.4056 | 540 | 1.1515 | 29079408 |
| 0.1381 | 0.4093 | 545 | 1.1517 | 29350304 |
| 0.1549 | 0.4131 | 550 | 1.1530 | 29623160 |
| 0.1277 | 0.4168 | 555 | 1.1523 | 29891904 |
| 0.1541 | 0.4206 | 560 | 1.1491 | 30167440 |
| 0.1365 | 0.4243 | 565 | 1.1512 | 30434736 |
| 0.248 | 0.4281 | 570 | 1.1536 | 30710520 |
| 0.1152 | 0.4318 | 575 | 1.1500 | 30980872 |
| 0.1919 | 0.4356 | 580 | 1.1481 | 31249880 |
| 0.1231 | 0.4394 | 585 | 1.1505 | 31519496 |
| 0.1135 | 0.4431 | 590 | 1.1489 | 31790184 |
| 0.1939 | 0.4469 | 595 | 1.1505 | 32067360 |
| 0.1121 | 0.4506 | 600 | 1.1467 | 32340136 |
| 0.1569 | 0.4544 | 605 | 1.1488 | 32613024 |
| 0.1592 | 0.4581 | 610 | 1.1475 | 32884400 |
| 0.1161 | 0.4619 | 615 | 1.1454 | 33155112 |
| 0.0919 | 0.4656 | 620 | 1.1441 | 33425512 |
| 0.201 | 0.4694 | 625 | 1.1492 | 33701256 |
| 0.2044 | 0.4732 | 630 | 1.1416 | 33972240 |
| 0.1223 | 0.4769 | 635 | 1.1436 | 34243808 |
| 0.1468 | 0.4807 | 640 | 1.1486 | 34520456 |
| 0.1553 | 0.4844 | 645 | 1.1404 | 34790544 |
| 0.1583 | 0.4882 | 650 | 1.1383 | 35066080 |
| 0.1896 | 0.4919 | 655 | 1.1446 | 35336248 |
| 0.1366 | 0.4957 | 660 | 1.1427 | 35609152 |
| 0.1858 | 0.4994 | 665 | 1.1400 | 35877200 |
| 0.1262 | 0.5032 | 670 | 1.1378 | 36144592 |
| 0.1302 | 0.5069 | 675 | 1.1424 | 36413456 |
| 0.0964 | 0.5107 | 680 | 1.1428 | 36681760 |
| 0.1263 | 0.5145 | 685 | 1.1401 | 36957320 |
| 0.1875 | 0.5182 | 690 | 1.1388 | 37219960 |
| 0.1174 | 0.5220 | 695 | 1.1407 | 37488992 |
| 0.1742 | 0.5257 | 700 | 1.1408 | 37762920 |
| 0.1519 | 0.5295 | 705 | 1.1377 | 38026320 |
| 0.2115 | 0.5332 | 710 | 1.1397 | 38296904 |
| 0.1436 | 0.5370 | 715 | 1.1387 | 38564392 |
| 0.1447 | 0.5407 | 720 | 1.1406 | 38837944 |
| 0.1507 | 0.5445 | 725 | 1.1373 | 39101480 |
| 0.1163 | 0.5483 | 730 | 1.1351 | 39376288 |
| 0.129 | 0.5520 | 735 | 1.1388 | 39648288 |
| 0.2292 | 0.5558 | 740 | 1.1380 | 39921536 |
| 0.1607 | 0.5595 | 745 | 1.1366 | 40191808 |
| 0.1814 | 0.5633 | 750 | 1.1370 | 40463888 |
| 0.1778 | 0.5670 | 755 | 1.1359 | 40731848 |
| 0.1903 | 0.5708 | 760 | 1.1343 | 41007512 |
| 0.1437 | 0.5745 | 765 | 1.1356 | 41272744 |
| 0.1851 | 0.5783 | 770 | 1.1328 | 41539920 |
| 0.1951 | 0.5821 | 775 | 1.1326 | 41803880 |
| 0.1262 | 0.5858 | 780 | 1.1337 | 42073712 |
| 0.2232 | 0.5896 | 785 | 1.1325 | 42346496 |
| 0.1937 | 0.5933 | 790 | 1.1320 | 42611904 |
| 0.2337 | 0.5971 | 795 | 1.1318 | 42881608 |
| 0.1333 | 0.6008 | 800 | 1.1316 | 43156312 |
| 0.1644 | 0.6046 | 805 | 1.1329 | 43430512 |
| 0.2155 | 0.6083 | 810 | 1.1317 | 43702768 |
| 0.1681 | 0.6121 | 815 | 1.1340 | 43971344 |
| 0.1201 | 0.6158 | 820 | 1.1320 | 44242816 |
| 0.1561 | 0.6196 | 825 | 1.1290 | 44514280 |
| 0.103 | 0.6234 | 830 | 1.1314 | 44775808 |
| 0.1167 | 0.6271 | 835 | 1.1318 | 45047144 |
| 0.1668 | 0.6309 | 840 | 1.1295 | 45324128 |
| 0.2126 | 0.6346 | 845 | 1.1276 | 45596672 |
| 0.176 | 0.6384 | 850 | 1.1294 | 45863960 |
| 0.1267 | 0.6421 | 855 | 1.1306 | 46129040 |
| 0.1979 | 0.6459 | 860 | 1.1277 | 46398088 |
| 0.1447 | 0.6496 | 865 | 1.1278 | 46672416 |
| 0.1757 | 0.6534 | 870 | 1.1301 | 46937184 |
| 0.191 | 0.6572 | 875 | 1.1272 | 47200984 |
| 0.1306 | 0.6609 | 880 | 1.1267 | 47468184 |
| 0.125 | 0.6647 | 885 | 1.1287 | 47737752 |
| 0.1341 | 0.6684 | 890 | 1.1315 | 47999712 |
| 0.1247 | 0.6722 | 895 | 1.1270 | 48264000 |
| 0.169 | 0.6759 | 900 | 1.1249 | 48532608 |
| 0.1177 | 0.6797 | 905 | 1.1280 | 48808296 |
| 0.0712 | 0.6834 | 910 | 1.1290 | 49073296 |
| 0.1168 | 0.6872 | 915 | 1.1284 | 49334576 |
| 0.1286 | 0.6910 | 920 | 1.1299 | 49599840 |
| 0.1317 | 0.6947 | 925 | 1.1278 | 49861136 |
| 0.147 | 0.6985 | 930 | 1.1289 | 50125936 |
| 0.1164 | 0.7022 | 935 | 1.1280 | 50395928 |
| 0.1193 | 0.7060 | 940 | 1.1289 | 50661104 |
| 0.1654 | 0.7097 | 945 | 1.1284 | 50935048 |
| 0.1353 | 0.7135 | 950 | 1.1241 | 51196400 |
| 0.1219 | 0.7172 | 955 | 1.1267 | 51472624 |
| 0.1166 | 0.7210 | 960 | 1.1274 | 51740600 |
| 0.1323 | 0.7247 | 965 | 1.1248 | 52013632 |
| 0.1577 | 0.7285 | 970 | 1.1259 | 52285776 |
| 0.1541 | 0.7323 | 975 | 1.1245 | 52553520 |
| 0.1187 | 0.7360 | 980 | 1.1223 | 52823736 |
| 0.1106 | 0.7398 | 985 | 1.1242 | 53095120 |
| 0.1275 | 0.7435 | 990 | 1.1270 | 53362224 |
| 0.1362 | 0.7473 | 995 | 1.1221 | 53633376 |
| 0.1207 | 0.7510 | 1000 | 1.1238 | 53909920 |
| 0.1537 | 0.7548 | 1005 | 1.1232 | 54179904 |
| 0.1309 | 0.7585 | 1010 | 1.1249 | 54439216 |
| 0.2223 | 0.7623 | 1015 | 1.1246 | 54705664 |
| 0.1478 | 0.7661 | 1020 | 1.1220 | 54977032 |
| 0.1868 | 0.7698 | 1025 | 1.1218 | 55250832 |
| 0.1014 | 0.7736 | 1030 | 1.1238 | 55517776 |
| 0.1415 | 0.7773 | 1035 | 1.1243 | 55787456 |
| 0.1376 | 0.7811 | 1040 | 1.1216 | 56050584 |
| 0.1316 | 0.7848 | 1045 | 1.1230 | 56314520 |
| 0.1462 | 0.7886 | 1050 | 1.1255 | 56578464 |
| 0.1411 | 0.7923 | 1055 | 1.1242 | 56845744 |
| 0.1464 | 0.7961 | 1060 | 1.1214 | 57110088 |
| 0.1964 | 0.7998 | 1065 | 1.1217 | 57373800 |
| 0.1278 | 0.8036 | 1070 | 1.1227 | 57642744 |
| 0.117 | 0.8074 | 1075 | 1.1217 | 57915200 |
| 0.106 | 0.8111 | 1080 | 1.1236 | 58178104 |
| 0.1668 | 0.8149 | 1085 | 1.1226 | 58451608 |
| 0.0766 | 0.8186 | 1090 | 1.1219 | 58724352 |
| 0.1183 | 0.8224 | 1095 | 1.1231 | 58992672 |
| 0.1453 | 0.8261 | 1100 | 1.1239 | 59257520 |
| 0.1311 | 0.8299 | 1105 | 1.1237 | 59529064 |
| 0.1174 | 0.8336 | 1110 | 1.1200 | 59793896 |
| 0.1889 | 0.8374 | 1115 | 1.1214 | 60066544 |
| 0.1597 | 0.8412 | 1120 | 1.1217 | 60336744 |
| 0.1343 | 0.8449 | 1125 | 1.1192 | 60598856 |
| 0.1629 | 0.8487 | 1130 | 1.1211 | 60862632 |
| 0.1116 | 0.8524 | 1135 | 1.1207 | 61128536 |
| 0.1171 | 0.8562 | 1140 | 1.1177 | 61396504 |
| 0.1267 | 0.8599 | 1145 | 1.1205 | 61670128 |
| 0.1695 | 0.8637 | 1150 | 1.1247 | 61927712 |
| 0.118 | 0.8674 | 1155 | 1.1217 | 62198216 |
| 0.1302 | 0.8712 | 1160 | 1.1184 | 62467968 |
| 0.139 | 0.8750 | 1165 | 1.1193 | 62728520 |
| 0.1311 | 0.8787 | 1170 | 1.1224 | 62994464 |
| 0.1646 | 0.8825 | 1175 | 1.1190 | 63263200 |
| 0.1948 | 0.8862 | 1180 | 1.1170 | 63532624 |
| 0.1435 | 0.8900 | 1185 | 1.1180 | 63796048 |
| 0.1222 | 0.8937 | 1190 | 1.1178 | 64067136 |
| 0.1527 | 0.8975 | 1195 | 1.1173 | 64339824 |
| 0.1857 | 0.9012 | 1200 | 1.1182 | 64604792 |
| 0.162 | 0.9050 | 1205 | 1.1178 | 64877464 |
| 0.1664 | 0.9087 | 1210 | 1.1168 | 65143192 |
| 0.1023 | 0.9125 | 1215 | 1.1178 | 65419592 |
| 0.1743 | 0.9163 | 1220 | 1.1190 | 65677928 |
| 0.0952 | 0.9200 | 1225 | 1.1169 | 65936144 |
| 0.1412 | 0.9238 | 1230 | 1.1147 | 66199144 |
| 0.17 | 0.9275 | 1235 | 1.1163 | 66458456 |
| 0.1457 | 0.9313 | 1240 | 1.1192 | 66733408 |
| 0.1909 | 0.9350 | 1245 | 1.1165 | 67005920 |
| 0.1616 | 0.9388 | 1250 | 1.1148 | 67271224 |
| 0.1314 | 0.9425 | 1255 | 1.1178 | 67535688 |
| 0.1707 | 0.9463 | 1260 | 1.1182 | 67810304 |
| 0.1656 | 0.9501 | 1265 | 1.1162 | 68082744 |
| 0.1116 | 0.9538 | 1270 | 1.1167 | 68345952 |
| 0.1369 | 0.9576 | 1275 | 1.1175 | 68611952 |
| 0.0926 | 0.9613 | 1280 | 1.1157 | 68875760 |
| 0.1441 | 0.9651 | 1285 | 1.1151 | 69144464 |
| 0.1654 | 0.9688 | 1290 | 1.1144 | 69414824 |
| 0.1936 | 0.9726 | 1295 | 1.1159 | 69681432 |
| 0.1919 | 0.9763 | 1300 | 1.1176 | 69952520 |
| 0.1458 | 0.9801 | 1305 | 1.1161 | 70214600 |
| 0.1564 | 0.9839 | 1310 | 1.1146 | 70478632 |
| 0.1862 | 0.9876 | 1315 | 1.1148 | 70746696 |
| 0.1156 | 0.9914 | 1320 | 1.1153 | 71012176 |
| 0.1483 | 0.9951 | 1325 | 1.1141 | 71281328 |
| 0.1799 | 0.9989 | 1330 | 1.1128 | 71551704 |
### Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1