jkazdan's picture
End of training
befa171 verified
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: gemma-2-2b_hs2_iter1_sftsd2
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# gemma-2-2b_hs2_iter1_sftsd2
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2172
- Num Input Tokens Seen: 18470688
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log | 0 | 0 | 1.3956 | 0 |
| 1.8375 | 0.0151 | 5 | 1.3784 | 277040 |
| 1.5937 | 0.0301 | 10 | 1.2687 | 554320 |
| 1.5082 | 0.0452 | 15 | 1.1925 | 832568 |
| 1.3528 | 0.0602 | 20 | 1.1570 | 1104816 |
| 1.29 | 0.0753 | 25 | 1.1362 | 1377136 |
| 1.2141 | 0.0903 | 30 | 1.1415 | 1648280 |
| 1.0916 | 0.1054 | 35 | 1.1517 | 1928952 |
| 1.0637 | 0.1205 | 40 | 1.1848 | 2205568 |
| 0.997 | 0.1355 | 45 | 1.2021 | 2486744 |
| 0.8411 | 0.1506 | 50 | 1.2454 | 2759672 |
| 0.819 | 0.1656 | 55 | 1.2625 | 3034344 |
| 0.8372 | 0.1807 | 60 | 1.2813 | 3310160 |
| 0.7501 | 0.1957 | 65 | 1.3245 | 3591528 |
| 0.701 | 0.2108 | 70 | 1.3285 | 3867064 |
| 0.6381 | 0.2259 | 75 | 1.3442 | 4136080 |
| 0.5853 | 0.2409 | 80 | 1.3674 | 4413080 |
| 0.5914 | 0.2560 | 85 | 1.3762 | 4697248 |
| 0.539 | 0.2710 | 90 | 1.3602 | 4976440 |
| 0.5163 | 0.2861 | 95 | 1.3418 | 5258848 |
| 0.3974 | 0.3011 | 100 | 1.3244 | 5530232 |
| 0.415 | 0.3162 | 105 | 1.3646 | 5806632 |
| 0.3812 | 0.3313 | 110 | 1.3175 | 6085304 |
| 0.3926 | 0.3463 | 115 | 1.3466 | 6366392 |
| 0.3356 | 0.3614 | 120 | 1.3194 | 6645272 |
| 0.3933 | 0.3764 | 125 | 1.3229 | 6933352 |
| 0.3463 | 0.3915 | 130 | 1.3271 | 7209752 |
| 0.3245 | 0.4065 | 135 | 1.3134 | 7487224 |
| 0.3898 | 0.4216 | 140 | 1.3007 | 7763992 |
| 0.238 | 0.4367 | 145 | 1.3160 | 8052304 |
| 0.3031 | 0.4517 | 150 | 1.3038 | 8323880 |
| 0.363 | 0.4668 | 155 | 1.3004 | 8594840 |
| 0.3207 | 0.4818 | 160 | 1.2812 | 8877704 |
| 0.2837 | 0.4969 | 165 | 1.2827 | 9158496 |
| 0.1469 | 0.5120 | 170 | 1.2875 | 9437080 |
| 0.2441 | 0.5270 | 175 | 1.2807 | 9715752 |
| 0.2553 | 0.5421 | 180 | 1.2806 | 9997688 |
| 0.2823 | 0.5571 | 185 | 1.2647 | 10279272 |
| 0.2381 | 0.5722 | 190 | 1.2680 | 10555816 |
| 0.2152 | 0.5872 | 195 | 1.2607 | 10829488 |
| 0.2018 | 0.6023 | 200 | 1.2581 | 11107824 |
| 0.2278 | 0.6174 | 205 | 1.2819 | 11388528 |
| 0.2623 | 0.6324 | 210 | 1.2529 | 11675728 |
| 0.2305 | 0.6475 | 215 | 1.2584 | 11954704 |
| 0.1346 | 0.6625 | 220 | 1.2531 | 12227408 |
| 0.2306 | 0.6776 | 225 | 1.2524 | 12509728 |
| 0.2329 | 0.6926 | 230 | 1.2434 | 12789144 |
| 0.1821 | 0.7077 | 235 | 1.2447 | 13064784 |
| 0.238 | 0.7228 | 240 | 1.2315 | 13335048 |
| 0.2227 | 0.7378 | 245 | 1.2391 | 13612832 |
| 0.2414 | 0.7529 | 250 | 1.2377 | 13892512 |
| 0.1753 | 0.7679 | 255 | 1.2327 | 14174312 |
| 0.2232 | 0.7830 | 260 | 1.2354 | 14454112 |
| 0.209 | 0.7980 | 265 | 1.2343 | 14724840 |
| 0.1725 | 0.8131 | 270 | 1.2314 | 15000280 |
| 0.1442 | 0.8282 | 275 | 1.2273 | 15282784 |
| 0.2197 | 0.8432 | 280 | 1.2237 | 15556416 |
| 0.2327 | 0.8583 | 285 | 1.2239 | 15842432 |
| 0.233 | 0.8733 | 290 | 1.2274 | 16119456 |
| 0.2136 | 0.8884 | 295 | 1.2228 | 16398960 |
| 0.1161 | 0.9034 | 300 | 1.2295 | 16675056 |
| 0.1408 | 0.9185 | 305 | 1.2214 | 16956240 |
| 0.2016 | 0.9336 | 310 | 1.2247 | 17235632 |
| 0.2294 | 0.9486 | 315 | 1.2298 | 17515584 |
| 0.1335 | 0.9637 | 320 | 1.2145 | 17798760 |
| 0.1811 | 0.9787 | 325 | 1.2251 | 18075960 |
| 0.2033 | 0.9938 | 330 | 1.2213 | 18358176 |
### Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1