Edit model card

collapse_gemma-2-2b_hs2_replace_iter8_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6211
  • Num Input Tokens Seen: 7818184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6108 0.0315 5 1.3097 239488
1.2048 0.0630 10 1.2514 488880
0.7736 0.0945 15 1.3428 739832
0.4942 0.1259 20 1.5487 988640
0.3684 0.1574 25 1.6597 1234208
0.2257 0.1889 30 1.8226 1477784
0.104 0.2204 35 2.0198 1730776
0.079 0.2519 40 2.1574 1971328
0.0504 0.2834 45 2.3647 2217856
0.0368 0.3148 50 2.4414 2465200
0.0362 0.3463 55 2.5177 2715224
0.0347 0.3778 60 2.5495 2963688
0.0318 0.4093 65 2.5692 3204352
0.0298 0.4408 70 2.5663 3455912
0.026 0.4723 75 2.5764 3694848
0.0277 0.5037 80 2.5583 3950488
0.0251 0.5352 85 2.5831 4197448
0.03 0.5667 90 2.6005 4438720
0.0247 0.5982 95 2.5882 4687496
0.024 0.6297 100 2.5853 4937840
0.0245 0.6612 105 2.6122 5185648
0.0259 0.6926 110 2.6367 5428648
0.0261 0.7241 115 2.6511 5673016
0.0276 0.7556 120 2.6375 5923456
0.0257 0.7871 125 2.6391 6177184
0.0255 0.8186 130 2.6434 6421672
0.025 0.8501 135 2.6282 6667984
0.0265 0.8815 140 2.6097 6917840
0.0258 0.9130 145 2.6087 7163648
0.0243 0.9445 150 2.6101 7416408
0.0237 0.9760 155 2.6211 7665640

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter8_sftsd0

Base model

google/gemma-2-2b
Finetuned
(217)
this model