Edit model card

collapse_gemma-2-2b_hs2_replace_iter9_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6695
  • Num Input Tokens Seen: 7754872

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5959 0.0315 5 1.3066 254632
1.0717 0.0630 10 1.2465 502128
0.7125 0.0945 15 1.3592 744976
0.504 0.1260 20 1.5120 986472
0.2848 0.1575 25 1.6652 1237336
0.2452 0.1890 30 1.8288 1482344
0.1578 0.2205 35 1.9980 1732136
0.0569 0.2520 40 2.1960 1978848
0.0667 0.2835 45 2.3046 2223360
0.0341 0.3150 50 2.4331 2460800
0.0289 0.3465 55 2.4497 2702840
0.027 0.3780 60 2.5245 2953304
0.0265 0.4094 65 2.5800 3203880
0.0271 0.4409 70 2.5911 3452328
0.0265 0.4724 75 2.6014 3694936
0.0237 0.5039 80 2.6018 3940776
0.0253 0.5354 85 2.5984 4186160
0.0254 0.5669 90 2.6081 4427280
0.026 0.5984 95 2.6275 4674224
0.0249 0.6299 100 2.6499 4922464
0.0263 0.6614 105 2.6559 5169512
0.0295 0.6929 110 2.6640 5411768
0.0241 0.7244 115 2.6679 5655504
0.0259 0.7559 120 2.6763 5901264
0.0255 0.7874 125 2.6777 6144528
0.024 0.8189 130 2.6766 6387936
0.0228 0.8504 135 2.6707 6633736
0.0258 0.8819 140 2.6821 6868528
0.024 0.9134 145 2.6846 7115712
0.0257 0.9449 150 2.6769 7363728
0.0263 0.9764 155 2.6716 7603744

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter9_sftsd2

Base model

google/gemma-2-2b
Finetuned
this model