Edit model card

gemma-2-2b_hs2_iter1_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0894
  • Num Input Tokens Seen: 6685536

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.4494 0.0427 5 1.2764 283664
1.2474 0.0853 10 1.1812 576880
1.2268 0.128 15 1.1447 858360
1.0463 0.1707 20 1.1207 1150400
1.091 0.2133 25 1.1145 1435208
1.0393 0.256 30 1.1119 1716288
0.996 0.2987 35 1.1091 2002968
0.9853 0.3413 40 1.1115 2284960
0.8797 0.384 45 1.1093 2574696
1.0232 0.4267 50 1.1144 2861120
0.9278 0.4693 55 1.1065 3142784
0.8712 0.512 60 1.1112 3431816
0.8836 0.5547 65 1.1035 3720448
0.9139 0.5973 70 1.1034 4007136
0.8125 0.64 75 1.1018 4294416
0.8507 0.6827 80 1.1010 4576968
0.8093 0.7253 85 1.0978 4861272
0.8551 0.768 90 1.0976 5150768
0.7879 0.8107 95 1.0955 5441720
0.844 0.8533 100 1.0929 5720656
0.7869 0.896 105 1.0932 6007136
0.8237 0.9387 110 1.0916 6286960
0.768 0.9813 115 1.0900 6573592

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/gemma-2-2b_hs2_iter1_sftsd2

Base model

google/gemma-2-2b
Finetuned
(217)
this model