Model training schedule and parameters

#3
by AntonioMartini - opened

Hello,

Really nice model! is there any information on the employed training schedule and any additional training parameters that would help to replicate the results?

Thanks,
Antonio

Hi... thanks!

Here are the hyperparameters:
lr = 5e-4
lr_schedule = constant
wd=0.1
adam_beta1=0.9, adam_beta2 = 0.95
context length=512
batch size=80
gradient accumulation steps=16

I think that's about it...

that's very helpful thanks!

This comment has been hidden
AntonioMartini changed discussion status to closed

Sign up or log in to comment