qwerrwe / configs /cerebras_1_3B_alpaca.yml

Commit History

swap batch size for gradient accumulation steps to decouple from num gpu
c2a0792

winglian commited on

Update wandb_log_model on cerebras_1_3B_alpaca.yml
b6a539b
unverified

Viktorius Suwandi commited on

4bit quantized support (wip)
77fca25

winglian commited on

deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches
d1aed4c

winglian commited on

more logging, wandb fixes
05fffb5

winglian commited on

improve prepared dataset loading, fix inference
b164725

winglian commited on

config chooser, update readme instructions, device config, llama flash attention, debug out the labels, fix config key checks, other bugfixes
f2a2029

winglian commited on