qwerrwe / configs /llama_65B_alpaca.yml

Commit History

swap batch size for gradient accumulation steps to decouple from num gpu
c2a0792

winglian commited on

Update wandb_log_model on llama_65B_alpaca.yml
232b931
unverified

Viktorius Suwandi commited on

fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release
8d43785

winglian commited on

fix lora target module, require explicit flash attention, fix min logging steps, don't use adam8bit for int4, hash prepared datasets, support hf hub datasets
87e073d

winglian commited on

4bit quantized support (wip)
77fca25

winglian commited on

deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches
d1aed4c

winglian commited on

add llama 7b config and fiz lora_fan_in_fan_out for llama (copy pasta bug)
d060c80

winglian commited on

more logging, wandb fixes
05fffb5

winglian commited on

improve prepared dataset loading, fix inference
b164725

winglian commited on

helpful info output
937f44f

winglian commited on

various bugfixes
80b2ed2

winglian commited on

more fixes and prep for llama training
949a27b

winglian commited on