ales
/

whisper-small-belarusian

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

ales commited on Dec 22, 2022

Commit

ed29fed

•

1 Parent(s): e6570a6

upd run_3/readme.md

Files changed (1) hide show

run_3/readme.md +4 -0

run_3/readme.md CHANGED Viewed

@@ -17,6 +17,10 @@ Checkpoint used: checkpoint-12000
 ## Advices
 * I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run
 * need to set number of steps > 6000. because model improved WER veeery slowly
 * can use original Mozilla Common Voice dataset instead of a HuggingFace's one.<br>
   the reason is that original contains multiple voicings of same sentence -
   so there is at least twice as more data.<br>

 ## Advices
 * I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run
 * need to set number of steps > 6000. because model improved WER veeery slowly
+* probably need to load `optimizer.pt` and `scaler.pt` from checkpoint before resuming training.
+  otherwise, I guess, we
+  * reinitialize optimizer and loose history of parameters momentum (exponential weighted average)
+  * scale loss incorrectly
 * can use original Mozilla Common Voice dataset instead of a HuggingFace's one.<br>
   the reason is that original contains multiple voicings of same sentence -
   so there is at least twice as more data.<br>