ales commited on
Commit
ed29fed
1 Parent(s): e6570a6

upd run_3/readme.md

Browse files
Files changed (1) hide show
  1. run_3/readme.md +4 -0
run_3/readme.md CHANGED
@@ -17,6 +17,10 @@ Checkpoint used: checkpoint-12000
17
  ## Advices
18
  * I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run
19
  * need to set number of steps > 6000. because model improved WER veeery slowly
 
 
 
 
20
  * can use original Mozilla Common Voice dataset instead of a HuggingFace's one.<br>
21
  the reason is that original contains multiple voicings of same sentence -
22
  so there is at least twice as more data.<br>
 
17
  ## Advices
18
  * I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run
19
  * need to set number of steps > 6000. because model improved WER veeery slowly
20
+ * probably need to load `optimizer.pt` and `scaler.pt` from checkpoint before resuming training.
21
+ otherwise, I guess, we
22
+ * reinitialize optimizer and loose history of parameters momentum (exponential weighted average)
23
+ * scale loss incorrectly
24
  * can use original Mozilla Common Voice dataset instead of a HuggingFace's one.<br>
25
  the reason is that original contains multiple voicings of same sentence -
26
  so there is at least twice as more data.<br>