nvidia
/

stt_be_conformer_ctc_large

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Community

smajumdar94 commited on Oct 5, 2022

Commit

13f33b0

•

1 Parent(s): e502137

Update README.md

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -1,7 +1,4 @@
 ---
 language:
 - be
 library_name: nemo
@@ -78,8 +75,9 @@ asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained("nvidia/stt_be_con
 ```
 ### Transcribing using Python
-```
 Simply do:
 ```
 asr_model.transcribe(['sample.wav'])
 ```
@@ -88,7 +86,7 @@ asr_model.transcribe(['sample.wav'])
 ```shell
 python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
- pretrained_name="nvidia/stt_en_conformer_ctc_large"
  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
 ```
@@ -120,11 +118,15 @@ All the models in this collection are trained on a composite dataset (NeMo ASRSE
 ## Performance
-Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding. WER on dev is 4.8%
 ## Limitations
-Since all models are trained on just MCV-10 dataset, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
 ## Deployment with NVIDIA Riva

 ---
 language:
 - be
 library_name: nemo
 ```
 ### Transcribing using Python
 Simply do:
 ```
 asr_model.transcribe(['sample.wav'])
 ```
 ```shell
 python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
+ pretrained_name="nvidia/stt_be_conformer_ctc_large"
  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
 ```
 ## Performance
+Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
+| Version | Tokenizer            | Vocabulary Size | MCV 10 Test | Train Dataset |
+|---------|----------------------|-----------------|-------------|---------------|
+| 1.12.0  | Google Sentencepiece | 1024            | 4.8         | MCV 10        |
 ## Limitations
+Since all models are trained on just academic datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
 ## Deployment with NVIDIA Riva