--- datasets: - AMI language: - en license: apache-2.0 metrics: - name: "IHM test WER" type: wer value: 18.06 - name: "SDM test WER" type: wer value: 32.61 - name: "GSS test WER" type: wer value: 23.03 tags: - k2 - icefall --- # AMI This is an ASR recipe for the AMI corpus. AMI provides recordings from the speaker's headset and lapel microphones, and also 2 array microphones containing 8 channels each. We pool data in the following 4 ways and train a single model on the pooled data: (i) individual headset microphone (IHM) (ii) IHM with simulated reverb (iii) Single distant microphone (SDM) (iv) GSS-enhanced array microphones Speed perturbation and MUSAN noise augmentation are additionally performed on the pooled data. Here are the statistics of the combined training data: ```python >>> cuts_train.describe() Cuts count: 1222053 Total duration (hh:mm:ss): 905:00:28 Speech duration (hh:mm:ss): 905:00:28 (99.9%) Duration statistics (seconds): mean 2.7 std 2.8 min 0.0 25% 0.6 50% 1.6 75% 3.8 99% 12.3 99.5% 13.9 99.9% 18.4 max 36.8 ``` **Note:** This recipe additionally uses [GSS](https://github.com/desh2608/gss) for enhancement of far-field array microphones, but this is optional (see `prepare.sh` for details). ## Performance Record ### pruned_transducer_stateless7 The following are decoded using `modified_beam_search`: | Evaluation set | dev WER | test WER | |--------------------------|------------|---------| | IHM | 19.23 | 18.06 | | SDM | 31.16 | 32.61 | | MDM (GSS-enhanced) | 22.08 | 23.03 | See [RESULTS](/egs/ami/ASR/RESULTS.md) for details.