datasets:
- AMI
language:
- en
license: apache-2.0
metrics:
- name: IHM test WER
type: wer
value: 18.06
- name: SDM test WER
type: wer
value: 32.61
- name: GSS test WER
type: wer
value: 23.03
tags:
- k2
- icefall
AMI
This is an ASR recipe for the AMI corpus. AMI provides recordings from the speaker's headset and lapel microphones, and also 2 array microphones containing 8 channels each. We pool data in the following 4 ways and train a single model on the pooled data:
(i) individual headset microphone (IHM) (ii) IHM with simulated reverb (iii) Single distant microphone (SDM) (iv) GSS-enhanced array microphones
Speed perturbation and MUSAN noise augmentation are additionally performed on the pooled data. Here are the statistics of the combined training data:
>>> cuts_train.describe()
Cuts count: 1222053
Total duration (hh:mm:ss): 905:00:28
Speech duration (hh:mm:ss): 905:00:28 (99.9%)
Duration statistics (seconds):
mean 2.7
std 2.8
min 0.0
25% 0.6
50% 1.6
75% 3.8
99% 12.3
99.5% 13.9
99.9% 18.4
max 36.8
Note: This recipe additionally uses GSS for enhancement
of far-field array microphones, but this is optional (see prepare.sh
for details).
Performance Record
pruned_transducer_stateless7
The following are decoded using modified_beam_search
:
Evaluation set | dev WER | test WER |
---|---|---|
IHM | 19.23 | 18.06 |
SDM | 31.16 | 32.61 |
MDM (GSS-enhanced) | 22.08 | 23.03 |
See RESULTS for details.