desh2608
/

icefall-asr-ami-pruned-transducer-stateless7

Model card Files Files and versions Metrics Training metrics Community

icefall-asr-ami-pruned-transducer-stateless7 / README.md

desh2608's picture

add pretrained model and logs for AMI

9a63d31 almost 2 years ago

|

1.45 kB

	# AMI

	This is an ASR recipe for the AMI corpus. AMI provides recordings from the speaker's
	headset and lapel microphones, and also 2 array microphones containing 8 channels each.
	We pool data in the following 4 ways and train a single model on the pooled data:

	(i) individual headset microphone (IHM)
	(ii) IHM with simulated reverb
	(iii) Single distant microphone (SDM)
	(iv) GSS-enhanced array microphones

	Speed perturbation and MUSAN noise augmentation are additionally performed on the pooled
	data. Here are the statistics of the combined training data:

	```python
	>>> cuts_train.describe()
	Cuts count: 1222053
	Total duration (hh:mm:ss): 905:00:28
	Speech duration (hh:mm:ss): 905:00:28 (99.9%)
	Duration statistics (seconds):
	mean 2.7
	std 2.8
	min 0.0
	25% 0.6
	50% 1.6
	75% 3.8
	99% 12.3
	99.5% 13.9
	99.9% 18.4
	max 36.8
	```

	Note: This recipe additionally uses [GSS](https://github.com/desh2608/gss) for enhancement
	of far-field array microphones, but this is optional (see `prepare.sh` for details).

	## Performance Record

	### pruned_transducer_stateless7

	The following are decoded using `modified_beam_search`:

	\| Evaluation set \| dev WER \| test WER \|
	\|--------------------------\|------------\|---------\|
	\| IHM \| 19.23 \| 18.06 \|
	\| SDM \| 31.16 \| 32.61 \|
	\| MDM (GSS-enhanced) \| 22.08 \| 23.03 \|

	See [RESULTS](/egs/ami/ASR/RESULTS.md) for details.