Papers
arxiv:2210.00077

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Published on Sep 30, 2022
Authors:
,
,
,
,
,

Abstract

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating convolution and self-attention but they have not managed to match Conformer's performance. The recently introduced Branchformer achieves comparable performance to Conformer by using dedicated branches of convolution and self-attention and merging local and global context from each branch. In this paper, we propose E-Branchformer, which enhances Branchformer by applying an effective merging method and stacking additional point-wise modules. E-Branchformer sets new state-of-the-art word error rates (WERs) 1.81% and 3.65% on LibriSpeech test-clean and test-other sets without using any external training data.

Community

Sign up or log in to comment

Models citing this paper 24

Browse 24 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2210.00077 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 3