--- license: cc-by-4.0 language: - en library_name: nemo pipeline_tag: token-classification tags: - G2P - Grapheme-to-Phoneme --- # English G2P token classification model This is a non-autoregressive model for English grapheme-to-phoneme (G2P) conversion based on BERT architecture. It predicts phonemes in CMU format. Initial data was built using CMUdict v0.07 ## Intended uses & limitations The input is expected to contain english words consisting of latin letters and apostrophe, all letters separated by space. ### How to use Install NeMo. Download en_g2p.nemo (this model) ```bash git lfs install git clone https://huggingface.co/bene-ges/en_g2p_cmu_bert_large ``` Run ```bash python ${NEMO_ROOT}/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py \ pretrained_model=en_g2p_cmu_bert_large/en_g2p.nemo \ inference.from_file=input.txt \ inference.out_file=output.txt \ model.max_sequence_len=64 \ inference.batch_size=128 \ lang=en ``` Example of input file: ``` g e f f e r t p r o s c r i b e d p r o m i n e n t l y j o c e l y n m a r c e c a ' s s t a n k o w s k i m u f f l e ``` Example of output file: ``` G EH1 F ER0 T g e f f e r t G EH1 F ER0 T G EH1 F ER0 T PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN P R OW0 S K R AY1 B D p r o s c r i b e d P R OW0 S K R AY1 B D P R OW0 S K R AY1 B D PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN P R AA1 M AH0 N AH0 N T L IY0 p r o m i n e n t l y P R AA1 M AH0 N AH0 N T L IY0 P R AA1 M AH0 N AH0 N T L IY0 PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN JH AO1 S L IH0 N j o c e l y n JH AO1 S L IH0 N JH AO1 S L IH0 N PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN M AA0 R S EH1 K AH0 Z m a r c e c a ' s M AA0 R S EH1 K AH0 Z M AA0 R S EH1 K AH0 Z PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN S T AH0 NG K AO1 F S K IY0 s t a n k o w s k i S T AH0 NG K AO1 F S K IY0 S T AH0 NG K AO1 F S K IY0 PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN M AH1 F AH0L m u f f l e M AH1 F AH0_L M AH1 F AH0_L PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN ``` Note that the correct output tags are in the **third** column, input is in the second column. Tags correspond to input letters in a one-to-one fashion. If you remove `` tag, and replace `_` with space, you should get CMU-like transcription. ### How to use for TTS See this [script](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/tts_en_infer_from_cmu_phonemes.py) to run TTS directly from CMU phonemes.