diarizers-community

community

AI & ML interests

Speaker diarization

diarizers-community aims to promote speaker diarization on the Hugging Face hub. It contains:

The available datasets are the CallHome (Japanese, Chinese, German, Spanish, English), AMI Corpus (English), Vox-Converse (English) and Simsamu (French). We aim to add more datasets in the future to better support speaker diarization on the Hub.

Each model has been fine-tuned on a specific Callhome language subset. They achieve better performances on multilingual data compared to pyannote's pre-trained segmentation-3.0 model (see benchmark for more details on model performance).

Together with diarizers-community, we release:

  • diarizers, a library for fine-tuning pyannote speaker diarization models using the Hugging Face ecosystem.

  • A google colab notebook, with a step-by-step guide on how to use diarizers.

Benchmark

Callhome test dataset Model DER False alarm Missed detection Confusion
Japanese Pretrained 25.44 2.30 17.45 5.69
Fine-tuned 18.23 6.31 6.91 5.01
Spanish Pretrained 33.44 2.59 25.19 5.66
Fine-tuned 25.72 6.87 12.73 6.12
English Pretrained 22.16 6.29 10.97 4.90
Fine-tuned 18.40 7.10 6.98 4.32
German Pretrained 21.90 3.10 14.25 4.55
Fine-tuned 16.75 5.00 7.75 4.00
Chinese Pretrained 19.73 4.81 9.82 5.11
Fine-tuned 15.95 5.04 7.24 3.68

Results are in %. They have been obtained using the test script from diarizers.