metadata

language: fr
license: mit
library_name: transformers
tags:
  - audio
  - audio-to-audio
  - speech
datasets:
  - Cnam-LMSSC/vibravox
model-index:
  - name: EBEN(M=4,P=4,Q=4)
    results:
      - task:
          name: Bandwidth Extension
          type: speech-enhancement
        dataset:
          name: Vibravox["throat_microphone"]
          type: Cnam-LMSSC/vibravox
          args: fr
        metrics:
          - name: Test STOI, in-domain training
            type: stoi
            value: 0.8338
          - name: Test Noresqa-MOS, in-domain training
            type: n-mos
            value: 3.862

Model Card

Developed by: Cnam-LMSSC
Model type: EBEN (see publication)
Language: French
License: MIT
Finetuned dataset: speech_clean subset of Cnam-LMSSC/vibravox
Samplerate for usage: 16kHz

Overview

This bandwidth extension model is trained on one specific body conduction sensor data from the Vibravox dataset. The model is designed to to enhance the audio quality of body-conducted captured speech, by denoising and regenerating mid and high frequencies from low frequency content only.

Disclaimer

This model has been trained for specific non-conventional speech sensors and is intended to be used with in-domain data. Please be advised that using these models outside their intended sensor data may result in suboptimal performance.

Training procedure

Detailed instructions for reproducing the experiments are available on the jhauret/vibravox Github repository.

Inference script :

import torch, torchaudio
from vibravox import EBENGenerator
from datasets import load_dataset

model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_throat_microphone")
test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)

audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.throat_microphone"]["array"])
audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)

cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz)
enhanced_audio_16kHz = model(cut_audio_16kHz)

Link to other BWE models trained on other body conducted sensors :

An entry point to all audio bandwidth extension (BWE) models trained on different sensor data from the trained on different sensor data from the Vibravox dataset is available at https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_bwe_models.