Skip to content

Speech2IPA Noise Robustness #12

@SanderGi

Description

@SanderGi

As pointed out in #11, transcription is very sensitive to background noise. We have run some preliminary experiments:

  1. Using webrtcvad to filter out non-speech segments: doesn't handle overlapping speech/sounds
  2. Training models on various kinds of augmented noisy speech: does not generalize well to unseen types of noise
Image

We need a low latency approach to remove noise that generalizes well to different types of noise. Some ideas that different PRs can explore:

  1. Enable noise suppression via Web API on supported devices/browsers
  2. Evaluate various open-source noise suppression models, see which we can run on-device in the browser and which would need to be hosted on a server with a GPU
  3. Look into more advanced noise suppression based on binaural audio and/or speaker specific embeddings
  4. Look into more noise robust Speech2IPA architectures, training objectives/regularization, and data augmentation approaches

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions