Skip to content

Complementary: FunASR/SenseVoice — ASR for voice cloning pipelines #482

Description

@LauraGPT

Summary

OpenVoice does voice cloning/synthesis. FunASR provides the complementary ASR front-end:

  • 170x faster transcription — RTF 0.006-0.007
  • 50+ languages (SenseVoice)
  • Speaker diarization (CAM++) — useful for multi-speaker voice cloning
  • OpenAI-compatible API — easy to chain ASR → voice cloning

In a voice pipeline: FunASR transcribes → OpenVoice clones voice → TTS generates. Both are Apache 2.0.

pip install funasr
funasr speaker_audio.wav --spk -f json  # Identify speakers
# → feed speaker audio to OpenVoice for cloning

GitHub: https://github.com/modelscope/FunASR (17.8K+ stars)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions