Clone a voice in 5 seconds to generate arbitrary speech in real-time
-
Updated
Mar 9, 2026 - Python
Clone a voice in 5 seconds to generate arbitrary speech in real-time
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Generate audiobooks from e-books, voice cloning & 1158+ languages!
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
A simple, high-quality voice conversion tool focused on ease of use and performance.
MARS5 speech model (TTS) from CAMB.AI
GPT-SoVITS ONNX Inference Engine & Model Converter
A Python/Pytorch app for easily synthesising human voices
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
A webui for different audio related Neural Networks
AI Podcast Generator for bilingual episodes, Multi Languages, Alternative to NotebookLLM;真人对话AI播客生成器,多语言,多音色
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
Add a description, image, and links to the voice-cloning topic page so that developers can more easily learn about it.
To associate your repository with the voice-cloning topic, visit your repo's landing page and select "manage topics."