Releases: Daniel-McLarty/Python-Autodub
Python Autodub v2.2.1
We finally added support for running Python Autodub without an NVIDIA GPU! But before you get too excited, let's talk about expectations.
The Good News:
-
The entire pipeline (Demucs, WhisperX, and F5-TTS) is no longer hardcoded to cuda. If you don't have an NVIDIA card, the system will gracefully fall back to processing everything on your CPU instead of instantly crashing.
-
Linux AMD Users: You win today. If you manually install the PyTorch ROCm wheels, this code will automatically hijack it, treat it like an Nvidia card, and run at blazing fast speeds.
The Bad News (Windows/Mac/Intel GPU users):
-
You are officially on the CPU fallback path. AI transformer models are incredibly memory-bandwidth starved. Without a GPU's VRAM, your processing speed will drop off a cliff.
-
You will think this program is a crypto miner. You will think it is stuck calculating pi to ten trillion digits. No, it is not. You just REALLY should not run this on the CPU unless your idea of fun is watching paint dry or grass grow. Expect a 10-minute video to take several hours to process.
Developer Note: I am officially marking the CPU mode as Untested. Why? Because I am not a masochist and I refuse to sit here for 4 hours to watch a test file finish rendering.
It should work mathematically perfectly. But if it breaks for you on hour 3, please send a log file so I can fix it without having to run it myself!
Python Autodub v2.1.0
Please check the changelog.
TL;DR: Lots of memory‑usage fixes. Longer movies should process without blowing up now. Adjusted some Demucs behavior because 4.0 apparently hoards RAM like a dragon unless you smack it with constraints. Fingers crossed that’s sorted. Also tweaked some UI bits so the GC can actually do its job.
P.S. Just download this version. It behaves exactly the same for you, it just contains the last six hours of my life dedicated to making sure it no longer demands 40GB of RAM for no reason.
Python Autodub v2.0.0
Python Autodub v2.0.0: The "Next-Gen Audio" Overhaul
Welcome to Python Autodub 2.0.0! This release is a massive milestone that transforms the project from a collection of developer scripts into a polished, standalone, and lightning-fast desktop application.
We’ve completely rebuilt the audio processing engine from the ground up to deliver flawless lip-syncing, introduced state-of-the-art voice generation, and expanded our language capabilities globally.
Here is what’s new in v2.0.0:
Universal Translation & The F5-TTS Engine
We have officially retired the older Coqui XTTSv2 model and upgraded to the transformer-based F5-TTS engine.
-
Any-to-Any Language Support: Python Autodub is no longer restricted to static language pairs. The pipeline dynamically translates between 16 different languages, including English, Spanish, French, German, Japanese, Korean, Arabic, and more.
-
Better Prosody & Emotion: The new F5-TTS engine dramatically improves the emotional delivery and natural pacing of generated lines.
-
Hallucination Defenses: We've implemented strict context-window filtering and forced terminal punctuation to stop the AI from generating phantom audio or "rambling".
Perfect Sync: Introducing "The Guillotine"
The most common issue with AI dubbing is dialogue bleeding over its designated subtitle window. v2.0.0 completely solves this with a brand-new, frame-accurate audio backend built purely on numpy and librosa.
-
The Shrink-Only Guillotine: If the AI generates a line that is too long, the pipeline intelligently time-stretches the audio (up to 2x speed). If it still exceeds the window, it safely hard-truncates the tail, guaranteeing it never overlaps with the next line.
-
Zero Latency Starts: The system now aggressively trims dead air and "warm-up" latency from the exact millisecond the AI begins speaking, ensuring frame-perfect start times for lip-syncing.
Massive Performance Leaps
-
Smart Diarization Bypass: Only dubbing a single speaker? The pipeline now automatically bypasses the Pyannote Diarization model entirely, saving you ~3GB of VRAM and slicing processing times drastically.
-
No More Out-Of-Memory Crashes: We've introduced aggressive VRAM clearing and garbage collection between major processing steps to ensure stability on massive, feature-length video files.
The Standalone App Experience
Python Autodub is now a true desktop application. You no longer need to mess with terminal commands or manual python environments.
-
Native OS Launchers: Simply double-click
Launch_UI.ps1(or.exe) on Windows, orinstall_linux_shortcut.shon Linux. The app will automatically sync its own isolated environment usinguv, check for C++ compilers, and boot the UI. -
Cleaner Terminals: We've suppressed the messy inference logs from F5-TTS to keep your console clean, and routed Hugging Face download progress bars directly into the Tkinter GUI.
Breaking Changes for Legacy Users
Because this is a major architectural rebuild, the following legacy workflows are no longer supported:
-
The
pydublibrary has been completely purged from the project. -
Headless developer scripts (
run_dub.pyandtest_env.py) have been removed. All interaction must now be done through the native Graphical User Interface.
Ready to upgrade? Download the latest release and simply run the launcher for your OS!
Python Autodub v1.2.1
🚀 Release: v1.2.1
This patch focuses heavily on quality-of-life improvements, making the initial setup completely frictionless for non-developers and drastically improving the quality of the final audio mix.
If you've experienced silent dialogue tracks or background audio that was way too loud, this update fixes those issues entirely!
✨ Key Highlights
-
Zero-Touch Setup: You no longer need to manually install Microsoft Build Tools. If your system is missing the required C++ compilers for our audio stack, the launcher will automatically fetch and install them for you. Just click run and let it do the heavy lifting!
-
Studio-Quality Audio Mixing: We've re-calibrated the master mixing targets. Background music and noise are now properly pushed down to a
-26 LUFSfloor, ensuring the newly generated dialogue tracks sit clearly and comfortably on top of the mix without fighting for volume. -
Bulletproof Audio Assembly: We've upgraded our custom FFmpeg binary to natively support time-stretching, and added a smart failsafe. If a generated voice line ever fails to time-stretch, the pipeline will safely fall back to the original audio instead of crashing or leaving a patch of dead air.
📝 Full Changelog
Project Management & Deployment
- Automated MSVC Build Tools Installer: The PowerShell launcher now uses
vswhere.exeto check for C++ build environments. If missing, it automatically downloads and passively installs the required Visual Studio C++ workloads, preventinguvfrom crashing when compiling C-extensions from source.
Audio Pipeline Improvements
-
Better Audio Mixing (LUFS Calibration): Fixed an issue where the background music/noise was aggressively loud. Lowered the background normalization target from
-14 LUFSto-26 LUFS, ensuring the background track sits comfortably underneath the-12 LUFSfocal dialogue track. -
Custom FFmpeg Upgrade: Updated the minimal MSYS2 FFmpeg build configuration to explicitly include the
atempofilter (--enable-filter=...,atempo), enabling native WSOLA time-stretching while keeping the binary ultra-lightweight.
Bug Fixes
- Resilient Audio Assembly (Step 6): Fixed a critical bug where the pipeline would silently fail to merge dialogue if FFmpeg couldn't time-stretch a line (resulting in a completely blank dialogue track). Switched
subprocess.runtorun_and_logto catch errors, and added a safe fallback that inserts the original, unadjusted TTS audio if the_adj.wavfile fails to generate.
Python Autodub v1.2.0
[v1.2.0] — 2026-03-09
📦 Dependency & Project Management
- Unified Environment with
uv: The project has fully migrated touv. A newpyproject.tomlnow acts as the single source of truth for dependencies, build systems, and project metadata, ensuring lightning-fast installs and reproducible environments.
[v1.1.0] — 2026-03-08
Developer Note: This is the "Quality of Life & Architecture" update. We’ve completed a massive refactor to separate the GUI from the AI logic, alongside highly requested features like dark mode, smart resuming, and full state saving.
🎨 UI & User Experience
-
Native OS Theming: Integrated
sv_ttkanddarkdetectto automatically apply a modern Windows 11 Light/Dark theme based on your system preferences. -
Full UI State Persistence: The app now remembers everything. Expanded
ui_config.jsonfunctionality ensures your workers, max speakers, and output paths are saved between sessions. -
Custom Output Controls: You can now specify the exact output directory and filename for the final dubbed MKV directly in the UI.
-
Force Clean Build: Added a "Clean Build" checkbox to manually purge the
temp/directory (while safely preserving logs) for a guaranteed fresh pipeline run.
🧠 Pipeline Intelligence & Logging
-
Smart Resume via File Hashing: Implemented MD5 chunk-hashing for video and SRT inputs. If your files haven't changed, the pipeline resumes using existing stems and cloned voices, saving massive amounts of processing time.
-
Enhanced Worker Logging: Logs now explicitly print the text of the line being generated (e.g.,
[Worker] Successfully generated Line 42: "Hello world") for better transparency during long runs.
🏗️ Architecture & Refactoring
-
Separation of Concerns: Broke apart the UI "God Class." All backend logic (FFmpeg, Demucs, Pyannote, and TTS) is now housed in a dedicated
DubbingPipelineclass to prevent UI freezing. -
Strict Configuration Typing: Replaced generic dictionaries with a
PipelineConfigdataclass, providing better type-checking and developer IDE support. -
Pathlib Migration: Refactored internal file management to use
pathlib, eliminating cross-platform "slash" issues—particularly for Demucs on Windows.
Python Autodub v1.0.5
Python Autodub v1.0.5 — Initial Release Notes
Welcome to the first official release of Python Autodub. This release encompasses all development from the initial commit up to v1.0.4, delivering a comprehensive, fully portable, and automated AI-powered video dubbing pipeline.
Below is an overview of the core features, architectural improvements, and portability enhancements included in this milestone.
Core Features
-
Automated Dubbing Pipeline: The application extracts audio, separates vocals, diarizes speakers, generates translated voice clones, and muxes the re-assembled audio back into a final MKV video file.
-
Advanced Vocal Separation: Isolates background noise and music from dialogue using 4-stem Demucs.
-
Speaker Diarization: Identifies and separates up to 22 different speakers in the audio using Pyannote.
-
Voice Cloning & Translation: Automatically extracts clean audio samples for each identified speaker and uses Coqui XTTSv2 to generate translated English lines.
-
Hybrid Emotional Cloning: An optional setting allows users to blend the original emotional cadence of the source vocals with the base voice clone.
-
Graphical User Interface (GUI): Features an intuitive Tkinter-based GUI for easy configuration of thresholds, worker limits, and file paths.
Distribution & Portability
-
Standalone Executables: Users can now launch the application directly using
Python-Autodub.exewithout manually installing Python, Git, or managing system paths. -
Auto-Bootstrapping Engine: Integrated a high-performance PowerShell launcher that automatically handles uv installation, Python 3.10 fetching, and virtual environment management on the first run.
-
Hermetic Environment: The application builds its own isolated Python environment (
dub_env) locally within the project root, ensuring zero interference with existing system-wide Python installations. -
Fully Portable AI Architecture: Overrode variables like
HF_HOME,TORCH_HOME,XDG_CACHE_HOME, andTTS_HOMEto force all multi-gigabyte models to download into a localmodels/directory, keeping hidden system folders clean. -
Zero-Config FFmpeg (Windows): A custom-built, optimized, LGPL-compliant FFmpeg binary is included in the
bin/folder and is dynamically injected into the runtimePATH.
Setup & Stability Improvements
-
Deterministic Environment Locking: Replaced the loose
requirements.txtwith a strict lock-file system using uv, making setups 100% reproducible and eliminating dependency conflicts. -
Fixed Download Race Conditions: Resolved a critical bug where parallel multiprocessing workers would simultaneously attempt to download the XTTSv2 model and corrupt the files.
-
Smart Pre-Fetching: The main thread now single-handedly pre-fetches the TTS model (or verifies it on disk) before spinning up background workers.
-
Strict Import Ordering: Refactored top-level scripts so that environment variables and path overrides are strictly defined before heavy AI libraries (
torch,pyannote) are imported, preventing initialization lock-in.
User Experience (UX) Enhancements
-
Persistent UI Memory: Implemented
ui_config.jsonto save user state, allowing the application to remember settings like your Hugging Face token between sessions. -
Integrated TOS Agreement: Fixed a headless worker crash by adding a formal Coqui CPML Terms of Service checkbox to the UI and a CLI prompt.
-
TOS State Saving: License agreements are written to a hidden
.tos_agreedfile to permanently suppress the prompt for future runs.
How To Use
-
Linux Users: RTFM.
-
Windows Users: Download the zip, extract it, run the exe, tell SmartScreen to buzz off (press "More info" -> "Run anyway"), and wait—the first load will take a second while it builds your environment.