14 Mar 03:00

71f96a8

Latest

We finally added support for running Python Autodub without an NVIDIA GPU! But before you get too excited, let's talk about expectations.

The Good News:

The entire pipeline (Demucs, WhisperX, and F5-TTS) is no longer hardcoded to cuda. If you don't have an NVIDIA card, the system will gracefully fall back to processing everything on your CPU instead of instantly crashing.
Linux AMD Users: You win today. If you manually install the PyTorch ROCm wheels, this code will automatically hijack it, treat it like an Nvidia card, and run at blazing fast speeds.

The Bad News (Windows/Mac/Intel GPU users):

You are officially on the CPU fallback path. AI transformer models are incredibly memory-bandwidth starved. Without a GPU's VRAM, your processing speed will drop off a cliff.
You will think this program is a crypto miner. You will think it is stuck calculating pi to ten trillion digits. No, it is not. You just REALLY should not run this on the CPU unless your idea of fun is watching paint dry or grass grow. Expect a 10-minute video to take several hours to process.

Developer Note: I am officially marking the CPU mode as Untested. Why? Because I am not a masochist and I refuse to sit here for 4 hours to watch a test file finish rendering.

It should work mathematically perfectly. But if it breaks for you on hour 3, please send a log file so I can fix it without having to run it myself!

Assets 4

13 Mar 04:18

Daniel-McLarty

v2.1.0

48012b2

Python Autodub v2.1.0

Please check the changelog.
TL;DR: Lots of memory‑usage fixes. Longer movies should process without blowing up now. Adjusted some Demucs behavior because 4.0 apparently hoards RAM like a dragon unless you smack it with constraints. Fingers crossed that’s sorted. Also tweaked some UI bits so the GC can actually do its job.

P.S. Just download this version. It behaves exactly the same for you, it just contains the last six hours of my life dedicated to making sure it no longer demands 40GB of RAM for no reason.

Assets 4

11 Mar 21:15

Daniel-McLarty

v2.0.0

4cd8852

Python Autodub v2.0.0

Python Autodub v2.0.0: The "Next-Gen Audio" Overhaul

Welcome to Python Autodub 2.0.0! This release is a massive milestone that transforms the project from a collection of developer scripts into a polished, standalone, and lightning-fast desktop application.

We’ve completely rebuilt the audio processing engine from the ground up to deliver flawless lip-syncing, introduced state-of-the-art voice generation, and expanded our language capabilities globally.

Here is what’s new in v2.0.0:

Universal Translation & The F5-TTS Engine

We have officially retired the older Coqui XTTSv2 model and upgraded to the transformer-based F5-TTS engine.

Any-to-Any Language Support: Python Autodub is no longer restricted to static language pairs. The pipeline dynamically translates between 16 different languages, including English, Spanish, French, German, Japanese, Korean, Arabic, and more.
Better Prosody & Emotion: The new F5-TTS engine dramatically improves the emotional delivery and natural pacing of generated lines.
Hallucination Defenses: We've implemented strict context-window filtering and forced terminal punctuation to stop the AI from generating phantom audio or "rambling".

Perfect Sync: Introducing "The Guillotine"

The most common issue with AI dubbing is dialogue bleeding over its designated subtitle window. v2.0.0 completely solves this with a brand-new, frame-accurate audio backend built purely on numpy and librosa.

The Shrink-Only Guillotine: If the AI generates a line that is too long, the pipeline intelligently time-stretches the audio (up to 2x speed). If it still exceeds the window, it safely hard-truncates the tail, guaranteeing it never overlaps with the next line.
Zero Latency Starts: The system now aggressively trims dead air and "warm-up" latency from the exact millisecond the AI begins speaking, ensuring frame-perfect start times for lip-syncing.

Massive Performance Leaps

Smart Diarization Bypass: Only dubbing a single speaker? The pipeline now automatically bypasses the Pyannote Diarization model entirely, saving you ~3GB of VRAM and slicing processing times drastically.
No More Out-Of-Memory Crashes: We've introduced aggressive VRAM clearing and garbage collection between major processing steps to ensure stability on massive, feature-length video files.

The Standalone App Experience

Python Autodub is now a true desktop application. You no longer need to mess with terminal commands or manual python environments.

Native OS Launchers: Simply double-click Launch_UI.ps1 (or .exe) on Windows, or install_linux_shortcut.sh on Linux. The app will automatically sync its own isolated environment using uv, check for C++ compilers, and boot the UI.
Cleaner Terminals: We've suppressed the messy inference logs from F5-TTS to keep your console clean, and routed Hugging Face download progress bars directly into the Tkinter GUI.

Breaking Changes for Legacy Users

Because this is a major architectural rebuild, the following legacy workflows are no longer supported:

The pydub library has been completely purged from the project.
Headless developer scripts (run_dub.py and test_env.py) have been removed. All interaction must now be done through the native Graphical User Interface.

Ready to upgrade? Download the latest release and simply run the launcher for your OS!

Assets 4

09 Mar 15:27

Daniel-McLarty

v1.2.1

61e0967

Python Autodub v1.2.1

🚀 Release: v1.2.1

This patch focuses heavily on quality-of-life improvements, making the initial setup completely frictionless for non-developers and drastically improving the quality of the final audio mix.

If you've experienced silent dialogue tracks or background audio that was way too loud, this update fixes those issues entirely!

✨ Key Highlights

Zero-Touch Setup: You no longer need to manually install Microsoft Build Tools. If your system is missing the required C++ compilers for our audio stack, the launcher will automatically fetch and install them for you. Just click run and let it do the heavy lifting!
Studio-Quality Audio Mixing: We've re-calibrated the master mixing targets. Background music and noise are now properly pushed down to a -26 LUFS floor, ensuring the newly generated dialogue tracks sit clearly and comfortably on top of the mix without fighting for volume.
Bulletproof Audio Assembly: We've upgraded our custom FFmpeg binary to natively support time-stretching, and added a smart failsafe. If a generated voice line ever fails to time-stretch, the pipeline will safely fall back to the original audio instead of crashing or leaving a patch of dead air.

📝 Full Changelog

Project Management & Deployment

Automated MSVC Build Tools Installer: The PowerShell launcher now uses vswhere.exe to check for C++ build environments. If missing, it automatically downloads and passively installs the required Visual Studio C++ workloads, preventing uv from crashing when compiling C-extensions from source.

Audio Pipeline Improvements

Better Audio Mixing (LUFS Calibration): Fixed an issue where the background music/noise was aggressively loud. Lowered the background normalization target from -14 LUFS to -26 LUFS, ensuring the background track sits comfortably underneath the -12 LUFS focal dialogue track.
Custom FFmpeg Upgrade: Updated the minimal MSYS2 FFmpeg build configuration to explicitly include the atempo filter (--enable-filter=...,atempo), enabling native WSOLA time-stretching while keeping the binary ultra-lightweight.

Bug Fixes

Resilient Audio Assembly (Step 6): Fixed a critical bug where the pipeline would silently fail to merge dialogue if FFmpeg couldn't time-stretch a line (resulting in a completely blank dialogue track). Switched subprocess.run to run_and_log to catch errors, and added a safe fallback that inserts the original, unadjusted TTS audio if the _adj.wav file fails to generate.

Assets 3

09 Mar 06:21

Daniel-McLarty

v1.2.0

561b4b1

Python Autodub v1.2.0

[v1.2.0] — 2026-03-09

📦 Dependency & Project Management

Unified Environment with uv: The project has fully migrated to uv. A new pyproject.toml now acts as the single source of truth for dependencies, build systems, and project metadata, ensuring lightning-fast installs and reproducible environments.

[v1.1.0] — 2026-03-08

Developer Note: This is the "Quality of Life & Architecture" update. We’ve completed a massive refactor to separate the GUI from the AI logic, alongside highly requested features like dark mode, smart resuming, and full state saving.

🎨 UI & User Experience

Native OS Theming: Integrated sv_ttk and darkdetect to automatically apply a modern Windows 11 Light/Dark theme based on your system preferences.
Full UI State Persistence: The app now remembers everything. Expanded ui_config.json functionality ensures your workers, max speakers, and output paths are saved between sessions.
Custom Output Controls: You can now specify the exact output directory and filename for the final dubbed MKV directly in the UI.
Force Clean Build: Added a "Clean Build" checkbox to manually purge the temp/ directory (while safely preserving logs) for a guaranteed fresh pipeline run.

🧠 Pipeline Intelligence & Logging

Smart Resume via File Hashing: Implemented MD5 chunk-hashing for video and SRT inputs. If your files haven't changed, the pipeline resumes using existing stems and cloned voices, saving massive amounts of processing time.
Enhanced Worker Logging: Logs now explicitly print the text of the line being generated (e.g., [Worker] Successfully generated Line 42: "Hello world") for better transparency during long runs.

🏗️ Architecture & Refactoring

Separation of Concerns: Broke apart the UI "God Class." All backend logic (FFmpeg, Demucs, Pyannote, and TTS) is now housed in a dedicated DubbingPipeline class to prevent UI freezing.
Strict Configuration Typing: Replaced generic dictionaries with a PipelineConfig dataclass, providing better type-checking and developer IDE support.
Pathlib Migration: Refactored internal file management to use pathlib, eliminating cross-platform "slash" issues—particularly for Demucs on Windows.

Assets 3

07 Mar 06:03

Daniel-McLarty

v1.0.5

edc6d04

Python Autodub v1.0.5

Python Autodub v1.0.5 — Initial Release Notes

Welcome to the first official release of Python Autodub. This release encompasses all development from the initial commit up to v1.0.4, delivering a comprehensive, fully portable, and automated AI-powered video dubbing pipeline.

Below is an overview of the core features, architectural improvements, and portability enhancements included in this milestone.

Core Features

Automated Dubbing Pipeline: The application extracts audio, separates vocals, diarizes speakers, generates translated voice clones, and muxes the re-assembled audio back into a final MKV video file.
Advanced Vocal Separation: Isolates background noise and music from dialogue using 4-stem Demucs.
Speaker Diarization: Identifies and separates up to 22 different speakers in the audio using Pyannote.
Voice Cloning & Translation: Automatically extracts clean audio samples for each identified speaker and uses Coqui XTTSv2 to generate translated English lines.
Hybrid Emotional Cloning: An optional setting allows users to blend the original emotional cadence of the source vocals with the base voice clone.
Graphical User Interface (GUI): Features an intuitive Tkinter-based GUI for easy configuration of thresholds, worker limits, and file paths.

Distribution & Portability

Standalone Executables: Users can now launch the application directly using Python-Autodub.exe without manually installing Python, Git, or managing system paths.
Auto-Bootstrapping Engine: Integrated a high-performance PowerShell launcher that automatically handles uv installation, Python 3.10 fetching, and virtual environment management on the first run.
Hermetic Environment: The application builds its own isolated Python environment (dub_env) locally within the project root, ensuring zero interference with existing system-wide Python installations.
Fully Portable AI Architecture: Overrode variables like HF_HOME, TORCH_HOME, XDG_CACHE_HOME, and TTS_HOME to force all multi-gigabyte models to download into a local models/ directory, keeping hidden system folders clean.
Zero-Config FFmpeg (Windows): A custom-built, optimized, LGPL-compliant FFmpeg binary is included in the bin/ folder and is dynamically injected into the runtime PATH.

Setup & Stability Improvements

Deterministic Environment Locking: Replaced the loose requirements.txt with a strict lock-file system using uv, making setups 100% reproducible and eliminating dependency conflicts.
Fixed Download Race Conditions: Resolved a critical bug where parallel multiprocessing workers would simultaneously attempt to download the XTTSv2 model and corrupt the files.
Smart Pre-Fetching: The main thread now single-handedly pre-fetches the TTS model (or verifies it on disk) before spinning up background workers.
Strict Import Ordering: Refactored top-level scripts so that environment variables and path overrides are strictly defined before heavy AI libraries (torch, pyannote) are imported, preventing initialization lock-in.

User Experience (UX) Enhancements

Persistent UI Memory: Implemented ui_config.json to save user state, allowing the application to remember settings like your Hugging Face token between sessions.
Integrated TOS Agreement: Fixed a headless worker crash by adding a formal Coqui CPML Terms of Service checkbox to the UI and a CLI prompt.
TOS State Saving: License agreements are written to a hidden .tos_agreed file to permanently suppress the prompt for future runs.

How To Use

Linux Users: RTFM.
Windows Users: Download the zip, extract it, run the exe, tell SmartScreen to buzz off (press "More info" -> "Run anyway"), and wait—the first load will take a second while it builds your environment.

Assets 3

Releases: Daniel-McLarty/Python-Autodub

Python Autodub v2.2.1

Uh oh!

Python Autodub v2.1.0

Uh oh!