Subdub is a command-line tool for:
- transcribing media with WhisperX,
- correcting/translating subtitles with LLMs through LiteLLM (or DeepL),
- generating dubbed speech with XTTS,
- and syncing dubbed audio back into video.
It was created to support the dubbing workflow in Pandrator, but it also works standalone.
Dubbing sample (Russian -> translated dub):
pandrator_example_dubbing.mp4
The app has been refactored from a monolithic file into modular pipeline stages.
subdub.cli+subdub.cli_args: entrypoint and argument parsing.subdub.app: runtime orchestration and stage ordering.subdub.tasks.*: stage modules (input,correction,translation,speech,transcribe).subdub.ai.client: centralized LiteLLM request layer, callbacks, structured output handling, cost tracking.subdub.media.*andsubdub.workflows.*: FFmpeg, TTS, sync, boundary-correction workflows.- Compatibility re-exports remain in
subdub.ai.translateandsubdub.models.
- Python 3.10+
- FFmpeg available in
PATH - WhisperX available either:
- as a direct
whisperxcommand, or - via Pandrator Pixi fallback (
WHISPERX_PIXI_EXEandWHISPERX_PIXI_MANIFEST)
- as a direct
- XTTS API server running at
http://localhost:8020(for dubbing/TTS)
Required external services depend on your model/task:
- LLM provider key(s):
ANTHROPIC_API_KEY,OPENAI_API_KEY,GEMINI_API_KEY, orOPENROUTER_API_KEY DEEPL_API_KEYwhen using--use-deeplHF_TOKENwhen using-diarize
Environment behavior notes:
OPENROUTER_APIis accepted as a legacy alias and copied toOPENROUTER_API_KEYif needed.- For localhost-style
-api_baseendpoints, Subdub auto-fillsOPENAI_API_KEY=lm-studiowhen no key is set. - Pandrator WhisperX fallback defaults to
../bin/pixi.exewith../envs/whisperx_installer/pixi.toml.
git clone https://github.com/lukaszliniewicz/Subdub.git
cd Subdub
pip install -e .Optional extras:
pip install -e .[dev]for tests/lint/type checkspip install -e .[gui]for manual correction GUI features
# Full pipeline: transcribe -> translate -> speech blocks -> TTS -> sync/mix
subdub -task full -i video.mp4 -sl English -tl Spanish -tts_voice voice.wav
# Transcription only
subdub -task transcribe -i video.mp4 -sl English
# Translate existing subtitles
subdub -task translate -i subtitles.srt -sl English -tl French
# Correct subtitles in-place language (no translation)
subdub -task correct -i subtitles.srt -sl English
# Zoom transcript correction (.vtt input)
subdub -task zoom-transcript -i meeting.vttAlternative invocation:
python -m subdub -task full -i video.mp4 -sl English -tl Spanish -tts_voice voice.wav| Task | Purpose | Typical Input | Main Output |
|---|---|---|---|
full |
End-to-end pipeline (STT -> translation -> TTS -> sync/mix) | media file / URL / subtitle file | translated SRT, speech blocks, aligned audio, final dubbed media |
transcribe |
WhisperX transcription (+ optional diarization/correction) | media file / URL | source SRT or JSON-derived corrected SRT |
translate |
Translate subtitles (LLM or DeepL) | media / .srt / WhisperX .json |
translated SRT + block JSON |
correct |
Correct subtitle text in source language | media / .srt / WhisperX .json |
corrected SRT |
speech_blocks |
Build speech segmentation JSON for dubbing | media / .srt / translated SRT |
*_speech_blocks.json |
sync |
Align existing generated speech with video | existing -session (+ optional -v) |
final mixed dubbed video/audio |
equalize |
Standalone subtitle line equalization | .srt |
_equalized.srt |
zoom-transcript |
Correct grouped Zoom VTT transcript chunks | .vtt |
corrected transcript .txt |
tts |
Legacy parser option (see notes below) | n/a | n/a |
Run subdub -h for the full list. Most relevant flags are below.
-i,--input: input path or URL (required for most tasks)-task:tts|full|transcribe|translate|speech_blocks|sync|equalize|correct|zoom-transcript(default:full)-session: custom session folder (otherwise auto-generated)-log: also write detailed logs tosubtitle_app.log-sl,--source_language: source language (default:English)-tl,--target_language: target language for translation/dubbing
-model: LiteLLM model string (default:anthropic/claude-3-5-sonnet-20241022)-ant_api,-openai_api,-gemini_api,-api_deepl: API keys--use-deepl: use DeepL instead of LLM translation-api_base: custom/local OpenAI-compatible endpoint base URL-llm-char: max characters per translation/correction block (default:4000)-max_tokens: max output tokens for provider call-reasoning_effort:minimal|low|medium|high-evaluate: second-pass evaluation/improvement stage-translation_memory: glossary memory file (translation_glossary.json)-context: pass prior response context between blocks--no-remove-subtitles: prohibit LLM[REMOVE]behavior-translate_prompt: extra translation instructions appended to prompt
OpenRouter-specific:
-provider: prioritized provider(s), comma-separated-sort:price|throughput|latency-fallbacks/-no-fallbacks: enable/disable provider fallback-ignore: provider(s) to ignore-data-collection:allow|deny-require-parameters: require provider parameter support
-whisper_model: Whisper model (default:large-v3)-align_model: custom WhisperX alignment model-whisper_prompt: custom initial prompt for WhisperX-chunk_size: WhisperX chunk size (default:15)-diarize: enable speaker diarization--hf_token: Hugging Face token for diarization--no-boundary-correction: disable automatic boundary correction-manual_correction: open manual correction GUI (PyQt6 extra required)--save_txt: save transcript as.txtduring WhisperX run-correct: run correction stage before translation-correct_prompt: additional correction instructions-resegment: use word-level re-segmentation pipeline
-tts_voice: 6-12s voice.wavfor XTTS (forfull; falls back to first.wavintts-voices/if omitted)-merge_threshold: subtitle merge threshold in ms (default:250)--delay_start: initial per-block delay cap in ms (default:2000)--speed_up: max speed-up percentage during alignment (default:115)-v,--video: optional input video override forsync-equalize: also equalize final SRT in task output-max_line_length: line length for-equalize(default:42)-characters: line length for standalone-task equalize(default:60)
-t_prompt: full custom translation prompt template-eval_prompt: full custom evaluation prompt template-gloss_prompt: full custom glossary prompt template-sys_prompt: custom system prompt
Each run writes artifacts into the session folder (-session or auto-generated). Depending on task/settings, typical files include:
- extracted audio (
<video>.wav) - transcription output (
<video>.jsonand/or corrected.srt) - translated/corrected subtitle outputs (
*.srt,*_final_blocks.json) - speech segmentation (
*_speech_blocks.json) - per-block XTTS wavs (
Sentence_wavs/*.wav) - aligned dubbed audio (
aligned_audio.wav) - final mixed video (
final_output.mp4) or dubbed audio only - optional session log (
subtitle_app.logwhen-logis enabled)
- All LLM operations route through
subdub.ai.client.llm_api_request. - LiteLLM callbacks are configured once per process and log input/success/failure details.
- Structured output schemas are used for correction/re-segmentation flows.
- Session-level estimated API cost is accumulated and logged at the end.
-task ttsis still present in parser choices for compatibility, but the current refactored pipeline does not execute a standalone TTS-only stage. Usefull(orspeech_blocks+ existing audio workflow) instead.- API key validation is task-aware: LLM provider keys are checked only when the selected task path actually uses LLM calls.
-resegmentis designed for media input or WhisperX JSON word timestamps; plain.srtinput is not a good fit for that mode.
Python package dependencies are defined in pyproject.toml.
External runtime tools/services:
- FFmpeg
- WhisperX
- XTTS API Server
- LLM provider endpoint(s) via LiteLLM (or DeepL when selected)