Redub Text-to-Speech API

POST /redub_tts

About this tool

`/redub_tts` merges synthetic speech with picture so product and education teams localize without booking talent for every wording tweak. Submit JSON tying `video_url` to a `script` block plus `voice_id` metadata understood by our TTS backends; FFmpeg composites the rendered waveform onto the mux while preserving captions-friendly timing cues. Tasks stay asynchronous behind the same polling contract, and multipart uploads remain off-limits—you publish finished audio by pointing at deterministic HTTPS assets when needed.

Compared to dumping MP3s into Slack, orchestrating synthesis through the API captures SSML quirks, pronunciation lexicons, and retry-friendly task IDs centrally. Tie each job to a subscription key whose usage charts show bursts during marketing pushes. Localization engineers often diff scripts in git, then automate hundreds of renders overnight with exponential backoff guarding rate limits. When multilingual campaigns collide, segregate workspaces per API key so finance can attribute Serbian versus Spanish spend without guesswork.

Expect minor drift between generated speech and embedded subtitles unless you rerun `/captions/auto` afterward. FFmpeg handles loudness leveling per documented presets, yet creative teams still spot-check consonants clipped by overly aggressive normalization. Consumers searching for gimmicky “instant voiceover generators” miss the governance story: ACLs around API keys, audit logs, and predictable spend caps.

Architecturally, synthesis plus muxing may take longer than simple cuts due to sequential CPU phases. Horizontal scale depends on tiers; bursting beyond contractual peaks queues jobs rather than starving neighbors. Observability dashboards mirror other routes, so Grafana hooks you already built for `/cut` reuse unchanged.

Security posture demands HTTPS everywhere: never embed raw credentials in payloads, and sanitize scripts to reject unexpected binary content. FFmpeg ignores HTML in strings, yet upstream validators should still constrain length so queue abuse stays impossible. Growth teams exporting CMS CSVs batch hundreds of renders by templating identical JSON scaffolding and swapping localized `script` fields per row, tying spreadsheet row IDs to returning task payloads so localization QA reconciles failures fast. Burst retries belong behind client-side leaky buckets so shared inference clusters stay humane.

Try it now

How it works

  1. Author the screenplay JSON

    Combine `video_url`, multi-line `script`, and chosen `voice_id`. Keep lines UTF-8 clean and cite public video locations only.

  2. POST with API authentication

    Send JSON to `/redub_tts` using your subscription API key headers. Responses omit giant blobs—they only expose task bookkeeping metadata.

  3. Monitor async FFmpeg + TTS stages

    Poll `status_url` for granular states distinguishing synthesis from final mux so product owners know which phase is slow.

  4. Download the voiced cut

    Grab the downloadable asset and route it through QA, caption regeneration, or `/merge_urls` packaging as needed.

Frequently asked questions

How is `/redub_tts` different from `/redub_vc`?

TTS consumes text plus `voice_id` catalog entries, while VC routes target voice conversion models with different payloads. Pick the slug that matches your creative pipeline.

Can I pause mid-script?

Use punctuation and newline hints within `script`; exact prosody knobs live in docs. Extremely long manuscripts may exceed tier timeouts—split logically.

What if synthesis succeeds but FFmpeg fails?

Task JSON surfaces stage-specific errors so you retry only the mux or adjust parameters without re-paying synthesis if cached internally—check docs for guarantees.

Why no multipart?

Consistency across Droid Apps tools matters more than convenience. Hosting sources yourself plus referencing URLs simplifies retries and CDN caching.

Do I need captions afterwards?

Optional but wise for accessibility compliance. Chain `/captions/auto` once the voiced edit is locked.

How do subscriptions meter TTS?

Characters synthesized plus FFmpeg seconds may both count. Read tier tables before launching massive campaigns.