Voice Conversion API

POST /redub_vc

About this tool

`/redub_vc` applies voice-conversion models so speakers can sound like approved talent profiles while keeping on-screen video untouched. You still send JSON with `video_url`, a `script` map for timing or phoneme alignment (see docs), and `voice_id` pointing at a licensed conversion profile. FFmpeg pipelines merge the converted waveform asynchronously; your subscription key unlocks the queue, and public HTTPS URLs keep uploads out of the hot path entirely. That split keeps browser extensions thin yet still satisfies auditors who demand clear boundaries between client demos and privileged keys.

Machine-learning inference plus traditional muxing means runtime splits into two observability phases. Track them via `status_url` transitions to understand whether GPU slots or disk IO throttled progress. Legal teams should archive consent paperwork alongside `voice_id` choices because synthetic likeness rules vary wildly by jurisdiction—the API surfaces technical telemetry, yet cannot interpret talent agreements for you.

Unlike drag-and-drop toy sites, this endpoint expects integration discipline: correlation IDs, structured logging, and retries that respect idempotency guidelines. Ship golden JSON fixtures per environment so QA catches schema drift quickly. Pair with `/redub` when you need hybrid mixes—some productions run VC for lead lines and keep natural room tone under beds. Designers validate intelligibility subjectively since objective MOS scores rarely ship in API JSON.

Quality hinges on clean reference audio upstream. Feeding brittle conference recordings into VC yields metallic artifacts FFmpeg cannot fix magically. Normalize inputs, remove HVAC rumble externally, then call `/redub_vc`. Throughput aligns with tiers; bursts during trailer launches belong behind your own backoff-smart queue.

Because scripts may drive alignment models, escaping special characters matters—consult anchoring docs before dumping markdown into the textarea. FFmpeg outputs deterministic media bytes for identical payloads, simplifying regression tests. Product and legal reviewers should watermark preview renders fetched after polling—not because URLs are careless, but because VC outputs might resemble protected likenesses before approvals. Organizations sometimes mirror JSON requests into immutable audit trails showing which `voice_id` shipped each window.

Try it now

How it works

  1. Prepare URLs and persona metadata

    Host the source video anonymously over HTTPS. Draft the textual `script` payload and reference the allowable `voice_id` string from catalog exports.

  2. Authenticate JSON POST `/redub_vc`

    Include `X-API-Key`. The POST returns task scaffolding without binary bodies—everything streams from remote storage.

  3. Observe async inference + FFmpeg

    Poll tasks for stage-specific statuses so support teams know whether ML or muxing stalled.

  4. Finalize audio QA

    Download the mastered file, run listening tests, and optionally route through `/captions/auto` plus loudness tooling before release.

Frequently asked questions

Is this the same as `/redub_tts`?

No—VC routes reshape existing vocal timbre according to modeling constraints, whereas TTS generates speech from scratch. Payloads overlap but backends differ sharply.

Does `script` always mean plain dialogue?

Some integrations embed phoneme or alignment hints per documentation. Plain strings work when models infer timing directly from soundtrack analysis.

Why might jobs queue longer?

GPU inference pools can saturate during global peaks. Paid tiers prioritize within contractual bounds; exponential backoff avoids thundering herds.

Are multipart uploads coming?

No. Declare URLs in JSON and let workers fetch—that’s uniform across FFmpeg tools.

What compliance hooks exist?

API keys tie to audited accounts so enterprise security teams rotate credentials quickly after departures.