Discount on all models + if you follow us on twitter and hit us on dm you will get free credit to your email hurry up 🔥🔥🔥 click here
Any issues any issue at all join our discord or use the feedback system and report it we will solve it faster than you think ̲𝖢̲𝗅̲𝗂̲𝖼̲𝗄̲ ̲𝗁̲𝖾̲𝗋̲𝖾̲ it will redirect to discord server.
All posts

PODCAST · MAY 8, 2026 · UPDATED JUNE 7, 2026 · 5 MIN READ

Caption your podcast in 5 minutes (Whisper + auto vertical export).

Turn long-form podcasts into vertical captioned shorts for TikTok, Reels, and Shorts. Whisper transcription, in/out points, 9:16 export — no editor needed.

getvivix Team
getvivix Journal
May 8, 20265 min

To caption a podcast in 5 minutes, upload the audio to a tool that transcribes with Whisper, lets you drag in/out points around the best moment, then burns captions and auto-crops to 9:16 in one pass. A 60-minute podcast usually has 5-10 moments worth clipping for social. The problem: finding them, transcribing them, captioning them, and exporting at 9:16 used to take an hour per clip in Premiere or DaVinci.

getvivix Caption Studio does the whole pipeline in ~5 minutes per clip. It pairs well with our caption maker and Shorts makerwhen you want to repurpose the same episode across platforms. Here's the workflow.

What Caption Studio does

  • Transcribes the entire file with Whisper — 95-98% accurate, every word timestamped to the millisecond
  • Lets you scrub the timeline — click any line in the transcript to jump to that timestamp
  • Drag in/out handles to set clip boundaries
  • Burn captions into the video with viral-tested templates (Bold TikTok, Karaoke, Subtitle)
  • Auto-crop to 9:16 with subject detection so the speaker stays centered
  • Export ready-to-upload MP4 — drop into TikTok, Reels, Shorts

The 5-minute workflow

Step 1: Upload (30 seconds)

Drop an MP3, MP4, M4A, or WAV. Long-form is fine — Caption Studio transcribes the whole thing once, even a 2-hour episode.

Step 2: Whisper transcribes (1-2 minutes)

Whisper runs server-side. For a 60-minute file, it finishes in ~90 seconds. Every word gets a timestamp, so you can click any line in the transcript to scrub the playhead there.

Step 3: Pick the moment (1 minute)

Skim the transcript. When you find a clip-worthy moment (a strong quote, a controversial take, a laugh), click the line to jump there. Drag the in/out handles to set 30-90 second boundaries — that's the Goldilocks length for TikTok and Reels.

Step 4: Caption + crop (30 seconds)

Pick a caption template:

  • Bold TikTok — large white text, black outline, bottom-third placement. Best for podcasts.
  • Karaoke — word-by-word highlight on the active word. Works for fast-paced clips.
  • Subtitle — small clean text at the bottom. Best for talking-head clips.

Caption Studio auto-crops to 9:16 vertical with subject detection. If your podcast has video of two speakers, the crop follows whichever one is talking.

Step 5: Export (30 seconds)

Click export. Caption Studio renders the captioned 9:16 MP4 with ffmpeg server-side and gives you a download link. Costs 1 clip credit per export.

Pricing

  • Free signup: 1 clip credit (1 captioned export)
  • Standard ($10/mo): 80 clip credits/mo
  • Pro ($25/mo): 120 clip credits/mo
  • Ultimate ($70/mo): 400 clip credits/mo

Clip credits are separate from the regular generation credits — they reset monthly, but unused credits roll over.

Real numbers from a real podcast

We tested with a 65-minute interview podcast. Workflow:

  • Upload: 45 seconds (180 MB file)
  • Whisper transcription: 2 minutes 10 seconds
  • Picking 5 moments: 4 minutes (skimming the transcript)
  • Captioning + exporting all 5 clips: 12 minutes
  • Total: ~19 minutes for 5 publishable shorts

Same job in Premiere: ~5 hours.

Tips that earned us views

1. Pull-quotes beat play-by-play

The clips that go viral are strong claims, not slow build-ups. Look for moments where the host or guest says something opinionated — that's the hook. If you want a deeper rundown on the format itself, our guide to making TikTok shorts with AI covers what tends to land.

2. Add a 1-second pause at the end

Drag the out-point 1 second past the end of the line. The pause lets the viewer process before scrolling. Watch-time goes up.

3. Edit the captions for clarity

Whisper gets 95-98% right but it sometimes misses uncommon names or jargon. Click any caption to fix the text — the timing stays locked to the audio.

4. Use the same template across all clips from one episode

Visual consistency = brand recognition in the algorithm. Pick Bold TikTok once, use it for every clip from that episode.

Multi-language

Whisper supports 90+ languages. The transcription works in whatever language your podcast is in — Arabic, Mandarin, Japanese, Spanish, all native quality. Captions burn in with the right script and direction (RTL for Arabic, etc.).

FAQ

Can I edit the in/out points after seeing the export?

Yes — your transcript stays in your account. Open the same project, drag new boundaries, export again (costs another clip credit).

Can I do voice-over for the clip?

Yes. Generate a voice with the getvivix voice generator (ElevenLabs Flash, MiniMax Speech, xAI TTS) and layer it in Caption Studio.

What about B-roll behind the audio?

Generate B-roll with the text-to-video tool and use it as the visual track with the podcast audio overlaid. If you want to go fully visual-free, the faceless video tools roundup is a good next read.

How accurate is Whisper on heavy accents?

~92% on heavy accents (vs. 95-98% on neutral). Errors are easy to fix — click, retype, save.

Sign up free — 1 clip credit on signup, no card, your first captioned short is free.

NEXT IN JOURNAL

Free AI video tools in 2026: what actually works (no credit card)
Read

RELATED READING

Newsletter

Be the first to know

Subscribe to the getvivix newsletter and you'll hear it first whenever new models land or new features go live. No promo spam. Unsubscribe in one click.

We use your email only for the newsletter. Unsubscribe anytime.