Add Auto-Generated Subtitles to Your Videos

Subtitles play an essential role in making your video content accessible to a broader audience, especially those who may not understand the spoken language or prefer reading along. FastPix makes it easy to add subtitles through its auto-generation feature, in these two scenarios:

How auto-generated subtitles work

FastPix leverages the OpenAI Whisper model to automatically generate captions for your on-demand media files. This process is designed to be efficient and accurate, converting spoken words into synchronized subtitles.

Key considerations

Audio quality: Auto-generated captions perform best with clear audio. Performance may vary with media that includes excessive non-speech audio, such as music, background noise, or long silences.

Language compatibility: The feature generates subtitles in the same language as the audio. It is not intended for generating translated captions in other languages.

We recommend testing this feature with your typical content to evaluate its effectiveness.

Steps to add auto-generated subtitles for new media

Here’s a step-by-step guide to enable auto-generated subtitles for your FastPix videos:

Step 1: Prepare your video

Ensure clear audio: Remove any unwanted sounds, reduce background noise, and avoid overlapping audio to achieve a cleaner subtitle generation.

Adjust volume levels: Ensure that voices are clear and loud enough for precise transcription.

Step 2: Upload your video to FastPix

FastPix accepts either direct video uploads from your local storage or public URLs from cloud storage services for subtitle generation.

Step 3: Enable auto-generation in request

To activate auto-generated subtitles, you need to include the createSubtitles JSON object in your video settings.

This object consists of three key-value pairs:

name: Specify the language of the audio (e.g., "english").
metadata: Optionally, you can add a metadata object if you want to tag specific information with the subtitles.
languageCode: Enter the language code that corresponds to the spoken language in the video (e.g., en for English).

Example JSON object for enabling subtitles

{ 
  "inputs": [ 
    { 
      "type": "video", 
      "url": "https://example.com/sample.mp4"
    } 
  ],
  "subtitles": { 
    "languageName": "english", 
    "metadata": { 
      "key1": "value1" 
    }, 
    "languageCode": "en" 
  }, 
  "accessPolicy": "public" 
}

IMPORTANT
Double check the languageCode to match the spoken language in your video, as the subtitle model will follow this setting for transcription.

Step 4: Process the video

Once uploaded, FastPix will process your video using the Whisper model to automatically generate subtitles. The model transcribes spoken content into text and synchronizes it with the video for optimal viewing.

Supported languages for auto-generated subtitles

FastPix supports the following languages and language codes for auto-generated subtitles in Video on Demand (VOD) content:

Language	Language Code	Status
English	en	Supported
Spanish	es	Supported
Italian	it	Supported
Portuguese	pt	Supported
German	de	Supported
French	fr	Supported
Polish	pl	Beta
Russian	ru	Beta
Dutch	nl	Beta
Catalan	ca	Beta
Turkish	tr	Beta
Swedish	sv	Beta
Ukrainian	uk	Beta
Norwegian	no	Beta
Finnish	fi	Beta
Slovak	sk	Beta
Greek	el	Beta
Czech	cs	Beta
Croatian	hr	Beta
Danish	da	Beta
Romanian	ro	Beta
Bulgarian	bg	Beta

PLEASE NOTE
Subtitles are only available in the same language as the audio input. Additional language support may be added in the future, but currently, each subtitle matches the spoken language directly.

Generate subtitles for audio tracks in existing media

FastPix also provides a feature to generate audio track subtitles for the added default track and additional audio tracks after the media is ready. You can call the generate track subtitles API which takes in the audio trackId as input and then generates the subtitle for the track. The user must provide the language name and language code in the request body. The endpoint details are below:

Endpoint: PATCH

api.fastpix.io/v1/on-demand/{mediaId}/tracks/{trackId}/generate-subtitles

Request headers:

Content-Type: application/json

Authorization: Basic Auth YOUR_ACCESS_TOKEN YOUR_SECRET_KEY

{ 
   "languageCode": "de", 
  "languageName": "german" 
 }

PLEASE NOTE

Use the correct trackId of audio.

Ensure the languageCode follows BCP 47 standards.

{
   "success": true,   
   "data": {
     "id": "455d0ab6-853d-469c-b07d-d16b7d5f0966",   
     "type": "subtitle",  
     "languageCode": "de",   
     "languageName": "german"   
  }   
 }

Retrieve a transcript

If your media has an auto-generated captions track, you can extract a plain text transcript of the recognized speech. This is useful for content moderation, sentiment analysis, summarization, or further processing in other systems.

To retrieve the transcript, use the playback ID of the media and the track ID of the generated subtitles.

Plain text transcript (TXT format)

A plain text transcript provides a raw, unformatted version of the speech content without timestamps. This is ideal for processing in natural language applications or integrating with search systems.

To fetch the transcript in plain text format, use:

https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.txt

PLEASE NOTE

This transcript contains only the spoken words from the video, without timecodes or additional metadata.

WebVTT subtitle file (VTT format)

A WebVTT file provides subtitles in a structured format with timestamps, allowing for easy synchronization with video players. This is useful if you want to edit, refine, or repurpose subtitles for other platforms.

To fetch the WebVTT subtitles, modify the URL by replacing .txt with .vtt:

https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.vtt

INFORMATION

WebVTT files are widely supported in video players and can be manually edited using any text or subtitle editor.

Retrieving transcripts for secured content (Signed media)

If your video is protected and requires authentication, you need to include a JWT (JSON Web Token) as a parameter when requesting the transcript. This ensures that only authorized users can access the content.

Use the following URL format:

https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.txt?token={JWT}

Similarly, for WebVTT subtitles of secured media, use:

https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.vtt?token={JWT}

By retrieving these transcripts, you can enhance accessibility, repurpose content, or integrate subtitles into external workflows with ease.

Use cases for transcripts

Beyond accessibility, retrieving transcripts enables various workflow enhancements:

Automated content review – Run transcripts through AI tools to detect key topics or compliance issues.
SEO optimization – Transcripts make video content indexable, improving searchability.
Podcast and blog conversion – Convert video speech into written formats for repurposing.
Educational materials – Provide readable transcripts alongside instructional videos.

Editing and replacing auto-generated subtitles

If you find errors in your auto-generated captions, you can edit and replace them: Auto-generated captions rely on AI transcription, which may occasionally misinterpret speech, especially in cases of strong accents, background noise, or fast dialogue.

To correct errors, follow these steps:

Download the existing WebVTT file:

https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.vtt

Edit the file using a text editor or subtitle editor (such as Aegisub or Subtitle Edit).
Remove the auto-generated track using the Delete track API.

You can also directly add the edited subtitles using Update track API or you can follow the next step.
Upload the edited subtitles as a new track via the Add track API .

This process ensures that your subtitles are as accurate as possible and improve the viewing experience.

Best practices for accurate subtitles

Audio quality: Ensure clear, high-quality audio. Minimize background sounds, echo, and interruptions to get the best results from auto-generation.
Consistent speech: Maintain a steady speaking pace and clear pronunciation. Avoid using multiple languages in one segment, as the subtitle feature may not accurately differentiate between them.
Language consistency: Keep the entire video in a single language where possible. If there are multiple languages, consider post-editing or manual subtitle creation for multilingual parts.