Add auto-generated subtitles
Subtitles play an essential role in making your video content accessible to a broader audience, especially those who may not understand the spoken language or prefer reading along. FastPix makes it easy to add subtitles through its auto-generation feature, allowing you to enhance your videos with minimal effort.
How auto-generated subtitles work
FastPix leverages the OpenAI Whisper model to automatically generate captions for your on-demand media files. This process is designed to be efficient and accurate, converting spoken words into synchronized subtitles.
Key considerations
Audio quality: Auto-generated captions perform best with clear audio. Performance may vary with media that includes excessive non-speech audio, such as music, background noise, or long silences.
Language compatibility: The feature generates subtitles in the same language as the audio. It is not intended for generating translated captions in other languages.
We recommend testing this feature with your typical content to evaluate its effectiveness.
Steps to add auto-generated subtitles
Here’s a step-by-step guide to enable auto-generated subtitles for your FastPix videos:
Step 1: Prepare your video
Ensure clear audio: Remove any unwanted sounds, reduce background noise, and avoid overlapping audio to achieve a cleaner subtitle generation.
Adjust volume levels: Ensure that voices are clear and loud enough for precise transcription.
Step 2: Upload your video to FastPix
FastPix accepts either direct video uploads from your local storage or public URLs from cloud storage services for subtitle generation.
Step 3: Enable auto-generation in request
To activate auto-generated subtitles, you need to include the createSubtitles
JSON object in your video settings.
This object consists of three key-value pairs:
-
name: Specify the language of the audio (e.g., "english").
-
metadata: Optionally, you can add a metadata object if you want to tag specific information with the subtitles.
-
languageCode: Enter the language code that corresponds to the spoken language in the video (e.g., en for English).
Example JSON object for enabling subtitles
{
"inputs": [
{
"type": "video",
"url": "https://example.com/sample.mp4",
"startTime": 0,
"endTime": 60
}
],
"metadata": {
"key1": "value1"
},
"subtitles": {
"languageName": "english",
"metadata": {
"key1": "value1"
},
"languageCode": "en"
},
"accessPolicy": "public",
"maxResolution": "1080p"
}
IMPORTANT
Double check thelanguageCode
to match the spoken language in your video, as the subtitle model will follow this setting for transcription.
Step 4: Process the video
Once uploaded, FastPix will process your video using the Whisper model to automatically generate subtitles. The model transcribes spoken content into text and synchronizes it with the video for optimal viewing.
Supported languages for auto-generated subtitles
FastPix supports the following languages and language codes for auto-generated subtitles in Video on Demand (VOD) content:
Language | Language Code | Status |
---|---|---|
English | en | Supported |
Spanish | es | Supported |
Italian | it | Supported |
Portuguese | pt | Supported |
German | de | Supported |
French | fr | Supported |
Polish | pl | Beta |
Russian | ru | Beta |
Dutch | nl | Beta |
Catalan | ca | Beta |
Turkish | tr | Beta |
Swedish | sv | Beta |
Ukrainian | uk | Beta |
Norwegian | no | Beta |
Finnish | fi | Beta |
Slovak | sk | Beta |
Greek | el | Beta |
Czech | cs | Beta |
Croatian | hr | Beta |
Danish | da | Beta |
Romanian | ro | Beta |
Bulgarian | bg | Beta |
PLEASE NOTE
Subtitles are only available in the same language as the audio input. Additional language support may be added in the future, but currently, each subtitle matches the spoken language directly.
Retrieve a transcript
If your media has an auto-generated captions track, you can extract a plain text transcript of the recognized speech. This is useful for content moderation, sentiment analysis, summarization, or further processing in other systems.
To retrieve the transcript, use the playback ID of the media and the track ID of the generated subtitles.
Plain text transcript (TXT format)
A plain text transcript provides a raw, unformatted version of the speech content without timestamps. This is ideal for processing in natural language applications or integrating with search systems.
To fetch the transcript in plain text format, use:
https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.txt
PLEASE NOTE
This transcript contains only the spoken words from the video, without timecodes or additional metadata.
WebVTT subtitle file (VTT format)
A WebVTT file provides subtitles in a structured format with timestamps, allowing for easy synchronization with video players. This is useful if you want to edit, refine, or repurpose subtitles for other platforms.
To fetch the WebVTT subtitles, modify the URL by replacing .txt
with .vtt
:
https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.vtt
INFORMATION
WebVTT files are widely supported in video players and can be manually edited using any text or subtitle editor.
Retrieving transcripts for secured content (Signed media)
If your video is protected and requires authentication, you need to include a JWT (JSON Web Token) as a parameter when requesting the transcript. This ensures that only authorized users can access the content.
Use the following URL format:
https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.txt?token={JWT}
Similarly, for WebVTT subtitles of secured media, use:
https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.vtt?token={JWT}
By retrieving these transcripts, you can enhance accessibility, repurpose content, or integrate subtitles into external workflows with ease.
Use cases for transcripts
Beyond accessibility, retrieving transcripts enables various workflow enhancements:
- Automated content review – Run transcripts through AI tools to detect key topics or compliance issues.
- SEO optimization – Transcripts make video content indexable, improving searchability.
- Podcast and blog conversion – Convert video speech into written formats for repurposing.
- Educational materials – Provide readable transcripts alongside instructional videos.
Editing and replacing auto-generated subtitles
If you find errors in your auto-generated captions, you can edit and replace them: Auto-generated captions rely on AI transcription, which may occasionally misinterpret speech, especially in cases of strong accents, background noise, or fast dialogue.
To correct errors, follow these steps:
-
Download the existing WebVTT file:
https://stream.fastpix.io/{PLAYBACK_ID}/text/{TRACK_ID}.vtt
-
Edit the file using a text editor or subtitle editor (such as Aegisub or Subtitle Edit).
-
Remove the auto-generated track using the Delete track API.
You can also directly add the edited subtitles using Update track API or you can follow the next step.
-
Upload the edited subtitles as a new track via the Add track API .
This process ensures that your subtitles are as accurate as possible and improve the viewing experience.
Best practices for accurate subtitles
-
Audio quality: Ensure clear, high-quality audio. Minimize background sounds, echo, and interruptions to get the best results from auto-generation.
-
Consistent speech: Maintain a steady speaking pace and clear pronunciation. Avoid using multiple languages in one segment, as the subtitle feature may not accurately differentiate between them.
-
Language consistency: Keep the entire video in a single language where possible. If there are multiple languages, consider post-editing or manual subtitle creation for multilingual parts.
Updated 1 day ago