Translate your transcriptions & subtitles
This feature is on Beta state.
We’re looking for feedbacks to improve this feature, share yours here.
The Translation model generates translations of your transcriptions to one or more targeted languages. If subtitles and/or sentences are enabled, the translations will also include translated results for them. You can translate your transcription to multiples languages in a single API call.
The list of the languages covered by the Translation feature are listed in the API Reference (see translation_config
).
2 translation models are available:
base
: Fast, cover most use casesenhanced
: Slower, but higher quality and with context awarenessTo enable translation simply set the "translation"
parameter to true
translation_config
OptionsThe translation
feature can be further customized using the translation_config
object. When translation: true
is set, you can also provide a translation_config
object to specify more details. Here are the available options:
target_languages
["fr", "es"]
for French and Spanish.model
"base"
: Fast and covers most use cases."enhanced"
: Slower, but offers higher quality and context awareness.match_original_utterances
(Default: true
)true
.true
, the system attempts to match the translated segments (utterances, sentences) to the timing and structure of the original detected speech segments.false
, the translation might be more fluid or natural-sounding in the target language but could deviate from the original utterance segmentation.true
for most subtitling or dubbing use cases where alignment with original speech is crucial. Set to false
if you prioritize a more natural flow in the translated text over strict temporal alignment.lipsync
(Default: true
)This option controls the behavior of the translation’s alignment with visual cues, specifically lip movements.
How it works: When lipsync
is set to true
(the default value), the translation process utilizes an advanced lip synchronization matching algorithm. This algorithm is designed to align the translated audio or subtitles with the speaker’s lip movements by leveraging timestamps derived from lip activity.
Advantages: The primary benefit is an improved synchronization between the translated output and the visual of the speaker. This can significantly enhance the viewing experience, especially for dubbed content or when precise visual timing with speech is important.
Potential Trade-off: Due to its focus on matching lip movements, the algorithm might occasionally aggregate two distinct spoken words into a single “word” object within the translated output. This means that while the timing aligns well with the lips, the direct one-to-one correspondence between source words and translated words might sometimes be altered to achieve better visual sync.
When to disable: If a strict, word-for-word translation format is an absolute requirement, and minor deviations for the sake of lip synchronization are not acceptable, you should set lipsync
to false
. This will instruct the system to prioritize literal word mapping over visual timing synchronization.
context_adaptation
(Default: true
)true
.true
, the translation model can utilize contextual information and formality preferences to produce more accurate and appropriate translations.false
, the translation will be performed without context adaptation, using only the source content for translation decisions.true
for most use cases to benefit from enhanced translation quality. Set to false
only if you need purely literal translations without any contextual adjustments.When context_adaptation
is enabled, you can use the following additional parameters:
context
(Default: ""
)""
(empty string).context: "Medical consultation between doctor and patient discussing cardiology"
.informal
(Default: false
)false
.true
, the translation will use informal pronouns, verb conjugations, and speech patterns appropriate for casual conversation.false
, the translation will default to formal or neutral language forms.Sample code
In the following examples, we’re using base
model with additional context and informal settings.
The transcription result will contain a "translation"
key with the output of the model:
If you enabled the subtitles
generation, those will also benefits from the translation model.