Translation

This feature is on Beta state.

We’re looking for feedbacks to improve this feature, share yours here.

The Translation model generates translations of your transcriptions to one or more targeted languages. If subtitles and/or sentences are enabled, the translations will also include translated results for them. You can translate your transcription to multiples languages in a single API call.

The list of the languages covered by the Translation feature are listed in the API Reference (see translation_config).

2 translation models are available:

base : Fast, cover most use cases
enhanced : Slower, but higher quality and with context awareness

Usage

To enable translation simply set the "translation" parameter to true

request data

{
  "audio_url": "<your audio url>"
  "translation": true
}

`translation_config` Options

The translation feature can be further customized using the translation_config object. When translation: true is set, you can also provide a translation_config object to specify more details. Here are the available options:

`target_languages`

Description: An array of strings specifying the language codes for the desired translation outputs.
Example: ["fr", "es"] for French and Spanish.
Details: The list of supported language codes can be found in the list of supported languages.

`model`

Description: Specifies the translation model to be used.
Values:
- "base": Fast and covers most use cases.
- "enhanced": Slower, but offers higher quality and context awareness.
Default: If not specified, the system might use a default model (typically “base”, but refer to API docs for current defaults).

`match_original_utterances` (Default: `true`)

Description: This boolean option controls whether the translated utterances should be aligned with the original utterances from the transcription.
Default: true.
Behavior:
- When true, the system attempts to match the translated segments (utterances, sentences) to the timing and structure of the original detected speech segments.
- When false, the translation might be more fluid or natural-sounding in the target language but could deviate from the original utterance segmentation.
Use Case: Keep as true for most subtitling or dubbing use cases where alignment with original speech is crucial. Set to false if you prioritize a more natural flow in the translated text over strict temporal alignment.

`lipsync` (Default: `true`)

This option controls the behavior of the translation’s alignment with visual cues, specifically lip movements.

How it works: When lipsync is set to true (the default value), the translation process utilizes an advanced lip synchronization matching algorithm. This algorithm is designed to align the translated audio or subtitles with the speaker’s lip movements by leveraging timestamps derived from lip activity.
Advantages: The primary benefit is an improved synchronization between the translated output and the visual of the speaker. This can significantly enhance the viewing experience, especially for dubbed content or when precise visual timing with speech is important.
Potential Trade-off: Due to its focus on matching lip movements, the algorithm might occasionally aggregate two distinct spoken words into a single “word” object within the translated output. This means that while the timing aligns well with the lips, the direct one-to-one correspondence between source words and translated words might sometimes be altered to achieve better visual sync.
When to disable: If a strict, word-for-word translation format is an absolute requirement, and minor deviations for the sake of lip synchronization are not acceptable, you should set lipsync to false. This will instruct the system to prioritize literal word mapping over visual timing synchronization.

`context_adaptation` (Default: `true`)

Description: Enables or disables context-aware translation features that allow the model to adapt translations based on provided context.
Default: true.
Behavior:
- When true, the translation model can utilize contextual information and formality preferences to produce more accurate and appropriate translations.
- When false, the translation will be performed without context adaptation, using only the source content for translation decisions.
Use Case: Keep as true for most use cases to benefit from enhanced translation quality. Set to false only if you need purely literal translations without any contextual adjustments.

When context_adaptation is enabled, you can use the following additional parameters:

`context` (Default: `""`)

Description: Provides additional context to improve translation quality and accuracy. This string parameter allows you to supply contextual information about the content being translated.
Default: "" (empty string).
Behavior: When provided, the translation model uses this context to better understand domain-specific terminology, proper nouns, or situational context that might influence the translation choices.
Use Case: Particularly useful for:
- Technical content where specific terminology is important
- Content with ambiguous terms that could be translated differently based on context
- Providing background information about speakers, topics, or settings
Example: For a medical conversation, you might set context: "Medical consultation between doctor and patient discussing cardiology".

`informal` (Default: `false`)

Description: Forces the translation to use informal language forms when available in the target language.
Default: false.
Behavior:
- When true, the translation will use informal pronouns, verb conjugations, and speech patterns appropriate for casual conversation.
- When false, the translation will default to formal or neutral language forms.
Use Case: Essential for:
- Casual conversations or social media content
- Content targeting younger audiences
- Maintaining the tone of informal source material
Language Note: This parameter is particularly relevant for languages with formal/informal distinctions (e.g., French “tu/vous”, German “du/Sie”, Spanish “tú/usted”, ducth:“U/jij”).

Sample code

In the following examples, we’re using base model with additional context and informal settings.

async function makeFetchRequest(url: string, options: any) {
  const response = await fetch(url, options);
  return response.json();
}

async function pollForResult(resultUrl: string, headers: any) {
  while (true) {
    console.log("Polling for results...");
    const pollResponse = await makeFetchRequest(resultUrl, { headers });

    if (pollResponse.status === "done") {
      console.log("- Transcription done: \n");
      const translation = pollResponse.result.translation;
      console.log(translation);
      break;
    } else {
      console.log("Transcription status : ", pollResponse.status);
      await new Promise((resolve) => setTimeout(resolve, 1000));
    }
  }
}

async function startTranscription() {
  const gladiaKey = "YOUR_GLADIA_API_KEY";
  const requestData = {
    audio_url:
      "YOUR_AUDIO_URL",
    translation: true,
    translation_config: {
      target_languages: [ "fr", "es"],
      model: "base", // "enhanced" is slower but of better quality
      context_adaptation: true, // Enable context-aware translation
      context: "This is a medical transcription",
      informal: true // Use informal language in translations
    }
  };
  const gladiaUrl = "https://api.gladia.io/v2/pre-recorded/";
  const headers = {
    "x-gladia-key": gladiaKey,
    "Content-Type": "application/json",
  };

  console.log("- Sending initial request to Gladia API...");
  const initialResponse = await makeFetchRequest(gladiaUrl, {
    method: "POST",
    headers,
    body: JSON.stringify(requestData),
  });

  console.log("Initial response with Transcription ID :", initialResponse);

  if (initialResponse.result_url) {
    await pollForResult(initialResponse.result_url, headers);
  }
}

startTranscription();

Result

The transcription result will contain a "translation" key with the output of the model:

{
  "transcription":{...},
  "translation": {
    success: true,
    is_empty: false,
    results: [
      {
        words: [
          {
            word: "Diviser",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "l'infini",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: [Array],
        full_transcript: "Diviser l'infini dans un temps où moins est plus...",
        utterances: [Array], // Also translated
        error: null
      },
      {
        words: [
          {
            word: "Dividir",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "la infinidad",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: [Array],
        full_transcript: "Dividir la infinidad en un tiempo en que menos es más, donde demasiado nunca es suficiente...",
        utterances: [Array], // Also translated
        error: null
      }
    ],
    exec_time: 0.6475496292114258,
    error: null
  }
}

If you enabled the subtitles generation, those will also benefits from the translation model.

Introduction

Asynchronous Speech-to-Text

Real-time Speech-to-Text

Audio Intelligence

Limits & Specifications

Guides

Integration

Usage

`translation_config` Options

`target_languages`

`model`

`match_original_utterances` (Default: `true`)

`lipsync` (Default: `true`)

`context_adaptation` (Default: `true`)

`context` (Default: `""`)

`informal` (Default: `false`)

Result

Introduction

Asynchronous Speech-to-Text

Real-time Speech-to-Text

Audio Intelligence

Limits & Specifications

Guides

Integration

​Usage

​translation_config Options

​target_languages

​model

​match_original_utterances (Default: true)

​lipsync (Default: true)

​context_adaptation (Default: true)

context (Default: "")

informal (Default: false)

​Result

Usage

`translation_config` Options

`target_languages`

`model`

`match_original_utterances` (Default: `true`)

`lipsync` (Default: `true`)

`context_adaptation` (Default: `true`)

`context` (Default: `""`)

`informal` (Default: `false`)

Result