Real-time V2 is the latest real-time speech-to-text API from Gladia. It offers more features and has significant improvements in latency compared to V1. Here is a guide on how to migrate to V2, so you can start enjoying all the benefits.
Please make sure you migrate sooner rather than later as we’re looking to remove support for V1 sometime in the future. Before we do so however, we’ll of course reach out to those of you who are still on V1.
In V1, you always connect to the same WebSocket URL (wss://api.gladia.io/audio/text/audio-transcription) and send your configuration through the WebSocket connection.
In V2, you first generate a unique WebSocket URL with a call to our POST /v2/live endpoint, and then connect to it. This URL contains a token that is unique to your live session. You’ll be able to resume your session in case of a lost connection, or give the URL to a web client without exposing your Gladia API key.
With V2 offering more features, the configuration comes with some changes. You’ll find the full configuration definition in the POST /v2/live API reference page.
Here, we’ll show you how to migrate your V1 configuration object to the V2 one.
{ "language_config": { "languages": [], // You can now specify the expected languages in V2 as guidance to improve accuracy and latency "code_switching": true }}
If you were sending chunks as bytes, nothing has changed.
If you were sending them as base64, the format of the JSON messages changed in V2. See the API reference for the full format.
In V1, we only send two kinds of messages through WebSocket:
the “connected” message
the “transcript” messages
In V2, we send more:
lifecycle event messages
acknowledgment messages
add-on messages
post-processing messages
…
To read a transcription message in V1, you verify that the type field is "final" and/or the transcription field is not empty.
In V2, you should confirm that the type field is transcript and that data.is_final is true.
Below are examples of transcript messages in V1 and V2, so you can see the differences.
See the API reference for the full format.
If you’re not interested in new messages and simply want the ones from the V1 API, you can always configure what kind of messages you want when calling the POST /v2/live endpoint to initiate the session.
With the following configuration, you will only receive final transcript messages: