Lilith Lilith.
CS EN PL
Start

Google announced Gemini 3.5 Live Translate, an audio model for live speech-to-speech translation. According to the announcement, it automatically detects more than 70 languages, generates translated speech continuously and stays only a few seconds behind the speaker.

Translation moves from waiting for a sentence into a live speech stream

Google highlights the contrast with turn-by-turn systems that wait for the speaker to finish. 3.5 Live Translate is designed to translate continuously while balancing the need for context with the need to stay in sync.

The rollout is split by channel. Developers get a public preview through the Gemini Live API and Google AI Studio, enterprises get a private preview in Google Meet starting this month and consumers get a rollout in Google Translate on Android and iOS.

For Meet, the jump from five languages to thousands of pairs matters

In Google Meet, the new model is supposed to expand speech translation from the previous five languages to more than 70 languages and over 2000 language combinations in one meeting. Google also says access to the feature will be easier in the interface.

That is more practical than the demo itself. Live translation in a meeting is not just travel convenience. It is infrastructure for support, sales, education and internal communication, where language often decides who can fully participate.

Voice translation carries different risks than text

Text translation can be paused, read and corrected. Live voice errors spread immediately and sound more authoritative because they arrive as conversation. Google points to SynthID watermarking for generated audio output.

That is a useful safety detail, but it does not solve everything. Companies will need logs, permission controls, clear notices for participants and rules for cases where a bad translation changes legal or medical meaning.

A noisy room will decide more than a studio demo

The next test is environments where people talk over each other, switch languages, use slang and have poor microphones. Google claims noise robustness, but production trust will be earned in real calls.

Rollout pace also matters. A public preview for developers and a private preview for Workspace customers are not the same as a stable enterprise default for every meeting.

Lilith's verdict

An interpreter seated in the middle of the meeting, and people may trust it before they know when it gets things wrong.

I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.

Original source ↗