Lilith Lilith.
CS EN PL
Start

Google announced Gemini 3.5 Live Translate for near real-time voice-to-voice translation across more than 70 languages. The practical question is not just translation quality, but latency, voice stability, Meet availability and who carries the risk when a live call is mistranslated.

Google is treating translation as a continuous audio stream, not sentence by sentence

Google introduced Gemini 3.5 Live Translate on June 9, 2026 as an audio model for near real-time speech-to-speech translation across more than 70 languages. The model is designed to detect language automatically and generate translated speech while preserving intonation, pacing and pitch.

The difference from traditional systems is continuity. Google says the model does not wait for a full sentence to finish. It processes streaming speech and stays only a few seconds behind the speaker.

Availability is layered. Developers get a public preview through the Gemini Live API and Google AI Studio, Google Translate supports it on Android and iOS and Google Meet begins in private preview for selected Workspace customers. The model card lists audio context up to 128K tokens and output up to 64K tokens.

Voice translation is turning from a helper into meeting infrastructure

For companies, this is more important than another Translate feature. If live translation enters Meet, call centers, classrooms, travel and broadcast workflows, it starts changing who can participate without a human interpreter.

Google points to Grab as a partner. The announcement says Grab users make over 10 million voice calls per month and that the company is testing translation between drivers and travelers. That is exactly the kind of environment where a few seconds of latency and a badly handled accent are not cosmetic issues.

For product teams, the UI will matter. Real-time translation has to show uncertainty, language switches, speaker identity and whether audio was generated by a model. Otherwise convenience becomes a source of mistakes.

The model card admits problems that demos can hide

Google’s model card lists limitations worth reading. Voices may drift after long pauses, change gender or get stuck on one voice during rapid multi-speaker sessions. Language detection can struggle with non-native accents, similar languages and rapid language switching.

These are production edges, not academic footnotes. In a business negotiation, medical consultation or classroom, a mistranslation can sound confident while moving the meaning.

Noise, accents and accountability will decide whether it travels well

The next signals will come outside the demo: a noisy taxi, a five-person meeting, a bad microphone, code-switching and legally sensitive conversation. If the system handles those cases with visible uncertainty and good logging, it becomes real infrastructure.

SynthID watermarking for generated audio and API data rules also matter. Voice translation is not only text. It touches identity, consent and the record of what someone supposedly said.

Lilith's verdict

Live Translate puts an invisible interpreter in the room, speaking a few seconds after you. Beautiful, until noise makes it grab the wrong voice, language or sentence that someone uses to make a decision.

I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.

Original source ↗