Forget phrase books or even Google Translate. New translation devices are getting closer to replicating the fantasy of the Babel fish, which in the “Hitchhiker’s Guide to the Galaxy” sits in one’s ear and instantly translates any foreign language into the user’s own.

The WT2 Plus Ear to Ear AI Translator Earbuds from Timekettle are already available, while the over-the-ear “Ambassador” from Wavery Labs is scheduled for release this year. Both brands are wireless, and come with two earpieces that must be synced to a single smartphone connected to Wi-Fi or cellular data.

These devices “bring us a bit closer to being able to travel to places in the world where people speak different languages and communicate smoothly with those who are living there,” said Graham Neubig, an assistant professor at the Language Technologies Institute of Carnegie Mellon University and an expert in machine learning and natural language processing.

Tech Reviews and Tips


Whether the technology is in the ear, hand-held or in an app, speech-to-speech translation has mostly occurred in the same three-step process since 2016, when neural networks were assigned to the task. First, automatic speech-recognition software transcribes the spoken words into text. Next, the text is converted using neural machine translation into the text of the other language, and finally text-to-speech voice modulation articulates the other language.

That conversion process causes a slight delay, while the imaginary yellow fish in Douglas Adams’ comedy science fiction series translated instantaneously. Still, the new devices do allow a person to continue speaking even as the translation is occurring, and that allows for a more natural flow to the conversation.

“This is important, because otherwise the conversation will become twice as long, where one person speaks, the system translates, then the other person speaks, the system translates. This is ponderous and can test people’s patience,” Neubig said.

The WT2 Plus requires speakers to take turns, but simultaneously transcribes the conversation, and later this year it should be able to translate English, Chinese, Japanese, Korean and Russian while offline. Shown here is the WT2 Plus Ear to Ear AI Translator Earbuds from Timekettle. (Timekettle Technologies via The New York Times)

The WT2 Plus consists of two earbuds that look similar to large AirPods, and in any of the three modes, users can talk in any two of 36 languages and 84 accents. (The modes, Simul Mode, Touch Mode and Speaker Mode, allow control over the earbuds to address ambient noise and whether you want to lend the person you’re conversing with an earbud or use your phone’s microphone and speaker.)

In its conversation mode, the Ambassador allows one user to interrupt another, as is done in real life, and translates simultaneously to both. (Waverly Labs via The New York Times)

The Ambassador, which supports 20 languages, allows people to chat when they are each wearing one of the clip-on earpieces that look like a small headphone. Or, a single user in “Listen Mode” can use microphones embedded in the earpiece to hear a translated version of what others are saying while standing a few feet away. In addition to the Converse and Listen modes, the Ambassador has a “Lecture Mode” to stream your words through your phone or pair the earpiece with an audio system.


To see how advanced the ear pieces are, we compared them to two translation tools on the market, Google Translate’s conversation mode and the hand-held CM Translator ($117 retail) from Cheetah Mobile. A preproduction model of the Ambassador ($150 retail) was tested at company headquarters in Brooklyn, while the WT2 Plus earbuds ($230 retail), were used by two multilingual students at the University of Colorado Boulder.

The upshot: Google Translate and the CM Translator would be fine for ordering a beer or asking the location of a museum, but both would fall short if trying to engage with the person sitting next to you on the train.

“I thought it was really cool that you could talk in one language and a few seconds later it would come out in a different language,” Maya Singh, a freshman who speaks English, Russian and Spanish, said of the WT2 Plus earbuds.

The WT2 Plus and the Ambassador each offer unique advantages. In its conversation mode, the Ambassador allows one user to interrupt another, as is done in real life, and translates simultaneously to both. The WT2 Plus requires the speakers to take turns, but simultaneously transcribes the conversation, and later this year it should be able to translate English, Chinese, Japanese, Korean and Russian while offline, said Kazaf Ye, head of marketing for Timekettle, in an interview from company headquarters in Shenzhen, China.


“Efficiency is a key element in deciding whether one person wants to continue talking to the other person,” Ye said. “If it is too much trouble or if I have to wait too long then I will not want to talk with him, I’d rather just speak to someone in my language.”

Andrew Ochoa, chief executive officer of Waverly Labs, said the ultimate goal in translation devices would be an earpiece that works offline, in real time, and can translate everything you hear.

If that device is ever developed, “I can drop you off in the middle of Tokyo … and it will translate everything in your proximity,” Ochoa said.

While we’re not there yet, translation has taken a quantum leap forward in the past few years because neural machine translation can process phrases, not just words.

“It went from something that was barely intelligible and barely useful to something that was syntactically and grammatically very useful, at least for some of the major languages,” said Florian Faes, managing director of Slator, a Zurich-based provider of news and analysis on the global language industry.

So although today’s translators can’t seem to differentiate “phat” from “fat” in a sentence, all the ones we compared were sophisticated enough to translate the Spanish phrase “No hay mal que por bien no venga,” which literally means, “There is no bad from which good doesn’t come,” into the more relatable English expression, “Every cloud has a silver lining.”


As for the future, translation will likely be faster, more accurate and maybe even mimic your voice, tone and emotion. Google is already experimenting with a new way of translating altogether, titled “Translatotron.”

“Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language” without first converting to text, said Justin Burr, a spokesman for Google AI & Machine Learning.

He cautioned that so far it’s just research, and Google has no plans to develop it into a stand-alone translation device. Still, that doesn’t mean that someone else won’t. And if it happens, it might blow the Babel fish right out of the water.