Rohit Prasad, the Amazon scientist in charge of developing the artificial-intelligence software’s abilities to grasp human language, says Alexa is still in the early stage of its journey.
LAS VEGAS — Alexa, Amazon.com’s artificial-intelligence software, is still in the early stages of a journey to understand the human voice and respond accordingly, says Rohit Prasad, the head scientist in charge of endowing it with the gift of language.
One day, the researcher predicts, it will grasp the meaning of every human utterance to make life easier for its users.
Prasad spoke about the development of Alexa on the sidelines of Amazon Web Services’ re:Invent conference, a huge gathering of developers and cloud-computing enthusiasts, where the tech giant is unveiling some of its latest technology.
Prasad’s ward is one of the stars of the show: AWS is incorporating into its platform some of the voice recognition and text-to-speech technology that have made Alexa a hit with customers who mostly have interacted with it through the Echo, a speaker that plays music, reads the news and controls light switches around the house.
Most Read Stories
- Seattle judge won’t immediately release ‘Dreamer’ from detention center
- Officials say damage to sewage plant in Discovery Park is catastrophic
- T-Mobile one-ups Verizon’s new unlimited data plan; 4Q results top forecasts
- Students frustrated trying to get into UW’s strict engineering program
- Sticker shock as much higher car-tab bills land in mailboxes
Now developers should be able to rely on those building blocks to make voice-activated, cloud-based applications of their own — a path experts say leads to the future of computing.
For the Alexa development team, the arduous path began with the choice of a name that would wake Alexa from its slumber. “It’s primarily a tribute to the library of Alexandria,” said Prasad, who works out of Amazon’s operations in Boston. But the moniker was also short and easy to pronounce. “It’s super important for artificial intelligence to be mainstream, very easy to relate to,” he said.
However, that posed problems. The word is common, and its sounds can be found in many words. That could have been solved by inventing an artificial-sounding long word, by adding a prefix like “OK,” or by having the user repeat the word twice. “To solve that problem, we really had to invent a lot of different techniques,” Prasad said.
Prasad is the first to agree Alexa isn’t perfect. But his team is working on it. While he would not disclose what percentage of voice commands from users end up stumping Alexa, he said that number has been cut in half since its 2014 launch. “It’s huge,” especially considering there are many more customers out there using more functions, Prasad said.
The Alexa service also has been the subject of a flurry of news during the conference: Among other things, it’s rolling out a library of capabilities related to natural language it says will help developers more easily create their own voice interactions.
Ariel Kelman, AWS’ vice president for global marketing, said in an interview that the tools could be used to build apps in contexts where conversational interaction with a computer could improve the experience, such as with vending machines and cars, or in making a medical appointment. It “makes it really easy to build interfaces that ask follow-up questions,” Kelman said.
Another announcement: Alexa is working with Intel on a reference design for a smart speaker that will set standards for other manufacturers to make their own takes on the popular Echo.
The move speaks to the difficulty of the hoops Amazon jumped through to design a speaker that could discern voice commands in noisy environments and at a certain distance, send up those commands to the cloud, and speak back quickly.
It also shows how Amazon sees the technology surrounding Alexa as a sort of open platform. Some view it as akin to the tack taken by the IBM PC in the early days of personal computing. At the time IBMallowed other manufacturers to make compatible machines; that led Microsoft’s operating system, MS-DOS, and later Windows, to claim a bigger share of the market than Apple, which kept its products in an enclosed orchard.
Prasad won’t go as far as saying it’s exactly MS-DOS-like, although he calls it a “good analogy.”
“Our efforts are all around making Alexa easily accessible for developers,” so that customers can reap the benefits in a world where there are many different sources of devices and software. “This is not going to be a homogeneous world.”
When asked how Amazon will make money off Alexa, which is freely available to developers, Prasad says that’s beyond his scope as a scientist. In any case, it’s still the very early days of the technology and it’s too soon to tell how it might flourish.
“Right now we really want to simplify the daily lives of our customers,” he said. “Everything you say to Alexa, I want it to understand.”
Another steep challenge: taking Alexa, so far confined to the U.S., the U.K. and, on an experimental basis, in Germany, where Prasad has a team, to the rest of the world’s languages.
The hurdle is not only collecting thousands of samples of voices to make Alexa cope with every aspect of a language. It’s also about having enough local data to make the artificial-intelligence assistant useful.
“If you have Spanish in Spain, you need to know a lot about Spain,” Prasad said. “It’s not the language, it’s the other context that makes it super-hard. The ultimate thing you need is data: You need to have a lot of content providers.”
In the U.S., Amazon had a very significant footprint and gobs of data when the Echo speaker launched. The trick is to build that in new geographies before launching. Prasad declined to say where Alexa would be released next.
“The key is to have a good baseline. We don’t want to frustrate our customers,” he said.