The researchers achieve a word error rate of 6.3 percent, getting it closer to what they say is the next generation of interaction with machines.
Microsoft says its researchers are one step closer to building software that understands speech as well as humans do.
IBM recently touted an error rate of 6.6 percent, Microsoft said. Just a few years ago, the technology industry couldn’t do better than a 10 percent error rate.
Software that can fully understand human speech, some technologists say, will enable a next generation of interaction with machines, one that doesn’t require a keyboard, mouse, or touch input.
Most Read Stories
- I-5’s Uncle Sam: 50 years and still ticked off near Chehalis
- Check out this new drone footage of the Bertha-dug Highway 99 tunnel WATCH
- Sports on TV & radio: Local listings for Seattle games and events
- Washington state’s new parental leave law could change workplace for moms — and dads
- Republicans going beyond hypocrisy with the national debt | Danny Westneat
Early examples of that are visible in the limited tasks people can ask digital assistants to perform already, like searching the web with Google’s Now, asking Microsoft’s Cortana to make a calendar appointment, or prompting Amazon.com’s Alexa to turn on music.
Microsoft says its progress was aided by the use of deep neural networks, or software inspired by the brain’s wiring that is better able to detect patterns in speech. Another component, they say, is using powerful graphics processing units, originally designed for high-performance computer graphics for video games and other applications, to speed up the algorithms that underlie speech recognition.
“This new milestone benefited from a wide range of new technologies developed by the (artificial intelligence) community from many different organizations over the past 20 years,” Xuedong Huang, Microsoft’s chief speech scientist, said in a blog post.
The research, by Huang and seven other authors, was published on Tuesday.