The researchers achieve a word error rate of 6.3 percent, getting it closer to what they say is the next generation of interaction with machines.

Share story

Microsoft says its researchers are one step closer to building software that understands speech as well as humans do.

A group of researchers at the Redmond company say their conversational speech recognition system, in an industry benchmark test, has achieved a word error rate of just 6.3 percent.

IBM recently touted an error rate of 6.6 percent, Microsoft said. Just a few years ago, the technology industry couldn’t do better than a 10 percent error rate.

Software that can fully understand human speech, some technologists say, will enable a next generation of interaction with machines, one that doesn’t require a keyboard, mouse, or touch input.

Most Read Stories

Unlimited Digital Access. $1 for 4 weeks.

Early examples of that are visible in the limited tasks people can ask digital assistants to perform already, like searching the web with Google’s Now, asking Microsoft’s Cortana to make a calendar appointment, or prompting Amazon.com’s Alexa to turn on music.

Microsoft says its progress was aided by the use of deep neural networks, or software inspired by the brain’s wiring that is better able to detect patterns in speech. Another component, they say, is using powerful graphics processing units, originally designed for high-performance computer graphics for video games and other applications, to speed up the algorithms that underlie speech recognition.

“This new milestone benefited from a wide range of new technologies developed by the (artificial intelligence) community from many different organizations over the past 20 years,” Xuedong Huang, Microsoft’s chief speech scientist, said in a blog post.

The research, by Huang and seven other authors, was published on Tuesday.