Microsoft's Natural Language Group is in an ongoing race to keep up with the evolution of the dozens of languages for which they produce...
Microsoft’s Natural Language Group is in an ongoing race to keep up with the evolution of the dozens of languages for which they produce spell-checkers and other writing tools.
Here’s how the group selects words to add:
The first step is finding possible candidates for inclusion in the spell-checker lexicon. When Mike Calcagno started at Microsoft in 1998, that was done ad hoc, with candidate words or changes sent to someone high enough on the corporate ladder to get attention.
“The number of issues that we would see at that time was so small that we could keep track of it on a single Excel spreadsheet,” he said.
Most Read Business Stories
- REI picks new satellite office ‘surrounded by trail networks’
- Judge upholds Seattle eviction regulations, rebuffing landlords' lawsuit
- Fry's Electronics executive accused of embezzling $65 million
- Funky electronics chain Fry's is no more
- Alaska Airlines ordered to pay $3.2M to family of woman who died after escalator fall
Now, the company uses software to monitor actual language usage across its vast properties.
“When you add a word to your custom dictionary, either in Word itself or in Hotmail, that word comes to us,” Calcagno said. When a word is added hundreds of times, it becomes part of the candidate list. Words still come in on an ad hoc basis, too.
The lists are filtered with software to eliminate words the team has already considered.
Then the words are sorted by frequency and sent to outside editors who evaluate each one against a set of guidelines Microsoft has created, such as whether a new word has appeared in a major dictionary.
Rarely, editors can’t decide whether a word should be added and it’s sent back to the Natural Language Group for debate. The team of about 50 software engineers, computational linguists, machine learning experts and other specialists hail from around the world.
With occasional exceptions, the words to be added — often tens of thousands of new ones — are shipped out to users in the next release of Office, used by hundreds of millions of people around the world.
“Everybody’s speller gets updated and few people notice,” he said.
— Benjamin J. Romano