SAN FRANCISCO — Four years ago, more than 700 computer scientists competed in a contest to build artificial intelligence that could pass an eighth-grade science test. There was $80,000 in prize money on the line.
They all flunked. Even the most sophisticated system couldn’t do better than 60% on the test. AI couldn’t match the language and logic skills that students are expected to have when they enter high school.
But Wednesday, the Allen Institute for Artificial Intelligence, a prominent lab in Seattle, unveiled a new system that passed the test with room to spare. It correctly answered more than 90% of the questions on an eighth-grade science test and more than 80% on a 12th-grade exam.
The system, called Aristo, is an indication that in just the past several months researchers have made significant progress in developing AI that can understand languages and mimic the logic and decision-making of humans.
The world’s top research labs are rapidly improving a machine’s ability to understand and respond to natural language. Machines are getting better at analyzing documents, finding information, answering questions and even generating language of their own.
Aristo was built solely for multiple-choice tests. It took standard exams written for students in New York, though the Allen Institute removed all questions that included pictures and diagrams. Answering questions like that would have required additional skills that combine language understanding and logic with so-called computer vision.
Some test questions, like this one from the eighth-grade exam, required little more than information retrieval:
A group of tissues that work together to perform a specific function is called:
(1) an organ
(2) an organism
(3) a system
(4) a cell
But others, like this question from the same exam, required logic:
Which change would most likely cause a decrease in the number of squirrels living in an area?
(1) a decrease in the number of predators
(2) a decrease in competition between the squirrels
(3) an increase in available food
(4) an increase in the number of forest fires
Researchers at the Allen Institute started work on Aristo — they wanted to build a “digital Aristotle” — in 2013, just after the lab was founded by Seattle billionaire and Microsoft co-founder Paul Allen. They saw standardized science tests as a more meaningful alternative to typical AI benchmarks, which relied on games like chess and backgammon or tasks created solely for machines.
A science test isn’t something that can be mastered just by learning rules. It requires making connections using logic. An increase in forest fires, for example, could kill squirrels or decrease the food supply needed for them to thrive and reproduce.
Enthusiasm for the progress made by Aristo is still tempered among scientists who believe machines are a long way from completely mastering natural language — and even further from duplicating true intelligence.
“We can’t compare this technology to real human students and their ability to reason,” said Jingjing Liu, a Microsoft researcher who has been working on many of the same technologies as the Allen Institute.
But Aristo’s advances could spread to a range of products and services, from internet search engines to record-keeping systems at hospitals.
“This has significant business consequences,” said Oren Etzioni, the former University of Washington professor who oversees the Allen Institute. “What I can say — with complete confidence — is you are going to see a whole new generation of products, some from startups, some from the big companies.”
The new research could lead to systems that can carry on a decent conversation. But it could also encourage the spread of false information.
“We are at the very early stage of this,” said Jeremy Howard, who oversees Fast.ai, another influential lab, in San Francisco. “We are so far away from the potential that I cannot say where it will end up.”
In 2016, when a London lab built a system that could beat the world’s best players at the ancient game of Go, it was widely hailed as a turning point for artificial intelligence.
Etzioni’s excitement, however, was muted. Artificial intelligence was not nearly as advanced as it might seem, he said, pointing to the earlier Allen Institute’s competition that stumped the AI systems with an eighth-grade science test.
The Allen Institute improved on that earlier effort much quicker than many experts — including Etzioni — expected.
Its work was largely driven by neural networks, complex mathematical systems that can learn tasks by analyzing vast amounts of data. By pinpointing patterns in thousands of dog photos, for example, a neural network can learn to recognize a dog.
In recent months, the world’s leading AI labs have built elaborate neural networks that can learn the vagaries of language by analyzing articles and books written by humans.
At Google, researchers built a system called Bert that combed through thousands of Wikipedia articles and a vast digital library of romance novels, science fiction and other self-published books.
Through analyzing all that text, Bert learned how to guess the missing word in a sentence. By learning that one skill, Bert soaked up enormous amounts of information about the fundamental ways language is constructed. And researchers could apply that knowledge to other tasks.
The Allen Institute built their Aristo system on top of Bert technology. They fed Bert a wide range of questions and answers. In time, it learned to answer similar questions on its own.
Not long ago, researchers at the lab defined the behavior of their test-taking system one line of software code at a time. Sometimes they still do that painstaking coding. But now that the system can learn from digital data on its own, it can improve at a much faster rate.
Systems like Bert — called “language models” — now drive a wide range of research projects, including conversational systems and tools designed to identify false news. With more data and more computing power researchers believe the technology will continue to improve.
But Etzioni stressed that the future of these systems was hard to predict and that language was only one piece of the puzzle.
Liu and her fellow Microsoft researchers have tried to build a system that can pass the Graduate Records Exam, the test required for admission to graduate school.
The language section was doable, she said, but building the reasoning skills required for the math section was another matter. “It was far too challenging.”