The challenge from Paul Allen’s Institute for Artificial Intelligence involves a 2,500-question science quiz aimed at an eighth-grade level. About 100 teams have already entered since the challenge started last week.
Paul Allen’s Institute for Artificial Intelligence wants to find out how well IBM’s “Jeopardy”-playing computer, “Watson,” holds up in a classroom.
The Allen Institute for AI, or AI2, has issued a challenge to IBM and researchers everywhere to see if their smart computers can pass a multiple-choice, eighth-grade science quiz.
AI2 has been building a machine-learning program in which a computer gains knowledge over time, then uses that information to gather context and answer science questions.
The challenge comes with a big payoff. Teams of researchers, companies or individuals whose computers achieve the highest test score will receive $50,000, and runners-up will get $20,000 and $10,000 for second and third place. They have until February to complete the 2,500-question test.
Most Read Stories
- Elizabeth Warren: ‘The next step is single-payer’ health care
- Seattle No. 1 in home-price growth again; starter homes require half of income
- Zillow vs. McMansion Hell: Seattle company not backing off fight with blog despite PR fiasco
- Washington lawmakers reach tentative state budget deal, but no details made public
- Ohio woman set on fire by ex-boyfriend in 2015 dies
“Kind of like a kid, we give them practice questions,” AI2 CEO Oren Etzioni said. “And the test questions are hidden.”
Since the challenge started last week, nearly 100 teams have already entered, and about 1,000 teams have downloaded the provided practice questions used to teach their programs to take the quiz, Etzioni said. But AI2 had not received word from the one entry it’s waiting for: IBM. The computing giant did not have an immediate comment Monday.
The competition, hosted on data-science website Kaggle, is made up of questions from companies that specialize in writing test questions. There’s no way to cheat. The 2,500 real questions are embedded within tens of thousands of similar questions to discourage people from just taking the test themselves.
AI2’s science-answering program, the Aristo Project, is just a steppingstone for what the institute hopes to do — make machines that help scientists gather and reason through data quickly.
An early iteration of this idea, called Semantic Scholar, is launching next month. The program helps computer scientists find references to a wide variety of terms and topics within the enormous pool of published work on that topic.
“There’s so many challenging problems, from climate change, AIDS, cancer, you name it,” Etzioni said. “And there’s so many millions of papers being published every year. We are building a program that will make scientists’ jobs easier.”
Semantic Scholar is starting with computer science, which AI2 engineers know well, but the goal is to eventually move to other disciplines, including medicine.
The institute’s walls, most of which are covered in whiteboard ink for formula writing, display statistics of how well the program is performing and goals for the end of the year. The computer is surpassing 70 percent on some versions of a fourth-grade science test. It’s a big accomplishment, but it still has a long way to go, Etzioni said.
A team of nine researchers within the 43-employee company is working to get computers to understand diagrams that go along with science quizzes. That means making a computer recognize everything from arrows in food chains to a sun in a picture of precipitation. It’s something that human minds can comprehend almost instantly, but computers need to be taught to recognize each little piece and its context.
Etzioni downplayed the fear perpetuated in movies such as the “Terminator” series in which machines prepare to take over the world.
“One of the things about giving these tests to the machine is we can put it on a level playing field and we can assess how much it understands,” he said. “And if it’s doing 40 percent or 60 percent on the eighth-grade science test, I don’t think it’s ready to take over.”