To detect and police content across YouTube’s sprawling library, and ensure ads don’t run against questionable content, Google must solve an AI problem no one has cracked yet: automatically understanding everything that’s going on in videos.

Share story

An advertiser boycott of YouTube is testing a critical and much-hyped part of Google’s future: its prowess in artificial intelligence (AI). Some experts in the field say the technology isn’t up to scratch yet, but that if any company can solve the problem at hand, it’s the online search giant. 

Some of the world’s biggest marketers halted spending on Google’s YouTube site last month after ads from large brands were found running alongside hateful and extremist videos.

Google parent Alphabet risks losing $750 million in revenue this year from the debacle, analysts at Nomura Instinet estimated last week. That’s less than 1 percent of projected sales this year, so it can weather the financial storm. But it’s likely an incentive for the company to redirect AI investments and accelerate research efforts already underway. 

To detect and police content across YouTube’s sprawling library and ensure ads don’t run against questionable content, Google must solve an AI problem no one has cracked yet: automatically understanding everything that’s going on in videos, including gesticulations and other human nuances.

A potential solution lies in machine learning, a powerful AI technique for automatically recognizing patterns across reams of data — a Google specialty. Chief Executive Officer Sundar Pichai has pledged to infuse the technology across all its products, and the company touts its abilities in the field to software developers, cloud-computing clients, advertisers and shareholders.  

Computer scientists doubt technology alone can expunge offensive videos. “We’re not there yet where we can, say, find all extremist content,” said Hany Farid, a Dartmouth professor and senior adviser to the Counter Extremism Project, which has repeatedly called on YouTube to tackle this problem.

He recommends companies like Google and Facebook deploy more human editors to filter content. “Machine learning, AI, is nowhere near that yet,” he said. “Don’t believe the hype.” 

The AI hype machine seems to be running at full speed in the tech industry right now as startups and technology giants like Google, Amazon and Microsoft compete to recruit engineers and scientists skilled in the field. 

Google’s AI advances sometimes match the hype, but they are not perfect. The company’s cloud division recently released a tool (unrelated to YouTube) that breaks videos into their constituent parts, rendering them “searchable and discoverable.” A group of academics published research last week that showed how to deceive this system by injecting images into videos.

Google has used machine learning and other AI tools to master speech, text and image recognition. In 2012, researchers famously got a network of 16,000 computers to teach itself to recognize cats by scanning millions of still images culled from YouTube videos. Understanding entire videos is a lot more difficult. Cats meow, stretch and jump through more than a thousand video frames each minute.

“A video is three dimensional, with two dimensions in space and an additional dimension in time,” said Jiebo Luo, an AI expert at the University of Rochester, who has created text-based hate-speech filters for social media like Twitter. “It’s a challenge.”

Particularly so for something as big as YouTube. It said in 2015 that people uploaded 300 hours of content per minute to the service, making it impossible to screen all videos as they appear. 

Two years have passed, so could software now evolve to a point where it dissects every video uploaded online, deciphering the vile from the rest? “It’s possible,” said Luo, “especially given the massive computational and people resources that Google has. No other company is in a better position to do that.” 

Still, it’s not cheap. Video demands advanced algorithms and computing horsepower. Today the industry relies on specialized semiconductors, called graphic processing units, to drive the software. Processing an hour of video typically occupies half the resources of a GPU, said Reza Zadeh, founder of Matroid, an AI startup working on video. A graphics card featuring a high-end chip from market leader Nvidia sells for about $500. “The best models are expensive right now,” Zadeh said. 

Google spends billions of dollars a year on data centers, computer servers and the chips that run them. It doesn’t say how much of this goes toward YouTube, but the video site is known to be expensive to run.

In AI, Google even has developed its own hardware, called TensorFlow Processing Units. It recently offered a $30,000 prize for researchers to use its cloud and TPUs, or similar AI tools, to accurately label YouTube videos.

Google researchers have applied machine-learning software to classify images and audio inside videos for years (is that video tagged as a Prince song really Prince?), while improving recommendations and ad performance. Another part of Alphabet — a group called Jigsaw — is using AI tools in other ways to curb hate speech online.

In a memo to aggrieved YouTube advertisers last month, the company said its machine-learning algorithms will improve the precision and classification of videos. However, it also warned that with the volume of content involved, this can never be 100 percent guaranteed. 

Classifying what we hear and watch online, together at once, “is a problem that is essentially open,” said Zadeh, of Matroid. “Probably, Google is doing it right now.” 

He’s probably right. At a conference Google hosted in February, Jeff Dean, head of Google’s “Brain” AI research unit, spoke about his team’s advances. “The next big domain is video,” he said.