Internet publishers are increasingly relying on automated systems to tag phrases of interest and, in some cases, to provide links to other sites. But it can be difficult for publishers to keep a tight rein on their sites in this age of user-generated content.
NEW YORK — It wasn’t what anyone expected to see while perusing a news article.
But there, in the final paragraph of an online story about the call girl involved in the Eliot Spitzer scandal, Yahoo’s automated system was inviting readers to browse through photos of underage girls.
Yahoo Shortcuts, which more frequently offer to help readers search for news and Web sites on topics like “California” or “President Bush,” had in this case highlighted the words “underage girls.” Readers who passed their mouse over the phrase in The Associated Press story were shown a pop-up window filled with images from Flickr, Yahoo’s photo-sharing Web site.
Most Read Business Stories
- Boeing made an entire fake neighborhood to hide its bombers from potential WWII airstrikes
- 1 house, 45 offers: Homebuyers in Western Washington hard-pressed as supply remains scarce
- 55,000 in Washington state may have to pay back thousands in jobless benefits
- Seattle artists worry potential sale of historic INS building could spell the end for their studios
- Frontier cancels flight, citing maskless passengers
Some of the pictures showed nothing untoward, while several captions claimed that attached photos showed underage drinking. Clicking on the pop-up window yielded more-disturbing results: hundreds of images, including some of a girl or woman in pigtails, knee socks and lingerie. One photo showed a faceless female body, naked.
The misstep, which happened in early July, was noted on a technology blog. Editors at the AP contacted Yahoo, where a spokeswoman said the company quickly removed the link. Several of the more provocative photos were apparently taken off Flickr.
The phrase “underage girls,” now added to a list of thousands of previously blocked terms, will never again generate a Yahoo Shortcut, the company said. But the incident highlights how difficult it can be for publishers to keep a tight rein on their sites in this age of user-generated content.
Internet publishers are increasingly relying on automated systems to tag phrases of interest and, in some cases, to provide links to other sites. With legions of YouTube users, Flickr photographers and anonymous bloggers posting floods of their own, largely unsupervised material, it’s impossible for publishers using automation to exercise total control.
“No matter how sophisticated you make these automated systems, you’re not going to make them perfect, and all you can really strive for is to tune them as you go along,” said Lauren Weinstein, co-founder of People for Internet Responsibility. Still, he said, in this case “it’s pretty clear there was a lapse in terms of the quality control of Yahoo’s keyword list.”
It’s unclear how, exactly, “underage girls” was selected as a useful link. Yahoo Shortcuts “leverages a combination of algorithmic and editorial processes to identify current, relevant and popular terms,” said spokeswoman Meagan Busath. Among the factors the system considers: terms entered into Yahoo’s search engine.
That raises the unsavory prospect that “underage girls” could be among the most popular searches on Yahoo, said Chris Sherman, executive editor of the industry Web site Search Engine Land.
But he said it’s more likely that a combination of factors was at play. Perhaps a similar phrase is a popular search term, or perhaps the exploitation of young women has become a hot news topic.
The selection of the phrase could also have been driven by its relevance to the story at hand — after all, the AP article was about how Ashley Dupre had dropped a lawsuit that claimed she was underage when she appeared in a “Girls Gone Wild” video.
If the system was merely checking whether Flickr had a sufficient number of relevant results, the answer apparently would have been yes.
Although Busath notes that Flickr users and employees monitor the site’s content and report problematic images, a search of the site for the words “underage girls” turned up 428 photos.
Any technology has its hiccups as engineers refine it, and over the years automated content has occasionally offered offense. In one recent flub, a Yahoo photo collection about Osama bin Laden began with a picture of Sen. Barack Obama. There was nothing wrong with the programming (the senator had been at a hearing about the al-Qaida leader, and his photo was the most recent in the collection), but Yahoo rewrote its programming code to block the same thing from happening again, a spokesman said.
In a widely publicized incident in the early days of Google’s AdSense system, the service placed an ad for luggage next to a news story about a murder victim whose body was stuffed into a suitcase.
Google has since enhanced its technology to detect when sites contain “sensitive content,” said spokesman Daniel Rubin. Those pages often receive public service announcements in place of ads, he said.
“We are really only in the infancy of this kind of automated analysis,” Weinstein said. “I’m sure it’s going to be expanding greatly, not just in volume but in sophistication.”
Already, Yahoo displays its Shortcuts on stories hosted by Yahoo News from U.S. sources, including Time magazine and E! Online. Since 2006, The New York Times’ site has used an automated system to tag key words within its stories, directing readers to archived stories about the topics. It also uses automated technology to link to carefully vetted blogs from within its site, said Chief Technology Officer Marc Frons.
Further expanding that practice to automatically link out to additional sites is something the Times might consider in the future, Frons said. He noted, however, that the links would have to be carefully selected.
“The quality of the content and the information is paramount,” he said. “You want to make sure you’re striking the right balance between giving your readers everything the Web has to offer with making sure they’re getting the right information and the relevant information.”
Perhaps the biggest short-term goal for automated tagging is to create a richer browsing experience for Web users, while offering publishers an opportunity for profit, as the technology is used to link to commercial sites. For example, the word “laptop” in an article could be linked to Best Buy’s Web page.