A consortium backed by Yahoo! has launched an ambitious effort to digitize classic books and technical papers and make them freely available...

Share story

SAN JOSE, Calif. — A consortium backed by Yahoo! has launched an ambitious effort to digitize classic books and technical papers and make them freely available on the Web.

One of the Open Content Alliance’s first projects will be to digitize the approximately 18,000-title collection of classic fiction and nonfiction American books owned by the University of California, the group said. That could be completed by the end of next year.

The consortium includes Adobe Systems, Hewlett-Packard Labs, the National Archives of the U.K., O’Reilly Media, the Prelinger Archives, the University of California and the University of Toronto.

The announcement of the consortium comes amid furious debate about a similar project called Google Library, in which the Mountain View, Calif., tech giant is scanning and digitizing millions of books at select libraries.

Most Read Stories

Unlimited Digital Access. $1 for 4 weeks.

Google’s effort differs, though, because it intends to digitize material regardless of its copyright status. The members of the Open Content Alliance say they will scan copyright material only if they have the permission of the rights holders.

Complete texts

Also, while Google allows people to view only excerpts of copyright material, the aim of the alliance is to offer complete texts for viewing and downloading.

“Our goal is to help with the expansion of human knowledge,” said Dave Mandelbrot, Yahoo!’s vice president of search content. “What we’d like to see in two or three years is a major collaborative effort where libraries are contributing material and publishers are providing permission” to digitize their content.

Each of the consortium’s partners is providing different areas of expertise. The Internet Archive, a San Francisco nonprofit that collects copies of Web pages and other material, is helping with the scanning of materials.

Yahoo! will index the content, make it searchable and pay for the scanning, about 10 cents a page. And Adobe will help convert some of it into its PDF format so that it can be downloaded from the Web.

The group is launching a Web site at www.opencontentalliance.org, where people will be able to access the content. But the digitized materials also will be available through a special page on Yahoo!’s Web site and through the Internet Archive. In fact, Mandelbrot said, the group’s goal is to make the content so that any search engine can index it and make it available.

Digitizing the world’s cultural archives, from television shows to classic books, has long been a burning ambition of Internet Archive founder Brewster Kahle.

In fact, Kahle’s Internet Archive already has launched an effort to digitize books called the Million Book Project, a collaboration with Indian and Chinese agencies and Carnegie Mellon University.

“The real crime is that we have all these people using the Internet for research, but we don’t have some of the best content on it,” Kahle said.

Amazon.com, the Internet retailer, also operates a massive book-digitizing project. But its goal is to make the material available to customers to help spur sales.

“At some point, we want to meet in the middle so end-users win,” Kahle said. “So they can have access to great works either for free or pay.”

In addition to working with libraries to scan older content whose copyright is expired, the consortium will collaborate with publishers and authors who want to make their works available on the Web.

In some instances, the Open Content Alliance will give copyright holders the option of releasing their material under a Creative Commons license, an alternative licensing scheme that encourages reuse and distribution of content.

Suit against Google

The announcement of the consortium comes just days after the 8,000-member Authors Guild sued Google in federal court seeking to stop its Google Library project. The group claims the project violates copyright law because authors have not first given permission for Google to digitize their works.

Google has defended its project by noting that it will offer only short excerpts — not full texts — of copyright material on its Web site. That type of use falls under “fair use” laws, the company contends. Also, Google is allowing authors to choose not to have their work digitized. The Electronic Frontier Foundation and others have defended Google in the debate.

A less-controversial companion project, Google Publisher, allows publishers to tell Google which books they want to be in the search engine’s index.

Google did not respond to a request for comment about the Open Content Alliance project.

The Association of American Publishers has been one of the critics of the Google Library program. Although an association spokeswoman said it had little information about the alliance project, she said it sounded “encouraging.”

“At the very least they are approaching it from the standpoint that it’s the author who says what can be done with his work,” said Judith Platt, the association’s director of communications. “There are ways the use of copyrighted material can be approached without violating the rights of the copyright holder.”