RealTime IT News

Open Content Alliance: The World's Books For All

SAN FRANCISCO -- Search is on a mission these days.

It's no longer enough to be able to index and point to everything that's loaded on a Web server somewhere. Search has moved into a new era in which content owners and search providers are hustling to digitize information moldering on the shelf.

"The World Wide Web gives us access to more information, but almost everything on the net has been written since 1996," said Brewster Kahle, founder of the Internet Archive. "I think folks before 1996 also had something to say."

Kahle spoke to an audience of librarians and journalists for the kick-off of the Open Content Alliance (OCA), a group with a plan to scan as many out-of-print books as possible, then work up the chain toward books under copyright. The OCA was announced on October 3.

The digitized books will be made openly available for search on the Internet Archive Web site or through other search services.

"Having an open library allows different projects to build new and different interfaces without having to ask permission," Kahle said.

The archive created an elegant reading interface that uses a page-turning metaphor. Entering a term in the search query box produces a yellow tab on each page on which the term is found. Clicking on a tab takes the user to that page, where the term is highlighted in yellow.

At the event, held in San Francisco's Presidio, Kahle's staff demonstrated the "scribe station," a system for scanning books that he said would cost around ten cents a page. The system uses a 16-megapixel digital camera that produces images at 500 DPI. Software color corrects the images and provides thumbnails so the operator can make sure all of the pages have been scanned.

The OCA has more software to help determine whether a particular book might be under copyright, and if so, to connect with another database created in partnership with libraries to find the copyright holder. "Copyright issues are tricky, but they're doable," Kahle said.

Rather than shipping books to a central location for scanning by a single company, OCA members will for the most part handle their own scanning, then upload the digitized documents to the archive.

OCA membership includes prestigious libraries and research institutions that have pledged to digitize priceless collections and make them available for search.

For example, the Smithsonian Institution will contribute its current digital collection and work to digitize materials with a focus on history, culture and biodiversity. The Missouri Botanicals Garden will scan rare botanical prints and books kept under lock and key in its archives. The Natural History Museum of London, the New York Botanical Garden and Royal Botanical Garden of London will contribute materials, as will the libraries of Columbia, Emory and Johns Hopkins Universities.

While Yahoo was a founding member of the OCA, and MSN announced its membership at the event, Google was conspicuously absent. Google is being sued by the Association of American Publishers and the Authors Guild for scanning library books without the consent of their copyright holders. (Google says its activities fall under fair use principles.)

Founding OCA members are The Internet Archive, Yahoo! Inc., Adobe Systems Inc., the European Archive, HP Labs, the National Archives (UK), O'Reilly Media Inc., Prelinger Archives, the University of California, and the University of Toronto. Fourteen new members were announced at the event.

Several new OCA members said they wanted to make sure that these public troves of knowledge remained owned by the public.

Daniel Greenstein, executive director of the University of California's California Digital Library, said, "We want to make sure these works don't become commodified."

Doron Weber, director of the Sloan Foundation's programs for public understanding of science and technology and history of science and technology, said, "We cannot risk having world knowledge privatized. We believe an open, non-proprietary approach is better. To private companies, we say, 'Rein in your impulses.'"