Calais: Putting Wings on the Semantic Web?

You can find anything using Google. That may be a common perception, but it’s not accurate. Words and phrases often have multiple meanings so, despite the best efforts of Google (NASDAQ: GOOG) and other sophisticated search engines, you don’t always get the answer you’re looking for, certainly not on the first try.

“If you ask for ‘high blood pressure’ on Google, you won’t get ‘hypertension,'” IDC analyst Sue Feldman told “There’s often a mismatch between those writing the documents and the language experts use.”

Enter OpenCalais, a free service funded by media giant Thomson Reuters (NYSE: TRI). “The core of OpenCalais is a Web service we’ve released for commercial or noncommercial use,” explained Tom Tague, Calais evangelist and project lead. “We want to make all the world’s content more accessible, interoperable and valuable.”

The Calais Web service is designed to automatically create rich Semantic metadata from any
you submit (blogs, a novel, a news story) in well under a second using natural language processing, machine learning and other methods. Tague said Calais “goes well beyond classic entity identification and returns the facts and events hidden within your text as well.”

Analyst Feldman said she’s very bullish on Calais because of its potential. “classic search, keywords, only get you so far down the road.” She noted that Calais’ process stands to improve search results because it removes ambiguity. “It would know, for example, that IBM is also Big Blue,” she said.

Rather than challenge Google and other search players, Feldman said search companies would be wise to take advantage of Calais’ service. Calais is itself reaching out. Last week, Calais
announced a plug-in for Yahoo’s
called Marmoset. Calais also announced a plug-in for the open source WordPress blogging platform.

“We love SearchMonkey and think it will incent a lot of people to put Semantic data on their pages,” Tague said. “We created PHP code you can add to your site that 99 percent of the time does nothing, but when Yahoo comes by it sends it to Calais and back to SearchMonkey” so that people, companies and events are tagged.

In an interview with earlier this month, Yahoo’s
(NASDAQ:YHOO) Amit Kumar, director of product management for search, said search engines haven’t taken advantage of all the structured data that’s on the Web. He called SearchMonkey “the beginning of the next generation of search.”

Calais Tagaroo is the name of the WordPress plug-in. Once installed, Tagaroo takes text you type in a blog, fires it off to Calais and returns suggested tags for the blog post. “A lot of bloggers told us finding photographs online, resizing them and adding copyrights is a hassle,” Tague said.

With Tagaroo, WordPress users can automatically tag people, places, facts and events in their posts; Tagaroo also finds and resizes relevant copyrighted photos on Flickr. Calais Tagaroo will start working immediately, suggesting tags for more than 100 types of things (people, companies, geographies, and
organizations) and events (natural disasters, management changes, stock offerings).

Calais said Tagaroo’s “vocabulary” of facts and events continues to grow every month with plans to cover additional knowledge domains, such as sports and entertainment.

Currently, Tagaroo already can identify and tag more than a hundred different types of data, including people, companies, geographies and organizations as well as events (natural disasters, management changes, stock offerings, and so on).

Challenging Google?

IDC’s Feldman sees Calais augmenting search efforts by the likes of Google (who’s stated aim is to “organize the world’s information”), Yahoo and others, rather than compete with them.

“If Reuters can get companies to do plug-ins to Calais, like they already have for WordPress, Calais has a chance to become a standard for tagging documents and we won’t have the confusing proliferation of terminology we have today,” Feldman said.

Tague admits the effort is at an early stage, but he’s optimistic. “We’re using a massive natural language processing engine (NLP) that’s been built up over the last 10 years, and it’s extraordinarily accurate,” he said. “Like any NLP it occasionally makes mistakes, but it’s more accurate than a human and when it does make mistakes it’s in a consistent manner we can correct.”

News Around the Web