Revisiting Semantic Web’s Pros and Cons




Semantic Web

Mention the Semantic Web, and some people think it’s going to be a little slice of heaven, while others see it as the end of the world.

The first camp, which mainly includes scientists and researchers, wants to use computers to link up data from different sources to create a holistic view of the world.

The second, which is more concerned with the social impact of technology, counters that this would result in a massive invasion of privacy and that it will create useless results in part because it misses out on the implicit or ambiguous communications of the real world — where a wink is often as good as a nod. (And how would you wink if you were a computer?)

Of course, they both could be right. Ultimately, their debate is really about how the technological advances may impact society.

Scientists tend to like their worlds clear-cut and devoid of extraneous loose bits of matter, and typical of those who espouse the pro-Semantic Web view is Prof. James Hendler of Rensselaer Polytechnic Institute.

“My work is trying to integrate heterogeneous data using appropriate amounts of metadata and doing things with metadata that you can’t do with language or specific data,” Hendler told InternetNews.com.

For example, he said, searching for a video on YouTube would likely be fruitless “unless you know the name of the artist and what you want to see.” Having brief descriptions of a video’s contents would be helpful but people submitting videos to the site “don’t want to write that many words when they send in their videos.”

However, if the files include a small amount of metadata, people searching for, say, the James Bond movie “Goldfinger” may be able to find the video even without knowing its name.

Users “would, for example, be able to say ‘I want that video where the guy takes off his hat and throws it at a statue and the head falls off and the hat comes back to you’ and it comes back with the title of the James Bond movie and a spoof.”

The vision, as articulated by Sir Tim Berners-Lee back in 1999, was of a Web in which computers “become capable of analyzing all the data on the Web — the content, links, and transactions between people and computers” and this would result in “the day-to-day mechanisms of trade, bureaucracy and our daily lives” being handled by machines talking to machines.

A tidy vision, but critics warn the fallout from this could be a massive invasion of privacy. After all, to deal effectively with these day-to-day tasks, computers would have to know as much as possible about everyone and everything.

Drawing connections

That’s because the task of even beginning to approach this sort of vision is steep indeed.

[cob:In_Focus]For example, much of what a salesman knows about his contacts is contained not in his Rolodex or contacts management software, but in his head.

When assessing a potential lead, he correlates bits of discrete data, including knowledge about the product he is trying to sell, his prospective target’s habits and preferences, indications about the general financial condition of his target and background information about the condition of the economy.

Consequently, he knows you don’t try to sell a Rolls-Royce to someone who can’t afford to dress really well.

Those are the kinds of data linkages the Semantic Web is meant to facilitate. One difference is that while the salesman’s linkages are in his mind, which affords him a measure of privacy, the Semantic Web plan requires linkages to be open, shared and potentially visible to the public.

Page 2 of 3

That public availability of information has some critics concerned.

“We had a meeting with a venture capitalist in Palo Alto six months ago and I put up a PowerPoint presentation that included a slide which had information about his mortgage — the deed, his signature, the whole nine yards,” said Pat Dane, CEO of MyPublicInfo, which specializes in protecting consumers’ personal information for a fee.

“It’s all a matter of public record,” he told InternetNews.com. “With the Semantic Web, it’s going to be a matter of time before more detailed data — whether you ever had a DUI conviction, if you ever did anything wrong — will be available to anybody for free.”

Likewise, the Liberty Alliance, an industry consortium of online companies working to develop Web services identity management tools and policies, is very conscious of this possibility.

“The issue we’re all facing is the balance between creating a ubiquitous Web-based identity layer in cyberspace and establishing privacy protections,” Roger Sullivan, the Alliance’s president and vice president of identity management at Oracle, told InternetNews.com.

One way that supporters are aiming to satisfy privacy concerns is with technology like Security Assertion Markup Language, or SAML , an XML-based framework for ensuring the security of transmitted communications. The Liberty Alliance is one of the industry groups promoting SAML, and is involved in its development.

“There’s the concept of user-centric identity, which is designed to deal with how much personal information you make available to social interaction sites and who those sites know to trust,” Scott Crawford, research director at Enterprise Management Associates, told InternetNews.com. “That’s … implicit in SAML 2.0.”

Yet it’s unclear whether those protections will fail in practice or whether “they are realistic enough to deliver the goods,” Crawford said.

Semantic Web’s proponents are well aware of the need for privacy. “I would love my doctor to be able to find all my medical records when he needs them, but at the same time, I don’t want other people to have it,” Hendler said.

“We’re working on both how to integrate data and how to protect it,” he said.

Another argument raised by Semantic Web critics is that indiscriminate or poorly reasoned linking could give rise to useless results.

[cob:In_Focus]Long-time Semantic Web opponent Clay Shirky has cited syllogisms — inferences — as being an excellent example of how bad results could result from reducing interactions to algorithms, to “a world where language is merely math done with words,” he described in an earlier online essay.

Shirky described one example of the potentially inane results that Semantic Web may produce:

“Consider the following assertions: Count Dracula is a vampire. Count Dracula lives in Transylvania. Transylvania is a region of Romania. Vampires are not real,” he wrote. “You can draw only one non-clashing conclusion from such a set of assertions — Romania isn’t real.”

Syllogistic fallacies can be corrected among people, over time and with education. Semantic Web critics, however, fear that a computerized syllogistic approach cannot be easily corrected.

More “human factors”

And that’s just the start of problems caused by the ambiguities of the real world. Nonverbal communication — body language and facial expressions — have long been held to convey a vast amount of information between two people speaking face-to-face. (Or as Emerson famously put it, “What you do speaks so loudly that I cannot hear what you say.”)

Page 3 of 3

It’s not surprising, however, that factoring that human element into meaningful Semantic Web data — that is, Resource Description Framework, or RDF , a standard for describing machine-readable metadata — may prove a major hurdle.

“Most of the important information we have is implicit and ambiguous and loose-edged and messy, and that stuff escapes the net of RDF,” said author and technology commentator David Weinberger, a fellow at the Berkman Center for Internet and Society at Harvard University.

Some of that inherent looseness may be captured in narrow categories, or ontologies, but “there’s a balance between the complexity of the ontology and its utility across ontology boundaries,” said Weinberger, who most recently wrote about information classification in his 2007 book, Everything Is Miscellaneous.

“The more specific, detailed and complex you make your ontology, the more semantic value it has,” Weinberger said. “But then, it will make it harder to integrate with other Semantic Web ontologies” because the syntaxes of different ontologies are not likely to be the same.

Both the problems of ending up with inane results and of dealing with the ambiguities of the real world cannot be solved by metadata, critics charge. Among them, writer and blogger Cory Doctorow, who blasted the concept in his 2001 essay “Metacrap: Putting the torch to seven straw-men of the meta-utopia.”

In his essay, Doctorow described seven insurmountable obstacles to getting reliable metadata: People lie; people are lazy; people are stupid; people can’t accurately observe themselves; schema aren’t neutral; metrics influence results; and there’s more than one way to describe something.

All true; but John Wilbanks, vice president for science at Creative Commons, a non-profit that develops alternative licenses for creative works than traditional copyright, sees no conflict between the two camps’ take on the Semantic Web.

“There’s got to be a little of both approaches,” Wilbanks told InternetNews.com. “Clay Shirky is right that the problems facing the regular Web are not sufficient to justify going to the Semantic Web today; but industries like the pharmaceutical industry have a lot of problems and it’s justified for them.”

The Semantic Web is “just about making the Web work for databases by creating a context for links and, in many cases, creating the links,” he added. “I call it the one database per child view.”

[cob:In_Focus]That approach is “the opposite of the Web in many ways,” he said. That’s because it aims to find ways to take data sources in which structure is important — like databases of genes and proteins — then “find a way to wire that into the folksonomies of the fringes where the interesting science is happening, where the argument is not what the chemicals are but what they do.”

With any luck, both sides of the Semantic Web debate may be mollified by such an approach. And just in time, too: the first applications are already being built on the emerging technology, Hendler said.

For example, several new products were announced at the
Semantic Technology Conference
, held in San Jose, Calif., last week.

Additionally, the Semantic Web camp received further votes of confidence in the form of an enhanced effort from business information powerhouse Thomson Reuters, which updated its Calais toolkit, designed to encourage broader deployment of Semantic Web technologies.

Despite such tentative steps, if the heated rhetoric from supporters and detractors indicates anything, it’s that the Semantic Web has a good deal of debate still yet to come.

News Around the Web