RealTime IT News
Berners-Lee Talks Up Semantic Web
By Thor Olavsrud
September 23, 2003

What if the World Wide Web were one giant database, linking both human readable documents and machine readable data in a way useful to both mankind and machine?

It would be the future of the Web espoused by Tim Berners-Lee, father of the World Wide Web and director of the World Wide Web Consortium (W3C). Since Berners-Lee and a few other leaders at the W3C first mentioned it in May 2001, that vision has increasingly become a leading focus of the W3C's work. They call it the Semantic Web.

Speaking before Britain's scientifically-minded Royal Society Monday, Berners-Lee attempted to explain the vision of the Semantic Web, and why he believes it will reinvent the existing Web for both end users and businesses.

"It's so difficult to explain to people who are used to the Web why, before the Web, it was so difficult to explain to people what the Web was all about," Berners-Lee said.

He explained that the words and concepts needed to explain the little documentation system he began creating in the late 1980s, during his tenure at CERN, the European particle physics lab, just hadn't existed. But once people saw the Web and what it could do, it seemed so simple, he said.

The Semantic Web faces much the same conundrum.

Simply put, the idea behind the Semantic Web is to give data more meaning through the use of metadata , the data about data, which describes how and when and by whom a particular set of data was collected, and how the data is formatted. By adding metadata to the existing Web, the Semantic Web will allow both humans and machines to find and make use of data in ways that previously haven't been possible.

"The Semantic Web is just mechanical data," Berners-Lee said. "It's like a great big database."

For instance, he explained, consider an event listing on the Web for a lecture. It would include data like the location, start time, end time, the speaker, a phone number to call for more information and so on. But the data is fairly static. It can be read by humans, but not by machines. However, metadata could be applied to those datapoints which identify to machines what they are. Then an interested party could click to attend the event, and whatever calendaring application that person uses could immediately schedule the event in the planner, denoting where it is, what time it will start and what time it will end, and who will be speaking. It could provide a map to get the person to that event, and supply information about the speaker.

It is certainly possible to do this sort of thing without the Semantic Web, Berners-Lee said. A person could just cut and paste from the Web site listing to his or her calendaring application. He could then click on multiple other links to get the rest of the data. But that, Berners-Lee said, is not making use of the Web to its fullest extent.

"When it comes to the data in our lives, we are pre-Web," he said. "It's silly for us to do things which the computer could do for us."

And of course, he noted, the possibilities extend far beyond calendaring functions. An important focus is Enterprise Application Integration (EAI).

"Wherever there is a connection of common concepts between different applications, then it becomes interesting to connect those applications together, to break them out of their boxes," Berners-Lee said. "The Semantic Web starts to connect them together."

One of the numerous foundational specifications for the Semantic Web is the Resource Description Framework . RDF, built on top of XML , is a general framework for describing metadata. It provides interoperability between applications that exchange machine-understandable information on the Web.

By implementing products based on RDF as an EAI "hub," companies can link together documents, and data stored in disparate databases, and pull related concepts together when analyzing the information. That sort of thing can be done with XML Web services today, but it can be a laborious task, Berners-Lee explained. For instance, you might have information in three XML documents that you want to merge. But each document uses its own schema , which defines the tags used to express the data. To merge the data in the three documents, Berners-Lee said a person might have to interview the people that created the schemas and then write a new schema that can take the data in the documents and express them as a new document.

"RDF just concatenates the documents," Berners-Lee said.

It could also have tremendous applications in scientific fields, he noted. For instance, researchers studying weather phenomena would be able to identify which weather balloons supplied particular datapoints, who manufactured the balloons and where the materials came from. In the case of corrupted data, that could allow them to identify faulty balloons and even discard the particular data supplied by those balloons.

"You don't just want to look at data," he said. "You want to look at documents and see where they came from."

Of course, there are still obstacles and potential problems that the Semantic Web faces, even if you put aside the difficulties of explaining why people should be excited about it, Berners-Lee said.

There are many specifications that still need to be delivered. Web Ontology Language (which the W3C has given the acronym OWL, solely because it sounds better than WOL), is one of those. There are about 20 more potential standards to sift through. Berners-Lee said the whole thing could be a failure if the Semantic Web is not compatible with the existing Web. And companies could try to derail the whole process by claiming patents on the technology the W3C is developing.

"When it comes to the infrastructure standards, we have to keep it like HTTP: royalty-free," he said.