RealTime IT News

Q&A: Microsoft XML Architect Jean Paoli

This summer, Microsoft will release Office 2003, a core component of the company's XML strategy that will serve as the user interface for end-users' interaction with XML. Jean Paoli, Microsoft's XML Architect and one of the co-creators of the World Wide Web Consortium's XML 1.0 standard, sat down with internetnews.com to discuss XML and Microsoft's vision for Office 2003.

Paoli, one of the leaders of the SGML community before that technology evolved into XML, joined Microsoft in 1996 after serving as technical director of GRIF S.A., a leader in the creation of SGML authoring tools. Once at Microsoft, Paoli jump-started the company's XML activity by creating and managing the team that delivered the software that XML-enabled both Internet Explorer and the Windows operating system. With a specialty in building end-user markup editing tools, Paoli is now a member of Microsoft's Office team.

Paoli was instrumental in designing Office 2003's XML capabilities and creating the newest Office application, InfoPath.

Q: As one of the co-creators of the XML 1.0 standard, what is your vision for XML?

With my colleagues in the XML Community, the idea was really to change the world of information. This was really the XML dream and before that the dream of SGML. What we all wanted to do was to change dramatically how to create, access and manage information. What we have done is to build a universal data format which is platform independent and based on an open standard.

I really believe that XML unified two worlds which were classically separate: the world of documents and the world of data. When you talk about data, you forget that the biggest database in the world, the largest information repository of a customer, is the actual set of documents that they create every day. This is a database which is perpetually growing and always underused. With the documents in XML, it is easy to mine, reuse and manage the data.

Q: Do you see the increasing acceptance of XML forcing us to change the way we think about content?

When we went and talked to customers about Office 2003, the first thing that they told us is "please help me connect 'disconnected islands of information.'" That means they may have -- on the back-end -- an old IBM mainframe, a Unix, a new Windows, a new Linux, and all these places have information and this information is not connected. Because of all this information, which is undocumented, I believe that XML is going to change the world of information at this level because the information could be transformed on demand to XML or created natively in XML, and it can be connected now to Office 2003 desktop tools.

It's the core design of XML that is going to help us solve this problem. It's textual; it enables you to create your own new tags. That's something that is extremely important because those tags reflect your business needs. If I want to do my sales report in XML, I can do it. Once I start using XML for that, then I can send that information to a back-end, where it becomes a sort of textual database that can start to be searched and re-used.

XML data is semi-structured: In a relational database, when you create a new table, all the rows have the same number of columns. In XML, you don't have to have this regularity. The data format is semi-structured. You can have the tag 'potential customer' present in one tag and not present in another. Because of that, the structure of an XML file can look more like the way you write a document than the way you write a relational database.

This is a model where I have an XML file, I created the tags that I need, and I express the data, not the presentation in it. When I express the data, I can take the information in it and send it to a foreign platform and you can process it -- for example, search or aggregate it. We believe in that model in Office. We implemented the support for those kinds of files across the Office family. Now the important thing is the scenario. If you want to create a document, you use Word. If you want to analyze data, you use Excel. If you want to gather information in a Form, you use InfoPath. If you have a project, you can put a 'request for proposal' template out on the Web in Word, and when third parties respond to the RFP, I can take the information from that file and put it in a database and compare the two.

Q: What is the significance of XML in Microsoft's strategy?

Four years ago, a lot of our customers told us 'we are all on the Internet now. I am the same person but I use three or four different machines.' The customer is defined by his own data. Our customers told us that they wanted to be able to access this information independent of the device and the software they are using. So .NET and XML Web Services were a very strategic announcement. It was very profound. With Office 2003, people on the desktop are starting to really see how it was great for our customers.

The data is the important thing. So the model becomes the data has to be disconnected from the presentation and from the actual software that produces it. This data is XML, because the data has to flow freely between platforms. A lot of back-ends are, for example, Linux, or Unix or Windows machines. A lot are legacy machines. People don't change their back-end every day. They spend a lot of time building their back-ends. So, we really inversed the situation. Now our focus is the data of the customer in the center. His information is created by different devices. Office was a natural fit for the overall user interface.

Q: What are schemas and how do they interact with XML in Office 2003?

A schema describes the structure of information. A schema is essentially just a formalization of the tags that I'm using. Tags really delineate, in a very small, granular way, the different regions of a document. Those different regions can then be extracted from the document, stored in a database, reused, etc. But because schemas are about tags and data modeling, these things are not for the end users. Instead, developers or power users create a schema that can be associated with a template. Everybody knows how to use a template. It's all about drag and drop.

All of our tools [Word, Excel, Access, InfoPath, Visio, FrontPage, etc.] become XML authorers. All these tools support inputting and outputting XML. FrontPage supports formatting XML. The IT department creates a customer-defined schema and the tools support the end user. If an annual report is using the XML capabilities of Word, inside this document you are going to find schema-defined tags like annual results, sales, etc. If someone creates the annual report using an XML schema, then somebody using Excel can go and load that annual report file and do a comparison, as long as it's using the XML syntax and a schema that defines it. If you have an existing Access database, and you want to output one specific customer's information with all the existing orders that he did, and then send that file to a third party like an accounting firm, you could do that with Access. Those documents are now reprocessable. Before, the maximum you could do was a full text search. That's not business.

Our added value is bringing XML to the end user, to democratize this XML file that they don't know how to deal with. We really believe our customers now have software that is very easy to use. A few years ago, the first layer of XML that was adopted was the back-end. Now it is XML-enabled. The second layer was connectivity, which is done with SOAP and the Web Services standards. Now, finally, we are bringing in the front end, the missing piece: on the desktop we have an entire suite which can read and write for this core data model that we adopted three years ago with .NET. We are bringing the front-end to all these XML Web services that all the back-ends in the world are going to output.

Q: What role does the new InfoPath application play in Office 2003?

How do you gather information today? For example, a sales report? You do it with forms. But the customers told us that classic form technology was too hard to use or too rigid. If you use form technologies today, you may have a few fields like name or address. Then you have two or three lines where you can put your customers. If by accident you have more than three customers, then you need to use another piece of paper. Forms do not grow. You do not have spell checking, you cannot add an image. The form is very static; it looks like paper and you can't add or remove anything.

So the customers said forms are too rigid and customers really love documents, like those created by a word processor. But document technology does not have the validation functionality that you have in a form. What we did is we went and created a new kind of product. InfoPath is a hybrid product that takes the best of documents and the best of forms technology. Think about a form that is growing, a hierarchy of fields inside of fields inside of fields.

You really need to think of InfoPath as a sophisticated XML authoring tool which follows a customer-defined schema. When those rules are followed, it creates a valid document. The whole thing was to create a user interface that follows the schema and lets you only create a document which is valid to the schema.

We really wanted to put the XML authoring in the hands of the end user, the mass market. The user interface is in terms of menus and clicks and is very familiar to the end user, while still following the schema. The end user just knows that he is clicking on a few things and sees a form being created.

Q: How do you design a "docuform" with InfoPath?

The way it works is you define your own schema by creating an XSD file [for instance, with Visual Studio .NET 2003] or connecting to an XML Web Service. Then, with InfoPath, you open the schema in design mode and with a drag-and-drop the user decides how he wants to design the actual form. InfoPath then generates an XSLT file, which basically transforms the XML. We ship 25 sample forms, or the user can also start from an empty form with drag and drop controls and we create the schema for him. It's really about business processes, both organizational business processes like human resources, sales data and inventory, and work group business processes like status reports, sales data collection and procurement.