Q&A: Microsoft XML Architect Jean Paoli
Page 1 of 1
This summer, Microsoft will release Office 2003, a core component of the company's XML
Paoli, one of the leaders of the SGML
Paoli was instrumental in designing Office 2003's XML capabilities and creating the newest Office application, InfoPath.
Q: As one of the co-creators of the XML 1.0 standard, what is your vision for XML?
With my colleagues in the XML Community, the idea was really to change the world of information. This was really the XML dream and before that the dream of SGML. What we all wanted to do was to change dramatically how to create, access and manage information. What we have done is to build a universal data format which is platform independent and based on an open standard.
I really believe that XML unified two worlds which were classically separate: the world of documents and the world of data. When you talk about data, you forget that the biggest database in the world, the largest information repository of a customer, is the actual set of documents that they create every day. This is a database which is perpetually growing and always underused. With the documents in XML, it is easy to mine, reuse and manage the data.
Q: Do you see the increasing acceptance of XML forcing us to change the way we think about content?
When we went and talked to customers about Office 2003, the first thing that they told us is "please help me connect 'disconnected islands of information.'" That means they may have -- on the back-end -- an old IBM mainframe, a Unix, a new Windows, a new Linux, and all these places have information and this information is not connected. Because of all this information, which is undocumented, I believe that XML is going to change the world of information at this level because the information could be transformed on demand to XML or created natively in XML, and it can be connected now to Office 2003 desktop tools.
It's the core design of XML that is going to help us solve this problem. It's textual; it enables you to create your own new tags. That's something that is extremely important because those tags reflect your business needs. If I want to do my sales report in XML, I can do it. Once I start using XML for that, then I can send that information to a back-end, where it becomes a sort of textual database that can start to be searched and re-used.
XML data is semi-structured: In a relational database, when you create a new table, all the rows have the same number of columns. In XML, you don't have to have this regularity. The data format is semi-structured. You can have the tag 'potential customer' present in one tag and not present in another. Because of that, the structure of an XML file can look more like the way you write a document than the way you write a relational database.
This is a model where I have an XML file, I created the tags that I need, and I express the data, not the presentation in it. When I express the data, I can take the information in it and send it to a foreign platform and you can process it -- for example, search or aggregate it. We believe in that model in Office. We implemented the support for those kinds of files across the Office family. Now the important thing is the scenario. If you want to create a document, you use Word. If you want to analyze data, you use Excel. If you want to gather information in a Form, you use InfoPath. If you have a project, you can put a 'request for proposal' template out on the Web in Word, and when third parties respond to the RFP, I can take the information from that file and put it in a database and compare the two.
Q: What is the significance of XML in Microsoft's strategy?
Four years ago, a lot of our customers told us 'we are all on the Internet
now. I am the same person but I use three or four different machines.' The
customer is defined by his own data. Our customers told us that they
to be able to access this information independent of the device and the
software they are using. So .NET and XML Web Services
The data is the important thing. So the model becomes the data has to be disconnected from the presentation and from the actual software that produces it. This data is XML, because the data has to flow freely between platforms. A lot of back-ends are, for example, Linux, or Unix or Windows machines. A lot are legacy machines. People don't change their back-end every day. They spend a lot of time building their back-ends. So, we really inversed the situation. Now our focus is the data of the customer in the center. His information is created by different devices. Office was a natural fit for the overall user interface.
Q: What are schemas
A schema describes the structure of information. A schema is essentially
just a formalization of the tags
All of our tools [Word, Excel, Access, InfoPath, Visio, FrontPage, etc.] become XML authorers. All these tools support inputting and outputting XML. FrontPage supports formatting XML. The IT department creates a customer-defined schema and the tools support the end user. If an annual report is using the XML capabilities of Word, inside this document you are going to find schema-defined tags like annual results, sales, etc. If someone creates the annual report using an XML schema, then somebody using Excel can go and load that annual report file and do a comparison, as long as it's using the XML syntax and a schema that defines it. If you have an existing Access database, and you want to output one specific customer's information with all the existing orders that he did, and then send that file to a third party like an accounting firm, you could do that with Access. Those documents are now reprocessable. Before, the maximum you could do was a full text search. That's not business.
Our added value is bringing XML to the end user, to democratize this XML
file that they don't know how to deal with. We really believe our
now have software that is very easy to use. A few years ago, the first
layer of XML that was adopted was the back-end. Now it is XML-enabled. The
second layer was connectivity, which is done with SOAP
Q: What role does the new InfoPath application play in Office 2003?
How do you gather information today? For example, a sales report? You do it with forms. But the customers told us that classic form technology was too hard to use or too rigid. If you use form technologies today, you may have a few fields like name or address. Then you have two or three lines where you can put your customers. If by accident you have more than three customers, then you need to use another piece of paper. Forms do not grow. You do not have spell checking, you cannot add an image. The form is very static; it looks like paper and you can't add or remove anything.
So the customers said forms are too rigid and customers really love documents, like those created by a word processor. But document technology does not have the validation functionality that you have in a form. What we did is we went and created a new kind of product. InfoPath is a hybrid product that takes the best of documents and the best of forms technology. Think about a form that is growing, a hierarchy of fields inside of fields inside of fields.
You really need to think of InfoPath as a sophisticated XML authoring tool which follows a customer-defined schema. When those rules are followed, it creates a valid document. The whole thing was to create a user interface that follows the schema and lets you only create a document which is valid to the schema.
We really wanted to put the XML authoring in the hands of the end user, the mass market. The user interface is in terms of menus and clicks and is very familiar to the end user, while still following the schema. The end user just knows that he is clicking on a few things and sees a form being created.
Q: How do you design a "docuform" with InfoPath?
The way it works is you define your own schema by creating an XSD