Q&A: Microsoft XML Architect Jean Paoli

This summer, Microsoft will release Office 2003, a core component of the company’s XML strategy that will serve as the user interface for end-users’ interaction with XML. Jean Paoli, Microsoft’s XML Architect and one of the co-creators of the World Wide Web Consortium’s XML 1.0 standard, sat down with internetnews.com to discuss XML and Microsoft’s vision for Office 2003.

Paoli, one of the leaders of the SGML community before that technology evolved into XML, joined Microsoft in 1996 after serving as technical director of GRIF S.A., a leader in the creation of SGML authoring tools. Once at Microsoft, Paoli jump-started the company’s XML activity by creating and managing the team that delivered the software that XML-enabled both Internet Explorer and the Windows operating system. With a specialty
in building end-user markup editing tools, Paoli is now a member of
Microsoft’s Office team.


Paoli was instrumental in designing Office 2003’s XML capabilities and
creating the newest Office application, InfoPath.

Q: As one of the co-creators of the XML 1.0 standard, what is your
vision for XML?

With my colleagues in the XML Community, the idea was really to change the
world of information. This was really the XML dream and before that the
dream of SGML. What we all wanted to do was to change dramatically how to
create, access and manage information. What we have done is to build a
universal data format which is platform independent and based on an open
standard.

I really believe that XML unified two worlds which were classically
separate: the world of documents and the world of data. When you talk
about
data, you forget that the biggest database in the world, the largest
information repository of a customer, is the actual set of documents that
they create every day. This is a database which is perpetually growing and
always underused. With the documents in XML, it is easy to mine, reuse and
manage the data.

Q: Do you see the increasing acceptance of XML forcing us to change the
way we think about content?

When we went and talked to customers about Office 2003, the first thing
that they told us is “please help me connect ‘disconnected islands of
information.'” That means they may have — on the back-end — an old IBM
mainframe, a Unix, a new Windows, a new Linux, and all these places have
information and this information is not connected. Because of all this
information, which is undocumented, I believe that XML is going to change
the world of information at this level because the information could be
transformed on demand to XML or created natively in XML, and it can be
connected now to Office 2003 desktop tools.

It’s the core design of XML that is going to help us solve this problem.
It’s textual; it enables you to create your own new tags. That’s something
that is extremely important because those tags reflect your business
needs.
If I want to do my sales report in XML, I can do it. Once I start using
XML
for that, then I can send that information to a back-end, where it becomes
a sort of textual database that can start to be searched and re-used.

XML data is semi-structured: In a relational database, when you create a
new table, all the rows have the same number of columns. In XML, you don’t
have to have this regularity. The data format is semi-structured. You can
have the tag ‘potential customer’ present in one tag and not present in
another. Because of that, the structure of an XML file can look more like
the way you write a document than the way you write a relational
database.

This is a model where I have an XML file, I created the tags that I need,
and I express the data, not the presentation in it. When I express the
data, I can take the information in it and send it to a foreign platform
and you can process it — for example, search or aggregate it. We believe
in that model in Office. We implemented the support for those kinds of
files across the Office family. Now the important thing is the scenario.
If
you want to create a document, you use Word. If you want to analyze data,
you use Excel. If you want to gather information in a Form, you use
InfoPath. If you have a project, you can put a ‘request for proposal’
template out on the Web in Word, and when third parties respond to the
RFP,
I can take the information from that file and put it in a database and
compare the two.

Q: What is the significance of XML in Microsoft’s strategy?

Four years ago, a lot of our customers told us ‘we are all on the Internet
now. I am the same person but I use three or four different machines.’ The
customer is defined by his own data. Our customers told us that they
wanted
to be able to access this information independent of the device and the
software they are using. So .NET and XML Web Services were a very strategic announcement. It was very profound. With Office 2003,
people on the desktop are starting to really see how it was great for our
customers.

The data is the important thing. So the model becomes the data has to be
disconnected from the presentation and from the actual software that
produces it. This data is XML, because the data has to flow freely between
platforms. A lot of back-ends are, for example, Linux, or Unix or Windows
machines. A lot are legacy machines. People don’t change their back-end
every day. They spend a lot of time building their back-ends. So, we
really
inversed the situation. Now our focus is the data of the customer in the
center. His information is created by different devices. Office was a
natural fit for the overall user interface.

Q: What are schemas and how do they interact with XML
in
Office 2003?

A schema describes the structure of information. A schema is essentially
just a formalization of the tags that I’m using. Tags really
delineate, in a very small, granular way, the different regions of a
document. Those different regions can then be extracted from the document,
stored in a database, reused, etc. But because schemas are about tags and
data modeling, these things are not for the end users. Instead, developers
or power users create a schema that can be associated with a template.
Everybody knows how to use a template. It’s all about drag and drop.

All of our tools [Word, Excel, Access, InfoPath, Visio, FrontPage, etc.]
become XML authorers. All these tools support inputting and outputting
XML.
FrontPage supports formatting XML. The IT department creates a
customer-defined schema and the tools support the end user. If an annual
report is using the XML capabilities of Word, inside this document you are
going to find schema-defined tags like annual results, sales, etc. If
someone creates the annual report using an XML schema, then somebody using
Excel can go and load that annual report file and do a comparison, as long
as it’s using the XML syntax and a schema that defines it. If you have an
existing Access database, and you want to output one specific customer’s
information with all the existing orders that he did, and then send that
file to a third party like an accounting firm, you could do that with
Access. Those documents are now reprocessable. Before, the maximum you
could do was a full text search. That’s not business.

Our added value is bringing XML to the end user, to democratize this XML
file that they don’t know how to deal with. We really believe our
customers
now have software that is very easy to use. A few years ago, the first
layer of XML that was adopted was the back-end. Now it is XML-enabled. The
second layer was connectivity, which is done with SOAP and
the Web Services standards. Now, finally, we are bringing in the front
end,
the missing piece: on the desktop we have an entire suite which can read
and write for this core data model that we adopted three years ago with
.NET. We are bringing the front-end to all these XML Web services that all
the back-ends in the world are going to output.

Q: What role does the new InfoPath application play in Office 2003?

How do you gather information today? For example, a sales report? You do
it
with forms. But the customers told us that classic form technology was too
hard to use or too rigid. If you use form technologies today, you may have
a few fields like name or address. Then you have two or three lines where
you can put your customers. If by accident you have more than three
customers, then you need to use another piece of paper. Forms do not grow.
You do not have spell checking, you cannot add an image. The form is very
static; it looks like paper and you can’t add or remove anything.

So the customers said forms are too rigid and customers really love
documents, like those created by a word processor. But document technology
does not have the validation functionality that you have in a form. What
we
did is we went and created a new kind of product. InfoPath is a hybrid
product that takes the best of documents and the best of forms technology.
Think about a form that is growing, a hierarchy of fields inside of fields
inside of fields.

You really need to think of InfoPath as a sophisticated XML authoring tool
which follows a customer-defined schema. When those rules are followed, it
creates a valid document. The whole thing was to create a user interface
that follows the schema and lets you only create a document which is valid
to the schema.

We really wanted to put the XML authoring in the hands of the end user,
the
mass market. The user interface is in terms of menus and clicks and is
very
familiar to the end user, while still following the schema. The end user
just knows that he is clicking on a few things and sees a form being
created.

Q: How do you design a “docuform” with InfoPath?

The way it works is you define your own schema by creating an XSD
file [for instance, with Visual Studio .NET 2003] or
connecting to an XML Web Service. Then, with InfoPath, you open the schema
in design mode and with a drag-and-drop the user decides how he wants to
design the actual form. InfoPath then generates an XSLT
file,
which basically transforms the XML. We ship 25 sample forms, or the user
can also start from an empty form with drag and drop controls and we
create
the schema for him. It’s really about business processes, both
organizational business processes like human resources, sales data and
inventory, and work group business processes like status reports, sales
data collection and procurement.

Get the Free Newsletter!

Subscribe to our newsletter.

Subscribe to Daily Tech Insider for top news, trends & analysis

News Around the Web