With the recent beta release of Microsoft Office 2003 out the
door earlier this week, many customers got their first look at what
Microsoft hopes will re-write the office productivity landscape with a new
ecosystem of collaborative functionality based on XML
will organizations have to buy into an entirely Microsoft architecture to
tap it?
That’s the contention of Gary Edwards, a Web application design consultant
and OpenOffice.org’s representative
on the OASIS Open
Office XML Format Technical Committee.
Edwards said that Office 2003 beta’s handling of the XML file format
means that firms will not be able to tap the rich collaborative features of
Open Office 2003 without resorting to proprietary Microsoft file formats. And to truly unlock its collaborative potential, firms will have to
standardize on the Windows XP operating system (Office 2003 won’t run on
Windows 9x), as well as Windows 2003 Server, SharePoint Server, Exchange
Server, etc. As for the file formats, he called Office 2003’s XML
“crippled,” because it strips XML files of all presentation and formatting
information when saving them in the XML file format. It does not do this
when saving files in Microsoft’s proprietary file formats.
“Although it’s still early in the review process, it does look as though XP
XML has been so seriously crippled as to be useless to anyone but the big
content management and collaboration system providers,” Edwards said.
“Reports are that when saving to XML, [Office 2003] strips out the
presentation and formatting information, leaving near raw content. It
appears, at least from the non-enterprise systems user’s perspective, that
all the really cool collaborative advantages are based on saving files in
the XP proprietary format. Which means that “all” the users in the
collaborative effort must be on the XP platform, using XP Office,
connecting through XP servers. What kind of universal connectivity and
exchange is that? XP users won’t even be able to collaborate equally with
the 200 million Win9x users. Not unless they upgrade.”
However, Mark McWilliams, a software engineer and Office 2003 beta tester, said he has seen nothing to indicate that Office 2003 removes formatting information from files saved in .xml. He noted that he opened a heavily formatted .doc Microsoft Word file, saved the file as XML, and later opened the file in Word 2003.
“The opened XML document looks exactly like the original .doc file,” he said. “And if I open up the XML file in a text editor, I can see that all of the formatting is properly maintained in the XML file.”
He also noted that when saving a file, a user has the option of saving in a “data only” XML format which does remove formatting.
Still, Microsoft acknowledged that it is possible that some formatting information could be lost when saving to an XML file format. “If you save something in raw XML format, you may lose some of the really rich formatting like graphics,” a Microsoft spokesman said. “That’s inherent in the way XML works though (not just in Office 2003). It saves the data and some information about how that data is being presented/formatted in order to make the information easier to manipulate, search and reuse.”
Also, Ronald Schmelzer, founder and senior analyst of XML research firm
ZapThink, noted that Microsoft’s
approach — if Edwards is correct — aligns more closely with a core tenet of XML theory: the
separation of process and data.
“The idea is for XML not to specify how the information should be
processed, but rather leave that task to XSL
processing steps,” he said. “XML is supposed to be a presentation-neutral
format.”
Still, Schmelzer said that becomes more tricky when integration goes beyond
the enterprise itself.
“I think when it goes beyond intra-business integration to cross-industry
and inter-organization integration, the question will be how much of the
data they exchange do they want loaded with presentational and operational
functionality and how much do they want to leave to the individual
implementation of the company?” he said. “This is really not an answerable
question — because it depends on the scenario. The problem with standards
is that there are so many of them. The resolution here is to look at how
companies and industries will adopt XML in their verticals and then
determine which aspects of that should be embodied in standards and which
should be embodied in products. Experience shows that companies and
industries can hardly agree on the data, let alone the representation, so
erring on the side of “less” in the XML body makes more sense.”
The Application-File Format Model
Edwards’ point depends upon some understanding of how XML is pressuring the
traditional “application-file format” model. The traditional standalone
application-file format model allows user customization via an application
programming interface (API) set by the application provider. Users could
only alter the application to their needs through the API, with access to
the API and the file format structure determined by the vendor’s
“permission” policy. But next generation XML-enabled applications could
lead to a drastic power shift by putting much of the control traditionally
reserved for the vendor into the hands of the user.
“Next generation XML-enabled desktop applications will need to march to the
beat of a different drummer,” Edwards said. “The tightly bonded
application-file format model is being replaced by a loosely-coupled three
part model comprised of the application, XML standard schema templates, and
XML standard file formats.”
With a central role is an XML standard file format, natively portable
between proprietary applications. Application providers can use the
standard format to easily configure their applications to import and export
conforming compound documents.
“Today’s clumsy import/export process has great difficulty accurately
mapping content, presentation and formatting components,” Edwards said.
“And forget about anything having to do with intelligent or live document
files that might contain business logic, routing, processing, transaction
and user interface instructions.
“Perhaps the most important factor relating to standard XML file formats is
that of human readable tags and standard processing techniques. With a
proprietary file format, users had to either get special permission from
the application vendor, or reverse engineer the binary format in order to
work with the files in ways that met their specific needs (if those needs
went beyond what the app vendor offered).”
However, Edwards said that an XML standard file format allows users to
construct scripting machines and transformation procedures, without vendor
permission. Combined with a community of developers creating tools,
machines and advanced transformation procedures based on a standard file
format, Edwards said power would shift from the application vendor to the
owner of the information.
“Some people think that XML technologies are a gnarly swarm of human
readable standards seriously lacking in the performance efficiencies of
traditional binary files,” Edwards said. “The whole point, however, is to
empower users by giving them direct access to an open file format so that
they can mine, re-use and re-purpose information any way they can think of.
Plus, the standardization of the file formats and related XML
transformation technologies means that powerful machines can be constructed
to service advanced content management and collaboration needs without
having to beg the application vendor for permission or future
enhancements.”
Schema Templates
XML schema templates are another important part of the puzzle, and one
which contains much of what has been the domain of the API. It contains
business process and processing intelligence instructions, and can contain
instructions on how an application should present the user interface and
where to access network components like Web services, data stores and Java
computation. The difference? Schema templates are created by users, like
vertical industry consortiums, rather than application vendors.
“Anywhere there is a defined business process, transaction process, or
collaborative effort, there will probably be a shared schema template
defining that process,” Edwards said. “In particular, vertical industries
and global business trading partners are perfecting schema templates of all
sorts, in efforts to streamline the transaction, exchange and interaction
between disparate information systems.”
He added, “Using XML, businesses can describe a transaction process in
terms that are machine readable and actionable. XML information conforming
to open standards can easily be transformed or translated from a shared
business process, back and forth, into the many disparate information
systems, enabling these legacy systems and data silos to directly
participate and interact at the point of transaction.
“Prior to this evolution, the only way to effectively interact and exchange
information was to standardize on a specific platform, using specific
applications (including exact version synchronization), and specific file
formats. Literally everyone had to agree on the same proprietary stack, top
to bottom.”
But XML schemas can eliminate that need by allowing organizations to keep
legacy systems while still connecting and collaborating with anyone and
everyone.
Pulling it Together
The essential point to all that, according to Edwards, is that the schema
format actually determines the structure of the file format. For this to
work, he said, the application must be able to pass on that schema template
intact — which it can’t do if the file is stripped of presentation and
formatting information.
Schmelzer, however, said this may be a bit of an over-reaction, given the
fact that most collaboration and integration is still happening internal to
the enterprise.
“XML and Web services use, especially for content-driven applications, is
still very much limited to basic use of XML as a data-exchange mechanism
between systems — primarily for internal integration approaches,” he said.
“When dealing with exchanging information internally, what is most
important is not to bundle all collaborative features into making for a
huge, cumbersome XML file that only certain applications can process, but
rather to strip out all the presentation layer features and focus on just
the data to be exchanged. In this case, I don’t see how Microsoft is
violating that. You can choose to save a document with all the rich
presentation data left in, if you choose (and that data will only be
processable by Office applications), or you can choose to save the XML with
just the data in it. I don’t see how that cripples anything.”