Showcasing its continued interest in mobile computing, IBM
Monday unwrapped new tools and middleware for the
development of multimodal technology, which allows users to utilize multiple forms of input and output — including voice, keypads
and stylus — interchangeably in the same interaction.
IBM believes the technology could be key to the uptake of 3G wireless services because it would enable a new class of applications
that will allow users to solve business problems while also using more minutes — an important consideration for carriers. However,
Sunil Soares, director of product management for IBM’s Pervasive Computing Division, noted the technology is not dependent on
A prime example of the technology is a solution for the brokerage industry, which would allow customers to call in and request
trading account balances by voice, but receive the response on their devices in text. Another example is filling in a travel form on
a mobile device. The user could use voice to request a list of “New York to Paris” flights, and then use voice or stylus
interchangeably to finish the booking.
“As computing gets embedded into our everyday lives and moves from PCs to wide varieties of devices, we’ll need new, flexible ways
to interact with technology,” said Rod Adkins, general manager of IBM Pervasive Computing. “Multimodal interaction allows us to
access enterprise applications — from databases to inventory, financial and sales information — in ways that are convenient to us,
instead of forcing us to adapt to technology. By adding multimodal capability into our middleware, we aim to make it easier for our
customers to make multimodality part of their infrastructure.”
To make that vision a reality, IBM said it will release a multimodal toolkit for developers, and add multimodal capabilities to its
WebSphere Everyplace Access (WEA).
Both offerings will be based the X+V mark-up language, a combination of XHTML and VoiceXML, developed by IBM, Motorola and Opera and
submitted to the World Wide Web Consortium (W3C) last year. The W3C acknowledged the specification in December and has formed a
Multimodal Working Group which currently has X+V under consideration.
“The beauty of X+V is it lets operate in a voice-only environment, it lets you operate in a visual-only environment, and if you want
you can operate in a multimodal environment,” Soares said.
The multimodal toolkit, built on IBM’s WebSphere Voice Toolkit, is scheduled for release in the fall. It will contain a multimodal
editor, allowing developers to write XHTML and VoiceXML in the same application; reusable blocks of X+V code; and a simulator to
test the applications. The toolkit will also provide Eclipse-based plug-ins for an existing WebSphere development environment.
IBM said the tools will allow developers to speech-enable existing or new visual applications with X+V speech tags.
Meanwhile, Big Blue plans to add multimodal capabilities to WEA by the first half of 2003, allowing businesses to enable their users
to access business applications like databases and customer relationship information applications through multimodal devices.