IBM Takes Wraps Off Multimodal Toolkit

Continuing its efforts to capture the space for extending e-business from
the desktop to mobile devices — and get a leg up on Microsoft — IBM Wednesday rolled out the general
availability release of its Multimodal Toolkit for WebSphere Studio, a
developer environment based on the XHTML+Voice (X+V) specification which
allows for the creation of multimodal user interfaces.

IBM unveiled its plans for the toolkit last July, and delivered a beta in the fall.

Multimodal technology gives users the ability to interchangeably use voice
commands, keypads and stylus to input information in an application. The
X+V specification, submitted jointly to the World Wide Web Consortium (W3C)
by IBM, Motorola and Opera Software, combines XHTML and VoiceXML.

IBM sees multimodal technology as key to the uptake of 3G wireless services
because it would enable a new class of applications that will allow users
to solve business problems while also using more minutes — an important
consideration for carriers. However, the firm has also noted that the
technology is not dependent on 3G.

A prime example of the technology is a solution for the brokerage industry,
which would allow customers to call in and request trading account balances
by voice, but receive the response on their devices in text. Another
example is filling in a travel form on a mobile device. The user could use
voice to request a list of “New York to Paris” flights, and then use voice
or stylus interchangeably to finish the booking.

“As computing continues to extend from PCs onto devices, we’re expecting
more from our devices,” said Eugene Cox, director of mobile solutions at
IBM Pervasive Computing. “X+V is based on standards that voice and Web
developers are already familiar with so that enterprises can leverage
invested skills to extend existing applications. Standards form the
backbone of IBM’s multimodal strategy and are critical in ensuring that
these heterogeneous devices and communication modes work together.”


In addition to creating multimodal interfaces, Big Blue said the new
toolkit will allow developers to convert existing voice-only and Web-only
applications into multimodal applications, providing new flexibility to
devices like PDAs, phones, appliances and telematics devices in cars.


The Multimodal Toolkit is an Integrated Development Environment (IDE) based
on the open source Eclipse framework. The toolkit includes:

  • A multimodal editor for writing both XHTML and VoiceXML in the same
    application via X+V

  • Reusable blocks of X+V code
  • A simulator, based on Opera 7 for Windows, for testing
    applications.

Working in conjunction with WebSphere Everyplace Multimodal Environment for
Embedix, which is based on Opera technology, Multimodal Toolkit gives
developers the ability to bring IBM’s ViVoice advanced speech recognition
(ASR) and text-to-speech (ViaVoice) engines together on one device, which
in turn will allow users to obtain and manage information in the manner
best suited to the situation at hand, whether spoken or visual.


Embedix, is a version of Linux geared for set-top boxes, PDAs and other
small devices.

IBM is competing with fellow software development heavy-weight Microsoft
for the space. Microsoft recently unveiled the third beta of its
forthcoming Speech Application Software Development Kit (SASDK), based on
the Speech Application Language Tags (SALT) specification and designed to
work with Microsoft’s Visual Studio .NET 2003 development environment.

Still, with the release of Multimodal Toolkit, IBM has the chance to
capture a chunk of the market before Microsoft’s offering emerges from
beta.

News Around the Web