RealTime IT News

IBM Takes Wraps Off Multimodal Toolkit

Continuing its efforts to capture the space for extending e-business from the desktop to mobile devices -- and get a leg up on Microsoft -- IBM Wednesday rolled out the general availability release of its Multimodal Toolkit for WebSphere Studio, a developer environment based on the XHTML+Voice (X+V) specification which allows for the creation of multimodal user interfaces.

IBM unveiled its plans for the toolkit last July, and delivered a beta in the fall.

Multimodal technology gives users the ability to interchangeably use voice commands, keypads and stylus to input information in an application. The X+V specification, submitted jointly to the World Wide Web Consortium (W3C) by IBM, Motorola and Opera Software, combines XHTML and VoiceXML.

IBM sees multimodal technology as key to the uptake of 3G wireless services because it would enable a new class of applications that will allow users to solve business problems while also using more minutes -- an important consideration for carriers. However, the firm has also noted that the technology is not dependent on 3G.

A prime example of the technology is a solution for the brokerage industry, which would allow customers to call in and request trading account balances by voice, but receive the response on their devices in text. Another example is filling in a travel form on a mobile device. The user could use voice to request a list of "New York to Paris" flights, and then use voice or stylus interchangeably to finish the booking.

"As computing continues to extend from PCs onto devices, we're expecting more from our devices," said Eugene Cox, director of mobile solutions at IBM Pervasive Computing. "X+V is based on standards that voice and Web developers are already familiar with so that enterprises can leverage invested skills to extend existing applications. Standards form the backbone of IBM's multimodal strategy and are critical in ensuring that these heterogeneous devices and communication modes work together."

In addition to creating multimodal interfaces, Big Blue said the new toolkit will allow developers to convert existing voice-only and Web-only applications into multimodal applications, providing new flexibility to devices like PDAs, phones, appliances and telematics devices in cars.

The Multimodal Toolkit is an Integrated Development Environment (IDE) based on the open source Eclipse framework. The toolkit includes:

  • A multimodal editor for writing both XHTML and VoiceXML in the same application via X+V
  • Reusable blocks of X+V code
  • A simulator, based on Opera 7 for Windows, for testing applications.

Working in conjunction with WebSphere Everyplace Multimodal Environment for Embedix, which is based on Opera technology, Multimodal Toolkit gives developers the ability to bring IBM's ViVoice advanced speech recognition (ASR) and text-to-speech (ViaVoice) engines together on one device, which in turn will allow users to obtain and manage information in the manner best suited to the situation at hand, whether spoken or visual.

Embedix, is a version of Linux geared for set-top boxes, PDAs and other small devices.

IBM is competing with fellow software development heavy-weight Microsoft for the space. Microsoft recently unveiled the third beta of its forthcoming Speech Application Software Development Kit (SASDK), based on the Speech Application Language Tags (SALT) specification and designed to work with Microsoft's Visual Studio .NET 2003 development environment.

Still, with the release of Multimodal Toolkit, IBM has the chance to capture a chunk of the market before Microsoft's offering emerges from beta.