RealTime IT News

Big Blue: Talk with a Web Site Via Phone

IBM Wednesday will drop a voice toolkit on the software world that was crafted to allow developers with little voice-recognition experience to forge speech-enabled applications that are accessible by phones and other mobile devices.

Released as a beta under Big Blue's WebSphere brand, the toolkit supports the open VoiceXML standard forged by IBM, AT&T, Motorola and Lucent Technologies Inc. In its most simple form, VoiceXML grants access to the Web through the telephone or a voice-driven browser. For example, say you are lost somewhere. To get back on the right track, you might dial the MapQuest Web site and ask for directions via your phone. The voice request activates an XML query and then the query result is converted back to a voice message to give you the information you request.

What the toolkit does, then, is make it easier for the developers who wish to work on VoiceXML to work on new applications without having years of experience with voice technology. The toolkit contains building blocks, such as VoiceXML program editors, grammar editors, pronunciation builders and VoiceXML Reuseable Dialog Components, that aid the development of voice applications and allow programmers to build the applications they want without being familiar with details, such as grammar and phonetics.

Sunil Soares, product management director of IBM Voice Systems, might have been overstating it a bit when he said you "no longer need a team of rocket scientists to build a voice application," but the fact remains that the voice front end may be easily integrated with the back end.

"This means, for example, that a developer can seamlessly link a phone-based, voice-enabled stock quote or transaction service with middleware that it is already running on," Soares said.

Clearly, a toolkit is not a shot at competition, but The Kelsey Group's Senior Vice President of Voice & Wireless Commerce Mark Plakias told InternetNews.com that IBM deserves credit for producing well-formed VoiceXML with functionality while other firms are talking the talk but not walking the walk.

"This IBM announcement is a wake-up call to the applications developer community, to say that they can generate industry-standard well-formed VoiceXML using familiar drag-and-drop tools that they have been using for Java and the widely-deployed WebSphere application server," Plakias said.

As for VoiceXML, the powers that be behind it are currently hashing out a second version. Gerald Karam, department head of Innovative Services Research at AT&T Labs, explained the major differences between version 1.0 and 2.0, noting that version 2 will feature a speech grammar markup language and a speech synthesis markup language:

"In version 1.0, we made no commitment in terms of what we thought the right answer was because, to be honest, we didn't know what the right answer was," Karam explained in the July/August issue of Speech Technology Magazine. "They were still emerging as agreed-upon specifications. In 2.0 you have a more complete description of how to specify synthesized speech. More importantly, the inclusion of a speech recognition markup language gives a common way of representing speech and DTMF grammars. That was something that was simply left unsaid in VoiceXML 1.0, which meant that a lot of vendors could have their own proprietary ways of specifying it. Those implementations would be incompatible with one another because they were assuming somebody's proprietary specification of the grammar (e.g. Nuance, Philips, SpeechWorks). This isn't the worst place to have incompatibilities because you can convert from one to another in a relatively straightforward way but it was a weak point in what we had done. Version 2.0 remedies that by coming up with a standard way of describing speech grammars."

As for how VoiceXML could help the every-day consumer, one could could tap into a movie schedule site and get start times of films. Just think -- no waiting because the theater's phone line is logjammed. Instead, the site would recite the times back as though you were listening to a recorded voice mail message.

"The human angle [to VoiceXML] is that there is going to be some magic that occurs when what you're doing is not done on the browser but on the phone," said Plakias.

Still, Plakias did say that regardless of new developments, the evolution of VoiceXML, much like the turbulent changes of second-to-third-generation wireless communications, is a "messy process" that will take some serious tweaking.

The beta of WebSphere Voice Toolkit is available as a free download through IBM's alphaWorks Web site and will be generally available in the fall.

For a quick summary of the kit, it includes the following:

  • Integrated development environment (IDE) - which runs on the desktop and provides the ability to address the elements required to create a speech application
  • VoiceXML editor - provides syntax checking and content assistance
  • Grammar editor - enables syntax checking and content assistance
  • pronunciation builder - generates a pronunciation from text as well as the ability to manually create pronunciations
  • Audio recorder - allows the creation of audio files from spoken text and provides the means to play a previously-recorded audio file
  • VoiceXML Reusable Dialog Components - pre-written VoiceXML code to use as building blocks for application functions such as names, addresses, mURLs, and credit card information

For more on the topic of VoiceXML, be sure to visit VoiceXMLPlanet.com, one of the newest sites in the internet.com family of Web sites.