Big Blue: Talk with a Web Site Via Phone

IBM Wednesday will drop a voice toolkit on the software world that was
crafted to allow developers with little voice-recognition experience to
forge speech-enabled applications that are accessible by phones and other
mobile devices.

Released as a beta under Big Blue’s WebSphere brand, the toolkit supports
the open VoiceXML standard forged by
IBM, AT&T, Motorola and Lucent Technologies Inc. In its most simple form,
VoiceXML grants access to the Web through the telephone or a voice-driven
browser. For example, say you are lost somewhere. To get back on the right
track, you might dial the MapQuest Web site and ask for directions via your
phone. The voice request activates an XML query and then the query result is
converted back to a voice message to give you the information you request.

What the toolkit does, then, is make it easier for the developers who wish
to work on VoiceXML to work on new applications without having years of
experience with voice technology. The toolkit contains building blocks, such
as VoiceXML program editors, grammar editors, pronunciation builders and
VoiceXML Reuseable Dialog Components, that aid the development of voice
applications and allow programmers to build the applications they want
without being familiar with details, such as grammar and phonetics.

Sunil Soares, product management director of IBM Voice Systems, might have
been overstating it a bit when he said you “no longer need a team of rocket
scientists to build a voice application,” but the fact remains that the
voice front end may be easily integrated with the back end.

“This means, for example, that a developer can seamlessly link a
phone-based, voice-enabled stock quote or transaction service with
middleware that it is already running on,” Soares said.

Clearly, a toolkit is not a shot at competition, but The Kelsey Group’s
Senior Vice President of Voice & Wireless Commerce Mark Plakias told that IBM deserves credit for producing well-formed VoiceXML
with functionality while other firms are talking the talk but not walking
the walk.

“This IBM announcement is a wake-up call to the applications developer
community, to say that they can generate industry-standard well-formed
VoiceXML using familiar drag-and-drop tools that they have been using for
Java and the widely-deployed WebSphere application server,” Plakias said.

As for VoiceXML, the powers that be behind it are currently hashing out a
second version. Gerald Karam, department head of Innovative Services
Research at AT&T Labs, explained the major differences between version 1.0
and 2.0, noting that version 2 will feature a speech grammar markup language
and a speech synthesis markup language:

“In version 1.0, we made no commitment in terms of what we thought the right
answer was because, to be honest, we didn’t know what the right answer was,”
Karam explained in the July/August issue of
Speech Technology Magazine. “They were still emerging as agreed-upon
specifications. In 2.0 you have a more complete description of how to
specify synthesized speech. More importantly, the inclusion of a speech
recognition markup language gives a common way of representing speech and
DTMF grammars. That was something that was simply left unsaid in VoiceXML
1.0, which meant that a lot of vendors could have their own proprietary ways
of specifying it. Those implementations would be incompatible with one
another because they were assuming somebody’s proprietary specification of
the grammar (e.g. Nuance, Philips, SpeechWorks). This isn’t the worst place
to have incompatibilities because you can convert from one to another in a
relatively straightforward way but it was a weak point in what we had done.
Version 2.0 remedies that by coming up with a standard way of describing
speech grammars.”

As for how VoiceXML could help the every-day consumer, one could
could tap into a movie schedule site and get start times of films. Just
think — no waiting because the theater’s phone line is logjammed. Instead,
the site would recite the times back as though you were listening to a
recorded voice mail message.

“The human angle [to VoiceXML] is that there is going to be some magic that
occurs when what you’re doing is not done on the browser but on the phone,”
said Plakias.

Still, Plakias did say that regardless of new developments, the evolution of
VoiceXML, much like the turbulent changes of second-to-third-generation
wireless communications, is a “messy process” that will take some serious

The beta of WebSphere Voice Toolkit is available as a free download through
IBM’s alphaWorks Web site and
will be generally available in the fall.

For a quick summary of the kit, it includes the following:

  • Integrated development environment (IDE) – which runs on the desktop
    and provides the ability to address the elements required to create a speech

  • VoiceXML editor – provides syntax checking and content assistance
  • Grammar editor – enables syntax checking and content assistance
  • pronunciation builder – generates a pronunciation from text as well as
    the ability to manually create pronunciations

  • Audio recorder – allows the creation of audio files from spoken text
    and provides the means to play a previously-recorded audio file

  • VoiceXML Reusable Dialog Components – pre-written VoiceXML code to use
    as building blocks for application functions such as names, addresses,
    mURLs, and credit card information

For more on the topic of VoiceXML, be sure to visit, one of the newest sites in the family of Web sites.

News Around the Web