RealTime IT News

W3C Unleashes VoiceXML 2.0

The World Wide Web Consortium (W3C) Tuesday delivered a cornerstone of its Speech Interface Framework when it published VoiceXML 2.0.

VoiceXML is intended to bring the advantages of Web-based development and content delivery to interactive voice response , or IVR, applications. The specification allows developers to create audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF (touch-tone) key input, recording of spoken input, telephony, and mixed initiative conversations.

"VoiceXML 2.0 has the power to change the way phone-based information and customer services are developed," said Dave Raggett, W3C Voice Browser Activity Lead. "No longer will we have to press 'one' for this or 'two' for that. Instead, we will be able to make selections and provide information by speech. In addition, VoiceXML 2.0 creates opportunities for people with visual impairments or those needing Web access while keeping their hands and eyes free for other things, such as getting directions while driving."

VoiceXML's role in the Speech Interface Framework is to control how an application interacts with the user. Speech Synthesis Markup Language (SSML) handles spoken prompts, while Speech Recognition Grammar Specification (SRGS) is used to guide speech recognizers via grammars describing expected user responses. Voice Browser Call Control (CCXML) provides telephony call control support for VoiceXML and other dialog systems, while Semantic Interpretation for Speech Recognition defines the syntax and semantics of the contents of tags in SRGS.

The inception of VoiceXML lies with an AT&T project dubbed Phone Markup Language (PML). In 1995, AT&T created an XML-based dialog design language intended to simplify the speech recognition application development process within PML. With the reorganization of AT&T, teams at AT&T, Lucent and Motorola continued working on PML-like languages. Following a W3C conference on voice browsers in 1998, AT&T, IBM , Lucent and Motorola -- all of which were developing speech-based markup languages, created the VoiceXML Forum to pool their efforts and define a standard dialog design language for building conversational applications.

VoiceXML Forum released VoiceXML 1.0 to the public in 2000, and then submitted the specification to the W3C. The specification slots into the W3C's work on the Speech Interface Framework, which would allow people to use any telephone to access appropriately designed Web-based services.

VoiceXML is similar to the recently announced Speech Application Language Tags (SALT) specification which has also been submitted to the W3C. SALT is a set of light-weight extensions to existing markup languages, particularly HTML and XHTML, that enable multimodal and telephony access to information, applications and Web services from PCs, telephones, tablet PCs and wireless personal digital assistants (PDAs). However, VoiceXML focuses on telephony application development while SALT is focused on multimodal speech application development (which, for example, would allow a PDA user to fill out a form using both voice commands and stylus, whichever is more convenient).

While the two are different, the specifications do share similar goals and may eventually converge. In fact, SALT uses key components of the Speech Interface Framework, including SRGS and the SSML.