RealTime IT News

Net Voice, Speech Stamped as Standards

After years as working implementations, the Voice XML 2.0 (VXML) and Speech Recognition Grammar Specifications (SRGS) won the World Wide Web Consortium's (W3C) seal of approval Tuesday.

The two new standards, called the Speech Interface Framework at the W3C, have ushered in a new era of Internet/voice applications, ranging from computer-generated information services like 555-1212 and Delta Airlines' ticketing to voice-activated dialing on Cingular Wireless telephones.

The technologies tackle voice-to-Internet and vice versa using different methods: VXML lets users say "one" or "two" into the telephone, while the SRGS interprets "one" and "two" and lets the software application do its work. The technologies are robust enough to distinguish a person's individual accent or variations ("yes" or "yeah").

While work on Voice XML started back in 1994, the technology didn't get a mainstream boost until the creation of the Voice XML Forum, an industry initiative formed by IBM , AT&T , Lucent and Motorola in 1999 and comprised of more than 372 member companies today.

Stewardship of the Voice XML technology was then passed to the W3C in 2001, and in 2002 the organization moved forward with making the technology a standard.

Despite the widespread use of VXML and SRGS, the need for the adoption of a standard and compatibility with other vendors has always been necessary, said Brad Porter, a co-editor of VXML 2.0 and director of engineering at TellMe.

"The reason I think Voice XML is in such great shape and the reason the W3C has gone forward with a standard is because there has been so much market demand already for Voice XML, that the market has dictated that things need to be as compatible as possible," Porter told internetnews.com.

Testing criteria for the new standards began last October, when the Voice XML Forum launched a beta trial of a certification process to ensure VXML applications were compatible throughout the industry.

Comprised of more than 700 tests, Porter said the tests are strenuous but that most companies who already have applications shouldn't have a problem.

The Conformance Test Suite is available as a free download at the organization's Web site.

While the certification process has just completed on VXML 2.0, work is already underway on the next generation of the technology, which focuses on extending the power of the Speech Interface Framework in the W3C's voice browser group.

"We're putting all the bricks together to create a wall or foundation, if you will, and the bricks are now starting to fall into place," Dave Raggett, W3C voice browser activity lead, told internetnews.com.

Those bricks, laid out with the standardization of VXML and SRGS, will continue with existing technologies making their way through the W3C process. They include:

  • Speech Synthesis Markup Language (SSML), a candidate recommendation that lets Web browsers talk back to users
  • Call Control XML, a standalone extension to VXML that gives telephony services more functionality, like connecting and disconnecting, starting conference calls and placing outgoing calls
  • Semantic Interpretation, which gives Speech Interface Frameworks the ability to take different, yet similar words, and find the correct word. For example, saying Pepsi or Coca-Cola will assign the word to the correct tag in the VXML document
  • Multi-modal support -- although working models are under development, future applications will let mobile users, for example, ask their handset for directions and a map with driving directions will display itself on the screen.