Net Voice, Speech Stamped as Standards


After years as working implementations, the Voice
XML 2.0 (VXML) and Speech Recognition Grammar Specifications (SRGS) won the
World Wide Web Consortium’s (W3C) seal of approval Tuesday.

The two new standards, called the Speech Interface Framework at the W3C,
have ushered in a new era of Internet/voice applications, ranging from
computer-generated information services like 555-1212 and Delta
Airlines’ ticketing to voice-activated dialing on Cingular Wireless
telephones.

The technologies tackle voice-to-Internet and vice versa using different
methods: VXML lets users say “one” or “two” into the telephone, while the
SRGS interprets “one” and “two” and lets the software application do its
work. The technologies are robust enough to distinguish a person’s
individual accent or variations (“yes” or “yeah”).

While work on Voice XML started back in 1994, the technology
didn’t get a mainstream boost until the creation of the Voice XML Forum, an
industry initiative formed by IBM , AT&T ,
Lucent and Motorola in 1999 and comprised
of more than 372 member companies today.


Stewardship of the Voice XML
technology was then passed to the W3C in 2001, and in 2002 the organization
moved
forward
with making the technology a standard.

Despite the widespread use of VXML and SRGS, the need for the adoption of a
standard and compatibility with other vendors has always been necessary,
said Brad Porter, a co-editor of VXML 2.0 and director of engineering at
TellMe.

“The reason I think Voice XML is in such great shape and the reason the W3C
has gone forward with a standard is because there has been so much market
demand already for Voice XML, that the market has dictated that things need
to be as compatible as possible,” Porter told internetnews.com.

Testing criteria for the new standards began last
October
, when the Voice XML Forum launched a beta trial of a
certification process to ensure VXML applications were compatible
throughout the industry.


Comprised of more than 700 tests, Porter said the tests are
strenuous but that most companies who already have applications
shouldn’t have a problem.


The Conformance Test Suite is available as a free
download at the organization’s Web site.

While the certification process has just completed on VXML 2.0, work is
already underway on the next generation of the technology, which focuses on
extending the power of the Speech Interface Framework in the W3C’s voice
browser group.

“We’re putting all the bricks together to create a wall or foundation, if
you will, and the bricks are now starting to fall into place,” Dave Raggett,
W3C voice browser activity lead, told internetnews.com.

Those bricks, laid out with the standardization of VXML and SRGS, will
continue with existing technologies making their way through the W3C
process. They include:

  • Speech Synthesis Markup Language (SSML), a candidate recommendation
    that lets Web browsers talk back to users
  • Call Control XML, a standalone extension to VXML that gives telephony
    services more functionality, like connecting and disconnecting, starting
    conference calls and placing outgoing calls
  • Semantic Interpretation, which gives Speech Interface Frameworks the ability
    to take different, yet similar words, and find the correct word. For
    example, saying Pepsi or Coca-Cola will assign the word to the correct tag
    in the VXML document
  • Multi-modal support — although working models are under development,
    future applications will let mobile users, for example, ask their handset
    for directions and a map with driving directions will display itself on the
    screen.

Get the Free Newsletter!

Subscribe to our newsletter.

Subscribe to Daily Tech Insider for top news, trends & analysis

News Around the Web