Aiming to help businesses extend their Web presences with speech, Intel
and Microsoft
Monday announced they
are jointly developing technologies and a reference design based on the
Speech Applications Language Tags (SALT) 1.0 specification submitted to the
World Wide Web Consortium (W3C) in August.
The SALT specification defines a set of lightweight tags as extensions to
common Web-based programming languages, allowing developers to add speech
functionality to existing Web applications.
The joint effort by Intel and Microsoft will leverage Intel’s telephony
building blocks — namely Intel Architecture servers, NetStructure
communications boards and telephony call management interface software —
and Microsoft’s .NET Speech platform to give enterprise customers a set of
tools with which to build and deploy their own speech applications, and
also to give ISVs, OEMs, VARs and SIs a toolset with which to build and
deploy such applications for enterprise customers.
Intel and Microsoft said their tools will support both telephony and
multimodal applications on a range of devices.
The partners believe the value proposition of such technology is clear: it
stands to reduce costs associated with call center agents. A typical
customer service call costs $5 to $10 to support, while an automated voice
recognition system can lower that to 10 cents to 30 cents per call.
Additionally, voice recognition technology can be used to give employees
access to critical information while on the move.
Earlier this year, market research firm the Kelsey Group projected
worldwide spending on voice recognition will reach $41 billion by 2005.
But Intel and Microsoft are by no means alone in the space. They are likely
to face stiff competition from IBM , a pioneer in the voice
recognition space. In April, IBM announced it had assigned
about 100 speech researchers from IBM Research to an eight-year project
dubbed the Super Human Speech Recognition Initiative, intended to
revolutionize voice technologies.
Currently IBM offers solutions based on VoiceXML and Java, and has helped
develop a new specification, X+V (a combination of XHTML and VoiceXML)
for multimodal access. For instance, it crafted a system for investment
management firm T. Rowe Price, which allows customers to access and manage
their accounts through natural conversations by utilizing IBM WebSphere
Voice Server with Natural Language Understanding.
Smaller, specialized players, like Mountain View, Calif.-based start-up
TuVox, are also in the space. TuVox, founded by two alums of Apple Computer
uses a combination of artificial intelligence and VoiceXML to help firms
automate their technical support call centers. It has already automated
the after-hours technical support lines for both Handspring and
Activision.
But while the ball is already rolling in the voice recognition space, IBM
says there are still significant hurdles to overcome; hurdles which spurred
it to create the Super Human Speech Recognition Initiative.
Noise, punctuation and grammar, and accents all continue to pose problems
for speech recognition.