RealTime IT News

Intel, Microsoft Dip into Speech with SALT

Aiming to help businesses extend their Web presences with speech, Intel and Microsoft Monday announced they are jointly developing technologies and a reference design based on the Speech Applications Language Tags (SALT) 1.0 specification submitted to the World Wide Web Consortium (W3C) in August.

The SALT specification defines a set of lightweight tags as extensions to common Web-based programming languages, allowing developers to add speech functionality to existing Web applications.

The joint effort by Intel and Microsoft will leverage Intel's telephony building blocks -- namely Intel Architecture servers, NetStructure communications boards and telephony call management interface software -- and Microsoft's .NET Speech platform to give enterprise customers a set of tools with which to build and deploy their own speech applications, and also to give ISVs, OEMs, VARs and SIs a toolset with which to build and deploy such applications for enterprise customers.

Intel and Microsoft said their tools will support both telephony and multimodal applications on a range of devices.

The partners believe the value proposition of such technology is clear: it stands to reduce costs associated with call center agents. A typical customer service call costs $5 to $10 to support, while an automated voice recognition system can lower that to 10 cents to 30 cents per call. Additionally, voice recognition technology can be used to give employees access to critical information while on the move.

Earlier this year, market research firm the Kelsey Group projected worldwide spending on voice recognition will reach $41 billion by 2005.

But Intel and Microsoft are by no means alone in the space. They are likely to face stiff competition from IBM , a pioneer in the voice recognition space. In April, IBM announced it had assigned about 100 speech researchers from IBM Research to an eight-year project dubbed the Super Human Speech Recognition Initiative, intended to revolutionize voice technologies.

Currently IBM offers solutions based on VoiceXML and Java, and has helped develop a new specification, X+V (a combination of XHTML and VoiceXML) for multimodal access. For instance, it crafted a system for investment management firm T. Rowe Price, which allows customers to access and manage their accounts through natural conversations by utilizing IBM WebSphere Voice Server with Natural Language Understanding.

Smaller, specialized players, like Mountain View, Calif.-based start-up TuVox, are also in the space. TuVox, founded by two alums of Apple Computer uses a combination of artificial intelligence and VoiceXML to help firms automate their technical support call centers. It has already automated the after-hours technical support lines for both Handspring and Activision.

But while the ball is already rolling in the voice recognition space, IBM says there are still significant hurdles to overcome; hurdles which spurred it to create the Super Human Speech Recognition Initiative.

Noise, punctuation and grammar, and accents all continue to pose problems for speech recognition.