RealTime IT News

Microsoft Begins Testing Speech Server 2007

Microsoft today formally opened up a public beta test of the Microsoft Speech Server 2007, a significant upgrade to its voice recognition and telephony server with a host of new features and technologies.

Speech Server 2007 is the third release of Microsoft’s speech and telephony platform but the first to feature VoiceXML , the approved industry standard markup language for voice command recognition. In previous versions, Microsoft had been championing another standard, SALT, which never received W3C approval.

Microsoft Speech Server is an interactive voice recognition (IVR) platform that provides speech and telephony systems on a single platform. Applications can be written to take either keypad or voice input for applications like customer service or support.

The new version is fully integrated with Visual Studio 2005 and uses the Windows Workflow Foundation libraries found in WinFX. As part of this integration, Speech Server 2007 comes with a Dialogue Workflow Designer built on the Windows Workflow Foundation.

In Visual Studio, a developer can make a new voice-driven application, drag an event like a prompt for input from a toolbar to the video design, attach an audio file to that action and the actions to take based on user input. All of this is done in a visual representation of the call flow without writing code.

Microsoft did this because that's how its customers were designing their voice applications in the first place. "As we talk to customers, very often design of these apps was done by a committee in Visio, and then they had to go and code it by hand," said Clint Patterson, director of product management for Microsoft Speech Server.

"So when they made changes to it, there wasn't a 1-to-1 representation. Here, when you change a prompt or response, you simple go into this app and edit it from the visual layout."

Because the application is built on the Workflow foundation, it's possible to connect the main voice-driven application to other applications. It can kick off other processes, pass information or launch other applications. "So the IVR is no longer an isolated workflow in a call center but it's interacting with other workflows," said Patterson.

Another new feature in Speech Server 2007 is a natural speech recognition and processing system called Conversational Understanding. It allows a caller to speak more naturally and the system to handle that more natural speech.

One aspect is that it knows all of the different ways people can say the same thing, like how people may place an order. Rather than hand-code all the different ways people might say "I'd like to order a pizza," Speech Server generates its own wild cards.

Complimenting this is the Grammar Design Advisor, which acts as a grammar checker for applications before they are deployed. It examines all of the voice input prompts programmers build into an application and corrects their bad grammar. That way, at least the caller has to speak properly.

Other features include native support for VoIP, including Session Initiation Protocol (SIP) and Real-Time Transport Protocol (RTP) support out-of-the-box and the Analytics Studio and Business Intelligence Tools to provide users with customized, detailed usage reports. Analytics Studio will provide a variety of predefined reports while Business Intelligence Tools provide managers with a long-term view of caller behavior.

Daniel Hong, senior voice business analyst with Datamonitor, said the speech recognition market has been lacking a big name developer like Microsoft to really promote the idea of voice-driven applications, but Microsoft could change that.

"Speech recognition is still pretty expensive and quite an investment," he said. "A lot of organizations have a hard time justifying the cost of speech. Microsoft Speech Server is significantly cheaper than other solutions out there when looking at total cost of ownership."

Speech Server 2007 is due in the fall. It will be available globally with keypad input support but language support will be limited to the U.S., U.K. and German language.