RealTime IT News

Microsoft Beta to Make 'em Talk

Continuing its push into speech technology, Microsoft Wednesday unleashed the first public beta of its Microsoft Speech Server, and refreshed its Speech Application Software Development Kit (SASDK) with a beta 3 release.

"Speech technology is on the cusp of reaching its full potential, and we are committed to bringing it to the mainstream," said Kai-Fu Lee, corporate vice president of the Speech Technologies Group at Microsoft. "With the beta release of Microsoft Speech Server and the beta 3 release of the SASDK, we are making it easier for enterprise companies and their customers to access information."

Microsoft believes the value proposition of speech technology is clear: it stands to reduce costs associated with call center agents. A typical customer service call costs $5 to $10 to support, while an automated speech technology system can lower that to 10 cents to 30 cents per call. Additionally, speech technology can be used to give employees access to critical information while on the move.

The new Speech Server, designed to run on the Windows Server 2003 operating system, is a platform for speech application deployments. It is built on the Speech Application Language Tags (SALT) standard, which defines a set of lightweight tags as extensions to common Web-based programming languages, allowing developers to add speech functionality to existing Web applications, as well as to add prompt functionality to telephony and multimodal applications.

Microsoft has brought partners Intel and Intervoice on board to provide the server with a Telephony Interface Manager (TIM), which provides integration of the Speech Server with the Intel NetStructure communications boards, which allow for the deployment of speech processing applications. Microsoft noted that multimodal applications don't need TIM.

The key components of the new server are Speech Engine Services (SES) and Telephony Application Services (TAS).

The SES includes:

  • Speech Recognition Engine, for handling users' speech inputs
  • Prompt Engine, which takes prerecorded prompts from a database and plays them back to allow users to hear a human voice
  • Text-to-Speech Engine, which uses SpeechWorks' Speechify engine to synthesize audio output from a text string when prerecorded prompts are unavailable.

The TAS includes:

  • SALT Interpreter, which deals with all the speech interface and presentation logic, and also handles interactions between the speech application and the telephony components of the architecture
  • Media and Speech Manager, which handles requests made by SALT Interpreters to SES for speech recognition and prompt playback, and manages interfaces with the third-party TIM to deliver audio to and from the telephone user
  • SALT Interpreter Controller, which manages creation, deletion and resetting of the multiple instances of the SALT Interpreter that are managing dialogs with individual callers.

"Microsoft Speech Server is unique to the marketplace in that it is the only speech server that supports both unified telephony and multimodal applications," said Xuedong Huang, general manager of the Speech Technologies Group at Microsoft. "By building our speech technology offerings upon the open, industry-standard SALT specification, customers can use speech to access information from standard telephones and cell phones as well as GUI-based devices like PDAs, Tablet PCs and smart phones."

SASDK beta 3
The software giant also refreshed its SASDK with a third beta Wednesday, updating the SASDK beta 2 released in October 2002. The SASDK is a developer tool based on SALT and designed to integrate with the Visual Studio .NET 2003 development environment. It allows developers to write combined speech and visual Web applications in a single code base.

The new beta includes a host of new features, including:

  • Pocket Internet Explorer Bits, allowing Pocket PC access to Speech Server applications
  • Speech Application Wizard, which allows developers to create a new project in Visual Studio .NET 2003 that contains all necessary objects
  • Telephony Application Simulator, which simulates Speech Server to allow developers to deploy telephony applications on the desktop and interact with the application
  • Enhanced support for dual-tone multifrequency, or DTMF (the type of audio signals that are generated when you press the buttons on a touch-tone telephone)
  • Speech Application Controls, preset controls which manage responses containing digits and letters, like credit card numbers, expiration dates, currency amounts, ZIP codes and Social Security Numbers
  • Enhancements to Grammar Authoring, providing a flowchart view of grammars, the ability to type text for grammar phrases into grammar files, a Pronunciation Editor for unusual words, and integration into the Visual Studio .NET 2003 environment
  • Speech Controls Outline Panel, which consists of a dockable Visual Studio menu that shows users the sequence of controls in the speech application.

Speech Partner Program
Finally, Microsoft also raised the curtain on its Speech Partner Program (SPP), which is intended to provide additional revenue and profit opportunities to partners interested in developing, deploying or reselling enterprise-grade speech technology solutions based on Microsoft's technologies.

The software giant is targeting telephony value-added resellers and distributors, systems integrators, Web developers, independent software vendors, and Microsoft-certified partners with the program, giving them access to industry and Microsoft-specific events, access to special partner collateral (like advertising templates, sales tools and targeted demand-generation materials), discounted rates for Microsoft Speech Technologies training courses, placement in its SPP Resource Directory on the Microsoft.com Web site, and promotion of their products and services through Microsoft's marketing efforts.

To qualify, Microsoft said partners need to complete three training courses, including Speech Applications: Planning, VUI Design and Maintenance; Developing Speech-Enabled Web Applications Using the Microsoft Speech Application Software Development Kit; and Deploying and Administering Microsoft Speech Server.