Continuing its push into speech technology, Microsoft
Wednesday unleashed the first public beta of its Microsoft Speech Server,
and refreshed its Speech Application Software Development Kit (SASDK) with
a beta 3 release.
“Speech technology is on the cusp of reaching its full potential, and we
are committed to bringing it to the mainstream,” said Kai-Fu Lee, corporate
vice president of the Speech Technologies Group at Microsoft. “With the
beta release of Microsoft Speech Server and the beta 3 release of the
SASDK, we are making it easier for enterprise companies and their customers
to access information.”
Microsoft believes the value proposition of speech technology is clear: it
stands to reduce costs associated with call center agents. A typical
customer service call costs $5 to $10 to support, while an automated speech
technology system can lower that to 10 cents to 30 cents per call.
Additionally, speech technology can be used to give employees access to
critical information while on the move.
The new Speech Server, designed to run on the Windows Server 2003 operating
system, is a platform for speech application deployments. It is built on
the Speech Application Language Tags (SALT) standard, which defines a set
of lightweight tags as extensions to common Web-based programming
languages, allowing developers to add speech functionality to existing Web
applications, as well as to add prompt functionality to telephony and
multimodal applications.
Microsoft has brought partners Intel
The SES includes: and Intervoice on
board to provide the server with a Telephony Interface Manager (TIM), which
provides integration of the Speech Server with the Intel NetStructure
communications boards, which allow for the deployment of speech processing
applications. Microsoft noted that multimodal applications don’t need TIM.
The key components of the new server are Speech Engine Services (SES) and
Telephony Application Services (TAS).
plays them back to allow users to hear a human voice
synthesize audio output from a text string when prerecorded prompts are
unavailable.
The TAS includes:
- SALT Interpreter, which deals with all the speech interface and
presentation logic, and also handles interactions between the speech
application and the telephony components of the architecture - Media and Speech Manager, which handles requests made by SALT
Interpreters to SES for speech recognition and prompt playback, and manages
interfaces with the third-party TIM to deliver audio to and from the
telephone user - SALT Interpreter Controller, which manages creation, deletion and
resetting of the multiple instances of the SALT Interpreter that are
managing dialogs with individual callers.
“Microsoft Speech Server is unique to the marketplace in that it is the
only speech server that supports both unified telephony and multimodal
applications,” said Xuedong Huang, general manager of the Speech
Technologies Group at Microsoft. “By building our speech technology
offerings upon the open, industry-standard SALT specification, customers
can use speech to access information from standard telephones and cell
phones as well as GUI-based devices like PDAs, Tablet PCs and smart
phones.”
SASDK beta 3
The software giant also refreshed its SASDK with a third beta Wednesday,
updating the SASDK beta 2 released in October 2002. The SASDK is a developer tool based on SALT and designed
to integrate with the Visual Studio .NET 2003 development environment. It
allows developers to write combined speech and visual Web applications in a
single code base.
The new beta includes a host of new features, including:
- Pocket Internet Explorer Bits, allowing Pocket PC access to Speech
Server applications - Speech Application Wizard, which allows developers to create a new
project in Visual Studio .NET 2003 that contains all necessary objects - Telephony Application Simulator, which simulates Speech Server to allow
developers to deploy telephony applications on the desktop and interact
with the application - Enhanced support for dual-tone multifrequency, or DTMF (the type of
audio signals that are generated when you press the buttons on a touch-tone
telephone) - Speech Application Controls, preset controls which manage responses
containing digits and letters, like credit card numbers, expiration dates,
currency amounts, ZIP codes and Social Security Numbers - Enhancements to Grammar Authoring, providing a flowchart view of
grammars, the ability to type text for grammar phrases into grammar files,
a Pronunciation Editor for unusual words, and integration into the Visual
Studio .NET 2003 environment - Speech Controls Outline Panel, which consists of a dockable Visual
Studio menu that shows users the sequence of controls in the speech
application.
Speech Partner Program
Finally, Microsoft also raised the curtain on its Speech Partner Program
(SPP), which is intended to provide additional revenue and profit
opportunities to partners interested in developing, deploying or reselling
enterprise-grade speech technology solutions based on Microsoft’s
technologies.
The software giant is targeting telephony value-added resellers and
distributors, systems integrators, Web developers, independent software
vendors, and Microsoft-certified partners with the program, giving them
access to industry and Microsoft-specific events, access to special partner
collateral (like advertising templates, sales tools and targeted
demand-generation materials), discounted rates for Microsoft Speech
Technologies training courses, placement in its SPP Resource Directory on
the Microsoft.com Web site, and promotion of their products and services
through Microsoft’s marketing efforts.
To qualify, Microsoft said partners need to complete three training
courses, including Speech Applications: Planning, VUI Design and
Maintenance; Developing Speech-Enabled Web Applications Using the Microsoft
Speech Application Software Development Kit; and Deploying and
Administering Microsoft Speech Server.