Inside Microsoft’s Next Big Thing

MOUNTAIN VIEW, Calif. — Microsoft brought its brainiacs to Silicon Valley for a road show highlighting the latest cool stuff.

Scientists from Microsoft Research labs in San Francisco and Redmond joined their colleagues at the company’s Mountain View, Calif. campus to showcase speculative projects that could someday find their way into products.

Researchers are working on everything from a Web services-based model of the universe to sneaky ways to foil spammers.

Dan Ling, vice president of Microsoft Research, told an audience of academics, entrepreneurs and business folk that while Research has only a small part of Microsoft’s hefty $7 billion R&D budget, most of the company’s products are influenced by what it does.

For example, the San Francisco lab’s statistical analysis of the Web could find its way into the new search technology Microsoft is readying to go up against Google. Jim Gray, a Microsoft Research Distinguished Engineer, said that a yearlong project to produce a statistical characterization of the Web turned up some interesting and useful trends. Microsoft Research tracked 1 billion Web pages for a year, analyzing what had changed and looking for anomalies.

By keeping track of how many Internet names mapped to the same IP address or how many other pages linked to a single Web page, the technology seems to be able to identify what Gray called “places you don’t want a search engine to go,” such as sites identified with pornography or spam. Microsoft researchers Marc Najork, Mark Manasse and Dennis Fetterly published the research and passed the information to the MSN Search team.

A new algorithm for finding the shortest route could be used for Microsoft MapPoint.Net, Gray said. In tests, author Andrew Goldberg found it delivered a 20-times improvement in time and memory for the road network of a large state. This improvement could enable shortest path routing for PDAs. It could be used to offer users real-time advise about traffic congestion or road outages, and it also could enable larger requests, such as driving directions for the shortest cross-country route.

A very long-term project, Ling said, is modular data center software, codenamed Boxwood, that could make large-capacity storage and computation systems cheaper by virtualizing storage, distributing the locking and global state to unify the system, and automating provisioning, error detection and reinitializing.

“We need to get rid of the idea that with our 1500 CPUS we’re going to have 1500 different file systems,” Ling told internetnews.com.

One area Microsoft Research is helping lead Microsoft is the company’s efforts to combat spam. “It’s of great importance to the Hotmail group which is here in Silicon Valley,” Ling said.

The stats are alarming: 23 percent of e-mail users say spam has reduced their e-mail use, while 76 percent are bothered by offensive or obscene content, and as much as 78 percent of all e-mails are spam.

“It’s something that needs to be undertaken by the community as a whole. Leading e-mail providers are starting to get together to look at common strategies,” he said.

Ling also outlined several approaches, including employing machine learning techniques to automatically identify e-mails that look like spam. With millions of Hotmail users participating in helping to train the software, Ling said, the filters can become very effective over time. Microsoft also is considering “black hole” lists and some form of “postage” that makes it more expensive to send spam, whether that’s charging money, making the computer perform a computation or giving senders a test to prove they’re human. All these could make spamming a little less economical.

The Silicon Valley Lab is working on using natural language, extending the language recognition capabilities shipped in Word 97. “Our end goal is to be able to speak in English to a machine and have it understand and respond,” Ling said. While working on that long-range goal, the group expects to identify some interesting applications in the short term. For example, they’ve built grammar parsers, which try to identify sentence structure. This could be used to build a grammar checker.

Another project — MindNet — is a semantic network. “Think of it as a bunch of senses of a particular word and relationships between those words,” Ling explained. For example, different words would link to the word bank when used to denote a financial institution than when it referred to the bank of a river.

Microsoft is bringing this to bear on automatic translation, to help with the production of its many manuals in even more languages. For its Product Support Services Knowledge Base, human editors translated five percent of the most important documents into Spanish and Japanese; the rest were done by machine, with little evident loss in customer satisfaction.

In an attempt to contribute directly to science, Microsoft Research is collaborating with Jim Mullins at the University of Washington and Simon Mallal of Royal Perth Hospital in Australia on what Ling described as “doing an AIDS vaccine in a rational manner.” Computational science can contribute by calculating the probabilities of various protein sequences appearing in the rapidly mutating HIV virus, to identify the vaccine that would produce immunity to as many strains as possible.

Microsoft Research was founded in 1991; it has facilities in San Francisco, Mountain View, Redmond, Beijing and Cambridge, UK.

“Our goal,” said Ling, “is advancing the state of the art, participating in the worldwide research community, then delivering that into the hands of Microsoft’s customers.”

Get the Free Newsletter!

Subscribe to our newsletter.

Subscribe to Daily Tech Insider for top news, trends & analysis

News Around the Web