RealTime IT News

Jim Gray, Microsoft Scaleable Servers Research Group

Jim Gray

Jim Gray is a distinguished engineer in Microsoft's Scaleable Servers Research Group and manager of its Bay Area Research Center. Gray is one of six Microsoft researchers based in San Francisco, three of whom work on managing personal media; the others, including Gray, concentrate on working with large data sets. He focuses on building supercomputers with commodity components, including fast networks, huge Web servers and inexpensive, very high-performance storage servers.

Gray is working with the astronomy community to create SkyServer, an effort to make the entire world's astronomy data accessible as a single distributed database on the Internet. Another project, TerraServer-USA, provides free public access to a vast data store of maps and aerial photographs of the United States. TerraServer also is available as a Web service.

Gray recently sat down with internetnews.com to discuss what's on the company's cutting edge.

Q: Your work seems to be moving toward providing data as a public utility.

Exactly. Our premise is that a lot of the excitement about grid comp is misplaced. Many people talk about outsourcing computation. Frankly, it's not very interesting to outsource computation. But people have questions they want to outsource. Most grid computing we'll see in the future will be a data grid. You'll go places for the data and not the computes.

TerraServer is an example of a Web services project, and one of our most successful subscriptions to it is MapPoint, a geo-location service. If you give MapPoint an address, it will tell you what the longitude and latitude are, and it gives you points nearby. There's a bit of enthusiasm for it, especially for cell phones. The cell phone knows where it is, so now it can ask what's the closest gas station, and how do I get there? It's already part of MSN.

Q: You've said that federating the astronomy archives presents interesting challenges for computer scientists. What are some of those challenges?

There are technical challenges and non-technical challenges. As with all things, the non-technical challenges, the people, are the biggest problems. For example, getting all the astronomers to agree there are stars and galaxies is easy. But getting them to agree to what exactly a galaxy is and what its properties are and how to measure them? Now, we're getting close to deep beliefs about the way astronomy should be done. The biggest challenge we face in all human endeavors is getting people to agree. Once people agree, it's just engineering.

Q: What are the goals of TerraServer and SkyServer? Are they to provide public access to the kinds of scientific data that used to be hidden within the scientific community?

No. It was a crass attempt to show off the scalability of Microsoft software. When I came to Microsoft, it was even truer than it is today that we were a desktop operating system and productivity tools company. We had some server software already in place -- Exchange and SQLServer were available -- but they did not get very much respect. We've been working for close to a decade to change that. One way is to dog-food your own stuff, to build large servers and experiment with them, and see what works and what doesn't work.

It's interesting to see astronomers working with PostSQL, MysSQL, Oracle DB2 and SQLServer. You get a pretty honest comparison. SQLServer's strength is that it's a fairly complete implementation and very easy to install and manage. We're learning a lot about what we could do better.

Q: You mentioned that the software to process SkySurvey data, to present the catalogs to the public and to federate it with other archives, often comprises more than 25 percent of the project's budget. Is that an opportunity for Microsoft?

It is more than 25 percent, and that's a lot of money, especially for the astronomers. The software is more expensive than the telescope, which came as a shock to me. But it's custom software, peculiar to what astronomers are doing with their data. There is some generic software in there, the operating system and some programming languages -- certainly, we're in that business -- but that is a tiny part of the whole software bill.

Q: Will Web services make that custom code more reusable?

My enthusiasm about this work, why I think it's good for Microsoft to be funding the astronomy community, is that it's a fairly vendor-neutral, open collaboration way of experimenting with some new ideas like Web services. Many people go off and implement Web services, but they can't tell you much about what they did, because it's a competitive advantage. Here, people talk about how they implement these ideas, talk about how the implementation works and show people the code, so others can copy it.

People in private industry can look at what the astronomers have done and apply it to their widgets. It's not so much that we're hoping to make a buck on the astronomers, but that we're hoping to learn what works and doesn't about the technology. SkyServer is like a sandbox we're playing in. We're building prototypes of what we think will be typical of other enterprises in the future.

Q: SkyServer seemed pretty fast to me -- and the user interface was fairly friendly. How are you addressing the presentation layer issues that still bedevil grid computing? Applications remain bandwidth hogs, even when people are pooling CPUs in a grid format.

It's a challenge. The TerraServer is fundamentally limited by how much we're willing to spend on the phone bill. We design given the phone bill. We don't do fly-throughs or show movies. We insist people click for every screen they get, and that limits how much bandwidth you can soak up. It's a significant part of the cost of running the site.

Q: What got you interested in working with large bodies of data?

I'm puzzled to this day by what existence is about. How do we know anything? When I went to college, I started as a philosophy major. Philosophers were trying to understand reasoning using predicate calculus. But thought is very complicated. Computers at their base operate in a very simple way, but it was clear even in 1960 that we were not going to be able to explain human thought with predicate calculus. I'm not sure we're any closer today to understanding how thought works, but it's a noble goal to do that and a fascinating one.