SAN JOSE, Calif. — Google isn’t only crawling your Web site, it’s looking at your source code as well. That was the introduction for Chris DiBona, the open source programs manager at Google (NASDAQ: GOOG), one of several keynote speakers here at OSCON 2009 today.
While Google Code Search has been available through Google Labs for some time, DiBona revealed some telling — and potentially — unexpected findings about the state of open source.
He said Google’s found some 2.5 billion lines of open source code spread across 30 million unique files that the search giant’s identified via its code search crawl. He then played a little quiz game with the audience, asking a series of questions about Google’s discoveries.
For example: “Is there more C or C++ open source programs?” Answer: More than twice as many are based on C.
More Perl or PHP? More PHP, by more than 37 million lines of code.
Smalltalk versus Objective C? Smalltalk wins by nearly three times as much.
Troff or Ruby? Troff by more than 88 million lines of code.
DiBona admitted there’s probably a certain amount of error in the calculations because of project duplication and other factors, but joked “we make it up in volume.”
Microsoft has been making big moves into open source lately, so its appearance at OSCON wasn’t a surprise. Tony Hey, the software giant’s corporate vice president of external research, made a pitch for how open source is currently helping the scientific community, and what it can do in the future.
“I believe there is a great opportunity for Microsoft and other companies to help scientists solve problems,” he said. “Science has to move from data to information to knowledge.”
He noted that with the increase in sensor networks and myriad other sources, scientists can easily be overwhelmed by the flood of data they have access to. That makes it important that scientists are able to use the best tools for the job, whether they are proprietary or open source-based.
“We’re trying to give people choice and that means Microsoft, open source, Google, Oracle and IBM,” Hey said. “We want to help scientists spend less time on IT issues and more on discovery.
For instance, he mentioned a Microsoft project called PhloD, a statistical tool used to analyze the DNA from large pools of HIV patients. The idea is to try to find correlations between how the HIV virus changes.
Hey said Microsoft offers PhloD as an Azure service, part of its cloud computing initiative. “Scientists can upload data to the cloud and do their analysis without intervention,” he said.
He also noted that Microsoft now offers a number of open source extensions to Office and some of its other products that make them more accessible.