RealTime IT News

A Search Engine For Java Code

SAN FRANCISCO -- Can't find that last bit of Java code to complete your project? IBM is developing a search engine it claims will let Java developers find even the briefest code examples in a fraction of the time it now takes.

Code-named "Prospector," the engine seeks out code examples that use any or all of J2SE 1.4, Eclipse 3.0, and Eclipse GEF (Graphical Editing Framework) code. IBM is working with the U.C. Berkeley Computer Science Department to fund the venture with a fraction of its $1 billion annual developer budget.

The search engine is currently in beta testing with plans for completion by the end of the year.

Project lead Rastislav "Ras" Bodik said the search engine was necessary because of the millions of lines of code that exist in all of the Java programming environments that are out there, but with no real central repository that developers can access. The engine is also designed to help some six million Java programmers navigate and learn about object-oriented APIs .

"Imagine that a programmer is writing some Java code, that she has a URL object pointing to a JPEG file, and that she wants to display it as an image using the java.awt.Image class," Bodik said. "Unfortunately, Image is an abstract class, and it's not very obvious how to create one at all, let alone how to create one from a URL."

Prospector then searches the graph for paths from the "have" class to the "want" class and then converts the paths into legal Java source code.

"Now, she can ask Prospector for a list of code examples, pick out an example, and get back to coding," he said.

Life before Prospector was less certain, Bodik said, pointing out that it used to take him as much as three weeks to search the Web for a mere three lines of code with only limited success.

"Beyond Google, which doesn't usually find this type of code, I would be resigned to asking my smarter students what they could find," Bodik told "The difficulty is that you have so much Java code to choose from -- Sun, IBM, BEA, Eclipse --- and you only want a little piece of this one and a little piece of that one. Now, I can get the code I need and install it into my application in a couple of hours instead of a week."

While the search engine currently scours the Java world, Bodik said the team has the capacity to search out other languages, but no plans are in place at this time.

"Perhaps a version of Prospector for C# is the easiest to get up and running because it has Java attributes, but languages with C and C++ would be much harder to design and we have no plans to do so at this time," he said, hinting that he might want to take the project out of the University and launch it as a viable business with his fellow coders David Mandelin, and Lin Xu.

Bodik gave his presentation during a roundtable discussion with press at an IBM-hosted event at their offices here. The Armonk, N.Y.-based firm is celebrating its fifth anniversary of its developerWorks site ( and has added some new features designed to tap into some of the latest trends.

IBM said its newest feature is the Power Zone (which houses a forum, blog, submissions database and community feedback) for applications based on the Power Architecture. Based on IBM's Power Everywhere program, the reference material includes the new IBM eServer OpenPower 720, a POWER5 microprocessor-based server tuned specifically for Linux. The Zone also addresses Power chips for game and embedded systems development.

In addition to Power, the developerWorks site will have technical resources around wireless, embedded, speech technologies, and applications using Reusable Dialogue Components (RDCs). RDCs are pre-built speech software components, or "building blocks" that handle basic functions such as date, time, currency and locations (major cities, states, zip codes) and are used in speech-enabled infrastructure applications.