RealTime IT News

Nelson Mattos, Distinguished Engineer, IBM

Jeff Hawkins The way IBM sees it, the market for enterprise search -- where employees sift through networks to find very specific information -- is wide open.

The company has hundreds of software engineers dedicated to various research projects that blend corporate search with software integration to make life easier for employees.

But Big Blue's search and integration efforts have managed to find their way into actual products in the company's Information Integration line, which boasts 1,700 customers. Launched last year and code-named Masala, Information Integrator is one key product that embodies some of the search capabilities the programmers have whipped up.

To dovetail more effectively with its push for the integration of different software components, IBM has moved Information Integrator into its WebSphere line from the DB2 line.

Later this year, the company will issue a second version of the software, code-named Serrano, which Nelson Mattos, distinguished engineer and vice president of IBM's Information Integration division, discussed with

Q: Is IBM seeing a big uptick in the demand for integration software?

As we predicted we have seen a dramatic increase in the interest for integration software. The Information Integration division had an extremely successful year in 2004. We have 1,700 customers using the portfolio right now. A little over 350 were basically brand-new customers that did not have previous integration technology from IBM. Forty percent of these 350 customers did not have our DB2 database, which shows that integration is a problem that is spread across the industry. It's a problem that many customers face.

Q: Why is IBM increasingly incorporating enterprise search into its Information Integration products?

Information Integration focuses on the customer challenges of easily accessing business information that is spread across the enterprise, different operating systems, vendor platforms, application packages and silos of information. One of the things we heard from our customers when we started to address information integration was that, in addition to providing interfaces that business applications can use to address content-centric problems, it was important to also have interfaces that a normal human being will be able to use to find information across the enterprise.

So, we brought to market WebSphere Information Integrator OmniFind Edition. It basically brings search technology to the enterprise, making it easy for end users to find relevant corporate integration, independent of where it is stored. It could be in DB2 database, Oracle database, SQL Server, a content repository from IBM, FileNet or Documentum. It could be in e-mail systems from IBM or Microsoft, in flat files in HTML or Web agents. IBM is addressing one of the biggest pain points customers have today as a consequence of the explosion of data, which is the fact that their employees are spending 30 percent of their time looking for corporate information to get their job done.

Q:What's the next leg for Information Integrator?

The next major release of Information Integrator is code-named Serrano, which stands for a very hot chili pepper that you'll find in some salsas. Serrano is going to focus on three major pain points we hear about from customers. The first one is the difficulty in understanding the information assets they have. Information Integrator allows customers to connect to all the different information repositories. But there is still a need for customers to understand what is stored in different databases and repositories and how the data that is stored there relates to data sitting in other systems.

Serrano will deliver intelligent tools. So, it will tell you that you have a customer table in an Oracle database and that you have a sales transaction table in Microsoft SQL Server and a warehouse of historical data about your business. And you recognize that the customer table is related to the sales transaction table and that the historical data is related to both of them. This is going to increase the return of investment of customers in the integration platform, because it simplifies the development of applications.

Q: What will you be doing to upgrade the search capabilities in Serrano?

Serrano will have a major extension of OmniFind, which provides high-relevancy search results in sub-second response time. We're expanding OmniFind by leveraging the UIMA [Unstructured Information Management Architecture], which standardizes the application of text mining and text analytics abilities in a pluggable framework that allows ISVs, partners and customers to develop specialized analyses.

Serrano will be able to extract meaning from the stored documents independent from where or in what language they are stored. We will also be opening up the UIMA architecture for partners so they can build solutions that will extract underlying meaning.

This will help us expand our leadership in the integration of unstructured content because customers will be able to create a brand new set of applications that will be able to make conclusions from analyzing large numbers of documents. For example, you'll be able to analyze millions of patient case histories to more quickly discover how they react to different drugs and treatments. Customers can also use the technology to analyze call center transcripts for early detection of potential defects in their products.

Q: Some folks in the industry compared your efforts to what Google and Yahoo do with regard to search. What is the difference between what you are trying to provide for corporations and the search those companies provide?

They're totally different markets. Google and Yahoo do Internet search, which has a lot of things that make it easy for you to search the Web. One key difference is that, in the Internet, all of the information is public. You don't have to deal with security issues, or access-control issues with regard to search. But enterprise information is maintained in a very secure way. I use our own products internally, and very few folks in my organization have the access controls to see the performance of our business. So, how do you make sure that enterprise search offerings address security issues?

Second, in the Web, everything is pretty much HTML or XML documents. In the enterprise environment, you have data in databases, legacy systems, content repositories and e-mailing systems, and you also have documents in the file system -- HTML pages, etc. You are dealing with a variety of document types spread across many different platforms.

Third, the kind of algorithms that you do in the Internet to discover what is relevant for the questions that are asked are totally different than those that you would use in the enterprise. In the Internet, everything has links between pages I can use to find relevant info. Documents in a database, content repository or e-mail system don't have links to each other. I can't possibly use that to infer what is relevant.

Now, the proliferation of search engines was extremely beneficial to make search technology accessible to everyone in the world. However, the big problem is not in the Internet but on the corporate side where, again, employees are spending 30 percent of their time looking for relevant information. That is what is causing customers to lose sleep. And that is the focus area for us at IBM. We are focused on how to make it easy for corporations to find relevant information and real-time insight to their business.