Nelson Mattos, Distinguished Engineer, IBM

Jeff Hawkins
The way IBM sees it, the market for enterprise search —
where employees sift through networks to find very specific information —
is wide open.

The company has hundreds of software engineers dedicated to various research projects that blend corporate search with software
integration to make life easier for employees.

But Big Blue’s search and integration efforts have managed to find their way
into actual products in the company’s Information Integration line, which
boasts 1,700 customers. Launched
last year and code-named Masala, Information Integrator is one key product
that embodies some of the search capabilities the programmers have whipped

To dovetail more effectively with its push for the integration of different
software components, IBM has moved Information Integrator into its WebSphere
line from the DB2 line.

Later this year, the company will issue a
second version of the software, code-named Serrano, which Nelson Mattos,
distinguished engineer and vice president of IBM’s Information Integration
division, discussed with

Q: Is IBM seeing a big uptick in the demand for integration software?

As we predicted we have seen a dramatic increase in the interest for
integration software. The Information Integration division had an extremely
successful year in 2004. We have 1,700 customers using the portfolio right
now. A little over 350 were basically brand-new customers that did not have
previous integration technology from IBM. Forty percent of these 350
customers did not have our DB2 database, which shows that integration
is a problem that is spread across the industry. It’s a problem that
many customers face.

Q: Why is IBM increasingly incorporating enterprise search into its
Information Integration products?

Information Integration focuses on the customer challenges of easily
accessing business information that is spread across the enterprise,
different operating systems, vendor platforms, application packages and
silos of information. One of the things we heard from our customers when we
started to address information integration was that, in addition to providing
interfaces that business applications can use to address content-centric
problems, it was important to also have interfaces that a normal human being
will be able to use to find information across the enterprise.

So, we brought to market WebSphere Information Integrator OmniFind Edition.
It basically brings search technology to the enterprise, making it easy for
end users to find relevant corporate integration, independent of where it is
stored. It could be in DB2 database, Oracle database, SQL Server, a content
repository from IBM, FileNet or Documentum. It could be in e-mail systems
from IBM or Microsoft, in flat files in HTML or Web agents. IBM is
addressing one of the biggest pain points customers have today as a
consequence of the explosion of data, which is the fact that their employees
are spending 30 percent of their time looking for corporate information to
get their job done.

Q:What’s the next leg for Information Integrator?

The next major release of Information Integrator is code-named Serrano,
which stands for a very hot chili pepper that you’ll find in some salsas.
Serrano is going to focus on three major pain points we hear about from
customers. The first one is the difficulty in understanding the information
assets they have. Information Integrator allows customers to connect to all
the different information repositories. But there is still a need for
customers to understand what is stored in different databases and
repositories and how the data that is stored there relates to data
sitting in other systems.

Serrano will deliver intelligent tools. So, it will tell you that you have a customer
table in an Oracle database and that you have a sales transaction table in
Microsoft SQL Server and a warehouse of historical data about your business.
And you recognize that the customer table is related to the sales
transaction table and that the historical data is related to both of them.
This is going to increase the return of investment of customers in the
integration platform, because it simplifies the development of applications.

Q: What will you be doing to upgrade the search capabilities in

Serrano will have a major extension of OmniFind, which provides high-relevancy search results in sub-second response time. We’re expanding
OmniFind by leveraging the UIMA [Unstructured Information Management
Architecture], which standardizes the application of text mining and text
analytics abilities in a pluggable framework that allows ISVs, partners and
customers to develop specialized analyses.

Serrano will be able to extract
meaning from the stored documents independent from where or in what language
they are stored. We will also be opening up the UIMA architecture
for partners so they can build solutions that will extract underlying

This will help us expand our leadership in the integration of unstructured
content because customers will be able to create a brand new set of
applications that will be able to make conclusions from analyzing large
numbers of documents. For example, you’ll be able to analyze millions of
patient case histories to more quickly discover how they react to different
drugs and treatments. Customers can also use the technology to analyze call
center transcripts for early detection of potential defects in their

Q: Some folks in the industry compared your efforts to what Google and
Yahoo do with regard to search. What is the difference between what you are
trying to provide for corporations and the search those companies provide?

They’re totally different markets. Google and Yahoo do Internet search,
which has a lot of things that make it easy for you to search the Web. One
key difference is that, in the Internet, all of the information is public.
You don’t have to deal with security issues, or access-control issues with
regard to search. But enterprise information is maintained in a very secure
way. I use our own products internally, and very few folks in my organization
have the access controls to see the performance of our business. So, how do
you make sure that enterprise search offerings address security issues?

Second, in the Web, everything is pretty much HTML or XML documents. In the
enterprise environment, you have data in databases, legacy systems, content
repositories and e-mailing systems, and you also have documents in the file
system — HTML pages, etc. You are dealing with a variety of document
types spread across many different platforms.

Third, the kind of algorithms that you do in the Internet to discover what
is relevant for the questions that are asked are totally different than
those that you would use in the enterprise. In the Internet, everything has
links between pages I can use to find relevant info. Documents in a
database, content repository or e-mail system don’t have links to each
other. I can’t possibly use that to infer what is relevant.

Now, the
proliferation of search engines was extremely beneficial to make search
technology accessible to everyone in the world. However,
the big problem is not in the Internet but on the corporate side where,
again, employees are spending 30 percent of their time looking for relevant
information. That is what is causing customers to lose sleep. And that is
the focus area for us at IBM. We are focused on how to make it easy for
corporations to find relevant information and real-time insight to their

News Around the Web