RealTime IT News
Enterprise Search And Destroy
By Michael Hickins
January 03, 2007

Michael HickinsReporter's Notebook: New government regulations often spawn whole new markets. A far-reaching reform of the Federal Rules of Civil Procedure (FRCP) is proving to be no exception.

The reform means that electronic documents in all forms, including e-mail, instant messages and even transcripts of video conference and VoIP  calls, are fair game for litigants during the discovery phase of a lawsuit. This has given enterprise search application vendors significant new momentum.

However, allowing customers to find relevant documents among billions of pages is no mean feat: The CEO of one vendor said he told a prospective customer that it would take between 200 and 300 servers (in addition to the ones already in place) to index some 300 terabytes of data. He was informed that his competition had quoted somewhere in the range of 1,000 servers.

According to IDC analyst Sue Feldman, the field of electronic document search and retrieval has "gone from practically zilch three years ago" to achieving 30 percent growth over the past year, in large part thanks to the changes to the FRCP.

"Text mining is growing at an enormous rate," she told me.

No wonder that established enterprise search application vendors like Autonomy, Recommind and Exalead have introduced new products or tweaked existing software to meet demand for the efficient storage and retrieval of electronic communications.

The reform has also given new life to specialized vendors, such as Toronto-based Nstein, Palo Alto, Calif.-based Attensity, Attenex, based in Seattle, and Nexidia, based in Atlanta.

The reform, which took effect on Dec. 1, has also spawned a rash of new partnerships and acquisitions between traditional enterprise storage companies, such as IBM , EDS , EMC  and CA nbsp;on the one hand, and the document retrieval specialists on the other.

For instance, Exalead CEO Alain Heurtebise told me that his Paris-based company closed a deal with EDS's Italian subsidiary in 2006, and is being considered by IBM as an OEM partner for a storage and e-discovery application.

E-mail storage vendor Zantaz also recently bolstered its feature set with the acquisition of data-classification vendor Singlecast.

Paris-based business intelligence vendor Business Objects  picked Attensity to be its search partner in November, while Nstein has recently signed deals with Cognos  and Computer Sciences Corp. .

E-discovery application vendors promote their own special sauce for allowing corporate lawyers to sift through reams of data while de-duping and otherwise reducing irrelevant search results.

The challenge with searches in this context is that it isn't easy to know what you're looking for. For instance, legal beagles wouldn't have known to search for terms such as "manipulate the California energy market" at Enron without the benefit of hindsight.

According to Heurtebise, Exalead addresses this problem through what he called "serendipitous search," which allows customers to refine or redirect their searches based on an initial set of results. "The initial result sets could give you intelligence and insight about the right question to put. If you're ignorant about what you're looking for, you're obliged to go by serendipity," he said.

Next page: Digging into the words

Page 2 of 2

Michael HickinsReporter's Notebook: Autonomy, based in San Francisco and Cambridge, England, has taken a more holistic tack, and has married its enterprise search application with a corporate-compliance module that analyzes enterprise information repositories in real time, flags high-risk content and behaviors that may violate compliance policies, and provides an audit trail.

San Francisco-based Recommind uses a technology called probabilistic latent semantic analysis, which looks at words in relationship to each other in context. The newest version of its search platform includes the ability for customers to lock down given documents so that they cannot be edited or destroyed.

The company has also made the application more attractive to potential partners by improving its interoperability with other document-management application vendors.

"It's OEM-ready," in the words of Craig Carpenter, the company's vice president of marketing.

A more specialized application from transcription software vendor Nexidia lets users search voice files (from VoIP calls, for instance) using phonemics, which recognizes differences in regional accents (like the Boston accent that drops the "r" in car).

The market for these products is growing so quickly because of the key role played by discovery in the U.S. legal system: It forces plaintiffs and defendants to put their cards out on the table, forcing a settlement one way or the other. As a result of this process, fewer than 2 percent of federal cases ever go to trial, according to Cliff Shnier, vice president of the litigation group at Aon Consulting.

Companies that fail to provide these documents in a timely manner face the wrath of the courts: Morgan Stanley was saddled with a $1.45 billion settlement, in large part because the judge in the case instructed the jury to take the broker's failure to provide electronic documents as proof that the missing information was damaging to its case.

Several courts had already issued rulings with regards to electronic documents, but awards of the magnitude of the Morgan Stanley case prompted Congress to amend the FRCP to give those precedents force of law.

Changing the FRCP isn't something that happens very often; "since 1938 [when the FRCP was initially passed], the number of times it has been amended can be counted on the fingers of both hands," Shnier said.

The extent to which these rules apply to a company depend to a large extent on the industry in which it operates. For instance, brokers and other financial institutions are required to store voice transcripts, while construction companies are not.

Shnier added that companies are being encouraged to use technology in order to reduce the costs associated with discovery, and to make searching more productive.

"The courts have smiled on the idea of using search technology to cull this down," he said.

Moreover, companies don't have to maintain all their documents forever. According to Shnier, "it's highly recommended that companies have records retention policies" mandating the purging of documents after a given number of days, unless they are deemed potentially discoverable.

"When there's a potential for litigation, the duty to preserve does arise," he noted. That said, "an advisable retention policy gets rid of all inactive documents."