RealTime IT News

Scrubbing Content Metadata

Officials at content collaboration software maker Workshare unveiled a software update to its metadata scrubbing software Monday, promising an end to embarrassing leaks of private information.

Content metadata bears only a passing resemblance to the type of metadata most people are aware of. Web metadata is keywords associated with a Web site. By contrast, content metadata tracks a wide variety of information on documents -- previous revisions, hidden text, user e-mail addresses, server names and routers. In all, there are about 18 to 25 different types of metadata found in most Microsoft consumer and business products, namely its Office software suite.

According to Workshare officials, collaboration via e-mail is becoming one of the biggest new trends in the workplace. When a group effort is required on a document, many employees will e-mail the latest revisions rather than use a corporate portal.

Workshare officials pointed to figures by some research firms that back up its contention. Gartner Group predicts collaboration in the enterprise will grow from 20 percent to 70 percent. IDC showed that 3.47 trillion business e-mails were sent in 2003, a figure that will grow 40 percent yearly.

"It's a dramatic shift, where 10 years ago we sat down and wrote documents by ourselves," said Amy Millard, Workshare vice president of marketing. "Today, the main method of communication for documents is e-mail, and, over time, what is contained in those documents grows."

The problem, Millard said, is when companies send those documents outside the workplace, into the hands of people who might not have the best interests of the company in mind.

Take, for example, the law firm Boies, Schiller & Flexner, which represents the SCO Group in its $5 billion lawsuit against IBM .

When the law firm announced other companies getting named in a related lawsuit, Millard said, a reporter was able to look through the content metadata and find names of other companies against which the firm was intending to file a lawsuit. That wouldn't have happened, she said, if the law firm was using Protect 3.0, which allows network administrators to set guidelines on what content metadata found in Microsoft Word, Excel and PowerPoint can leave the corporate intranet.

The same technology also removes metadata found in the e-mail clients of Microsoft Outlook, Lotus Notes and Novell GroupWise. Protect 3.0 also features a risk report feature, which scans a document and shows the user what metadata is contained within.

It should come as no surprise that Workshare's software is particularly popular in the legal community. According to officials, the company's software is found in 98 percent of the Top 250 law firms in the United States, as well as with many major corporations, like Microsoft, Wells Fargo Bank, KPMG, Ingersoll-Rand and Nokia.

With pre-set policies in place governed by outgoing e-mail rules established by the administrator, all Word, Excel and PowerPoint metadata is automatically "scrubbed" from the document before it leaves the company. The administrator can also set the policies so that every outgoing document is converted to a read-only PDF file, which will remove the possibility of anyone reading the metadata in a company document.

"It's essentially security software, and to be successful it has to not ask the end user to remember to do something," Millard said. "People have the best of intentions, but it's a burden to require your end users to enact your metadata policies."

The company also launched a resource site for companies and individuals looking for vendor-neutral information on metadata security, called Metadatarisk.org, featuring white papers, best practices and the latest news on the topic.