WASHINGTON — The debate over how much information Internet companies should be allowed to collect about their users, and how they handle it, is as old as the Web itself.
But as the Internet becomes the backbone for an ever-growing body of personal data, including sensitive information such as healthcare records, many observers are expecting Congress to consider an update of the 1974 Privacy Act to add provisions relating to online data collection.
In a panel discussion on privacy here at the Hyatt Regency on Capitol Hill, Mike Hintze, Microsoft’s (NASDAQ: MSFT) associate general counsel, expressed the company’s cautious support for reforms to privacy law.
“We’ve been an advocate of federal privacy legislation for a number of years, and we would actually like to see a real comprehensive approach to privacy legislation that creates some baseline standards across industries and across the country,” Hintze said.
But tech companies whose revenue streams in advertising and other areas are increasingly tied to the quality of their data-collection operations, and would be wary of privacy legislation with specific restrictive conditions for online information.
“What we are always up against is trying to make sure that we’re not stifling innovation and research capacity to be able to offer improvements that we wouldn’t have been able to foresee without the use of data,” said Anne Toth, Yahoo’s (NASDAQ: YHOO) vice president of policy and head of privacy.
“Prescriptiveness can be helpful, baselines can be helpful, but flexibility, and recognizing that technology is a very fast-moving train [is critical],” Toth added.
But privacy advocates warn that government might need to play a stronger role in ensuring that companies can’t collect sensitive data such as genetic information, as well as establishing standards for how they anonymize the data that they do collect.
“There are a lot of intriguing myths about privacy and anonymity,” said Pam Dixon, cofounder and executive director of the World Privacy Forum. “One of the myths is that if you just simply take a database — and all databases are created kind of equally — and you remove [personally identifiable information], then you’ve done your job anonymizing it. And that’s actually not true.”
Companies like Microsoft, Google (NASDAQ: GOOG) and Yahoo, which collect massive amounts of data both from their search engines and from other account-based services, make a point of touting their privacy safeguards such as deleting their server logs and deleting parts of Internet protocol addresses to ensure that their users remain anonymous.
[cob:Special_Report]Yahoo, for instance, recently made a splash when it announced that it would anonymize IP addresses after 90 days, down from its previous retention window of 13 months.
Microsoft keeps IP addresses for a longer period of time (18 months), but then deletes them entirely, rather than just erasing the final numbers in the string. Hintze also said that the company employs a technique called a one-way hash at the point search data is collected to keep it separate from any personal information it might collect.
“Giving users an assurance of their privacy is beneficial to us as a company,” Hintze said.
Page 2: A theoretical limit to privacy?
Page 2 of 2
[cob:Pull_Quote]But Dixon said that anonymization techniques often fail to live up to their name. In part, she said this stems from the glut of other information today’s Web users post about themselves online, such as profile pages on social networks like Facebook and MySpace.
In spite of Internet companies’ assurances that their users’ information cannot be traced back to them, she said enough information is available about many people on the open Web to correlate back to the supposedly anonymized databases in the event of a breach.
“If you have enough background data, the research has really shown at this point — definitively — that there’s really a theoretical limit to privacy,” Dixon said.
She said that the real questions that Web companies must answer are how much sensitive information their databases contain, as well as how that slippery term is defined, and what steps they have taken to limit the correlation problem.
The sensitive-information question becomes particularly acute when dealing with data such as patients’ healthcare records. President-elect Obama has included provisions to digitize the healthcare industry in his stimulus package Congress is currently considering, but privacy concerns loom large.
Both Microsoft and Google have launched IT healthcare initiatives, promoting them as a path to making the industry more efficient and improving the quality of care. Tomorrow, the Senate Committee on Health, Labor and Pensions will hold a hearing to consider stimulus money for online healthcare.
Earlier today, the Coalition for Patient Privacy called on Congress to include privacy protections in any IT healthcare initiative.
Wary of burdensome regulation, many Internet companies have undertaken education initiatives, seeking to provide information to their users about how what information is collected and how it is used.
The effectiveness of these education efforts is another area where businesses and privacy advocates disagree. Dixon said it was impossible to educate 300 million Americans about how their information is being used, and suggested that the government should implement standard practices for how data is stored and safeguarded.
Hintze warned that with privacy technologies still rapidly evolving, it would be counterproductive to insist that all companies adopt one standard.
[cob:Special_Report]”I think it’s useful right now as companies are grappling with this to have companies taking a different approaches,” he said. “It’s not in anybody’s interest to take an overly simplistic approach of this where a regulator would come in and say, ‘Well, we’ll take Yahoo’s time frame but Microsoft’s method.'”
That heavy-handed approach, he said, would prevent companies from achieving “some of the essential purposes of the collection of the data in the first place.” In addition to serving more targeted, relevant ads, those purposes include preventing Internet blights such as click fraud, botnets, denial-of-service attacks and other security threats.