$this->articleCE->primaryUrlById(3334651) = /ec-news/article.php/3334651/Google+Moves+to+Block+RSS+Scraping.htm
Google Moves to Block RSS Scraping - InternetNews.
RealTime IT News

Google Moves to Block RSS Scraping

A Web developer's attempt to create customized RSS feeds from the popular Google News portal has run afoul of the search technology powerhouse.

Google issued a cease-and-desist order against British programmer Julian Bond with a warning that the creation of a news feed from the results of Google News was against its terms of reference.

According to Bond, the company requested the removal of RSS-powered Google News headlines from his Ecademy business networking site and made it clear Webmasters are not allowed to display headlines from Google News on third-party sites.

He posted the order on a mailing list dedicated to syndication discussions.

A Google spokesman was not immediately available for comment.

At the center of the dispute is Bond's gnews2rss, a PHP script that takes a Google News search and turns it into an RSS feed. The script allows users to enter search keywords into a field and create an RSS feed that can be used by any news aggregator.

Bond told internetnews.com he created the script a year ago to display categorized headlines at Ecademy.com. For instance, a section of the site devoted to wireless networking displays RSS-powered headlines from a variety of blogs and one from Google News with the keywords "Wi-Fi" or "WLAN" or "80211."

Bond has since removed the Google News headlines and replaced them with those powered by Yahoo but he said it was frustrating that the most popular search engine was not providing an XML output from their search results. "It's also become pretty disappointing that their SOAP API still only covers the main search engine and hasn't been extended to support the other parts of Google."

Google's Web API service, which was released last April, lets software developers query more than 2 billion Web documents directly from their own computer programs. The API allows communication via Simple Object Access Protocol (SOAP), an XML-based mechanism for exchanging typed information but it is limited to the Web search portion of Google's services.

"I find it a little strange that Google was among the first companies to use a SOAP API but they've done nothing to extend it beyond Web search. Contrast that with Yahoo, which has introduced their own RSS aggregator and provides feeds from all sections of the search. I think Google is missing a trick here," Bond said.

Although Bond has removed the Google News headlines from his Web site, he has not moved to stop others from using the code to create their own feeds. Indeed, he has released it to the open-source community, which means anyone can use it to create customized feeds for newsreaders. "The script is still up there and I'm sure people are still using it. I'm not sure how they would even know that you are scraping Google News and sending headlines to your aggregator."

He said it was disappointing that Google has not yet embraced the use of RSS on all its services. "Google is one of the greatest sources of content but that content is available in HTML . Now, they are getting upset when anyone tries to turn it around in machine-readable format. I find that rather strange."

While many in the content syndication space view Google's reluctance to embrace RSS as a strategic move to boost the competing Atom format, Bond thinks the company has simply not gotten around to adding syndication to the news portal.

He said the company sent him a reply to an e-mail query that hinted at coming changes with the Google News service. In that e-mail, Google said: "We don't currently offer an RSS or other feed, but as you may know, Google News is still in beta. We're considering a number of improvements based on feedback from our users. Given that we're still fine-tuning this service, it's too early for us to know which of the many great ideas we've received will be implemented."

The creation of Atom by developers from IBM , Google and a host of blog tools vendors has led to acrimony among software engineers. Google, through its Blogger service, has ditched RSS in favor of Atom syndication format but critics argue that the availability of competing formats is scaring away mainstream adoption of RSS.

In March this year, Dave Winer, the co-author of the RSS format proposed a merger between the two formats, insisting "it's time to bury the hatchet and move on."

Winer urged developers to put their heads together in order to come up with a backwards-compatible format that would avoid confusion and bring the two competing standards together.