RealTime IT News

Yahoo Makes Search Indexing Waves

There has always been at least one universal standard that all search engines, be it Yahoo, AltaVista, Google or otherwise, have respected. That is the Robots exclusion (robots.txt) file.

With robots.txt, site owners can only specify whole pages or directories to exclude. Yahoo has added a new twist, enabling site owners to specify which sections of an entire page to exclude with the new robots_nocontent tag.

"This tag is really about our crawler focusing on the main content of your page and targeting the right pages on your site for specific search queries," Yahoo Search developer Priyank Garg wrote in a blog post.

"Since a particular source is limited to the number of times it appears in the top ten, it's important that the proper matching and targeting occur in order to increase both the traffic as well as the conversion on your site. It also improves the abstracts for your pages in results by omitting unrelated text from search result summaries."

Yahoo's goal is to improve the relevancy of search, but there are a few people who disagree with Yahoo's approach.

"This seems like extra non-standardized code bloat," a commenter named 'Josh' wrote in response to Garg's post. "It should be the search engines' task to determine non-changing templated regions of a page."

The nocontent tag is supposed to help reduce the text complexity of a page by allowing a site owner to use the nocontent tag on templated areas such as navigation and ad placements.

"This tag is implying that Yahoo doesn't have the ability to determine what are templated regions of a page," Josh continued. "If you do have that ability, why add complexity?"

The other obvious issue is whether other search engines will follow suit in adopting the new nocontent tag. While Garg noted that he would want that, it is unclear if they will.

Google spokeswoman Katie Watson said the company is reviewing the possibility of supporting the nocontent tag.

"Google currently detects boilerplate text effectively, so we'll consider the overall value for users and webmasters," she said. "We're always working to improve our user experience."

Microsoft did not return a request for comment.

Cooperation between search engines on search standards has generally been a rare thing. In addition to the core robots.txt standard, which all search engines respect, Google, Yahoo and Microsoft now also respect the new sitemaps specification, which also helps crawlers find content on a Web site.