Yahoo Makes Search Indexing Waves


There has always been at least one universal standard that all search
engines, be it Yahoo, AltaVista, Google or otherwise, have respected. That is the
Robots exclusion (robots.txt) file.


With robots.txt, site owners can only specify whole pages or directories to exclude. Yahoo has added a new twist, enabling site owners to specify which
sections of an entire page to exclude with the new robots_nocontent tag.


“This tag is really about our crawler focusing on the main content of your
page and targeting the right pages on your site for specific search
queries,” Yahoo Search developer Priyank Garg wrote in a blog post.

“Since
a particular source is limited to the number of times it appears in the top
ten, it’s important that the proper matching and targeting occur in order to
increase both the traffic as well as the conversion on your site. It also
improves the abstracts for your pages in results by omitting unrelated text
from search result summaries.”


Yahoo’s goal is to improve the relevancy of search, but there are a
few people who disagree with Yahoo’s approach.


“This seems like extra non-standardized code bloat,” a commenter named
‘Josh’ wrote in response to Garg’s post. “It should be the search engines’
task to determine non-changing templated regions of a page.”


The nocontent tag is supposed to help reduce the text complexity of a page
by allowing a site owner to use the nocontent tag on templated areas such as
navigation and ad placements.


“This tag is implying that Yahoo doesn’t have the ability to determine what
are templated regions of a page,” Josh continued. “If you do have that
ability, why add complexity?”


The other obvious issue is whether other search engines will follow
suit in adopting the new nocontent tag. While Garg noted that he would want
that, it is unclear if they will.

Google spokeswoman Katie Watson said the company is reviewing the possibility of supporting the nocontent tag.

“Google currently detects boilerplate text effectively, so we’ll consider the overall value for users and webmasters,” she said. “We’re always working to improve our user experience.”


Microsoft did not return a request for comment.


Cooperation between search engines on search standards
has generally been a rare thing. In addition to the core robots.txt standard, which all
search engines respect, Google, Yahoo and Microsoft now also respect the new
sitemaps specification
, which also helps crawlers find content on a
Web site.

News Around the Web