RealTime IT News

Search vendors get canonical on results

google.logo.jpg
yahoo.search.small.gif
From the 'to WWW or not to WWW' files:

It's always a bit of a mystery to figure out if it matters whether or not you need to use 'www' in front of a domain name or not. That is www.example.com or just example.com.

Sometimes one will refer to the other and in some cases both will exist which can end up confusing search engines with duplication. Google, Yahoo and Microsoft have now teamed up for a new Search Engine standard that will provide a solution for the problem, properly referred to as a canonical domain (that is what section of the URL before the example.com). It's the new link rel="canonical" tag that can help to specify what should be indexed and how.

"When you use the tag, you can indicate the canonical URL form for crawlers to use for each page of content, no matter how it was retrieved," Priyank Garg Director Product Management
Yahoo! Search blogged. "This puts the preferred URL form with the content so that it is always available to the crawler, no matter which session id, link parameter, sort parameter, parameter order, or other source of variance is present in the URL form used to access the page."

Canonical links can also be extremely useful for sessionID tagged pages that are dynamically generated. Those types of pages tend to be difficult to index and often get a mod_rewrite (that is the webserver rewrites the address to something human readable) but it still leaves two (or more) potential addresses for the same content that a search engine could find.

Google in its discussion of the new tag gives an example that is yet another potential implementation of the link rel=canonical tag. Google's exampls uses the wikia page http://starwars.wikia.com/wiki/Nelvana_Limited which specifies its rel="canonical" as: http://starwars.wikia.com/wiki/Nelvana.
According to Google's blog post on this issue:
The two URLs are nearly identical to each other, except that
Nelvana_Limited, the first URL, contains a brief message near its
heading. It's a good example of using this feature. With
rel="canonical", properties of the two URLs are consolidated in our
index and search results display wikia.com's intended version.

This is a really interesting development from my point of view that will both add complexity and simplicity to web developers' lives.

On the one hand, we've now got greater control than ever for search engine optimization of pages. On the other hand, this is yet another way to re-write URLs which makes overall site management even more complex than before. Instead of just having URLs and then maybe a few rewritten ones, now you've got to worry about natural URLs, rewritten URLs and then canonical ones. Then again a good Sitemap could really help out there too, keeping it all straight.

Comment and Contribute