Sizing Up the Search Competition

Science has proven that, if you’re searching for “wainscoted rouses,” Google is the place to look.

Matthew Cheney and Mike Perry, two graduates of the University of Illinois Library Science program, devised a test for the National Center for Supercomputing Application (NCSA) to see whose index was bigger — Google’s or Yahoo’s.

The latest round of posturing was the result of Yahoo’s Aug. 8 announcement, via a blog entry by Tim Mayer, head of Yahoo Search, and e-mails to journalists, that its index had gotten fatter.

“While we typically don’t disclose size (since we’ve always said that size is only one dimension of the quality of a search engine),” Mayer wrote, “for those who are curious this update includes just over 19.2 billion Web documents, 1.6 billion images, and over 50 million audio and video files.”

Google disagreed. Its engineers dug into Yahoo Search and didn’t feel that the searches they did supported Yahoo’s claim.

“As of now, we have not been able to verify a substantial increase to Yahoo’s Web index via their search results,” said Google spokesman Nathan Tyler. Yahoo didn’t respond to several requests for comment.

Both Cheney and Perry have friends who work at Google. One of Cheney’s friends sent him links to blogs discussing the controversy. “We wanted to try to bring an independent analytical study to help shed light on whether the Yahoo claim was true,” Cheney said.

They used a standard list of English words to automatically generate two-word queries to both Google and Yahoo. They deleted queries that returned more than 1,000 results, because both search engines return only that many actual results — despite what they say the total number of results is. They conducted 10,012 searches over 18 hours.

They found that Yahoo returned an average of 37.4 percent of the results that Google did, and in some cases where Google returned results, Yahoo returned none.

“What we found most surprising was that the results were so obviously in favor of Google,” Cheney said. “In 96 or 97 percent of the 10,000 searches we ran, Google returned more results. We expected it to be somewhat comparable, but [these were] blow-away numbers.”

The words used in the test were obscure, indeed. They included the non-conjoined queries centerable’s heterolecithal; or’s depigmentation; and neep Edenization’s. For each of those queries, Yahoo returned no results. Google did return results, but they were all pages containing the very list used in the test.

Cheney didn’t think eliminating queries that returned more than 1,000 results might have skewed the comparison.

“The best way to conduct the test would have been to study all searches,” said Cheney. “But we had to turn to the obscure searches to get numbers we needed. I don’t think it’s a bad way to approach it; I think most search engines will cover the popular areas of the Web. It’s these obscure things on the far reaches of the Internet that define how good a search engine is.”

However the test log files show that in the first 15 searches the study eliminated, Yahoo returned a higher number of results than Google in 13 of them.

The test made two assumptions: The first was that the Yahoo and Google search engines return all the results that match the particular keywords and don’t do any filtering beyond removing duplicate results.

The second was that if Yahoo’s index contained more than twice as many documents as Google’s, the NCSA test should also return more than twice as many results from Yahoo than from Google.

But those were only assumptions, said Gary Price, news editor of Search Engine Watch.

“Without knowing what specifically Google and Yahoo consider a result, and what criteria they use to determine what goes on a results page, it’s very difficult to compare apples to apples,” Price said.

Size Sort of Matters

Brad Hill, author of a number of books on search, including Internet Searching for Dummies, and editor of the Unofficial Google Weblog, said that the size of the index is irrelevant for most people.

“To the average casual retail user of search, the size distinctions are meaningless and invisible. Most people hit search engines quickly and move on fast,” he told internetnews.com. Because they don’t use sophisticated keyword strings and seldom look beyond the first page or two of results, they’d never find those obscure Web documents anyway, he added.

Nevertheless, Hill wouldn’t call any query irrelevant. “Google and Yahoo both serve a very long tail,” he said. “The idea is that no matter how obscure your query is, you’ll get a useful answer.”

Brian Bowman, vice president of marketing and product management for InfoSpace , which operates the meta search engine Dogpile.com, said you need to find the information that’s less obvious when you get into how people search, using personalized, somewhat obtuse queries. “So, I do believe the depth at which people index is important.”

Bowman said that searches on Dogpile.com are extraordinarily diverse. About 50 percent in a given month are unique, while only a very small percentage of queries are duplicated in any one week. “There’s an enormous volume of queries that are new and different,” Bowman said.

While the Illinois study dealt with gross volume of search results, a study conducted by researchers at the University of Pittsburgh and Pennsylvania State University for Dogpile.com found a significant lack of overlap among search results. Dogpile.com lets users search the top four search engines at once, delivering a mix of results from Google, Yahoo, Ask Jeeves and MSN.

When the Pennsylvania researchers ran 12,570 different queries through search engines at Yahoo, Google, MSN and Ask Jeeves , they found that only 1.1 percent of the results appeared on the first results pages of all four engines, while 84.9 percent of the top results were unique to one engine.

In terms of unique results, Google had the lowest percentage, at 66.4 percent. Yahoo, MSN and Ask Jeeves all were within 3.1 percent of each other, with Yahoo having the highest percentage of unique results at 71.2 percent.

While size confers bragging rights, it’s not the size of the index, but what you do with it that counts. And in that respect, both companies have what it takes. A study by the University of Michigan released today found that Google and Yahoo were close to equal in customer satisfaction.

According to Price, the size comparisons have more to do with public relations than with engineering.

“Total size numbers from all engines are just claims. They’ve always just been claims,” he wrote in the Search Engine Watch blog. “To move beyond this,” he wrote, “some type of agreed upon standards and methods are needed. Otherwise, this week’s headlines will likely happen over and over again.”

Get the Free Newsletter!

Subscribe to our newsletter.

Subscribe to Daily Tech Insider for top news, trends & analysis

News Around the Web