Search engine indexing limits

Saturday 6 May 2006

Serge Bondar did a controlled experiment to see how much text on a page matters to the big three search engines. He created very long pages with unique made-up words inserted into them at regular intervals. Then he watched the server logs to see how much data the search engines actually pulled down. Then he waited for his unique marker words to turn up in search results.

He found that Yahoo only cares about the first 210Kb, Google gets bored after 520Kb, and MSN persists through 1.1Mb. Read about the entire experiment: Search Engine Indexing Limits: Where Do the Bots Stop?

When I read the results, I worried because my blog archives are organized into monthly pages. I thought maybe they were long enough that some content wasn’t getting indexed properly. Turns out, no need to worry. My longest archive page is November 2003, at 119Kb. My complete archive listing is longer, just at Yahoo’s limit of 210Kb, but that I prevent that page from being indexed anyway, since it is just a long list of post titles.

Comments

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.