Yahoo Blocking Bots from Spidering Delicious Bookmarks
Over the weekend, Yahoo’s Delicious (del.icio.us) social bookmarking property has been blocking spiders and bots from non-Yahoo search engines from crawling the site and identifying new web pages, sites and bookmarks.
Colin Cochrane found this out the other day, saying that ‘This isn’t a simple robots.txt exclusion, but rather a 404 response that is now being served based on the requesting User-Agent.’
I took a look at del.icio.us’ robots.txt and found that it was disallowing Googlebot, Slurp, Teoma, and msnbot for the following:
Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rssSeeing that the robots.txt was blocking these search engine spiders, I tried accessing del.icio.us with my User-Agent switcher set to each of the disallowed User-Agents and received the same 404 response for each one.
Colin also found that Delicious pages listed in Google are lacking a cache, title, description and other information.
Why would Yahoo do this?
Yahoo has a competitive advantage over Google, MSN and Ask.com by being able to identify web pages and other content via human bookmarking on Delicious before search engine bots can. Yahoo can also classify web documents via human descriptions and tagging, lending external meta data to these documents which can result in more relevant web results and intent targeted rankings.
(more…)