Even Robots.txt won’t keep the googlebot away

Well am I ever surprised! I would have thought that inserting a robots.txt file that tells googlebot to “go away” would cause it to “not index the site.”

User-agent: *
Disallow: /

Instead, I discovered that the googlebot may still spot the site and then put up a message saying that the site exists but is not indexed. i.e. the Googlebot still publicizes the existence of the site. It makes Google look like the good guys and us look like the bad guys for putting up a robots.txt. Yay for Google liberating all online information! Boo for us trying to keep our site un-indexed until we’re ready to make it public.I suppose if the site is public, they reason it’s OK to mention its existence. However, most of us did not intend for any results whatsoever to show up in Google, so having it say “the site exists but I can’t index it” is a big of a revelation! Beware of this if you are creating a pre-production test site — your site may still show up in Google searches. Instead, turn on some other protection — like the “Maintenance mode” plug-in for WordPress, so that not only sites but humans can’t use the site. Here’s kind what the Google result looks like:

A description for this result is not available because of this site’s robots.txt — learn more

