About a month ago, I helped create a development site for a company website. The purpose of the “dev” site was to act as a live duplicate of the original site so that my team could test different plugins on this cloned site instead of toying with the main website or relying on a local version.
As is standard practice, I made sure to check the mystical Search Engine Visibility box in WordPress under Settings > Reading > Discourage search engines from indexing this site. There is a small disclaimer under this box that says, “It is up to search engines to honor this request.” And, that’s where the trouble began.
This disclaimer is why I originally needed to learn how to deindex pages from Google.
How to Deindex a Site from Google
I’m not a rookie when it comes to SEO. In fact, I consider myself to be very knowledgeable.
I know that clicking that little search engine visibility box in WordPress adds the meta noindex tag to the header of all pages on a site.
<meta name="robots" content="noindex">
However, meta robots is more of a suggestion than a directive. Search engine bots have to crawl your pages to see the meta robots tag, and they decide whether or not to follow your suggestion at that time.
I wasn’t aware that the development site was being indexed until one of our internal marketing meetings. I let the team know about the dev site, and that we were testing some new functionality.
Of all people, the owner stops me to ask, “Is it noindexed?”
Of course I said yes. I had even checked that the meta noindex was added to the pages myself. He did a brief site: search to check if the website was showing up on Google. Sure enough, 40 pages from the dev site were being indexed.
In case you’re ever in this exact situation, the first step is the put your head in your hands and moan, “Nooooooo,” while cringing in embarrassment.
The second step is to deindex the site from Google.
1. Use Robots.txt
I was concerned about disallowing all pages in the robots.txt file for the site. After all, how will Google know to crawl the pages and see that we’re requesting to be nofollowed if Google isn’t able to crawl them at all?
I realized that Google’s spiders had already been given that chance and ignored my meta noindex tag, so that wasn’t a factor.
I immediately added the following directive to the dev site’s robots.txt:
User-agent: * Disallow: /
This blocks all bots from crawling a site. However, this alone will not remove a page from Google.
2. Verify Search Console
I wanted to avoid adding another property to Search Console, but this is the next step.
I wanted to be able to use the Remove URLs tool within Google Search Console, so I had to verify the dev site.
3. URL Removal Tool in Google
Navigate to Google Index > Remove URLs, and follow the instructions.
You can only remove pages that live on the subdomain of your Google Search Console property. This is sure to change (if it hasn’t already) in the new version of Search Console, where it is said to be easier to verify all subdomains and versions of a domain in one sweep.
With the Remove URLs tool, you can request removal of individual pages, whole folders, or your entire subdomain. I chose the whole subdomain, but you can change this based on your needs.
4. Verify that Pages Are Deindexed
The last step to remove old pages from Google search results is simply checking to see that your pages were removed.
You can use Google Advanced Search Operators to check your results.
For example, I used the site:dev.mydomain.com operator to verify that the entire development subdomain had been removed.
If you’re checking that a specific page has been removed, you can use info:mydomain.com/specific-page/. If your search results are blank, that means your page is no longer in the index, and you’ve succeeded.
Read more: