Loading Recent Tweets...
Sign up
Popular Today in Social: All Popular Articles

Resolving Critical Crawl Errors With Google Webmaster Tools

If you have ever clicked on a link only to receive a 404 Not Found error page, you can imagine how detrimental crawl errors can be. Errors make for an unpleasant user experience, and encountered by the Googlebot, they can also dilute your page rank. With the help of Google Webmaster tools, however, you can quickly identify crawl errors and resolve critical issues. Below is a quick summary of each category and what it probably means.

HTTP errors: When a user accesses a page or a bot attempts to crawl one, your server returns an HTTP status code. There is a long list of specific codes, but they fall into one of five categories: Provisional response (1XX), Successful (2XX), Redirected (3XX), Request Error (4XX), and Service Error (5XX). Webmaster Tools separates out the most critical request and service errors, so while you should eventually address these errors, they can wait.

In Sitemaps: Sitemap errors generally involve old sitemaps that you have deleted and designated 404 (Not Found) or pages in your current sitemap that are 404. The tricky thing is that after a link 404s, Google continues to crawl it to make sure it truly is dead. This does not hurt your page rank, but it can be annoying to see all those sitemap errors. Confirm that your current sitemap contains working links that you want indexed. Make sure your old sitemaps are 404 and not redirecting. Eventually, Google will stop crawling them and these errors will vanish for good.

URLs not followed: This occurs when spiders are not able to completely follow a URL and usually involves a redirect error. These can be some of the most time-consuming crawl errors to fix, but they are also among the most important. Make sure your redirects point to valid pages that are not empty. You also should minimize the number of redirects and set your redirect timer for a relatively short period of time. Lastly, you want to make sure that redirects do not point back to themselves or loop back to a URL more than once.

Not found: This is usually the dreaded 404 error. If you want to permanently delete pages from your site, it is fine to let them 404. Eventually they will disappear, but there are some instances in which a 404 requires you to 301 redirect it. One common occurrence is that someone links to you, but makes a typo. By redirecting the misspelled URL to the correct one, you can capture the traffic from that link. Also, if you are moving content rather than deleting it, you should 301 redirect it. Unintended 404s often involve a typographical error or small inconsistency somewhere, but thankfully, they are easy to fix with 301 redirect.

Restricted by robots.txt: This is usually not an error, but rather a restriction you have placed on the Googlebot in your robots.txt file. You should always confirm, however, that you are intentionally blocking those URLs. While it is rare occasionally, you may find a questionable URL that you will need to investigate further.

Soft 404s: These should be avoided at all costs. Basically, these are pages that do not respond with a hard 404 code, but due to light or duplicate content, the Googlebot thought it should be a 404 page. If it sounds like a value judgment, it is. You should either 404 these pages immediately or develop them into something more substantial. Temporary landing pages often suffer this same fate, so you want to minimize their use as much as possible.

URLs Timed Out: If a page takes too long to load, the Googlebot will time out and stop trying to crawl it. A DNS lookup timeout indicates a domain server issue, which may or may not be attributed to you. A URL timeout indicates problems connecting to a specific page, not your entire domain. A robots.txt timeout means the server timed out when Google attempted to crawl your robots.txt file before crawling the rest of your site. Google crawls the robots.txt file first to ensure they are not crawling pages you have blocked from the bot. When this happens, the crawl is postponed until the bot can access the file.

URL unreachable: Unreachable errors can occur for a variety of reasons. It could be an internal server or a DNS issue, or the Googlebot may simply have not been able to access your robots.txt file. It is important to diagnose these issues, however, and attend to them immediately.

Now that you are armed with some background information, hopefully you can approach crawl errors without as much as trepidation. Most of these are small fixes but important ones. With the help of Google Webmaster Tools, you will be able to keep your site running smoothly and crawl errors to a minimum.

Crawl errors indicate an unpleasant user experience and dilute your page rank. Google Webmaster tools, however, helps you identify and resolve critical issues that are essential factors to good search engine optimization.

RSS Comments Feed

Comments on this Article: 2

Add a Comment
  1. Very well written Rob. A great reference for the SEO audience. I’ll share it with my audience just after posting this comment.

    Best,
    Andrew

  2. Mike says:

    How long does it take before google notices you fixed the errors? Seems like forever.

Add a Comment: