How to deal with crawl errors in Google Search Console (Google Webmaster Tools)

Last updated on Oct 20, 2016

Has this happened to you? You check the “Crawl Errors” report in Google Search Console (formerly known as Webmaster Tools) and you see so many crawl errors that you don’t know where to start. Loads of 404s, 500s, “Soft 404s”, 400s, and many more… Here’s how I deal with big amounts of crawl errors.

If you don’t find a solution to your problem in this article, feel free to leave me a comment at the bottom of this page. I normally reply within a couple of days.

Contents

Here’s an overview of what you will find in this article:

Don’t panic!
First, mark all crawl errors as fixed
Check your crawl errors report once a week
The classic 404 crawl error
404 errors caused by faulty links from other websites
404 errors caused by faulty internal links or sitemap entries
404 errors caused by Google crawling JavaScript and messing it up 😉
Mystery 404 errors
What are “Soft 404” errors?
What to do with 500 server errors?
Other crawl errors: 400, 503, etc.
List of all crawl errors I have encountered in “real life”
Crawl error peak after a relaunch
Summary

So let’s get started. First of all:

Don’t panic!

Crawl errors are something you normally can’t avoid and they don’t necessarily have an immediate negative effect on your SEO performance. Nevertheless, they are a problem you should tackle. Having a small amount of crawl errors in Search Console is a positive signal for Google, as it reflects a good overall website health. Also, if the Google bot encounters less crawl errors on your page, users are less likely to see website and server errors.

First, mark all crawl errors as fixed

This may seem like a stupid piece of advice at first, but it will actually help you tackle your crawl errors in a more structured way. When you first look at your crawl errors report, you might see hundreds and thousands of crawl errors from way back when. It will be very hard for you to find your way through these long lists of errors.

lots of crawl errors in google search console

Does this screenshot make you feel better? I bet you’re better off than these guys 😉

My approach is to mark everything as fixed and then start from scrap: Irrelevant crawl errors will not show up again and the ones that really need fixing will soon be back in your report. So, after you have cleaned up your report, here is how to proceed:

Check your crawl errors report once a week

Pick a fixed day every week and go to your crawl errors report. Now you will find a manageable amount of crawl errors. As they weren’t there the week before, you will know that they have recently been encountered by the Google bot. Here’s how to deal with what you find in your crawl errors report once a week:

The classic 404 crawl error

This is probably the most common crawl error across websites and also the easiest to fix. For every 404 error the Google bot encounters, Google lets you know where it is linked from: Another website, another URL on your website, or your sitemaps. Just click on a crawl error in the report and a lightbox like this will open:

See where crawl errors are linked from

Did you know that you can download a report with all of your crawl errors and where they are linked from? That way you don’t have to check every single crawl error manually. Check out this link to the Google API explorer. Most of the fields are already prefilled, so all you have to do is add your website URL (the exact URL of the Search Console property you are dealing with) and hit “Authorize and execute”. Let me know if you have any questions about this!

Now let’s see what you can do about different types of 404 errors.

If the false URL is linked to from another website, you should simply implement a 301 redirect from the false URL to a correct target. You might be able to reach out to the webmaster of the linking page to ask for an adjustment, but in most cases it will not be worth the effort.

If the false URL that caused the 404 error for the Google bot is linked from one of your own pages or from a sitemap, you should fix the link or the sitemap entry. In this case it is also a good idea to 301 redirect the 404 URL to the correct destination to make it disappear from the Google index and pass on the link power it might have.

404 errors caused by Google crawling JavaScript and messing it up 😉

Sometimes you will run into weird 404 errors that, according to Google Search Console, several or all of your pages link to. When you search for the links in the source code, you will find they are actually relative URLs that are included in scripts like this one (just a random example I’ve seen in one of my Google Search Console properties):

Google crawls the URLs in this script

According to Google, this is not a problem at all and this type of 404 error can just be ignored. Read paragraph 3) of this post by Google’s John Mueller for more information (and also the rest of it, as it is very helpful):


I am currently trying to find a solution that is more satisfying than just ignoring this type of errors. I will update this post if I come up with anything.

Mystery 404 errors

In some cases, the source of the link remains a mystery. I get the impression that the data that Google provides in the crawl error reports is not always 100% reliable. For example, I have often seen URLs as sources for links to 404 pages that didn’t exist any more themselves. In such cases, you can still set up a 301 redirect for the false URL.

Remember to always mark all 404 crawl errors that you have taken care of as fixed in your crawl error report. If there are 404 crawl errors that you don’t know what to do about, you can still mark them as fixed and collect them in a “mystery list”. Should they keep showing up again, you know you will have to dig deeper into the problem. If they don’t show up again, all the better.

Let’s have a look at the strange species of “Soft 404 errors” now.

crawl-errors-google-search-console

What are “Soft 404” errors?

This is something Google invented, isn’t it? At least I’ve never heard of “Soft 404” errors anywhere else. A “Soft 404” error is an empty page that the Google bot encountered that gave back a 200 status code.

So it’s basically a page that Google THINKS should be a 404 page, but that isn’t. In 2014, webmasters started getting “Soft 404” errors for some of their actual content pages. This is Google’s way of letting us know that we have “thin content” on our pages.

Dealing with “Soft 404” errors is just as straightforward as dealing with normal 404 errors:

  • If the URL of the “Soft 404” error is not supposed to exist, 301 redirect it to an existing page. Also make sure that you fix the problem of non-existent URLs not giving back a proper 404 error code.
  • If the URL of the “Soft 404” page is one of your actual content pages, this means that Google sees it as “thin content”. In this case, make sure that you add valuable content to your website.

After working through your “Soft 404” errors, remember to mark them all as fixed. Next, let’s have a look at the fierce species of 500 server errors.

What to do with 500 server errors?

500 server errors are probably the only type of crawl errors you should be slightly worried about. If the Google bot encounters server errors on your page regularly, this is a very strong signal for Google that something is wrong with your page and it will eventually result in worse rankings.

This type of crawl error can show up for various reasons. Sometimes it might be a certain subdomain, directory or file extension that causes your server to give back a 500 status code instead of a page. Your website developer will be able to fix this if you send him or her a list of recent 500 server errors from Google’s Webmaster Tools.

Sometimes 500 server errors show up in Google’s Search Console due to a temporary problem. The server might have been down for a while due to maintenance, overload, or force majeure. This is normally something you will be able to find out by checking your log files and speaking to your developer and website host. In a case like this you should try to make sure that such a problem doesn’t occur again in future.

Pay attention to the server errors that show up in your Google Webmaster Tools and try to limit their occurrence as much as possible. The Google bot should always be able to access your pages without any technical barriers.

Let’s have a look at some other crawl errors you might stumble upon in your Google Webmaster Tools.

Other crawl errors: 400, 503, etc.

We have dealt with the most important and common crawl errors in this article: 404, “Soft 404” and 500. Once in a while, you might find other types of crawl errors, like 400, 503, “Access denied”, “Faulty redirects” (for smartphones), and so on.

In many cases, Google provides some explanations and ideas on how to deal with the different types of errors.

In general, it is a good idea to deal with every type of crawl error you find and try to avoid it showing up again in future. The less crawl errors the Google bot encounters, the more Google trusts your site health. Pages that constantly cause crawl errors will be thought to also provide a poor user experience and will be ranked lower than healthy websites.

You will find more information about different types of crawl errors in the next part of this article:

List of all crawl errors I have encountered in “real life”

I thought it might be interesting to include a list of all of the types of crawl errors I have actually seen in Google Search Console properties I have worked on. I don’t have much info on all of them (except for the ones discussed above), but here we go:

Server error (500)
In this report, Google lists URLs that returned a 500 error when the Google bot attempted to crawl the page. See above for more details.

Soft 404
These are URLs that returned a 200 status code, but should be returning a 400 error, according to Google. I suggested some solutions to this above.

Access denied (403)
Here, Google lists all URLs that returned a 403 error when the Google bot attempted to crawl them. Make sure you don’t link to URLs that require authentication. You can ignore “Access denied” errors for pages that you have included in your robots.txt file because you don’t want Google to access them. It might be a good idea though to use nofollow links when you link to these pages, so that Google doesn’t attempt to crawl them again and again.

Not found (404 / 410)
“Not found” is the classic 404 error that has been discussed above. Read the comments for some interesting information about 404 and 410 errors.

Not followed (301)
The error “not followed” refers to URLs that redirect to another URL, but the redirect fails to work. Fix these redirects!

Other (400 / 405 / 406)
Here, Google groups everything it doesn’t have a name for: I have seen 400, 405 and 406 errors in this report and Google says it couldn’t crawl the URLs “due to an undetermined issue”. I suggest you treat these errors just like you would treat normal 404 errors.

Flash content (Smartphone)
This report simply lists pages with a lot of flash content that won’t work on most smartphones. Get rid of flash!

Blocked (Smartphone)
This error refers to pages that could be accessed by the Google bot, but were blocked for the mobile Google bot in your robots.txt file. Make sure you let all of Google’s bots access the content you want indexed!

Please let me know if you have any questions or additional information about the crawl errors listed above or other types of crawl errors.

Crawl error peak after a relaunch

You can expect a peak in your crawl errors after a website relaunch. Even if you have done everything in your power to prepare your relaunch from an SEO perspective, it is very likely that the Google bot will encounter a big number of 404 errors after the relaunch.

If the number of crawl errors in your Google Webmaster Tools rises after a relaunch, there is no need to panic. Just follow the steps that have been explained above and try to fix as many crawl errors as possible in the weeks following the relaunch.

Summary

  • Mark all crawl errors as fixed.
  • Go back to your report once a week.
  • Fix 404 errors by redirecting false URLs or changing your internal links and sitemap entries.
  • Try to avoid server errors and ask your developer and server host for help.
  • Deal with the other types of errors and use Google’s resources for help.
  • Expect a peak in your crawl errors after a relaunch.

If you have any additional ideas on how to deal with crawl errors in Google Webmaster Tools, I would be grateful for your comments.

Say thanks by sharing this:

72 Comments

  1. Ali
    1. January 2017

    hello sir this my website kindly help me my search console analytical is not working what is problem can you help i cant see any error for this http://www.subkuchsell.com website

    Reply
    • Eoghan Henn
      4. January 2017

      Hello Ali,

      I am not sure if I can help you with this question. If you do not see any data in Google Search Console, it might be because you only verified your property recently. It takes a few days until data is shown.

      If you do not see any errors, it might also be related to the fact that there simply aren’t any errors.

      Make sure you verify the right website version. The URL you enter for your Search Console property should be http://www.subkuchsell.com/.

      Let me know if there is anything else I can do for you.

      Eoghan

      Reply
  2. leanin
    29. December 2016

    Hey Eoghan,

    thanks for sharing. For an e-commerce website, my friend suggest a way to deal with 400 pages.
    1.download the search crawl error-404,
    2.past the 404 url in to txt file,
    3. put the 404.txt in the ftp,
    4.submit 404.txt to Add/Test Sitemap
    google webmaster–crawl–sitemap–Add/Test Sitemap button
    http://www.xxxxx.com/404.txt

    since we are going to dele around 4k url recently,how to deal with it very important

    Reply
    • leanin
      29. December 2016

      Fix 404 errors by redirecting false URLs or changing your internal links and sitemap entries.

      for this, steps as followings, right?

      1. 301 redirect all 404 error url to the homepage,
      2. update the sitemap
      3. sumit the sitemap,

      which one is correct?

      Reply
      • Eoghan Henn
        4. January 2017

        Yes, this is how I would suggest to do it. Just think about whether there are better targets for your 301 redirects than the home page. I would not recommend to just redirect every old URL to the home page without thinking about it. For most URLs, there will probably be better targets than the home page.

    • Eoghan Henn
      4. January 2017

      Hi leanin,

      I am not sure why your friend recommends these steps, but this is not a solution I have ever heard of.

      Reply
  3. mirotic
    26. November 2016

    hi sir
    (i have bad english)

    can u help me fix this issue?

    my site has been block cause the yandex bot (i don really understand how this work)
    http://imgur.com/a/W1JKK

    i register my site at yandex , i couldnt find the crawl setting
    http://imgur.com/a/297Mu

    what should i do ?

    Reply
  4. mikc
    16. November 2016

    Hello Sir!

    I just built a website and google won’t crawl, won’t allow me to upload a site map either. Getting only 2 links showing when I enter site:acousticimagery.net, and one of these shows a 500 error. Also, when trying to crawl, Google doesn’t like my robots.txt file. I’ve tried several edits, removing it altogether, nothing helps. My Site host is worthless, been trying to get this fixed for 2 weeks. Any input you might have would be most appreciated!!

    Reply
    • Eoghan Henn
      16. November 2016

      Hello Mick! Thanks a lot for your comment.

      One important problem I have been able to identify is that your server always returns a 500 error instead of a 404 error when a page does not exist. Can you think of a way to fix this?

      If you want to get your pages indexed quickly, I recommend you go to “Crawl > Fetch as Google” in Google Search Console. Here you can fetch each page that is not in the index yet and then, after it has been fetched, click on “Submit to index”. This will speed up the indexing process.

      I could not find a robots.txt file or an XML sitemap on your server. The robots.txt should be located at http://acousticimagery.net/robots.txt. Right now, this URL returns a 500 error code, so I assume the file does not exist or is not in this location. Your can decide how you want to name your XM sitemap, but I would recommend putting it here: http://acousticimagery.net/sitemap.xml.

      Mind you, you don’t really need a robots.txt file and an XML sitemap for a website with only 4 pages (but they won’t do any harm either). Just make sure you change that thing with the wrong error codes.

      Please let me know if you have any other questions.

      Best regards,

      Eoghan

      Reply
      • Mick
        16. November 2016

        Hello Eoghan,
        Thanks for the response! Google won’t let me crawl the site as I keep getting an error saying they can’t locate the robots.txt file. I removed the file contents and tried again, still no go. Also, everytime I try to upload an XML file it tells me the file is in an invalid format. I see the 500 errors but cannot fix them. Any other ideas? This all started when I updated the site using a website builder available from Fat Cow. Very sorry I ever tried to update as I’m getting no cooperation from them on this at all. I’m thinking of just pulling the site and cancelling my Fat Cow account. You mentioned submitting each page with fetch. How do you do this?

      • Eoghan Henn
        19. November 2016

        Hi Mick,

        OK, thanks for the additional information. I now have a better understanding of what is going on. The Google bot tries to access your robots.txt file at http://acousticimagery.net/robots.txt and gets a 500 server error, so it decides not to crawl the page and come back later. You can fix this by fixing the error code problem I described earlier. If http://acousticimagery.net/robots.txt returns a 404 error, everything is fine and Google crawls your page.

        I do not know how this works with Fat Cow, but maybe this page will help you: http://www.fatcow.com/knowledgebase/beta/article.bml?ArticleID=620

        Here’s how to submit each page to the Google index in Google Search Console:

        1. In the left navigation, got to Crawl > Fetch as Google:

        Crawl, Fetch as Google

        2. Enter the path of the page you want to submit and hit Fetch:

        Enter page path and hit Fetch

        3. When fetching is complete, hit “Request indexing”:

        Hit Request indexing

        4. Complete the dialogue that pops up like this:

        Complete this dialogue

        5. Repeat for every page you want to submit to the index. Here are the paths of the pages you will want to submit:
        cd-transfers
        audio-recording
        contact-us

        I hope this helps! It will take a while until the pages show up in the search results. Let me know if there is anything else I can do for you.

        Eoghan

  5. Sean
    15. November 2016

    I get a lot of page no found errors and when I checked the linked from info and click the links they clearly go to the actual page, which is not broken? It’s really annoying as the errors keep coming back.

    i.e.

    This error

    /places/white-horse-inn/

    is linked form here

    http://www.seanthecyclist.co.uk/places/white-horse-inn/

    Any idea what might be causing this?

    Thanks

    Reply
    • Eoghan Henn
      16. November 2016

      Hi Sean,

      I think I might need some more information to be able to help you with this. I will send you an e-mail now.

      Best regards,

      Eoghan

      Reply
  6. Donald
    5. November 2016

    I have been getting the same issue as Michael .

    how i do to fixed this error 500 http://imgur.com/a/qE4i3

    It made me lose every single keyword I was ranking for and the more I try to remove they keep coming up. As soon as I fetch the URL , search results pop back up to #2 positions for many keywords but just after a few hours looks like google crawls them again finding errors and sends the site back to the 10th page. Search rankings were gradually lost as soon as this 500 server error was discovered on webmaster.
    Now I have thought about blocking /wp-includes/ but I think you cant block it anymore due to css and js which might hurt rankings even more.

    Any help would be most appreciated.

    Reply
    • Eoghan Henn
      5. November 2016

      Hi Donald,

      You’re absolutely right, /wp-includes/ does contain some .js files that you might want Google to crawl. Your CSS is normally in /wp-content/ though.

      Also, Yoast does not block /wp-includes/ by default any more (Source: https://yoast.com/wordpress-robots-txt-example/)

      Nevertheless, it is probably a good idea to block all URLs that return a 500 error from the Google bot. So far, I’ve never had problems with blocking the entire /wp-includes/ directory (I still do it on this website), but it might be worth the while going through the directory and only blocking URLs that return a 500 server error.

      I hope this helps!

      Reply
  7. Michael
    29. October 2016

    how i do to fixed this error 500 http://imgur.com/a/qE4i3

    Reply
    • Eoghan Henn
      1. November 2016

      Hello Michael,

      You can block your /wp-includes/ directory from the Google bot by putting it in your robots.txt file. I recommend you install the Yoast SEO plugin for WordPress. As far as I know, it does it automatically.

      I hope this helps.

      Eoghan

      Reply
      • Eoghan Henn
        5. November 2016

        Please see my reply to Donald’s comment (above) for an update on this issue.

  8. kevin
    30. September 2016

    Henn,
    We have crawl errors in webmasters.When we remove such pages from webmasters.So within how many days that page can be removed from google Webmasters.

    Reply
    • Eoghan Henn
      6. October 2016

      Hi Kevin,

      For me, there are two scenarios in which I would remove a crawl error from the report:

      1. If I know the error won’t occur again because I’ve either fixed it or I know it was a one-time thing.
      2. If I don’t know why the error occured (i.e. why Google crawled that URL or why the URL returned an error code) and I want to see if it happens again.

      WHEN you do this really doesn’t matter much. I hope this helps! Let me know if you have any other questions.

      Reply
  9. kevin
    29. September 2016

    Hi Eoghan Henn,
    This is kevin can u tell me after removing the page from webmasters.How many days after the page can be removed from the webmasters.

    Reply
    • Eoghan Henn
      30. September 2016

      Hello Kevin,

      Thanks a lot for your comment. I am not sure if I understand your question correctly. I will send you an e-mail so we can discuss this.

      Best regards,

      Eoghan

      Reply
  10. Saud Khan
    21. September 2016

    Please help me to fix this error.

    Screenshot: http://i.imgur.com/ydZo4Wv.jpg

    I’ve deleted the sample page and redirected the second url.

    Reply
    • Eoghan Henn
      27. September 2016

      Hi Saud,

      Unfortunately the screenshot URL is not working (any more). I will get in touch with you via email and see if I can help you.

      Best regards,

      Eoghan

      Reply
  11. Ajay Murmu
    14. September 2016

    I am getting HTTP Error: 302 error in sitemaps section. All other sitemap urls are working fine but i am getting error on main sitemap.xml. How can i resolve it?

    Reply
    • Eoghan Henn
      16. September 2016

      Hello Ajay,

      thanks a lot for your comment. I am not sure I understand your question very well. I will send you an e-mail so you can send me a screenshot if you like.

      Best regards,

      Eoghan

      Reply
      • Ray
        18. December 2016

        Hello Eoghan, I would love to know if you resolved the ‘302’ problem
        I’ve had the issue of going through the wayback machine to a website but then when I click the link I need I am greeted with: ‘Got an HTTP 302 response at crawl time’ and redirected to the current website where my information is no longer.
        Would really appreciate some help if you could email me.
        internetuser52@gmail.com

      • Eoghan Henn
        4. January 2017

        Hi Ray,

        I’ll send you an e-mail.

        Eoghan

  12. Jennifer M
    23. August 2016

    There is a nasty website that injected a redirect on our site. We found the malware and removed it, but their site is still linking to tons of URLS on our site that don’t exist–and hence creating crawler errors.

    How would you suggest we fix this?

    THANKS!
    Jennifer

    Reply
    • Eoghan Henn
      29. August 2016

      Hi Jennifer,

      This does sound nasty :/

      It is not easy to analyse this situation with the little bit of information I have, but I guess you do not have to worry about the crawl errors too much. Look at it this way: Somebody (a spammer) is sending the Googlebot to URLs on your website that don’t exist and have never existed. Google is clever enough to figure out that this is not your fault.

      If you like, you can send me more information via email so that I can have a closer look at it.

      Reply
  13. Chris
    10. August 2016

    That’s great news. Thanks for sharing Eoghan. Keep me posted!

    -Chris

    Reply
    • Eoghan Henn
      12. August 2016

      Hi Chris,

      For now, I recommend you use Google’s Search Console API explorer. If you follow this link, the fields are already pre-filled for a list of your 404 errors with additional information about the sitemaps the false URLs are included in and the pages they are linked from:

      https://developers.google.com/apis-explorer/#p/webmasters/v3/webmasters.urlcrawlerrorssamples.list

      You just need to fill in your site URL (make sure you use the exact URL of your GSC property in the right format). You can then copy and paste the output and forward it your IT. I want to build a little tool that will make this easier and nicer to export, but that will take a while 🙂

      Hope this helps for now! Let me know if you have any questions.

      Reply
      • Chris Smith
        13. August 2016

        Eoghan,

        That works perfectly. Thanks a ton for the detailed response and customized URL. I hope I can return the favor someday. 🙂

        Thanks again,

        Chris

  14. Chris Smith
    3. August 2016

    I like this strategy.

    Is there a way to download the “linked from” information in the 404 report? Would make it much easier to send the complete details to my IT team.

    Reply
  15. Dr Emixam
    21. July 2016

    Hi,

    Following a misconfiguration of another of my websites, google indexed a lot of non existant pages and now all of these pages appear in crawl errors.

    I tried to set them as 410 errors to tell they doesn’t existe anymore but google keeps them in crawl errors list.

    Do you know what is the best thing to do in this case ? And in a general way, for any page which is permanently deleted.

    Reply
    • Eoghan Henn
      2. August 2016

      Hello Dr Emixam,

      Thanks a lot for your comment snd sorry about my late reply.

      From what you described, you did everything right. Just remember to mark the errors as fixed once you’ve made changes to your page. This way they should not show up again.

      Let me know if you have any further questions.

      Reply
      • Eoghan Henn
        20. October 2016

        Just to clarify this: Giving back a 410 code alone will not prevent the URLs from showing up in the 404 error reports – Google currently shows 410 errors as 404 errors. In order to stop the URLs from showing up in the reports, all links to the URLs need to be removed too. Otherwise, Google will keep on following the links, crawling the URLs and showing the errors in the reports. If there are external links to the URLs that cannot be removed, it might be better to use a 301 redirect to point to another URL that is relevant to the link.

  16. Steven
    19. July 2016

    Hi Eoghan-

    Thanks for the great info in the article! I have an interesting (to me) issue with some of the crawl errors on our site. The total number of 404 errors is under 200 and some of them I can match to your info above. But, there are quite a few URLS that are not resolving properly due to “Chief%20Strategy%20Officer” having been appended on to each of the URLs. For example, the URL will end with “…personal-information-augmented-reality-systems/Chief%20Strategy%20Officer” and the Linked From URL is the link on our site.

    I’m going to go ahead and mark all as “fixed” and see what happens, but I was wondering if you had any idea how this may have happened?

    Thanks y ¡Saludos! from BCN…
    Steven

    Reply
    • Eoghan Henn
      29. July 2016

      Hi Steven,

      Thanks for your comment! I found your website, crawled it, and found some interesting stuff that might help you. I will send you an email about this.

      Best regards,

      Eoghan

      Reply
  17. Vicky
    17. July 2016

    Hi Eoghan Henn,

    I have over 1000, 404 not found errors on google search console for deleted products. What i will do to fix those errors. Can you please suggest me any way to fix them.

    Thanks
    Vicky

    Reply
    • Eoghan Henn
      28. July 2016

      Hello Vicky,

      When you have to delete a product page, you have a few options:

      • Is there a replacement product or a new version of the product? If so, you can 301 redirect the URL of the deleted product to this new URL. Make sure that the new URL you redirect the old URL to is very similar to the old URL though. Do not overuse 301 redirects!
      • If you want to delete a product and send a signal to Google saying that the page has been removed intentionally, you can give back a 410 status code instead of a 404.
      • If none of the above is possible, you can just mark the 404 errors in Search Console as fixed. Make sure you do not link to the old URLs internally any more. Google should stop crawling them then and the errors should not return. If a URL is linked from another website, you should definitely 301 redirect it to a relevant target (see first option).

      I hope this helps!

      Reply
      • Vicky
        28. July 2016

        Hi Eoghan,

        Thanks for reply,

        I marked them fixed so google stop crawling them. Yes there is some deleted pages linked internally & externally. I will redirect those deleted product to similar product.

        Will inform you soon about update on it.

        Again thanks for reply!

      • Eoghan Henn
        2. August 2016

        Hi Vicky,

        Just to clarify: Marking the errors as fixed will not make Google stop crawling them. This can only be achieved by removing all links to the URLs.

        I’m looking forward to hearing about how it went for you!

  18. Chris
    26. June 2016

    Hi Eoghan,

    Just wanted to give you a thumbs up! Great post and super useful to me today, right now, when I discovered a bunch of 404s on a client site who just moved from an html site to a wordpress site for some of the pages they used to rank for, ie. somepage.html.

    I had used SEMRush to find as many pages as possible that they were previously ranking for and redirected them to any specific, relevant page and, when not possible, to category or broad topic pages.

    The remaining crawl errors (404s) in Search Console are some pages that didn’t show up in SEMRush and of course things like “http://myclientsite.com/swf/images.swf” Since we are sensibly no longer using Flash I guess I just don’t worry about those? Not really sure.

    Anyway, thanks for the great post!

    Reply
    • Eoghan Henn
      30. June 2016

      Hi Chris,

      Thanks for your kind words! I’m glad this article helped you.

      Yes, you can just ignore the swf file errors. If you mark them as fixed I guess they won’t show up again.

      Reply
  19. Daniel
    23. June 2016

    Hi Eoghan,

    Any thought on bigger classified ad sites handling search console?

    For instance, real estate with multiple ads that expire after a certain date, having around 30k “404” or so. What would you suggest to deal with such amount of expired content?

    Thanks in advance,

    Reply
    • Eoghan Henn
      29. June 2016

      Hi Daniel,

      Thanks a lot for your interesting question. I have no practical experience with a case like this, but let me share some thoughts:

      • One thing you can do to make it clear that the pages were removed intentionally, so there is no “error”, would be to serve a 410 status code instead of a 404 status code (see https://searchenginewatch.com/sew/how-to/2340728/matt-cutts-on-how-google-handles-404-410-status-codes)
      • Also, ask yourself: Do you really need all of these temporary pages crawled and indexed? Do you get any valuable organic traffic from them? Do they rank for queries that you could not cover with pages that are permanently available? Maybe you can build an SEO strategy for your website that takes into account the fact that you have a big number of pages that disappear after a while.
        • I hope this helps!

      Reply
      • Jules
        11. September 2016

        Google treats a 410 as a 404. https://support.google.com/webmasters/answer/35120?hl=en under URL error types > Common URL errors > 404: “If permanently deleting content without intending to replace it with newer, related content, let the old URL return a 404 or 410. Currently Google treats 410s (Gone) the same as 404s (Not found).”

      • Eoghan Henn
        13. September 2016

        Hi Jules,

        Thanks for linking to this source. Still, I guess that a 410 error code is the better choice when intentionally removing content. We do not have power over how Google interprets our signals, but we should do everything we can to make them as consistent as possible.

  20. Josh
    24. April 2016

    Hey Eoghan – I see your website is made with WordPress, so I was hoping you’d be able to answer my question.

    I recently re-submitted my sitemap (since I thought it might be a good thing to do after disallowing /go/ in my robots.txt for affiliate links) and a few days after recrawling, I now see a new 500 error:

    /wp-content/themes/mytheme/

    Other notes:

    – This was not present before I resubmitted my sitemap, and it’s the only 500 error I’ve seen since I launched my website a month or two ago.
    – I also know that my webhost (Bluehost) tends to go down at times. Maybe this is because Google tried crawling when it was down?
    – I updated my theme a few days before the 500 error appeared.

    Do I need to take any action? Is there any other info I can provide?

    Thanks – appreciate it.

    Reply
    • Eoghan Henn
      25. April 2016

      Hi Josh! Thanks for your comment.

      First of all: This is not something you should worry about, but if you have time, you might as well try to fix it 😉

      Apparently, the type of URL you mentioned above always gives back a 500 server error. Check this out: I’m using a WP theme called “Hardy” and the exact same URL for my page and my theme also returns a 500 server error: http://www.rebelytics.com/wp-content/themes/hardy/. So it’s not Bluehost’s fault. (Fun fact: I will probably receive a 500 error for this now because I just placed that link here).

      Now the question is: Why did the Google bot crawl your theme URL in the first place? Are you linking to it in your new sitemap? If so, you should remove the link. Your sitemap should only contain links to URLs that you want indexed. You can check where the Googlebot found the link to the URL (as mentioned in the article above). Here’s a screenshot of that:

      See internal links for crawl errors in Google Search Console

      If you find a link to that URL anywhere, just remove it. Otherwise, I guess you can just ignore this crawl error. It would be interesting to mark it as fixed and see if it shows up again. Let me know how it goes! And just give me a shout if you have any additional questions.

      Best regards,

      Eoghan

      Reply
      • Josh
        25. April 2016

        Awesome – thanks for the reply. It’s not linked in my sitemap and clicking on the link in GWT doesn’t show where it’s linked from, but I’ll remove it. Glad to hear it’s not really a problem.

        I also had 2 other quick questions:

        In general, do I only need to worry about crawl errors/warnings for relevant webpages (webpages that I want indexed and webpages that should be redirected since they’re being clicked on)? Some warnings are shown for:
        /m
        /mobile
        /coming-soon

        No idea how these appeared, and it shows they’re linked from my homepage, even though I have no idea how that is possible.

        Also, my Amazon affiliate links (cloaked with /go/) got indexed a few weeks ago, and roughly a week ago, I put rel=”nofollow” for each link and also added “Disallow: /go/” under “User-agent: *” in my robots.txt.

        It’s been a week, and my affiliate links are still indexed when I enter “site:mysite.com”. Do you think I’m missing anything, and how can I find out if I’m still being penalized for them?

        Thanks for the help – greatly appreciated.

      • Eoghan Henn
        16. May 2016

        Hi Josh! Sorry it took me so long to reply to this one. You’re right, you should worry more about crawl errors for relevant pages that you want indexed. Nevertheless, it is always a good idea to also have a closer look at all the other crawl errors and try to avoid them in future. Sometimes, though, there’s nothing much you can do (like in the JavaScript example in the article).

        What kind of redirects are you using for your Amazon affiliate links? Make sure you use a 301 redirect so they don’t get indexed.

        I hope this helps!

  21. Dermid
    25. February 2016

    Eoghan,
    Thanks for the very good input. A related question: I’m getting three different numbers for Google indexed pages. 1) when I type site:mysite.com I get 200,000 and 2) when I look in Google Search Console index status it reports 117,000 and 3) when I look at crawled site map it reports only 67 pages indexed. Can you help me understand these varying index numbers? Thank you very much.
    Dermid

    Reply
    • Eoghan Henn
      25. February 2016

      Hello again! You get different numbers here because you are looking at three different things:

      1) site:mysite.com shows you all the pages on your domain that are currently in the index. This includes all subdomains (www, non-www, mobile subdomain) and both protocols (http and https).
      2) shows you all indexed pages within the Search Console property you are looking at. A Search Console property can only include URLs with a combination of one protocol and one subdomain, so if the name of your Search Console is https://www.mysite.com/, only URLs that start with https://www.mysite.com/ (and that are indexed) will show here.
      3) shows you all URLs that are included in this exact sitemap and that are indexed.

      I found https, http, www, non-www, and m. (mobile subdomain) pages of your domain in the Google index. You should make sure all of your pages are only available with https and decide whether you want to use www or not (this is decision a matter of taste). You can easily set this up with two 301 redirect rules: One that redirects every http URL to its https equivalent and on that redirects all non-www URLs to their www equivalents (or vice versa). Last but not least, make sure you are using the right Search Console property (so https://www.mysite.com/ or https://mysite.com/, depending on how you decide on the www or non-www matter) and post a sitemap with all the URLs you want to be indexed.

      Once you’ve followed this, you should work on closing the gap between 1), 2) and 3). If you have a healthy website and you’re in control of what Google is indexing, all three numbers should be on a similar level.

      Reply
  22. Dermid
    23. February 2016

    Eoghan,,
    Thank you. Our development team is using your advice because we have very similar issues with crawl errors. On a related note, I’m trying to understand the relationship between crawl errors and indexed URLs. When our URLs are indexed we do very well with organic search traffic. We have millions of URLs in our submitted sitemap. Within the Google Search Console our indexed URL number jumped from zero to 100K on Janurary 4th but has stayed at about that level since then. Should we expect that when we fix the crawl errors the indexed URLs will rise?
    Thank you,
    Dermid

    Reply
    • Eoghan Henn
      24. February 2016

      Hi Dermid,

      Thanks a lot for your comment and your interesting questions.

      Crawl errors and indexed URLs are not always directly related. 404 errors normally occur when the Googlebot encounters faulty URLs that are not supposed to be crawled at all, for example through broken links. Server errors, on the other hand, can occur with URLs that are supposed to be indexed, so fixing these might result in a higher number of indexed URLs.

      If you have millions of URLs in your sitemap, but only 100k of them are indexed, you should work on closing this gap. First of all you should check if you really want millions of URLs in your sitemap or if maybe lots of those pages aren’t relevant entry pages for users that search for your products or services in Google. It is better to have a lower number of high quality pages than having a higher number of low quality pages up for indexing.

      Next, check why a big part of the URLs you submitted in your sitemaps hasn’t been indexed by Google. Note that submitting a URL in a sitemap alone will normally not lead to indexing. Google needs more signals to decide to index a page. If a large number of pages on your domain is not indexed, this is normally due to poor internal linking of the pages or poor content on the pages. Make sure that all pages you want in the Google index are linked properly internally and that they all have content that satisfies the needs of the users searching for the keywords you want to rank for.

      I hope this helps!

      Eoghan

      Reply
  23. Arun
    14. January 2016

    Hi….Eoghan Henn

    Actually my site was getting hacked after that i got a lot of errors is search console i don’t know what have to do by your tips am going to markup the errors as fixed because those URL’s are not available in my website and i removed those URL’s but the issue is one main landing page is getting 521 error code i was googled about this but i didn’t find a good solution about that and the big issue is my home page only crawled by google another pages not able to crawled even i have submitted sitemaps and using fetch us google please help me and check my website error details below and please command me a good solution about this or mail me please help me……..

    hammer-testing-training-in-chennai.php
    521
    11/2/15

    2
    blog/?p=37
    500
    12/27/15

    8
    blog/?m=201504
    500
    12/7/15

    13
    userfiles/zyn2593-reys-kar-1623-moskva-rodos-ros6764.xml
    521
    11/2/15

    14
    userfiles/cez3214-aviabileti-kompanii-aer-astana-myv9933.xml
    521
    11/3/15

    17
    userfiles/wyz5836-bileti-saratov-simferopol-tsena-gif9086.xml
    521
    11/3/15

    Reply
    • Eoghan Henn
      15. February 2016

      Hi Arun,

      First of all I would like to say that I am very sorry about my late reply. I have been very busy lately and didn’t find time to reply to any comments on here.

      You did the right thing marking the errors as fixed and waiting to see if they occur again. Especially 5xx errors are normally temporary. Did any of these show up again?

      The other problem about important pages not being indexed is probably not related to the crawl errors problem. I am not able to determine the cause of this problem without further research, but I did find one very important problem with your website that you need to solve if you want your pages to be indexed properly:

      In your main navigation, some important pages are not linked to directly, but through URLs that have a 302 redirect to the target. Example:

      /hammer-testing-training-in-chennai.php is linked in the main navigation as /index.php?id=253.

      /index.php?id=253 redirects to /hammer-testing-training-in-chennai.php with a 302 status code. I am not surprised that Google will not index any of the two in this case. you should make sure that you always link directly to the target URL and you should absolutely avoid redirects in internal links. And, in general, there are very few cases where a 302 reidrect is needed. Normally you will need a 301 redirect if you have to redirect a URL.

      I am not sure if this is going to solve all of your problems, but fixing your internal links is definitively an important item on your to-do list. Please let me know if you have any other questions.

      Reply
  24. Greg
    30. October 2015

    I have launched a new site and for some reasons I am getting an error 500 for a number of urls in webmaster tools, including the sitemap itself, When I check my logs for access to the sitemap example it shows google has accessed the sitemap and no errors where returned:

    66.249.64.210 – – [29/Oct/2015:02:19:31 +0000] “GET /sitemap.php HTTP/1.1” – 10322 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

    Also if I access any of these urls they appear perfectly fine.

    Thanks

    Reply
    • Eoghan Henn
      1. November 2015

      Hello Greg, this looks like a temporary problem to me. What I would suggest is to mark the corresponding errors as fixed in the crawl error report and see if they show up again. If they do not show up again, everything is fine.

      Reply
  25. Artin
    18. September 2015

    Hi
    Good post! I get alo of 500 errors in GWT because of i have disabled my feeds! what should i do with that?
    I have disabled feeds because other websites steal my contents!
    Can you help me?
    Thanks

    Reply
    • Eoghan Henn
      22. September 2015

      Hello Artin, I am not quite sure if I understand your problem correctly. Which URLs return 500 errors? The URLs of your feeds? Are you still linking to them? If so, you should definitely remove the links. Also, you can check if it is possible to return a 404 instead of a 500 for your feed URLs. This would be a better signal for Google. It might even be a good idea to 301 redirect the URLs of your feeds to pages on your website, if you find a good match for every feed URL. If you explain your problem in more detail I will be happy to help.

      Reply
      • Artin
        17. October 2015

        Hey Eoghan
        Thanks for your reply! i found a plugin named ”disable feeds” and it redirects all feeds to homepage, so i got rid of those 500 errors! and plus it dosn’t let those spammy sites to steal my contents.

      • Eoghan Henn
        19. October 2015

        Hello Artin, thanks a lot for sharing the info about the plugin. Sounds useful!

  26. Ossai Precious
    14. September 2015

    Good post! I have a similar problem and just don’t know how to tackle it. Google search console shows that 69 pages have errors and I discovered that the 404 errors come up whenever a ‘/’ is added after the URL.

    Reply
    • Eoghan Henn
      15. September 2015

      Hello Ossai,

      Google only crawls URLs that are linked somewhere, so you should first of all try to find the source if this problem. In Search Console, you can find information on where the URLs with errors are linked from. It is very likely that somewhere on your page or in your sitemap you link to those URLs with trailing slash that return 404s. You should fix those links.

      The next thing you can do is make sure that all URLs that end with a slash are 301 redirected to the same URL without a trailing slash. You should only do this if all of your URLs work without trailing slash. It only requires one line in your htaccess file.

      If you have any other questions I will be happy to help.

      Reply
      • Chris
        8. December 2016

        I exactly have this issue right now. Can you explain the correct htaccess code?

      • Eoghan Henn
        14. December 2016

        Hi Chris,

        I am really not en expert on creating rewrite rules in htaccess files, so don’t rely on this, but this one works for me:

        RewriteRule ^(.*)/$ /$1 [R,L]

        Make sure you only use it for the URLs you want to use it for by setting a rewrite condition.

        I hope this helps!

Leave a Reply