How to deal with crawl errors in Google Search Console (Google Webmaster Tools)

Last updated on Oct 20, 2016

Has this happened to you? You check the “Crawl Errors” report in Google Search Console (formerly known as Webmaster Tools) and you see so many crawl errors that you don’t know where to start. Loads of 404s, 500s, “Soft 404s”, 400s, and many more… Here’s how I deal with big amounts of crawl errors.

If you don’t find a solution to your problem in this article, feel free to leave me a comment at the bottom of this page. I normally reply within a couple of days.

Contents

Here’s an overview of what you will find in this article:

Don’t panic!
First, mark all crawl errors as fixed
Check your crawl errors report once a week
The classic 404 crawl error
404 errors caused by faulty links from other websites
404 errors caused by faulty internal links or sitemap entries
404 errors caused by Google crawling JavaScript and messing it up 😉
Mystery 404 errors
What are “Soft 404” errors?
What to do with 500 server errors?
Other crawl errors: 400, 503, etc.
List of all crawl errors I have encountered in “real life”
Crawl error peak after a relaunch
Summary

So let’s get started. First of all:

Don’t panic!

Crawl errors are something you normally can’t avoid and they don’t necessarily have an immediate negative effect on your SEO performance. Nevertheless, they are a problem you should tackle. Having a small amount of crawl errors in Search Console is a positive signal for Google, as it reflects a good overall website health. Also, if the Google bot encounters less crawl errors on your page, users are less likely to see website and server errors.

First, mark all crawl errors as fixed

This may seem like a stupid piece of advice at first, but it will actually help you tackle your crawl errors in a more structured way. When you first look at your crawl errors report, you might see hundreds and thousands of crawl errors from way back when. It will be very hard for you to find your way through these long lists of errors.

lots of crawl errors in google search console

Does this screenshot make you feel better? I bet you’re better off than these guys 😉

My approach is to mark everything as fixed and then start from scrap: Irrelevant crawl errors will not show up again and the ones that really need fixing will soon be back in your report. So, after you have cleaned up your report, here is how to proceed:

Check your crawl errors report once a week

Pick a fixed day every week and go to your crawl errors report. Now you will find a manageable amount of crawl errors. As they weren’t there the week before, you will know that they have recently been encountered by the Google bot. Here’s how to deal with what you find in your crawl errors report once a week:

The classic 404 crawl error

This is probably the most common crawl error across websites and also the easiest to fix. For every 404 error the Google bot encounters, Google lets you know where it is linked from: Another website, another URL on your website, or your sitemaps. Just click on a crawl error in the report and a lightbox like this will open:

See where crawl errors are linked from

Did you know that you can download a report with all of your crawl errors and where they are linked from? That way you don’t have to check every single crawl error manually. Check out this link to the Google API explorer. Most of the fields are already prefilled, so all you have to do is add your website URL (the exact URL of the Search Console property you are dealing with) and hit “Authorize and execute”. Let me know if you have any questions about this!

Now let’s see what you can do about different types of 404 errors.

If the false URL is linked to from another website, you should simply implement a 301 redirect from the false URL to a correct target. You might be able to reach out to the webmaster of the linking page to ask for an adjustment, but in most cases it will not be worth the effort.

If the false URL that caused the 404 error for the Google bot is linked from one of your own pages or from a sitemap, you should fix the link or the sitemap entry. In this case it is also a good idea to 301 redirect the 404 URL to the correct destination to make it disappear from the Google index and pass on the link power it might have.

404 errors caused by Google crawling JavaScript and messing it up 😉

Sometimes you will run into weird 404 errors that, according to Google Search Console, several or all of your pages link to. When you search for the links in the source code, you will find they are actually relative URLs that are included in scripts like this one (just a random example I’ve seen in one of my Google Search Console properties):

Google crawls the URLs in this script

According to Google, this is not a problem at all and this type of 404 error can just be ignored. Read paragraph 3) of this post by Google’s John Mueller for more information (and also the rest of it, as it is very helpful):


I am currently trying to find a solution that is more satisfying than just ignoring this type of errors. I will update this post if I come up with anything.

Mystery 404 errors

In some cases, the source of the link remains a mystery. I get the impression that the data that Google provides in the crawl error reports is not always 100% reliable. For example, I have often seen URLs as sources for links to 404 pages that didn’t exist any more themselves. In such cases, you can still set up a 301 redirect for the false URL.

Remember to always mark all 404 crawl errors that you have taken care of as fixed in your crawl error report. If there are 404 crawl errors that you don’t know what to do about, you can still mark them as fixed and collect them in a “mystery list”. Should they keep showing up again, you know you will have to dig deeper into the problem. If they don’t show up again, all the better.

Let’s have a look at the strange species of “Soft 404 errors” now.

crawl-errors-google-search-console

What are “Soft 404” errors?

This is something Google invented, isn’t it? At least I’ve never heard of “Soft 404” errors anywhere else. A “Soft 404” error is an empty page that the Google bot encountered that gave back a 200 status code.

So it’s basically a page that Google THINKS should be a 404 page, but that isn’t. In 2014, webmasters started getting “Soft 404” errors for some of their actual content pages. This is Google’s way of letting us know that we have “thin content” on our pages.

Dealing with “Soft 404” errors is just as straightforward as dealing with normal 404 errors:

  • If the URL of the “Soft 404” error is not supposed to exist, 301 redirect it to an existing page. Also make sure that you fix the problem of non-existent URLs not giving back a proper 404 error code.
  • If the URL of the “Soft 404” page is one of your actual content pages, this means that Google sees it as “thin content”. In this case, make sure that you add valuable content to your website.

After working through your “Soft 404” errors, remember to mark them all as fixed. Next, let’s have a look at the fierce species of 500 server errors.

What to do with 500 server errors?

500 server errors are probably the only type of crawl errors you should be slightly worried about. If the Google bot encounters server errors on your page regularly, this is a very strong signal for Google that something is wrong with your page and it will eventually result in worse rankings.

This type of crawl error can show up for various reasons. Sometimes it might be a certain subdomain, directory or file extension that causes your server to give back a 500 status code instead of a page. Your website developer will be able to fix this if you send him or her a list of recent 500 server errors from Google’s Webmaster Tools.

Sometimes 500 server errors show up in Google’s Search Console due to a temporary problem. The server might have been down for a while due to maintenance, overload, or force majeure. This is normally something you will be able to find out by checking your log files and speaking to your developer and website host. In a case like this you should try to make sure that such a problem doesn’t occur again in future.

Pay attention to the server errors that show up in your Google Webmaster Tools and try to limit their occurrence as much as possible. The Google bot should always be able to access your pages without any technical barriers.

Let’s have a look at some other crawl errors you might stumble upon in your Google Webmaster Tools.

Other crawl errors: 400, 503, etc.

We have dealt with the most important and common crawl errors in this article: 404, “Soft 404” and 500. Once in a while, you might find other types of crawl errors, like 400, 503, “Access denied”, “Faulty redirects” (for smartphones), and so on.

In many cases, Google provides some explanations and ideas on how to deal with the different types of errors.

In general, it is a good idea to deal with every type of crawl error you find and try to avoid it showing up again in future. The less crawl errors the Google bot encounters, the more Google trusts your site health. Pages that constantly cause crawl errors will be thought to also provide a poor user experience and will be ranked lower than healthy websites.

You will find more information about different types of crawl errors in the next part of this article:

List of all crawl errors I have encountered in “real life”

I thought it might be interesting to include a list of all of the types of crawl errors I have actually seen in Google Search Console properties I have worked on. I don’t have much info on all of them (except for the ones discussed above), but here we go:

Server error (500)
In this report, Google lists URLs that returned a 500 error when the Google bot attempted to crawl the page. See above for more details.

Soft 404
These are URLs that returned a 200 status code, but should be returning a 400 error, according to Google. I suggested some solutions to this above.

Access denied (403)
Here, Google lists all URLs that returned a 403 error when the Google bot attempted to crawl them. Make sure you don’t link to URLs that require authentication. You can ignore “Access denied” errors for pages that you have included in your robots.txt file because you don’t want Google to access them. It might be a good idea though to use nofollow links when you link to these pages, so that Google doesn’t attempt to crawl them again and again.

Not found (404 / 410)
“Not found” is the classic 404 error that has been discussed above. Read the comments for some interesting information about 404 and 410 errors.

Not followed (301)
The error “not followed” refers to URLs that redirect to another URL, but the redirect fails to work. Fix these redirects!

Other (400 / 405 / 406)
Here, Google groups everything it doesn’t have a name for: I have seen 400, 405 and 406 errors in this report and Google says it couldn’t crawl the URLs “due to an undetermined issue”. I suggest you treat these errors just like you would treat normal 404 errors.

Flash content (Smartphone)
This report simply lists pages with a lot of flash content that won’t work on most smartphones. Get rid of flash!

Blocked (Smartphone)
This error refers to pages that could be accessed by the Google bot, but were blocked for the mobile Google bot in your robots.txt file. Make sure you let all of Google’s bots access the content you want indexed!

Please let me know if you have any questions or additional information about the crawl errors listed above or other types of crawl errors.

Crawl error peak after a relaunch

You can expect a peak in your crawl errors after a website relaunch. Even if you have done everything in your power to prepare your relaunch from an SEO perspective, it is very likely that the Google bot will encounter a big number of 404 errors after the relaunch.

If the number of crawl errors in your Google Webmaster Tools rises after a relaunch, there is no need to panic. Just follow the steps that have been explained above and try to fix as many crawl errors as possible in the weeks following the relaunch.

Summary

  • Mark all crawl errors as fixed.
  • Go back to your report once a week.
  • Fix 404 errors by redirecting false URLs or changing your internal links and sitemap entries.
  • Try to avoid server errors and ask your developer and server host for help.
  • Deal with the other types of errors and use Google’s resources for help.
  • Expect a peak in your crawl errors after a relaunch.

If you have any additional ideas on how to deal with crawl errors in Google Webmaster Tools, I would be grateful for your comments.

Say thanks by sharing this:

165 Comments

  1. Yatish
    19. June 2017

    Hi Eoghan (cool name, mate), thanks for this post – very insightful.

    So i am working on a website after we re-launched in April 2017. The dilemma I am having is that we setup around 1137 “Redirect 301 /old-url https://www.website.com/new-url/” but i have a hunch that lots of them are now also resulting in 404s.

    What is a convenient/accurate way to see this list of redirects. I tried using http://www.redirect-checker.org/bulk-redirect-checker.php to see that the full old-urls are in fact redirecting fine – but they still result in 404s.

    Please assist/advise as you can? Would be much appreciated.

    Thanks
    Yatish

    Reply
    • Eoghan Henn
      22. June 2017

      Hello Yatish,

      Thanks a lot for your comment. I normally use Screaming Frog to check lists of URLs. It has a list mode where you can just paste all of your URLs and it then requests them one by one and gives you back all the relevant data, including status codes. The tool you linked to is probably fine to just check the status codes, even if you can only check 100 URLs at a time.

      I’m not sure if I understood your question too well. What did the result the tool gave you look like? Did the URLs you pasted result in 404 errors or not? If you send me some more detailed information (here or by e-mail), I can have a closer look at it.

      Best regards,

      Eoghan

      Reply
  2. Duncan
    18. June 2017

    I feel one reason why I see so many 404 is due to someone scraping my site and hosting it somewhere. I deleted thousand and thousand of pages over 6 years ago and Google still tells me I have a 404 for one of those URL’s.

    I changed to from HTML to WP, then to SSL. I screwed up and pushed over 20K pages to a wrong directory and deleted my error months ago. GWM still tells me I have all these 404’s from this error. It seems I can never win with GWM tools.

    Google panda destroyed my business model, but that’s life.

    Does anyone offer paid consulting to get my domain semi error free?

    Reply
    • Eoghan Henn
      19. June 2017

      Hi Duncan,

      Thanks for sharing your experiences. I’m sorry to hear that you’ve had such bad luck with crawl errors. Hopefully another reader can help you with some consulting.

      Good luck and let me know if there’s anything I can do for you.

      Reply
  3. Tobi
    12. June 2017

    Hi Eoghan,
    One of my sites keep piling up URL errors (hundreds a day)
    All the pages displayed have never been generated on the site.

    unfortunately , and I don’t know why, the error origin is not displayed on Search Console, so I cannot trace where those 404 are being generated from.
    Here’s an example: https://www.screencast.com/t/EYxfGXiIny

    What do you recommend I should do, and does that effect the site’s ranking (which gradually decreases)

    Reply
    • Eoghan Henn
      14. June 2017

      Hi Tobi,

      That sounds strange, especially if we are talking about hundreds of URLs every day. Don’t any of them have any info in the “linked from” tab? Has this problem just started recently or has it been like this for a long time?

      I see you set up a 301 redirect for the example URL in the screenshot. Did you do this for more URLs? Did you mark them as fixed? Did it stop the URLs from showing up again?

      Maybe, if you send me some more example URLs, I can get an idea of what might be going on. I wouldn’t worry about the site’s ranking too much right now, but this is definitely something you should try to figure out and fix. I’ll help you with it as good as I can, if you give me some more info 🙂

      Eoghan

      Reply
      • Tobi
        15. June 2017

        Hi Eoghan,
        Thanks a lot for your quick and detailed answer, and more importantly, your offer to help out. This means a lot to me,

        Yes, crawl errors keep piling up rapidly https://www.screencast.com/t/qNgKnbWWR
        It is an issue we had a few weeks back, we have initially set 301 redirects and marked them all fixed, not entirely sure now doing so was a smart move.

        Since then, errors kept back again, though I’m not sure these are the same URLs
        Here are a bunch of those new URLs:

        https://www.al-ram.net/i22598
        https://www.al-ram.net/pli-dramatist-tact-nel/236/12720
        https://www.al-ram.net/complicated/lenelenari/louis+r%E3%83%9D%E3%82%A4%E3%83%B3%E3%83%88%EF%BC%91%EF%BC%90%E5%80%8D%E3%82%AD%E3%83%A3%E3%83%B3%E3%83%9A%E3%83%BC%E3%83%B3%E2%99%AA%E3%83%AB%E3%82%A4%E3%83%B4%E3%82%A3%E3%83%88%E3%83%B3

        All of which don’t show the origin of the error:
        https://www.screencast.com/t/TOhydUah6xd7

        Thanks again for your help
        Tobi

        Reply
        • Eoghan Henn
          16. June 2017

          Hello Tobi,

          Thanks for the additional information. This is indeed a mysterious case. I tried to find out where Googlebot might have found these URLs, but I couldn’t find any hints or traces anywhere. I searched for quite a while, but there’s really nothing 🙁

          The domain is quite young (about 2 or 3 years) and there has never been another website on the same domain, right? Does the company own any other old domains that it redirected to this domain?

          Setting up 301 redirects for the error URLs is probably a good idea, because this way there’s a chance that Google will stop crawling them eventually and the errors should stop showing up in your reports immediately.

          There are two possible scenarios why these URLs might be crawled right now:

          1. The URLs that are being crawled right now have been in some scheduling bucket for Google’s crawler (for whatever reason) and are being crawled right now. Setting up 301 redirects for all error URLs will help, because the errors won’t show up again and Google will probably stop crawling the URLs once it realises that the 301 redirect is there to stay.
          2. The URLs are not old, but are constantly being generated right now at a source that we haven’t been able to identify yet. In this case, we should really try to find out where the URLs are coming from and find a way to stop them.

          I suggest that for now, you continue setting up 301 redirects for all new errors that show up and keep an eye on the number of new errors. If they go down, that’s fine, and if not, let me know and we can have an even closer look at the problem to try and identify where these URLs are being generated.

          Also, try to check all of the errors for “linked from” information. This could be very valuable for solving the problem. Did you know that you can export all errors at once via the API? Here’s a link you can use to try it out: Google APIs explorer. That way you don’t have to check every link manually.

          I hope this helps for now! I’m sorry I haven’t been able to fully solve the problem yet.

          Let me know how it goes!

          Eoghan

          Reply
          • Tobi
            18. June 2017

            Thank you so much once again!
            I have exported all errors , yet non show the links origin ( I have checked that this field is included in the report).

            Will go on and 301 those new errors, which are now over 2000, and rising at 200+ a day.

            Waiting now for the company’s answer about their domains,
            will update you later on this week

            Thanks again for taking time trying to resolve this, your devotion in beyond any expectations I had.

            Reply
            • Eoghan Henn
              19. June 2017

              Hi Tobi,

              OK, let me know how it goes!

              Eoghan

  4. Larry Spencer
    3. June 2017

    This article cleared up a lot for me and enabled me to clean up my site errors. I had always wondered if having the errors affected my SEO and if cleaning them up would help.

    Reply
    • Eoghan Henn
      9. June 2017

      Thanks, Larry!

      Reply
  5. Dirt E. Harry
    31. May 2017

    Hi Eoghan,

    I have been working on the 404s that show up in the Google crawl errors at least once a week and I notice that several keep showing up. These are all URLs that have to do with products that have been deleted from not only the website, but also the sitemap- I mark them as fixed and some keep coming back. About a month ago, I set up an Excel worksheet so i could keep track. One in particular has shown up 4 times in the last month- I know it is totally gone from the site and the sitemap. It also does not exist on any of the internal pages that Google says it does.

    If the URL is showing up in an external URL, I create a 301 and that usually takes care of it, although I created a 301 on a particular URL one day and it showed up again 2 days later… is there a lag time where Google finally sees the 301?

    I was told that Google doesn’t like a lot of 301 redirects… is there a limit on how many redirects a site should have? Should I go in and remove some of the old redirects or is it okay to let them ride?

    Dirt E. Harry

    Reply
    • Eoghan Henn
      9. June 2017

      Hello again 🙂

      Some 404 errors will keep showing up again and again, even if they are not linked internally or externally any more. I need to update the article to make this clear.

      When you set up a 301 redirect, the URL should not show up as a 404 error again. The only explanation I can think of for the error that showed up two days after you set up a 301 redirect is that the URL was crawled shortly before you set up the redirect but was only included in the report a few days later. Most of the Search Console reports are not updated in real time.

      There is no limit to the number of redirects a site should use, but it is important not to abuse the feature either. When you set up a redirect it should be justified, which means that the content of the old URL should be found on the new URL (or at least a similar version). It is not a good idea, for example, to redirect loads of old URLs to the home page.

      One problem that can arise with lots of redirects is that your htaccess file simply gets too big, which has a negative impact on the load time of your pages.

      If you want to delete old redirects, make sure they are not needed anymore, meaning the redirected URLs are not accessed by users or bots any more.

      I hope this helps!

      Reply
    • Dirt E. Harry
      12. June 2017

      Hi Eoghan,

      I really appreciate your answer… makes total sense.

      In regards to the htaccess file, I have about 475 redirects to date. My site is not yuuuuge nor is it a dinky site either- it weighs about 807MB

      Airguns are as bad as automobiles- the manufacturers are constantly coming up with new models and discontinuing old models. One thing I thought about doing is turning off the Availability, Featured, On-Sale switches and the Add to Cart button on discontinued items and adding the term “Unfortunately this Model is now Unavailable” to the item description.

      If there is a bookmark or an external site link to the item, the end searcher will know exactly what page to go to to find the latest/greatest heart throb (if they know anything about airguns) and the bot won’t know the difference.

      The result would be tons of time saved deleting the item(s) from the site, sitemap and creating re-directs and no more 404s- not to mention simply turning switches back on if the item is resurrected and becomes available again… what do you think?

      Dirt E. Harry
      President and CEO
      http://topairgun.com

      Reply
      • Eoghan Henn
        14. June 2017

        Yes, that sounds like a good plan! Maybe think about adding some useful links to similar products to the pages of products that are currently unavailable. Here’s an example of a product page for a discontinued product that has a link to the newer versions of the product: https://www.ecom-ex.com/products/archive/communication/ex-handy-07/

        Reply
        • Dirt E. Harry
          14. June 2017

          Thanks Eoghan!

          You have just saved me another Yuuuuge chunk of time! Love it when a plan comes together,

          Dirt E. Harry
          President and CEO
          http://topairgun.com

          Reply
  6. Indrani Sen
    26. May 2017

    I am getting 4 internal server errors 500 in my search console.It is with some attachment.When I click on the link the page opens properly but when I do fetch with google .It says unreachable.Please help me with this problem

    Reply
    • Eoghan Henn
      29. May 2017

      Hello Indrani,

      Thanks a lot for your comment. I would need some more info to help you with this one. If you like you can send me an e-mail and I can have a look.

      Best regards,

      Eoghan

      Reply
      • mehrnoosh
        7. June 2017

        hi
        i have a problem in crawl error in google webmaster.i removed error but 2 or 3 days after that ,i see a lot of errors in this list but i dont move any page.

        please help me

        Reply
        • Eoghan Henn
          9. June 2017

          Hi Mehrnoosh,

          I will need some more info to help you with this. I will send you an e-mail.

          Reply
  7. Shailesh Chaudhary
    21. May 2017

    Hi Henn,
    Awesome Dude… This Article helps to remove my all Crawl error from Web master Console.
    Thanks

    Reply
    • Eoghan Henn
      24. May 2017

      Hi Shailesh,

      I’m happy to hear it was helpful to you!

      Eoghan

      Reply
  8. Richard
    20. May 2017

    Hi Eoghan,

    After reading through your informative post I wonder if you can advise me on something. I’ll put as much detail as I can.

    I recently had a major overhaul of my site, built with WordPress, and I’m getting a lot of odd 404 crawl errors. I did the following when relaunching the site:

    1) Changed permalink structure, setting up 301 redirects for each page, from old link to new.

    2) Once site was up and running, I then had SSL added.

    3) I then replaced all the 301 redirects again, to point from old links to new https versions of the pages.

    This was all done for the purpose of old backlinks dotted around the web and I also went through the site and manually updated all internal links to point to the correct https URL’s.

    I then submitted an XML sitemap to GSC. There wasn’t one previously as I had not done this with my old site.

    However now, as Google starts to crawl the site, I am getting loads of 404 errors for what look like the old links, but they have https at the start.

    1) Essentially I had old URL’s like this:
    http://www.example.com/blog/2015/05/10/postname

    2) Now I have new URL’s that looks like this:
    https://www.example.com/postname

    3) GSC is giving me 404 errors for a hybrid of the old and new URL’s, showing old linking structure but with SSL, like this:
    https://www.example.com/blog/2015/05/10/postname

    I should point out, if I manually type in one of my old URL’s it does direct to the new page and, if I do a Google search and click on an old link, it also redirects to the new page. So I know my 301 redirects are all good.

    So these third hybrid URL’s that Google can’t find in GSC, technically never existed as I never had SSL with my old site permalink structure. I also double checked my sitemap that I submitted and that only contains the correct, new URL’s, it doesn’t have any incorrect ones in there.

    My questions is, should I just ignore these errors for the hybrid links as they have never been real pages. If I shouldn’t ignore them, how do I remove them? Setting up 301 redirects seems counter intuitive as I’m redirecting from a page that never existed so nobody would ever find a link for these hybrid URL’s ‘out in the wild’.

    I hope that all makes sense :/

    Thanks in advance for any help!

    Reply
    • Eoghan Henn
      24. May 2017

      Hello Richard,

      Thanks a lot for sharing this interesting case. Everything you explain makes total sense, so it looks like you did a very good job with setting up the redirects.

      Still, it is slightly worrying that Googlebot is accessing your old URLs with https although there shouldn’t be a redirect pointing to them. Have you checked the “linked from” tab in the crawl error reports? You can find it by clicking on each URL in the report. The reason why I’m worried about the errors is that if Google is crawling “wrong” new URLs, they might not be crawling the right ones at all, which would be a problem. This is why it would definitely be a good idea to find out how Google found the wrong URLs.

      If you like, you can send me an e-mail with your domain name and some examples of URL errors so I can have a closer look.

      Best regards,

      Eoghan

      Reply
  9. Faniso
    17. May 2017

    Geez thanks hey. I was really starting to loose my marbles with the returning 404 problems. Best thing really is to redirect. Google doesn’t seem to know what’s still on and what’s not, so rather than keep trying to contact webmasters, etc. – REDIRECT.

    Thanks again,

    Faniso

    Reply
    • Eoghan Henn
      18. May 2017

      I’m happy to hear it helped. And thanks for the update!

      Reply
  10. Paul Clukey
    12. May 2017

    Hi Eoghan,
    Thanks for putting this post together. Have you ever seen a bunch of 404 errors that end in an /@ sign after the permalink? I am wondering if this is an error that originated on a social media platform like twitter or instagram.

    Here is how it looks in the crawl errors: http://example com/get-404-results/@example
    In this example, the permalink minus the @example works perfectly to a live page. I have like 400 of these.

    I did recently move the site to https from http but I don’t think that has anything to do with it.

    I’d love to hear your thoughts.
    Thanks,
    Paul

    Reply
    • Eoghan Henn
      16. May 2017

      Hi Paul,

      Thanks a lot for this interesting question. I have never seen a case like this, but I could imagine it happening if, for example, you tried to link to a Twitter profile on all of your pages and entered href=”@example” instead of href=”https://www.twitter.com/example” into the link tag. Crawlers or browsers might interpret this as a relative link and add the URL of the current page to the URL path. This would then cause lots of crawl errors just like you described.

      Is the “@example” part the same in all URLs? Have you checked the “linked from” information in the crawl error reports to see where the false URLs are linked from? If they are linked internally, have you checked your source code for “@example”?

      I hope this helps. If you like, you can send me some more info on this and I will have a closer look.

      Reply
  11. Shahab
    9. May 2017

    Hi Eoghan 🙂

    Can you please help me figure this out?
    Here we go! I had an HTML website years ago. I had added the site to the Google Webmaster (Search Console now). Then several months ago I removed all pages from my website and now for installed a WordPress site. The new XML that I have added to Search Console is generated by JetPack (By WordPress). And yet I still 404 URL errors of the old HTM or HTML files! How’s that even possible that the new generated XML which only recognize WordPress URL parameters detects old deleted HTM links which were never there for almost 4 months?

    Is it because of the fact that Google itself still stores all those links or because of my XML generated file? How can we delete all missing URL from Google for good? And if it is 301 can you please tell me how can we redirect *.htm and *.html to new pages of the website?

    Sorry for the long comment!
    Thanks,

    Reply
    • Eoghan Henn
      10. May 2017

      Hello Shahab,

      your old pages are probably still being crawled because Google still has them on the list of URLs they have discovered on your website and that they have scheduled for crawling. They won’t crawl them as often as the pages that are working, but they will check every now and again to see if they still return an error or if they are back up.

      In order to get rid of the crawl errors in your GSC reports, you should indeed redirect all of the old URLs. I guess the easiest way to set this up would be using a WordPress plugin like this one (I’ve never tested it, but it looks decent). Alternatively, if you’re familiar with editing your .htaccess file, you can choose this option. The Yoast SEO plugin has a handy .htaccess file editor, but you’d have to write your rewrite rules yourself (or find another tool that generates them for you).

      If you have versions of URLs ending in htm and html, you can add each one of them seperately and define a redirect target that contains content that is similar to the content on the old page.

      Please let me know if you have any further questions.

      Reply
      • Shahab
        10. May 2017

        Hi again,

        Thanks a lot for your detailed reply. Really appreciated it 🙂

        “They won’t crawl them as often as the pages that are working, but they will check every now and again to see if they still return an error or if they are back up.”

        How long would it take for Google to let go of the broken links? And how come they aren’t deleted yet? Is there a timeframe? Can’t we tell Google that these pages are removed or personally delete them from GSC?

        And if I want to do a 301 redirect could I do that for all htm files? Cause I don’t have an exact list of links. Could I use *.htm?

        And finally what would happen if I don’t do anything? Bad for SEO?

        Thanks a lot,

        Reply
        • Eoghan Henn
          11. May 2017

          Hello Shahab,

          Thanks a lot for your interesting follow-up questions.

          I don’t know when and if Google stops crawling URLs that return an 404 error. If they are linked from somewhere, they will definitely get crawled again and again. If not, you’d expect them to stop at some stage, but I’ve seen old URLs being crawled for many years after the pages stopped existing.

          Yes, you can “tell” Google that you removed the URLs intentionally: A 410 status code instead of a 404 status code would do the job. With a 410 status code, it is more likely that Google stops crawling the URLs, although it’s not guaranteed. And one thing that can be confusing about this is that the pages will still show up as 404 errors in Google Search Console.

          You cannot “delete” URLs from the crawling schedule yourself in GSC. There is a function to request removal of a URL from the index, but this is only temporary (for 90 days) and it’s not the same as preventing it from being crawled.

          You could probably block URLs via your robots.txt file, if you really don’t want them crawled, but I don’t think that’s a useful measure here.

          And yes, you can create a redirect rule in your htaccess file that redirects all URLs that end in “.htm” to a certain target, but this isn’t really advisable either. What I would recommend here is to export all 404 errors from GSC, mark them as fixed, and set up a redirect for each error URL to an equivalent target. I guess in most cases you’ll still be able to figure out what content was on the old page and find a fitting redirect target.

          If you don’t do anything, this will probably not do you much harm, but you might lose some potential. If, for example, some of your old pages have backlinks from other websites, you should definitely redirect them to targets that have similar content.

          I hope this helps! Let me know if you have any other questions.

          Reply
          • Shahab
            12. May 2017

            Thanks a lot for your great answer 🙂

            Just some final questions:
            What is the correct redirect for *.htm? Just put it there or something else?

            Maybe I have to use it, because I saw that old htm urls are about 146. And it is probably a little time consuming. I would like to do them individually of course.

            Thanks a lot again 🙂

            Reply
  12. Dirt E. Harry
    6. May 2017

    Hi Eoghan,

    I added a new product to my site yesterday 5/5/17 and added it to the sitemap and published. Today 5/6/17 I embedded a video from U-Tube in the same product. Do i need to go in and update the product date in the sitemap and republish because of the added html code? I know I have to update if I change more than a couple of words in the text or content of a given product.

    Dirt E. Harry

    Reply
    • Eoghan Henn
      8. May 2017

      Well, if you really want everything to be 100% correct in your XML sitemap, the “lastmod” value should probably accurately reflect the last time the page was changed (no matter how small the change was). But if you don’t keep the “lastmode” value up to date, it’s not a problem, because Google and other search engines don’t really pay attention to it (which also means that keeping it up to date won’t really help you in any way).

      Here’s some info on how Google deals with it: https://www.seroundtable.com/google-lastmod-xml-sitemap-20579.html

      XML sitemaps help Google discover your URLs, but whether or not and how often a page is crawled and re-crawled after it has been discovered depends on a number of factors and the “lastmod” value probably is one of the less important ones.

      I hope this helps! Please let me know if you have any further questions.

      Reply
  13. Charl
    2. May 2017

    Hi Eoghan

    I have come across your blog and it is very helpful.

    Our website has recently been hacked and our IT guys are still trying to clear up it up for the last 2 months.

    I would really like to get a second opinion on what we need to do to clear the crawl errors.

    Would you be able to help?

    Thanks!

    Reply
    • Eoghan Henn
      5. May 2017

      Hi Charl,

      Sure, just share your questions here or send me an e-mail. I’ll be happy to give you my opinion.

      Eoghan

      Reply
  14. Dirt E. Harry
    1. May 2017

    Hi Eoghan,

    I must say that your Rebi,ytics.com comments and posts have been extremely interesting and informative- I have learned a great deal! Needless to say your site is now bookmarked.

    The person that has been my SEO consultant for the last 7 or 8 years is discontinuing services at the end of this year (2017).

    Therefore, i am looking for a person that has the ability to:

    Write compelling content using key words and phrases that make sense to optimize my category pages.

    Monitor the sitemap crawl errors and fix them in a timely manner.

    Stay abreast with Google and make the necessary changes in the back office of the site to keep it up to date for optimum performance.

    Pro-actively suggest any additional site branding, design appearance and navigation ideas to increase customer purchasing and loyalty.

    Increase customer traffic and hit to sales ratios.

    If you or any of your commenters know of such a person please let me know,

    Dirt E. Harry

    Reply
    • Eoghan Henn
      2. May 2017

      Hi Dirt E. Harry,

      Thanks a lot for your comment. I’m glad my articles have been helpful. Right now, I don’t know of anyone who is available for SEO projects, but I’ll just leave this comment here for everyone to see. Good luck with your search! And just let me know if you have any further questions.

      Reply
    • Ben Lacey
      3. May 2017

      Hello Dirt E. Harry,
      I run a website agency where our main focus is SEO, website design and development. We are SEO certified by Google, meaning that we have passed all the exams for Google Analytics and Google AdWords with a score of over 80%.

      We are sorry to hear your SEO consultant has chosen to discontinue his services but we would be more than happy to help you out.

      Feel free to visit our website and get in contact to discuss things further – https://laceytechsolutions.co.uk

      Ben Lacey
      Managing Director

      Reply
      • Eoghan Henn
        5. May 2017

        I will just leave this here, but I want to make it very clear that there is no such thing as “SEO certified by Google”. Passing the exams for Google Analytics and Google AdWords is a nice achievement, but it does not prove any SEO skills whatsoever. Also, Google does not certify SEO practitioners in any way. Please be very careful with claims like this.

        Reply
  15. Sup
    24. April 2017

    Hi,

    My bIog site is in WordPress. I am getting 404 for the links. These all come when I publish my posts in facebook/ twitter and crawler shows facebook/twitter.com appended at the end.

    My site map looks clean.

    Can you please suggest how to fix it?

    Thanks,
    Sup

    Reply
    • Eoghan Henn
      26. April 2017

      Hi Sup,

      Could you send me an example of a URL with a 400 error (by mail if you like)? Do you use some kind of plugin to share your blog posts on Facebook and Twitter?

      I’ll be happy to help you, but I need some more information.

      Reply
  16. Raja
    22. April 2017

    Hi Eoghan Henn,

    Very helpful information, keep it up..

    I am getting errors on my search console which are not exist on my website and server. Actually those url pattern were exist more then a year ago. Lot of times I told to my developer to check on my server but he said those are not there now. I marked as fixed several times but again and again those errors are showing up. As website platform changed to php still google search console is showing .aspx errors too.

    Please give me some suggestion to solve these errors.

    Thanks in advance..

    Reply
    • Eoghan Henn
      24. April 2017

      Hi Raja,

      Thanks a lot for your comment. Have you tried to change the http status code that these old URLs give back? The best option would probably be to 301 redirect them to similar targets. If that’s not possible, you could change the 404 status code to a 410. This can help with getting the old URLs removed from the index quicker.

      Marking errors as fixed does not have any influence on Google’s crawling. It is just a useful tool for structuring your error reports better.

      Please let me know if you have any further questions.

      Reply
  17. Vadim
    19. April 2017

    Hi,

    Thank you for your article. We are a job listing website. Vacancies on our website expires after some time, so robot finds a lot of “not found” pages due to natural expiration. In our case, how should we handle it?

    Reply
    • Eoghan Henn
      20. April 2017

      Hello Vadim,

      Thanks a lot for your comment. When you deliberately delete a page from your website, it is best to serve a 410 status code instead of a 404. The difference is that 410 actually means “content deleted” while 404 only means “content not found”. Important side note: Pages that serve a 410 status code currently still appear as 404 errors in Google Search Console. Don’t let this irritate you.

      In terms of user experience, think about what you want to display on the error page for expired job vacancies. I guess it would be more interesting to display a message about the vacancy having expired (including a call to action for further navigation on your website) rather than a standard “page not found” message.

      Here’s a good article about your options when you remove pages from your website (the very last part about collateral damage is WordPress-specific, so you can ignore it): https://yoast.com/deleting-pages-from-your-site/

      I hope this helps! Let me know if you have any further questions.

      Reply
      • Vadim
        20. April 2017

        Hi Eoghan,

        very helpful and complete information. Thank you so much!

        Reply
  18. Alessandro
    4. April 2017

    I am happy to write the 100th comment, for this very helpful page!
    I’ve tried to scroll all 99 previuos comments and perhaps I’ve missed it, but what about URLs with not found errors that have NO linked from tab ? What does it mean? Where do they come from? A sitemap xml?

    Reply
    • Eoghan Henn
      10. April 2017

      Hello Alessandro! Thanks for writing the 100th comment 🙂 When a URL is included in an XML sitemap, this information also shows up in a tab next to the “linked from” tab. When there is no “linked from” tab, this simply means that there is no information available on the source of the link. This often happens with very old URLs that are re-crawled again and again. Sometimes, Google also makes up URLs itself, like (for example) m.domain.com or domain.com/mobile (just to check if there is a mobile version). These wouldn’t have a “linked from” tab either.

      Reply
  19. Annie Jones
    2. April 2017

    Hi Eoghan Henn,

    A page in the sitemaps already indexed. But now I found this
    HTTP Error: 403 “When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot due to an HTTP status error. All accessible URLs will still be submitted.” for this page.

    Could you please advise? Thank you.

    Reply
    • Eoghan Henn
      10. April 2017

      Hello Annie,

      in case of an error like this, I would recommend to crawl all URLs in your sitemap using a tool like Screaming Frog. If some of the URLs return errors, you should check if they should be removed from the sitemap (because they aren’t supposed to be in there) or if the pages need fixing (because they should be working but aren’t). If there aren’t any errors, you can resubmit the sitemap in Google Search Console and check the error report. Sometimes, this type of error is just temporary.

      Please let me know if you have any further questions.

      Reply
      • Annie Jones
        13. April 2017

        Thanks Rebel

        Actually, I tried to Fetch as Google but It turns to “error” instead of “complete” as usual. I don’t know to do next. I have Tried to install Scream Frog on my PC but it won’t run. Today this page disappear when I search the keyword rank from this page (previously on page 2, now I cannot see it on top 200 )

        Could you please help

        Thank you.

        Reply
        • Eoghan Henn
          18. April 2017

          Hello Annie,

          Can you tell me which page we are talking about? I can have a look to see if I can identify the problem.

          Best regards,

          Eoghan

          Reply
  20. Harry Venturi
    28. March 2017

    Hi Eoghan,

    I recently redesigned some our our site using an administrator login (we only have relatively limited access).

    Part of my redesign included removing some pages and replacing them, as well as changing a number of URLs.

    If I have not entered a 301 redirect before deleting, is it still possible to do so? Is it possible to do this through HTML source or similar, as we do not have access to hidden files from my knowledge.

    Kind regards

    Harry

    Reply
    • Eoghan Henn
      10. April 2017

      Hi Harry,

      Sorry about my late reply! Yes, you can (and should) still implement 301 redirects, although it would have been best to implement them immediately after you changed the URLs.

      There are several ways to implement 301 redirects and your choice depends on your technical circumstances. Here’s an overview that I find quite useful: http://www.webconfs.com/154/301-redirects-how-to-redirect-your-website/

      Let me know if you have any further questions!

      Reply
  21. Anne
    27. March 2017

    Hi Eoghan,
    My site gives error 500 (internal server error) when clicking from a google search result. If i search for autismherbs on google and click on any of the links to my website, i get error 500 (internal server error) but if i click the address bar and hit enter, or click a link to autismherbs.com on any other site, it works just fine.

    I have check error log and it show:
    [core:debug] [pid 992574] core.c(3755): AH00122: redirected from r->uri = /index.php, referer: https://www.google.com/

    I have try deleting .htaccess file but the problem still occur.
    Any solution for this problem? What codes should i check?

    Reply
    • Eoghan Henn
      10. April 2017

      Hello Anne,

      First of all, sorry about my late reply. I’ve been having trouble keeping up with the comments here.

      I checked your website and this is indeed very strange behaviour. It seems like your pages return 500 error codes for the Googlebot and for visitors that click on a Google search result. The exact same URLs work fine for other user agents and referrers.

      I’ve spent some time trying to figure out why your website behaves this way, but I don’t have a good idea yet. Can you check with your hosting / developers if there is any setting that determines different response codes for different user agents or referrers?

      Please let me know if you have any further questions.

      Reply
  22. Nikkhil Shirodkar
    21. March 2017

    Hi Eoghan. Wonderful article on crawl errors. I’m getting a whole lot of “no sentences found” news errors but when I test the article in news tool I get a success. How does one fix this? Also when I do a fetch and render it only renders the bot view. The visitor view is blank.

    Reply
    • Eoghan Henn
      22. March 2017

      Hello Nikkhil,

      Thanks a lot for your comment. I’m very sorry, but I do not have a lot of experience with Google News. As far as I know, the error “no sentences found” can be triggered by an unusual formatting of articles – too few or too many sentences per paragraph.

      If Google has problems rendering your page, there might be other technical problems. You should definitely check this out. Does the problem occur with all articles or just the ones that also have a “no sentences found” error?

      I’m sorry I can’t give you a better reply. Let me know if you have any additional questions.

      Eoghan

      Reply
      • Nikkhil Shirodkar
        22. March 2017

        Hi Eoghan,

        Thank you for the response. Our site is built on a MEAN stack. We use pre-render IO for the google bot to crawl since the site is in Angular js. There are about 600 articles in the error list with no sentences found. All of them have content! eg http://www.spotboye.com/television/television-news/after-sunil-grover-is-navjot-sidhu-the-next-to-quit-kapil-sharma-s-show/58d11aa18720780166958dc3

        Reply
        • Eoghan Henn
          23. March 2017

          Hello Nikkhil,

          The Google Cache for the example you provided looks fine. I’m not sure if prerendering is still the preferred way of dealing with Angular JS websites though, as Google is now a lot better at rendering JavaScript. Also, I do not know if the Google News bot treats JS differently (although it shouldn’t). The fact that the visitor view in the fetch and render report is not working is something you should probably dig into deeper.

          Sorry again for not having any solid responses for you, but this might help you with your further research. Let me know if you have any other questions!

          Eoghan

          Reply
  23. mido
    20. March 2017

    I’ve got a problem and do not know what to do
    Google appears some of my website pages in search as (https) but i dont have https on my site
    I do not want https just simple http
    plz help me

    Reply
    • Eoghan Henn
      21. March 2017

      Hello Mido,

      Thanks a lot for your comment. My first idea was to suggest that you redirect all https to URLs to their http equivalents, but that would probably still cause problems for most users, if you don’t have a valid SSL certificate: A warning would be displayed before the redirect is processed. I’m not sure how the Google bot would deal with a situation like this (if it will process the redirects or not), but a missing SSL certificate will most likely cause problems in other areas.

      I think your best bet would be to switch to https completely. This is something all webmasters should be doing anyhow. You can get a free SSL certificate from Let’s Encrypt: https://letsencrypt.org/

      Here’s a great resource for making sure you get everything right from an SEO perspective when switching to https: http://www.aleydasolis.com/en/search-engine-optimization/http-https-migration-checklist-google-docs/

      Please let me know if you have any other questions.

      Eoghan

      Reply
  24. Surojit
    26. February 2017

    Hi Eoghan
    Great article! On or around Feb. 19, 2017 our webmaster account saw a spike in 500, 502, 503 errors (‘server error’) and our programmer checked and found an issue with database and got it fixed. Accordingly we checked all the 500/502/503 errors as fixed in webmaster. However, soon thereafter, webmaster again began receiving server errors (mostly 502s, some 500s) and the number of errors keep climbing steadily everyday. We’re not sure now why we’re still getting the server error messages and I’ll be grateful if you can help out in this regard.

    PS – ever since we started getting the server error messages, our traffic got badly hit as well as overall search rank positions.

    Reply
    • Eoghan Henn
      2. March 2017

      Hello Surojit,

      Thanks a lot for your comment. If the errors keep coming back after you marked them as fixed, it looks like the issue with the database was not the only cause for the errors. There are probably more issues you need to fix.

      You can export a list of all errors through the Google Search Console API Explorer including information on where the URLs that cause the errors are linked from. This might help finding the root of the problem.

      Feel free to send me some more information so I can have a closer look.

      Best regards,

      Eoghan

      Reply
  25. Johan Watson
    17. February 2017

    Good Day,

    I need help with all my crawl errors. I will pay you in advance if you could help me to clear all my crawl errors.

    Kind Regards

    Johan Watson

    Reply
    • Eoghan Henn
      22. February 2017

      Hello Johan,

      Thanks a lot for your comment. I will help you for free if you provide me with more info.

      Best regards,

      Eoghan

      Reply
  26. Faniso
    6. February 2017

    Hi there! Thanks for this post.

    I’m not sure if this question has been asked already.

    I recently went into webmaster tools to check for crawl errors. Under the Smartphone tab, I noticed that most of them were for pages with either a m/pagename.html or mobile/pagename.html.

    We have built these pages without any sub-directories. So you will not find
    http://www.victoriafalls-guide.net/mobile/art-from-zimbabwe.html or
    http://www.victoriafalls-guide.net/m/art-from-zimbabwe.html

    Only such pages as http://www.victoriafalls-guide.net/art-from-zimbabwe.html

    What am I missing here?

    Reply
    • Eoghan Henn
      9. February 2017

      Hello Faniso,

      I have seen a similar problem in several other Google Search Console properties. Sometimes it is very difficult to understand where the Google bot initially found the URLs.

      Have you checked the “linked from” information in the detail view of each error URL? This might help you find the source of the link, but often there is no information available.

      There is also an unconfirmed but pretty credible theory that the Googlebot just checks the m/ and mobile/ directories to see if there is a mobile version of a page when it’s not mobile-friendly: https://productforums.google.com/forum/#!topic/webmasters/56CNFxZBFwE

      I recommend you mark the errors as fixed and set up 301 redirects from the non-existent URLs to the correct versions, although the redirects are probably not even necessary.

      I hope this helps!

      Reply
      • Keith
        16. March 2017

        Hi Eoghan

        I’m having a unresolved issue with the ‘link from’ source being pages that haven’t existed for up to 10 years.

        All from recent crawls, both link and ‘linked from’ are asp urls that haven’t existed for a decade. In that time, the site (same root url) underwent three hosting company moves and several complete site rebuilds (no css, scripts, etc. were carried over).

        I can see external sites keeping these old urls in their archives, etc. but how does google come up with phantom internally ‘linked from’ urls that just haven’t existed for this amount of time? Have you any thoughts on this perplexing problem Thanks!

        Reply
        • Eoghan Henn
          20. March 2017

          Hi Keith,

          I’ve encountered the exact same problem with several different websites.

          Here’s my explanation: I’m pretty sure that the “linked from” info is normally not up-to-date. The info that is displayed here is often from previous crawls and it is not updated every time the linked pages are crawled. That would explain why, even years later, pages still show up as linked from pages that no longer exist.

          Also, I have noticed that these errors often don’t come back after you have marked them as fixed for a couple of times and made sure that the pages really aren’t linked to from anywhere any longer. These errors normally won’t harm your SEO performance in any way and thus aren’t a reason to be worried.

          I hope this helps! Please let me know if you have any other questions.

          Eoghan

          Reply
          • Keith
            20. March 2017

            Thanks very much for that, Eoghan. Very reassuring that, at least, I’m not losing my mind. I will persist with the mark ’em fixed tactic.
            Cheers! Keith

            Reply
          • Alessandro
            4. April 2017

            Eoghan,

            I have the same problem and it’s refreshing to hear what you say.
            However, I am concerned about marking as fixed unfixed pages, cause I’ve found this note in Google’s official knowledge base:

            “Note that clicking This issue is fixed in the Crawl Errors report only temporarily hides the 404 error; the error will reappear the next time Google tries to crawl that URL. (Once Google has successfully crawled a URL, it can try to crawl that URL forever. Issuing a 300-level redirect will delay the recrawl attempt, possibly for a very long time.)”

            Thanks for helping us!

            Reply
            • Eoghan Henn
              10. April 2017

              Hello Alessandro,

              I would normally only recommend marking an unfixed error as fixed to check if it shows up again. In any case, I would recommend redirecting the error pages (which would mean fixing it). Often, these errors are nothing to worry about anyhow and the behaviour of the error reports in Google Search Console does not always make sense.

              I Hope this helps 🙂

  27. Andrea
    25. January 2017

    Hi. Few days ago my website (a blog) started to receive so many “calls” from googlebots and when I asked to Google why this is happening they answered that this is normal and that I should down th crawl frecuency at my webmasters tool. The big question for me is: how down is down? Do you have any suggestion? Thanks!

    Reply
    • Eoghan Henn
      31. January 2017

      Hi Andrea,

      Are the requests from Google causing you any problems with your server? If not, I would not recommend you change anything.

      If your server is indeed having trouble with the number of requests from the Google bot, I would first consider the following options:

      – Check if your server is performant enough. Normally, there shouldn’t be a problem with normal crawling by Google and other bots.
      – Check if the requests are actually coming from Google, or from another bot that pretends to be Google. You can compare the numbers from your log files (or wherever else you found that you were receiving lots of hits from the Google bot) with the Crawl stats report in Google Search Console (click on Crawl > Crawl stats in the left navigation).

      All in all, I would really not recommend to limit the crawling frequency for the Google bot.

      I hope this helps! Let me know if you have any other questions.

      Reply
  28. Tomáš Karafa
    19. January 2017

    Hi there,
    Few days ago , most of my website has disappeared from google search results . At the same time , google analytics has registered sharp decline in organich ( search engine ) visitors . Total daily visits dropped from 300 to 100 within about 3 days . Upon checking with webmaster tools , i get hundreds of “404-not found” errors . However what really bothers me is , that those URLs DO EXIST and they DO work ! I suspect that somehow the dynamic URL parameters are to blame . But so far , it has worked just fine … the website is written in several languages and ( being eshop ) is denominated in several currencies . Those languages and currencies are selected by $_GET parameters . To prevent people from trying to browse the pages without selected language or currency , the website automatically fills in those paramenters in case they are not present . Example :

    http://www.eurocatsuits.com/index.php …..redirects to : http://eurocatsuits.com/index.php/?la=en&currency=EUR

    in “fetch as google” , the index.php gets “redirected” status …. of course , it redirects to index.php/?la=en&currency=EUR …… but the “index.php/?la=en&currency=EUR” gets “not found” status …. however , in the browser the page works just fine ….

    any ideas ? … please help … thanks !

    Tomas

    Reply
    • Tomáš Karafa
      19. January 2017

      after sleepless night i found out , that .htaccess was to blame …i will make new one later , but for now i deleted it altogether and everything works just fine ….

      Reply
      • Eoghan Henn
        23. January 2017

        Hello Tomáš,

        I’m glad you managed to fix this.

        One general tip: You might want to get rid of those language and currency parameters in your URLs. They are not very search engine (or user) friendly.

        Please let me know if you have any additional questions.

        Best regards,

        Eoghan

        Reply
  29. Jacob Share
    17. January 2017

    I just received an email alert from GSC about a large increase in soft 404 errors. Turns out spammers from .cn domains are linking to searches on my WordPress blog for queries in another language (I assume Chinese), and the numbers have gone up almost linearly every day since Jan. 5th when it started. I suppose I could block search links from .cn domains but do you have a better idea?

    Reply
    • Eoghan Henn
      23. January 2017

      Hello Jacob,

      First of all, sorry about my late reply. I haven’t been able to keep up with all of the comments these last few days.

      Thanks a lot for sharing this interesting case, even if it sounds very annoying for you. Have you already set all of your search result pages to “noindex”? This is something every website should do anyhow, in order to avoid uncontrolled indexing of search result pages. You can use the Yoast SEO plugin to set this up.

      It might not stop the pages from showing up as soft 404 errors, but at least it will let Google know that you do not want these pages indexed. It should be enough to make sure that these pages don’t harm you.

      Another thing you might want to do is check the domains that are linking to you and see if they might be potentially harmful. It might be a good idea to use the disavow tool to exclude these links. Please note though that I am not an expert on link removal and cleanup and that you should do more research before deciding about this issue.

      Please let me know if you have any further questions.

      Best regards,

      Eoghan

      Reply
      • Jacob Share
        23. January 2017

        No worries, life is busy, just happy you replied at all 🙂

        Yes, my search pages are noindexed, via the AIOSEO WordPress plugin.

        I tried clicking through to one site, it’s a blog with a mass of links, each mostly pointing to other similarly-formatted garbage sites. The links to my site is gone and as far as I can tell, the site is being regenerated on the fly (or regularly) while replacing old links with new ones, spamming as many sites as possible.

        Reply
        • Eoghan Henn
          24. January 2017

          Looks like they might just be after visits from curious webmasters like you so they can generate some ad revenue off them. Similar to the ones that spam Google Analytics accounts with referral spam.

          Do any of these links show up in the “Search Traffic > Links to Your Site” report in Google Search Console?

          The links probably won’t harm you if they disappear again that quickly, but I guess you should keep an eye on it. As for the crawl errors… If you mark them as fixed they probably won’t show up again if the links disappear.

          I hope this helps and I hope that those spammers won’t bother you much longer.

          Reply
          • Kevin
            26. January 2017

            Hi Eoghan,

            I’m actually having the same issue that started right around the same date as Jacob.

            I received over 200 “soft 404 errors” from search URLs that are “linked from” a really strange search results page on my site that doesn’t exist.

            There are also a lot of very strange links from a few .cn websites.

            Hopefully this makes sense, I’m not familiar in dealing with crawl errors. Any help or guidance would be greatly appreciated.

            Thanks!

            Reply
            • Eoghan Henn
              31. January 2017

              Hi Kevin,

              First, I would recommend you mark the crawl errors as fixed in Google Search Console. You find this option in the crawl error report right above the detailed list of crawl errors.

              If the errors don’t show up again after that, you don’t have to worry about them any longer.

              If they do show up again, you’ll have to dig a bit deeper. Feel free to get back to me if you need any support with this.

              Best regards,

              Eoghan

  30. Ali
    1. January 2017

    hello sir this my website kindly help me my search console analytical is not working what is problem can you help i cant see any error for this http://www.subkuchsell.com website

    Reply
    • Eoghan Henn
      4. January 2017

      Hello Ali,

      I am not sure if I can help you with this question. If you do not see any data in Google Search Console, it might be because you only verified your property recently. It takes a few days until data is shown.

      If you do not see any errors, it might also be related to the fact that there simply aren’t any errors.

      Make sure you verify the right website version. The URL you enter for your Search Console property should be http://www.subkuchsell.com/.

      Let me know if there is anything else I can do for you.

      Eoghan

      Reply
  31. leanin
    29. December 2016

    Hey Eoghan,

    thanks for sharing. For an e-commerce website, my friend suggest a way to deal with 400 pages.
    1.download the search crawl error-404,
    2.past the 404 url in to txt file,
    3. put the 404.txt in the ftp,
    4.submit 404.txt to Add/Test Sitemap
    google webmaster–crawl–sitemap–Add/Test Sitemap button
    http://www.xxxxx.com/404.txt

    since we are going to dele around 4k url recently,how to deal with it very important

    Reply
    • leanin
      29. December 2016

      Fix 404 errors by redirecting false URLs or changing your internal links and sitemap entries.

      for this, steps as followings, right?

      1. 301 redirect all 404 error url to the homepage,
      2. update the sitemap
      3. sumit the sitemap,

      which one is correct?

      Reply
      • Eoghan Henn
        4. January 2017

        Yes, this is how I would suggest to do it. Just think about whether there are better targets for your 301 redirects than the home page. I would not recommend to just redirect every old URL to the home page without thinking about it. For most URLs, there will probably be better targets than the home page.

        Reply
    • Eoghan Henn
      4. January 2017

      Hi leanin,

      I am not sure why your friend recommends these steps, but this is not a solution I have ever heard of.

      Reply
  32. mirotic
    26. November 2016

    hi sir
    (i have bad english)

    can u help me fix this issue?

    my site has been block cause the yandex bot (i don really understand how this work)
    http://imgur.com/a/W1JKK

    i register my site at yandex , i couldnt find the crawl setting
    http://imgur.com/a/297Mu

    what should i do ?

    Reply
  33. mikc
    16. November 2016

    Hello Sir!

    I just built a website and google won’t crawl, won’t allow me to upload a site map either. Getting only 2 links showing when I enter site:acousticimagery.net, and one of these shows a 500 error. Also, when trying to crawl, Google doesn’t like my robots.txt file. I’ve tried several edits, removing it altogether, nothing helps. My Site host is worthless, been trying to get this fixed for 2 weeks. Any input you might have would be most appreciated!!

    Reply
    • Eoghan Henn
      16. November 2016

      Hello Mick! Thanks a lot for your comment.

      One important problem I have been able to identify is that your server always returns a 500 error instead of a 404 error when a page does not exist. Can you think of a way to fix this?

      If you want to get your pages indexed quickly, I recommend you go to “Crawl > Fetch as Google” in Google Search Console. Here you can fetch each page that is not in the index yet and then, after it has been fetched, click on “Submit to index”. This will speed up the indexing process.

      I could not find a robots.txt file or an XML sitemap on your server. The robots.txt should be located at http://acousticimagery.net/robots.txt. Right now, this URL returns a 500 error code, so I assume the file does not exist or is not in this location. Your can decide how you want to name your XM sitemap, but I would recommend putting it here: http://acousticimagery.net/sitemap.xml.

      Mind you, you don’t really need a robots.txt file and an XML sitemap for a website with only 4 pages (but they won’t do any harm either). Just make sure you change that thing with the wrong error codes.

      Please let me know if you have any other questions.

      Best regards,

      Eoghan

      Reply
      • Mick
        16. November 2016

        Hello Eoghan,
        Thanks for the response! Google won’t let me crawl the site as I keep getting an error saying they can’t locate the robots.txt file. I removed the file contents and tried again, still no go. Also, everytime I try to upload an XML file it tells me the file is in an invalid format. I see the 500 errors but cannot fix them. Any other ideas? This all started when I updated the site using a website builder available from Fat Cow. Very sorry I ever tried to update as I’m getting no cooperation from them on this at all. I’m thinking of just pulling the site and cancelling my Fat Cow account. You mentioned submitting each page with fetch. How do you do this?

        Reply
        • Eoghan Henn
          19. November 2016

          Hi Mick,

          OK, thanks for the additional information. I now have a better understanding of what is going on. The Google bot tries to access your robots.txt file at http://acousticimagery.net/robots.txt and gets a 500 server error, so it decides not to crawl the page and come back later. You can fix this by fixing the error code problem I described earlier. If http://acousticimagery.net/robots.txt returns a 404 error, everything is fine and Google crawls your page.

          I do not know how this works with Fat Cow, but maybe this page will help you: http://www.fatcow.com/knowledgebase/beta/article.bml?ArticleID=620

          Here’s how to submit each page to the Google index in Google Search Console:

          1. In the left navigation, got to Crawl > Fetch as Google:

          Crawl, Fetch as Google

          2. Enter the path of the page you want to submit and hit Fetch:

          Enter page path and hit Fetch

          3. When fetching is complete, hit “Request indexing”:

          Hit Request indexing

          4. Complete the dialogue that pops up like this:

          Complete this dialogue

          5. Repeat for every page you want to submit to the index. Here are the paths of the pages you will want to submit:
          cd-transfers
          audio-recording
          contact-us

          I hope this helps! It will take a while until the pages show up in the search results. Let me know if there is anything else I can do for you.

          Eoghan

          Reply
  34. Sean
    15. November 2016

    I get a lot of page no found errors and when I checked the linked from info and click the links they clearly go to the actual page, which is not broken? It’s really annoying as the errors keep coming back.

    i.e.

    This error

    /places/white-horse-inn/

    is linked form here

    http://www.seanthecyclist.co.uk/places/white-horse-inn/

    Any idea what might be causing this?

    Thanks

    Reply
    • Eoghan Henn
      16. November 2016

      Hi Sean,

      I think I might need some more information to be able to help you with this. I will send you an e-mail now.

      Best regards,

      Eoghan

      Reply
  35. Donald
    5. November 2016

    I have been getting the same issue as Michael .

    how i do to fixed this error 500 http://imgur.com/a/qE4i3

    It made me lose every single keyword I was ranking for and the more I try to remove they keep coming up. As soon as I fetch the URL , search results pop back up to #2 positions for many keywords but just after a few hours looks like google crawls them again finding errors and sends the site back to the 10th page. Search rankings were gradually lost as soon as this 500 server error was discovered on webmaster.
    Now I have thought about blocking /wp-includes/ but I think you cant block it anymore due to css and js which might hurt rankings even more.

    Any help would be most appreciated.

    Reply
    • Eoghan Henn
      5. November 2016

      Hi Donald,

      You’re absolutely right, /wp-includes/ does contain some .js files that you might want Google to crawl. Your CSS is normally in /wp-content/ though.

      Also, Yoast does not block /wp-includes/ by default any more (Source: https://yoast.com/wordpress-robots-txt-example/)

      Nevertheless, it is probably a good idea to block all URLs that return a 500 error from the Google bot. So far, I’ve never had problems with blocking the entire /wp-includes/ directory (I still do it on this website), but it might be worth the while going through the directory and only blocking URLs that return a 500 server error.

      I hope this helps!

      Reply
  36. Michael
    29. October 2016

    how i do to fixed this error 500 http://imgur.com/a/qE4i3

    Reply
    • Eoghan Henn
      1. November 2016

      Hello Michael,

      You can block your /wp-includes/ directory from the Google bot by putting it in your robots.txt file. I recommend you install the Yoast SEO plugin for WordPress. As far as I know, it does it automatically.

      I hope this helps.

      Eoghan

      Reply
      • Eoghan Henn
        5. November 2016

        Please see my reply to Donald’s comment (above) for an update on this issue.

        Reply
  37. kevin
    30. September 2016

    Henn,
    We have crawl errors in webmasters.When we remove such pages from webmasters.So within how many days that page can be removed from google Webmasters.

    Reply
    • Eoghan Henn
      6. October 2016

      Hi Kevin,

      For me, there are two scenarios in which I would remove a crawl error from the report:

      1. If I know the error won’t occur again because I’ve either fixed it or I know it was a one-time thing.
      2. If I don’t know why the error occured (i.e. why Google crawled that URL or why the URL returned an error code) and I want to see if it happens again.

      WHEN you do this really doesn’t matter much. I hope this helps! Let me know if you have any other questions.

      Reply
  38. kevin
    29. September 2016

    Hi Eoghan Henn,
    This is kevin can u tell me after removing the page from webmasters.How many days after the page can be removed from the webmasters.

    Reply
    • Eoghan Henn
      30. September 2016

      Hello Kevin,

      Thanks a lot for your comment. I am not sure if I understand your question correctly. I will send you an e-mail so we can discuss this.

      Best regards,

      Eoghan

      Reply
  39. Saud Khan
    21. September 2016

    Please help me to fix this error.

    Screenshot: http://i.imgur.com/ydZo4Wv.jpg

    I’ve deleted the sample page and redirected the second url.

    Reply
    • Eoghan Henn
      27. September 2016

      Hi Saud,

      Unfortunately the screenshot URL is not working (any more). I will get in touch with you via email and see if I can help you.

      Best regards,

      Eoghan

      Reply
  40. Ajay Murmu
    14. September 2016

    I am getting HTTP Error: 302 error in sitemaps section. All other sitemap urls are working fine but i am getting error on main sitemap.xml. How can i resolve it?

    Reply
    • Eoghan Henn
      16. September 2016

      Hello Ajay,

      thanks a lot for your comment. I am not sure I understand your question very well. I will send you an e-mail so you can send me a screenshot if you like.

      Best regards,

      Eoghan

      Reply
      • Ray
        18. December 2016

        Hello Eoghan, I would love to know if you resolved the ‘302’ problem
        I’ve had the issue of going through the wayback machine to a website but then when I click the link I need I am greeted with: ‘Got an HTTP 302 response at crawl time’ and redirected to the current website where my information is no longer.
        Would really appreciate some help if you could email me.
        internetuser52@gmail.com

        Reply
        • Eoghan Henn
          4. January 2017

          Hi Ray,

          I’ll send you an e-mail.

          Eoghan

          Reply
          • Drake
            16. May 2017

            I am having this same issue (HTTP error 302) , do you mind sending me an email as well.

            Thanks,
            Drake

            Reply
            • Eoghan Henn
              16. May 2017

              Hi Drake,

              Yes, I’ll send you an e-mail 🙂

  41. Jennifer M
    23. August 2016

    There is a nasty website that injected a redirect on our site. We found the malware and removed it, but their site is still linking to tons of URLS on our site that don’t exist–and hence creating crawler errors.

    How would you suggest we fix this?

    THANKS!
    Jennifer

    Reply
    • Eoghan Henn
      29. August 2016

      Hi Jennifer,

      This does sound nasty :/

      It is not easy to analyse this situation with the little bit of information I have, but I guess you do not have to worry about the crawl errors too much. Look at it this way: Somebody (a spammer) is sending the Googlebot to URLs on your website that don’t exist and have never existed. Google is clever enough to figure out that this is not your fault.

      If you like, you can send me more information via email so that I can have a closer look at it.

      Reply
  42. Chris
    10. August 2016

    That’s great news. Thanks for sharing Eoghan. Keep me posted!

    -Chris

    Reply
    • Eoghan Henn
      12. August 2016

      Hi Chris,

      For now, I recommend you use Google’s Search Console API explorer. If you follow this link, the fields are already pre-filled for a list of your 404 errors with additional information about the sitemaps the false URLs are included in and the pages they are linked from:

      https://developers.google.com/apis-explorer/#p/webmasters/v3/webmasters.urlcrawlerrorssamples.list

      You just need to fill in your site URL (make sure you use the exact URL of your GSC property in the right format). You can then copy and paste the output and forward it your IT. I want to build a little tool that will make this easier and nicer to export, but that will take a while 🙂

      Hope this helps for now! Let me know if you have any questions.

      Reply
      • Chris Smith
        13. August 2016

        Eoghan,

        That works perfectly. Thanks a ton for the detailed response and customized URL. I hope I can return the favor someday. 🙂

        Thanks again,

        Chris

        Reply
  43. Chris Smith
    3. August 2016

    I like this strategy.

    Is there a way to download the “linked from” information in the 404 report? Would make it much easier to send the complete details to my IT team.

    Reply
  44. Dr Emixam
    21. July 2016

    Hi,

    Following a misconfiguration of another of my websites, google indexed a lot of non existant pages and now all of these pages appear in crawl errors.

    I tried to set them as 410 errors to tell they doesn’t existe anymore but google keeps them in crawl errors list.

    Do you know what is the best thing to do in this case ? And in a general way, for any page which is permanently deleted.

    Reply
    • Eoghan Henn
      2. August 2016

      Hello Dr Emixam,

      Thanks a lot for your comment snd sorry about my late reply.

      From what you described, you did everything right. Just remember to mark the errors as fixed once you’ve made changes to your page. This way they should not show up again.

      Let me know if you have any further questions.

      Reply
      • Eoghan Henn
        20. October 2016

        Just to clarify this: Giving back a 410 code alone will not prevent the URLs from showing up in the 404 error reports – Google currently shows 410 errors as 404 errors. In order to stop the URLs from showing up in the reports, all links to the URLs need to be removed too. Otherwise, Google will keep on following the links, crawling the URLs and showing the errors in the reports. If there are external links to the URLs that cannot be removed, it might be better to use a 301 redirect to point to another URL that is relevant to the link.

        Reply
  45. Steven
    19. July 2016

    Hi Eoghan-

    Thanks for the great info in the article! I have an interesting (to me) issue with some of the crawl errors on our site. The total number of 404 errors is under 200 and some of them I can match to your info above. But, there are quite a few URLS that are not resolving properly due to “Chief%20Strategy%20Officer” having been appended on to each of the URLs. For example, the URL will end with “…personal-information-augmented-reality-systems/Chief%20Strategy%20Officer” and the Linked From URL is the link on our site.

    I’m going to go ahead and mark all as “fixed” and see what happens, but I was wondering if you had any idea how this may have happened?

    Thanks y ¡Saludos! from BCN…
    Steven

    Reply
    • Eoghan Henn
      29. July 2016

      Hi Steven,

      Thanks for your comment! I found your website, crawled it, and found some interesting stuff that might help you. I will send you an email about this.

      Best regards,

      Eoghan

      Reply
  46. Vicky
    17. July 2016

    Hi Eoghan Henn,

    I have over 1000, 404 not found errors on google search console for deleted products. What i will do to fix those errors. Can you please suggest me any way to fix them.

    Thanks
    Vicky

    Reply
    • Eoghan Henn
      28. July 2016

      Hello Vicky,

      When you have to delete a product page, you have a few options:

      • Is there a replacement product or a new version of the product? If so, you can 301 redirect the URL of the deleted product to this new URL. Make sure that the new URL you redirect the old URL to is very similar to the old URL though. Do not overuse 301 redirects!
      • If you want to delete a product and send a signal to Google saying that the page has been removed intentionally, you can give back a 410 status code instead of a 404.
      • If none of the above is possible, you can just mark the 404 errors in Search Console as fixed. Make sure you do not link to the old URLs internally any more. Google should stop crawling them then and the errors should not return. If a URL is linked from another website, you should definitely 301 redirect it to a relevant target (see first option).

      I hope this helps!

      Reply
      • Vicky
        28. July 2016

        Hi Eoghan,

        Thanks for reply,

        I marked them fixed so google stop crawling them. Yes there is some deleted pages linked internally & externally. I will redirect those deleted product to similar product.

        Will inform you soon about update on it.

        Again thanks for reply!

        Reply
        • Eoghan Henn
          2. August 2016

          Hi Vicky,

          Just to clarify: Marking the errors as fixed will not make Google stop crawling them. This can only be achieved by removing all links to the URLs.

          I’m looking forward to hearing about how it went for you!

          Reply
  47. Chris
    26. June 2016

    Hi Eoghan,

    Just wanted to give you a thumbs up! Great post and super useful to me today, right now, when I discovered a bunch of 404s on a client site who just moved from an html site to a wordpress site for some of the pages they used to rank for, ie. somepage.html.

    I had used SEMRush to find as many pages as possible that they were previously ranking for and redirected them to any specific, relevant page and, when not possible, to category or broad topic pages.

    The remaining crawl errors (404s) in Search Console are some pages that didn’t show up in SEMRush and of course things like “http://myclientsite.com/swf/images.swf” Since we are sensibly no longer using Flash I guess I just don’t worry about those? Not really sure.

    Anyway, thanks for the great post!

    Reply
    • Eoghan Henn
      30. June 2016

      Hi Chris,

      Thanks for your kind words! I’m glad this article helped you.

      Yes, you can just ignore the swf file errors. If you mark them as fixed I guess they won’t show up again.

      Reply
  48. Daniel
    23. June 2016

    Hi Eoghan,

    Any thought on bigger classified ad sites handling search console?

    For instance, real estate with multiple ads that expire after a certain date, having around 30k “404” or so. What would you suggest to deal with such amount of expired content?

    Thanks in advance,

    Reply
    • Eoghan Henn
      29. June 2016

      Hi Daniel,

      Thanks a lot for your interesting question. I have no practical experience with a case like this, but let me share some thoughts:

      • One thing you can do to make it clear that the pages were removed intentionally, so there is no “error”, would be to serve a 410 status code instead of a 404 status code (see https://searchenginewatch.com/sew/how-to/2340728/matt-cutts-on-how-google-handles-404-410-status-codes)
      • Also, ask yourself: Do you really need all of these temporary pages crawled and indexed? Do you get any valuable organic traffic from them? Do they rank for queries that you could not cover with pages that are permanently available? Maybe you can build an SEO strategy for your website that takes into account the fact that you have a big number of pages that disappear after a while.
        • I hope this helps!

      Reply
      • Jules
        11. September 2016

        Google treats a 410 as a 404. https://support.google.com/webmasters/answer/35120?hl=en under URL error types > Common URL errors > 404: “If permanently deleting content without intending to replace it with newer, related content, let the old URL return a 404 or 410. Currently Google treats 410s (Gone) the same as 404s (Not found).”

        Reply
        • Eoghan Henn
          13. September 2016

          Hi Jules,

          Thanks for linking to this source. Still, I guess that a 410 error code is the better choice when intentionally removing content. We do not have power over how Google interprets our signals, but we should do everything we can to make them as consistent as possible.

          Reply
  49. Josh
    24. April 2016

    Hey Eoghan – I see your website is made with WordPress, so I was hoping you’d be able to answer my question.

    I recently re-submitted my sitemap (since I thought it might be a good thing to do after disallowing /go/ in my robots.txt for affiliate links) and a few days after recrawling, I now see a new 500 error:

    /wp-content/themes/mytheme/

    Other notes:

    – This was not present before I resubmitted my sitemap, and it’s the only 500 error I’ve seen since I launched my website a month or two ago.
    – I also know that my webhost (Bluehost) tends to go down at times. Maybe this is because Google tried crawling when it was down?
    – I updated my theme a few days before the 500 error appeared.

    Do I need to take any action? Is there any other info I can provide?

    Thanks – appreciate it.

    Reply
    • Eoghan Henn
      25. April 2016

      Hi Josh! Thanks for your comment.

      First of all: This is not something you should worry about, but if you have time, you might as well try to fix it 😉

      Apparently, the type of URL you mentioned above always gives back a 500 server error. Check this out: I’m using a WP theme called “Hardy” and the exact same URL for my page and my theme also returns a 500 server error: https://www.rebelytics.com/wp-content/themes/hardy/. So it’s not Bluehost’s fault. (Fun fact: I will probably receive a 500 error for this now because I just placed that link here).

      Now the question is: Why did the Google bot crawl your theme URL in the first place? Are you linking to it in your new sitemap? If so, you should remove the link. Your sitemap should only contain links to URLs that you want indexed. You can check where the Googlebot found the link to the URL (as mentioned in the article above). Here’s a screenshot of that:

      See internal links for crawl errors in Google Search Console

      If you find a link to that URL anywhere, just remove it. Otherwise, I guess you can just ignore this crawl error. It would be interesting to mark it as fixed and see if it shows up again. Let me know how it goes! And just give me a shout if you have any additional questions.

      Best regards,

      Eoghan

      Reply
      • Josh
        25. April 2016

        Awesome – thanks for the reply. It’s not linked in my sitemap and clicking on the link in GWT doesn’t show where it’s linked from, but I’ll remove it. Glad to hear it’s not really a problem.

        I also had 2 other quick questions:

        In general, do I only need to worry about crawl errors/warnings for relevant webpages (webpages that I want indexed and webpages that should be redirected since they’re being clicked on)? Some warnings are shown for:
        /m
        /mobile
        /coming-soon

        No idea how these appeared, and it shows they’re linked from my homepage, even though I have no idea how that is possible.

        Also, my Amazon affiliate links (cloaked with /go/) got indexed a few weeks ago, and roughly a week ago, I put rel=”nofollow” for each link and also added “Disallow: /go/” under “User-agent: *” in my robots.txt.

        It’s been a week, and my affiliate links are still indexed when I enter “site:mysite.com”. Do you think I’m missing anything, and how can I find out if I’m still being penalized for them?

        Thanks for the help – greatly appreciated.

        Reply
        • Eoghan Henn
          16. May 2016

          Hi Josh! Sorry it took me so long to reply to this one. You’re right, you should worry more about crawl errors for relevant pages that you want indexed. Nevertheless, it is always a good idea to also have a closer look at all the other crawl errors and try to avoid them in future. Sometimes, though, there’s nothing much you can do (like in the JavaScript example in the article).

          What kind of redirects are you using for your Amazon affiliate links? Make sure you use a 301 redirect so they don’t get indexed.

          I hope this helps!

          Reply
      • Jimmy Ahyari
        31. March 2017

        I have same problem as Josh. But, in my error report, there is no tab for “linked from”. This is make me confuse. Why Google try to index wp-content/themes/blabla, even there is no “linked from” anywhere 😀

        I think, i just mark as fixed and see whats happen next.. Thanks Eoghan Henn

        Best regards from Indonesia

        Reply
        • Eoghan Henn
          10. April 2017

          Hello Jimmy,

          There is not always info available in the “linked from” tab. In WordPress, the theme URL shows in the source code in some context on most websites, and Google just follows these “links”. This type of error is really nothing to worry about.

          Best regards,

          Eoghan

          Reply
  50. Dermid
    25. February 2016

    Eoghan,
    Thanks for the very good input. A related question: I’m getting three different numbers for Google indexed pages. 1) when I type site:mysite.com I get 200,000 and 2) when I look in Google Search Console index status it reports 117,000 and 3) when I look at crawled site map it reports only 67 pages indexed. Can you help me understand these varying index numbers? Thank you very much.
    Dermid

    Reply
    • Eoghan Henn
      25. February 2016

      Hello again! You get different numbers here because you are looking at three different things:

      1) site:mysite.com shows you all the pages on your domain that are currently in the index. This includes all subdomains (www, non-www, mobile subdomain) and both protocols (http and https).
      2) shows you all indexed pages within the Search Console property you are looking at. A Search Console property can only include URLs with a combination of one protocol and one subdomain, so if the name of your Search Console is https://www.mysite.com/, only URLs that start with https://www.mysite.com/ (and that are indexed) will show here.
      3) shows you all URLs that are included in this exact sitemap and that are indexed.

      I found https, http, www, non-www, and m. (mobile subdomain) pages of your domain in the Google index. You should make sure all of your pages are only available with https and decide whether you want to use www or not (this is decision a matter of taste). You can easily set this up with two 301 redirect rules: One that redirects every http URL to its https equivalent and on that redirects all non-www URLs to their www equivalents (or vice versa). Last but not least, make sure you are using the right Search Console property (so https://www.mysite.com/ or https://mysite.com/, depending on how you decide on the www or non-www matter) and post a sitemap with all the URLs you want to be indexed.

      Once you’ve followed this, you should work on closing the gap between 1), 2) and 3). If you have a healthy website and you’re in control of what Google is indexing, all three numbers should be on a similar level.

      Reply
  51. Dermid
    23. February 2016

    Eoghan,,
    Thank you. Our development team is using your advice because we have very similar issues with crawl errors. On a related note, I’m trying to understand the relationship between crawl errors and indexed URLs. When our URLs are indexed we do very well with organic search traffic. We have millions of URLs in our submitted sitemap. Within the Google Search Console our indexed URL number jumped from zero to 100K on Janurary 4th but has stayed at about that level since then. Should we expect that when we fix the crawl errors the indexed URLs will rise?
    Thank you,
    Dermid

    Reply
    • Eoghan Henn
      24. February 2016

      Hi Dermid,

      Thanks a lot for your comment and your interesting questions.

      Crawl errors and indexed URLs are not always directly related. 404 errors normally occur when the Googlebot encounters faulty URLs that are not supposed to be crawled at all, for example through broken links. Server errors, on the other hand, can occur with URLs that are supposed to be indexed, so fixing these might result in a higher number of indexed URLs.

      If you have millions of URLs in your sitemap, but only 100k of them are indexed, you should work on closing this gap. First of all you should check if you really want millions of URLs in your sitemap or if maybe lots of those pages aren’t relevant entry pages for users that search for your products or services in Google. It is better to have a lower number of high quality pages than having a higher number of low quality pages up for indexing.

      Next, check why a big part of the URLs you submitted in your sitemaps hasn’t been indexed by Google. Note that submitting a URL in a sitemap alone will normally not lead to indexing. Google needs more signals to decide to index a page. If a large number of pages on your domain is not indexed, this is normally due to poor internal linking of the pages or poor content on the pages. Make sure that all pages you want in the Google index are linked properly internally and that they all have content that satisfies the needs of the users searching for the keywords you want to rank for.

      I hope this helps!

      Eoghan

      Reply
  52. Arun
    14. January 2016

    Hi….Eoghan Henn

    Actually my site was getting hacked after that i got a lot of errors is search console i don’t know what have to do by your tips am going to markup the errors as fixed because those URL’s are not available in my website and i removed those URL’s but the issue is one main landing page is getting 521 error code i was googled about this but i didn’t find a good solution about that and the big issue is my home page only crawled by google another pages not able to crawled even i have submitted sitemaps and using fetch us google please help me and check my website error details below and please command me a good solution about this or mail me please help me……..

    hammer-testing-training-in-chennai.php
    521
    11/2/15

    2
    blog/?p=37
    500
    12/27/15

    8
    blog/?m=201504
    500
    12/7/15

    13
    userfiles/zyn2593-reys-kar-1623-moskva-rodos-ros6764.xml
    521
    11/2/15

    14
    userfiles/cez3214-aviabileti-kompanii-aer-astana-myv9933.xml
    521
    11/3/15

    17
    userfiles/wyz5836-bileti-saratov-simferopol-tsena-gif9086.xml
    521
    11/3/15

    Reply
    • Eoghan Henn
      15. February 2016

      Hi Arun,

      First of all I would like to say that I am very sorry about my late reply. I have been very busy lately and didn’t find time to reply to any comments on here.

      You did the right thing marking the errors as fixed and waiting to see if they occur again. Especially 5xx errors are normally temporary. Did any of these show up again?

      The other problem about important pages not being indexed is probably not related to the crawl errors problem. I am not able to determine the cause of this problem without further research, but I did find one very important problem with your website that you need to solve if you want your pages to be indexed properly:

      In your main navigation, some important pages are not linked to directly, but through URLs that have a 302 redirect to the target. Example:

      /hammer-testing-training-in-chennai.php is linked in the main navigation as /index.php?id=253.

      /index.php?id=253 redirects to /hammer-testing-training-in-chennai.php with a 302 status code. I am not surprised that Google will not index any of the two in this case. you should make sure that you always link directly to the target URL and you should absolutely avoid redirects in internal links. And, in general, there are very few cases where a 302 reidrect is needed. Normally you will need a 301 redirect if you have to redirect a URL.

      I am not sure if this is going to solve all of your problems, but fixing your internal links is definitively an important item on your to-do list. Please let me know if you have any other questions.

      Reply
  53. Greg
    30. October 2015

    I have launched a new site and for some reasons I am getting an error 500 for a number of urls in webmaster tools, including the sitemap itself, When I check my logs for access to the sitemap example it shows google has accessed the sitemap and no errors where returned:

    66.249.64.210 – – [29/Oct/2015:02:19:31 +0000] “GET /sitemap.php HTTP/1.1” – 10322 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

    Also if I access any of these urls they appear perfectly fine.

    Thanks

    Reply
    • Eoghan Henn
      1. November 2015

      Hello Greg, this looks like a temporary problem to me. What I would suggest is to mark the corresponding errors as fixed in the crawl error report and see if they show up again. If they do not show up again, everything is fine.

      Reply
  54. Artin
    18. September 2015

    Hi
    Good post! I get alo of 500 errors in GWT because of i have disabled my feeds! what should i do with that?
    I have disabled feeds because other websites steal my contents!
    Can you help me?
    Thanks

    Reply
    • Eoghan Henn
      22. September 2015

      Hello Artin, I am not quite sure if I understand your problem correctly. Which URLs return 500 errors? The URLs of your feeds? Are you still linking to them? If so, you should definitely remove the links. Also, you can check if it is possible to return a 404 instead of a 500 for your feed URLs. This would be a better signal for Google. It might even be a good idea to 301 redirect the URLs of your feeds to pages on your website, if you find a good match for every feed URL. If you explain your problem in more detail I will be happy to help.

      Reply
      • Artin
        17. October 2015

        Hey Eoghan
        Thanks for your reply! i found a plugin named ”disable feeds” and it redirects all feeds to homepage, so i got rid of those 500 errors! and plus it dosn’t let those spammy sites to steal my contents.

        Reply
        • Eoghan Henn
          19. October 2015

          Hello Artin, thanks a lot for sharing the info about the plugin. Sounds useful!

          Reply
  55. Ossai Precious
    14. September 2015

    Good post! I have a similar problem and just don’t know how to tackle it. Google search console shows that 69 pages have errors and I discovered that the 404 errors come up whenever a ‘/’ is added after the URL.

    Reply
    • Eoghan Henn
      15. September 2015

      Hello Ossai,

      Google only crawls URLs that are linked somewhere, so you should first of all try to find the source if this problem. In Search Console, you can find information on where the URLs with errors are linked from. It is very likely that somewhere on your page or in your sitemap you link to those URLs with trailing slash that return 404s. You should fix those links.

      The next thing you can do is make sure that all URLs that end with a slash are 301 redirected to the same URL without a trailing slash. You should only do this if all of your URLs work without trailing slash. It only requires one line in your htaccess file.

      If you have any other questions I will be happy to help.

      Reply
      • Chris
        8. December 2016

        I exactly have this issue right now. Can you explain the correct htaccess code?

        Reply
        • Eoghan Henn
          14. December 2016

          Hi Chris,

          I am really not en expert on creating rewrite rules in htaccess files, so don’t rely on this, but this one works for me:

          RewriteRule ^(.*)/$ /$1 [R,L]

          Make sure you only use it for the URLs you want to use it for by setting a rewrite condition.

          I hope this helps!

          Reply

Leave a Reply