Understand and eliminate spam traffic in Google Analytics

Last updated on Nov 11, 2015

Spam traffic in Google Analytics has been a major issue in the web analytics community lately. Especially since the introduction of Universal Analytics, the amount of spam traffic has increased dramatically. This is due to the fact that Universal Analytics accounts are much easier to spam than classic Google Analytics accounts, a problem that will be discussed later on in this article.

A lot has been written about spam traffic in Google Analytics and there are plenty of useful resources to help you tackle the problem. Nevertheless, some of the suggested solutions seem to be misleading due to a lack of understanding of the problem. This is why this article focuses on explaining the problem in easy terms, before discussing some solutions that have been suggested to eliminate spam traffic and adding some new ideas.

spam-traffic-google-analytics

Let’s start by having a look at how spammers acutally get data into your Google Analytics accounts. There are two ways of spamming Google Analytics accounts that I am aware of: Web crawlers that visit sites with Google Analytics tracking codes and direct data insertion into Google Analytics accounts via the measurement protocol.

Spamming Google Analytics accounts with web crawlers

Web crawlers, or bots, are software programmes that visit lots of websites across the internet automatically. Most web crawlers have useful functions, like (for example) the Googlebot, which crawls all websites it can find and helps Google index the entire web. Visits by web crawlers are normally not captured by Google Analytics, because crawlers identify themselves as crawlers, and not as real users.

Some spammers use web crawlers to manipulate the data in your Google Analytics account. They send a web crawler to your website that identifies itself as a real user and is therefore captured by Google Analytics. On top of that, the web crawler pretends to be coming from a link that points from another domain to your site. These links normally don’t exist, but the domains that the links are supposed to be on are real and belong to the spammers.

But what do spammers gain from this? By simulating visits from their own domains to thousands, or even millions, of Google Analytics accounts, they generate a significant amount of traffic to their own websites. Google Analytics users want to find out where all of their new, unexpected traffic is coming from, so they check out the domains that appear in their referral reports. I’m sure that you have done this yourself before.

The spammers are happy about all of this traffic, because they make money with the ads they show on their websites. It’s as simple as that!

Let’s now have a look at a new, easier, and more sophisticated way of spamming Google Analytics accounts.

Spamming Google Analytics accounts by inserting data via the Measurement Protocol

As I mentioned in the beginning of this article, spam traffic has become an even bigger problem since the introduction of Universal Analytics. One of the best innovations of Universal Analytics is the Measurement Protocol, an interface that allows you to insert data into your Google Analytics account from any given system, without requiring the classic tracking code we all know from website tracking.

The Measurement Protocol makes Google Analytics a lot more powerful than it was before, because it enables an easy integration of different systems into your website tracking. One of the tracking features that have become a lot smoother to implement since the introduction of the Measurement protocol is phone call tracking. Call tracking providers can now insert data into your Google Analytics account via a simple HTTP request and they can include all the dimension and metric data that a normal page view or event on your website would include.

Maybe you already know where this is heading: While the Measurement Protocol is the most powerful innovation of Universal Analytics, it is also its biggest vulnerability. Anybody can send anything to your Google Analytics account! All they need is your tracking ID, and off we go.

So most of the spam visits you see in your Google Analytics account didn’t actually happen on your website, somebody just sent data to your Google Analytics account via the Measurement Protocol through a simple HTTP request. By using this method, the spammers can manipulate any dimension or metric they like. This is why, with Measurement Protocol spam, you do not only see spam domains in your referral report, but also in your events report or in your organic search keywords report.

The goal of the spammers that use this method is the same as the goal of the spammers that use web crawlers: They want to make you curios about their websites by making their domain names show up in all kinds of different places in your Google Analytics account. When you visit their websites, they make money with the ads they show.

How can you prevent your Google Analytics accounts from being spammed?

Now that we have discussed how spammers push data into your Google Analytics account and why they do it, let us have a look at some solutions to get rid of the unwanted traffic in your statistics. We will have a look at some advice that can be found across the web (including some of the really bad advice) and I will present a solution that I have developed myself with my colleagues at rankingCHECK.

Let’s start with some of the bad advice, so that you know what NOT to do about spam traffic right from the start.

Do NOT use the referral exclusion list to exclude referral spam

The referral exclusion list, like the Measurement Protocol, is another feature of Google Analytics that has been introduced with Universal Analytics and did not exist in the classic version. Its main function is to prevent a new session from starting when users leave the tracked website to perform an action that is hosted on a different domain and are then referred back to the tracked website.

A classic application for this is payment via external providers. If you send your website visitors to paypal.com to pay for their purchases in your online shop, and Paypal then sends them back to your website, their return will show up as a new visit from paypal.com, and your referral report will look like Paypal is sending you lots of buying customers.

To prevent this from happening, you can include paypal.com and the domains of other payment providers you work with in your referral exclusion list. Now, when a user comes to your page from paypal.com, Google Analytics will check whether this user has already started a session on your website. If so, the open session will be continued and the return of the user to your website will not be counted as a new visit and its source will not be noted as paypal.com, but as the source of the session that has already been started.

If, on the other hand, Google Analytics detects a user that comes from paypal.com that has not recently started a session on your website, the visit will be counted as a new visit and the referral “paypal.com” will be omitted. The visit will thus be counted as a direct visit.

And this is why you should never, never, ever, include spam referrals in your referral exclusion list! The spam visits will still be counted, but instead of counting them as referral visits from spam domains, Google Analytics will count them as direct visits. This actually makes the problem worse, instead of making it better. Now you won’t even be able to distinguish spam visits from real direct visits or real visits from other sources that are counted as direct visits for technical reasons (I will talk more about this problem in another article).

So, whatever you do to fight spam traffic in Google Analytics, do NOT use your referral exclusion list to tackle the problem, even if this piece of advice can be found in the most reputable sources. It will not help you, but make your spam problem worse instead.

Let us now have a look at another piece of bad advice that can be found in resources dealing with the problem of spam traffic in Google Analytics accounts.

Do NOT use a country filter to exclude spam traffic

Just like the useless advice with the referral exclusion list, this is another very bad idea I have read about in various otherwise reputable sources. Some digital marketers seem to think that simply filtering traffic from obscure countries will solve the problem. What they do not realise is that there are real internet users in those countries that might be interested in their websites and their services, just like there is a lot of spam traffic that shows up as traffic from their own country.

Using a hostname filter can be risky and is not really necessary

One of the solutions that is suggested accross most of the resources on this topic is using a hostname filter for your Google Analytics data that only includes valid hostnames. This is a pretty good solution, but it is far from perfect and comes with some risks. It will help you eliminate most of the Measurement Protocol spam, because the spammers that push data into your Google Account do not actually know your domain name. They generate Google Analytics tracking IDs randomly and use random hostnames in their hits, or often their own domains.

The hostname filter solution suggests that you only include hits in your Google Analytics data that have your own hostname(s), along with some other “good” hostnames, such as Google Translate. And this is where the solution becomes extremely unreliable. Who knows which other “good” hostnames will appear in future, because other companies will launch services similar to Google Translate, where your content is hosted on a different hostname, for the benefit of the user, and with no harm to your business?

Nobody actually really needs this solution, as there is a much better way of eliminating Measurement Protocol spam that works very effectively and comes with no risk for the quality of your data. Let us have a look at this solution now.

Get rid of Measurement Protocol spam once and for all

At rankingCHECK, the online marketing agency I work for, we have developed a quick, clean and easy solution for eliminating Measurement protocol spam. All you need to set it up is the Google Tag Manager.

If you haven’t set up Google Tag Manager yet and if you are still placing Google Analytics tracking codes directly in the source code of your website, you should change that NOW. There are no arguments for not using Google Tag Manager with Google Analytics. You can do lots of great things to your Google Analytics configuration when you use Google Tag Manager and the quality of your data improves significantly. I will write more about the benefits of Google Tag Manager in a future article.

Setting up a Measurement Protocol spam traffic filter with Google Tag Manager is easier than it sounds. All you need to do is clearly identify all hits that you control and exclude all other hits, that do not carry this identification. You can achieve this by passing a certain value in all hits that happen on your website in a custom dimension. Hits include pageviews, events, transactions and all other interactions of your tracking code with the Google Analytics servers.

In Google Tag Manager, you just add this value, that you define yourself, to the same custom dimension in all of your Google Analytics tags. Think of this value as your password, although it need not be cryptic or safe. It is just an identification for you, that will help you recognise all the hits that you are in control of.

Now you set up this custom dimension in your Google Analytics account and create a filter for the data view you are working with that only includes hits that carry the value you defined in the custom dimension you defined. You will see that from now on, Measurement Protocol spam will not show up in your data anymore, because the spammers don’t know your password and Google Analytics filters the hits they create.

If you are using the Measurement Protocol yourself to push data to Google Analytics for certain tracking features, like phone call tracking, you have to make sure you also include your identification value in those hits.

As you can see, it is very easy to exclude spam traffic that is pushed into your account via the Measurement Protocol. But what about the other type of spam traffic that we have discussed? Web crawler spam traffic is much more difficult to tackle, but let’s not give up! We will now have a look at the options for tackling this type of spam traffic.

Using referral exclusion filters to eliminate crawler spam

Once you have set up the Measurement Protocol spam traffic filter solution discussed above, you will see that the amount of spam traffic in your account decreases dramatically, but some visits from obscure referrals will keep showing up. These are visits that actually happened on your site, but they were not caused by real users, but by crawlers that pretend to be real users.

The solution I am using at the moment is identifiying those referrals on a weekly or monthly basis (depending on the size of the account) and excluding them from the data view, custom reports and dashboards using filters. This works great but it obviously is a pain in the arse because it takes up a lot of time, so I am looking for an automated solution.

One tool that is worth checking out and that promises to solve the problem of having to check your referral report manually and clean it up regularly is Simo Ahava’s spam filter insertion tool. I will let you know when I have tested how well it works.

What about the Google Analytics setting “Exclude all hits from known bots and spiders”?

This Google Analytics standard feature, which can be found in the data view settings in the admin area of your Google Analytics account, is a quite useful idea, but in reality, it does not have much of an impact. You can test it yourself by creating one data view with this feature activated and another data view without this setting. You will see that the difference is marginal. Using this setting is definitely a good idea, but it leaves you far from solving the problem.

What does the solution of the future look like?

In future, I hope to find a solution that does not consist in fighting spam, but in identifying real users better. If we can identify a real user on a website by the way he or she behaves, we can look at real users only and ignore spam completely.

There are already some very nice and helpful scripts out there that measure user behaviour (and at the same time, without it being their main purpose, help us identify real users), like the brilliant Riveted by Rob Flaherty.

If we manage to develop a tool like this that works 100% reliable on all device types, we will not have to worry about spam traffic in Google Analytics anymore. We will just create segments with our real users and analyse what they are doing on our pages. I will keep you posted.

Did you enjoy reading this article? Say thanks by sharing it:

Leave a Reply