How to Remove Referrer Spam in Google Analytics

The spam in Google Analytics (GA) is becoming a serious pain. A lot of users are confused and don’t know how to deal with this type of spam. Over the last couple of years, we’ve seen pretty weird things showing up in our Google Analytics reports, but nothing like the spam that is being used now. It’s not only that this new spam is sent as a language, instead of the common referrer spam, but it also has a fake secret Google domain and even a message supporting Trump for the past elections!

So far I was using plenty of filters in my Google Analytics to stop this spam. My .htaccess on the server is huge, just to stop this crap. Today I’ve found a better solution on how to remove referrer spam in Google Analytics.

Lucky you! Continue reading to find out hot to fix that. Just be careful, you need some intermediate set of skills in Google Analytics to make things right.

So let’s go, time to remove some crap spam from your Analytics reports!

1. Create Your Hostname Filter for Ghost Spam

Your hostname filter will prevent most of the spam from:

  • sites like all the share-button
  • fake compliance cookie sites
  • site-auditor
  • spammers impersonating legit sites
  • and most of the “secret.Google.com” language spam.

This filter will be for your hostnames. So as long as you add all of them you don’t have to worry, you won’t exclude any real traffic. The main characteristic of ghost spam is that it never visits your site. Instead, it uses the measurement protocol to reach your Google Analytics directly. For that reason, this type of spam always leaves a fake hostname or leaves an “undefined” hostname which will appear as (not set) in your reports.

Find your Hostnames

To get to the list of hostnames you should go to the network report in your Analytics and select the tab “Hostnames” at the top of the reports. Make a list of all the valid ones.

spam1

Build Your Hostname Regex

Once you have the list of all your hostnames, you have to create a regular expression (REGEX) that contains all of them. It is important that you add all your relevant hostnames, or you run the risk of losing valid data. Here are some example of regex for any domain type with (-) etc

tomrobakphotography\.com|cdn\.tomrobakphotography\.com|www\.tomrobakphotography\.com|sample\-domain\-tomrobak\.com

Few tips:

  • To separate each hostname, you need to use a bar or pipe character | ;
  • The dot . and the hyphen – are considered special characters in REGEX so you should add a backslash \ before them;
  • Don’t leave any spaces;
  • The REGEX has a limit of 255 characters;
  • Don’t add a pipe/bar |, at the beginning or the end of the expression.

Create the Valid Hostname Filter

Once you are sure the expression is correct, then you can create a filter to get rid all of Ghost Spam.

  1. Go to the Admin tab, and select the view where you want to apply the filter
  2. Select Filters under the View column, and select + Add Filter
  3. Enter “Valid Hostnames” as a name
  4. In Filter Type, select Custom
  5. Make sure you choose Include and select Hostname from the dropdown.
    spam3
  6. Copy and paste the hostname expression that you’ve built into the Filter Pattern box.
  7. After making sure your filter is ok, click Save.

2. Creating a Filter for Crawler and Language Spam in Google Analytics

Crawler spam is much harder to detect since it uses a valid hostname, so you’ll need a different filter with an expression that matches all known crawler spam. To save you some time, we will use an optimised REGEX for crawler spam that you’ll find below in the instructions, or if you prefer, it can be built the same way as the valid hostname expression. This time, you will use the source (referral) name.

  1. Go to the Admin tab.
  2. Under the last column “VIEW”, select Filters and then click + Add Filter
  3. Enter “Crawler Spam Filter” as a name.
  4. Filter Type > Custom > Exclude
  5. Filter Field > Campaign Source
    spam4
  6. Filter Pattern > Paste the following crawler spam expression

The following expressions are optimised to block all crawler spam detected over the last couple of years.
Create 1 filter for each expression

# Expression 1

(best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|semalt|forum69|7makemon|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|uptime(bot|check|\.com)

# Expression 2

datract|hacĸer|ɢoogl|responsive\-test|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter

# Expression 3 – FOR LANGUAGE SPAM
Follow the same steps but instead of “Campaign Source” select Language Settings

\s[^s]*\s|.{15,}|\.|,

3. Enable “Exclude all hits from known bots and spiders”

There are many other crawlers around that are not spam but neither useful for your reports. For example, the ones crawling your site for indexing. These bots will leave a record in your reports if not excluded. In this case, it is a bit easier because Google Analytics has a built-in feature to exclude this traffic.

spam5spam6

4. Clean up Historical Spam Data in Google Analytics

The spam that is already stored in your Analytics (or any data for that matter) can’t be permanently deleted. That is why it is important to create the filters to stop receiving junk traffic. However, you can still clean your past data affected by spam by using the valid hostname expression you built previously and an advanced segment.

To eliminate the spam from your Google Analytics historical data you will have to create an advanced segment:

  1. In the Reporting section, click the box that says All Users (at the top of the graph). Next click the red button +New Segment
  2. In the segment window, almost at the bottom, click Conditions
  3. First condition:
    Filter > Sessions > Include
    Dropdown 1> Hostname
    Dropdown 2 > matches regex
    Text box > Paste the Hostname Expression that you previously used for the filter
  4. Click +Add Filter at the bottom to add a new condition.
  5. Second Condition:
    Filter >Sessions >Exclude
    Dropdown 1 > Source
    Dropdown 2 > matches regex
    Textbox > Paste the Crawler Spam expression (best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|\-gratis|semalt|forum69|7make|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|\-crew|uptime(bot|check|\.com)|datract|hacĸer|ɢoogl|responsive\-test|torrent\-to|magnet\-to|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter
  6. Click the button Or to the left of the condition you just configured
  7. Third Condition (To exclude the new language spam)
    Dropdown 1 > Language
    Dropdown 2 > matches regex
    Textbox > Paste the Anti-Language Spam expression \s[^s]*\s|.{15,}|\.|,
  8. Enter your segment name and Save.

After saving the segment, you will be able to see spam-free reports, as long as the segment is selected. Eventually, the filters will do their work, and you won’t need to use the segment anymore. If this article helped you, please consider sharing it or leaving a comment with your experience. It may help other people!

Join our SEO related Facebook Group for more tips and trick and latest updates. Source: carlseo

What Are You Looking For ?
We need cookies to run our site optimally, by continuing to browse you agree to our cookie policy for our required cookies. If you’d like to update your cookies, check out how to clear cookies for your browsers here.
Accept