How to Remove Referrer Spam in Google Analytics
The spam in Google Analytics (GA) is becoming a serious pain. A lot of users are confused and don’t know how to deal with this type of spam. Over the last couple of years, we’ve seen pretty weird things showing up in our Google Analytics reports, but nothing like the spam that is being used now. It’s not only that this new spam is sent as a language, instead of the common referrer spam, but it also has a fake secret Google domain and even a message supporting Trump for the past elections!
So far I was using plenty of filters in my Google Analytics to stop this spam. My .htaccess on the server is huge, just to stop this crap. Today I’ve found a better solution on how to remove referrer spam in Google Analytics.
Lucky you! Continue reading to find out hot to fix that. Just be careful, you need some intermediate set of skills in Google Analytics to make things right.
So let’s go, time to remove some crap spam from your Analytics reports!
1. Create Your Hostname Filter for Ghost Spam
Your hostname filter will prevent most of the spam from:
- sites like all the share-button
- fake compliance cookie sites
- site-auditor
- spammers impersonating legit sites
- and most of the “secret.Google.com” language spam.
This filter will be for your hostnames. So as long as you add all of them you don’t have to worry, you won’t exclude any real traffic. The main characteristic of ghost spam is that it never visits your site. Instead, it uses the measurement protocol to reach your Google Analytics directly. For that reason, this type of spam always leaves a fake hostname or leaves an “undefined” hostname which will appear as (not set) in your reports.
Find your Hostnames
To get to the list of hostnames you should go to the network report in your Analytics and select the tab “Hostnames” at the top of the reports. Make a list of all the valid ones.
Build Your Hostname Regex
Once you have the list of all your hostnames, you have to create a regular expression (REGEX) that contains all of them. It is important that you add all your relevant hostnames, or you run the risk of losing valid data. Here are some example of regex for any domain type with (-) etc
tomrobakphotography\.com|cdn\.tomrobakphotography\.com|www\.tomrobakphotography\.com|sample\-domain\-tomrobak\.com
Few tips:
- To separate each hostname, you need to use a bar or pipe character | ;
- The dot . and the hyphen – are considered special characters in REGEX so you should add a backslash \ before them;
- Don’t leave any spaces;
- The REGEX has a limit of 255 characters;
- Don’t add a pipe/bar |, at the beginning or the end of the expression.
Create the Valid Hostname Filter
Once you are sure the expression is correct, then you can create a filter to get rid all of Ghost Spam.
- Go to the Admin tab, and select the view where you want to apply the filter
- Select Filters under the View column, and select + Add Filter
- Enter “Valid Hostnames” as a name
- In Filter Type, select Custom
- Make sure you choose Include and select Hostname from the dropdown.
- Copy and paste the hostname expression that you’ve built into the Filter Pattern box.
- After making sure your filter is ok, click Save.
2. Creating a Filter for Crawler and Language Spam in Google Analytics
Crawler spam is much harder to detect since it uses a valid hostname, so you’ll need a different filter with an expression that matches all known crawler spam. To save you some time, we will use an optimised REGEX for crawler spam that you’ll find below in the instructions, or if you prefer, it can be built the same way as the valid hostname expression. This time, you will use the source (referral) name.
- Go to the Admin tab.
- Under the last column “VIEW”, select Filters and then click + Add Filter
- Enter “Crawler Spam Filter” as a name.
- Filter Type > Custom > Exclude
- Filter Field > Campaign Source
- Filter Pattern > Paste the following crawler spam expression
The following expressions are optimised to block all crawler spam detected over the last couple of years.
Create 1 filter for each expression
# Expression 1
(best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|semalt|forum69|7makemon|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|uptime(bot|check|\.com)
# Expression 2
datract|hacĸer|ɢoogl|responsive\-test|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter
# Expression 3 – FOR LANGUAGE SPAM
Follow the same steps but instead of “Campaign Source” select Language Settings
\s[^s]*\s|.{15,}|\.|,
3. Enable “Exclude all hits from known bots and spiders”
There are many other crawlers around that are not spam but neither useful for your reports. For example, the ones crawling your site for indexing. These bots will leave a record in your reports if not excluded. In this case, it is a bit easier because Google Analytics has a built-in feature to exclude this traffic.
4. Clean up Historical Spam Data in Google Analytics
The spam that is already stored in your Analytics (or any data for that matter) can’t be permanently deleted. That is why it is important to create the filters to stop receiving junk traffic. However, you can still clean your past data affected by spam by using the valid hostname expression you built previously and an advanced segment.
To eliminate the spam from your Google Analytics historical data you will have to create an advanced segment:
- In the Reporting section, click the box that says All Users (at the top of the graph). Next click the red button +New Segment
- In the segment window, almost at the bottom, click Conditions
- First condition:
Filter > Sessions > Include
Dropdown 1> Hostname
Dropdown 2 > matches regex
Text box > Paste the Hostname Expression that you previously used for the filter - Click +Add Filter at the bottom to add a new condition.
- Second Condition:
Filter >Sessions >Exclude
Dropdown 1 > Source
Dropdown 2 > matches regex
Textbox > Paste the Crawler Spam expression (best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|\-gratis|semalt|forum69|7make|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|\-crew|uptime(bot|check|\.com)|datract|hacĸer|ɢoogl|responsive\-test|torrent\-to|magnet\-to|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter - Click the button Or to the left of the condition you just configured
- Third Condition (To exclude the new language spam)
Dropdown 1 > Language
Dropdown 2 > matches regex
Textbox > Paste the Anti-Language Spam expression \s[^s]*\s|.{15,}|\.|, - Enter your segment name and Save.
—
After saving the segment, you will be able to see spam-free reports, as long as the segment is selected. Eventually, the filters will do their work, and you won’t need to use the segment anymore. If this article helped you, please consider sharing it or leaving a comment with your experience. It may help other people!
Join our SEO related Facebook Group for more tips and trick and latest updates. Source: carlseo