Occasionally, you may see traffic traffic data from unusual/unrecognized hostnames showing-up in your GA reports. As an example, here are some of the top hostnames appearing the Mason core website in the last week or so:
Hostname | Sessions |
---|---|
www2.gmu.edu | 103733 |
www.youtube.com | 112 |
www2.gmu.edu.googleweblight.com | 16 |
m.youtube.com | 10 |
translate.googleusercontent.com | 6 |
georgemasonudev.prod.acquia-sites.com | 4 |
fanyi.myyoudao.com | 2 |
web.archive.org | 2 |
webcache.googleusercontent.com | 2 |
client.ql2.com | 1 |
core.itsdev.gmu.edu | 1 |
core.preprod.gmu.edu | 1 |
jijua.gmu.edu | 1 |
m.baidu.com | 1 |
www.gmu.edu | 1 |
Obviously only one, www2.gmu.edu, should be expected.
How does your website compare? You can see the same data for your website using this GA custom report link: https://analytics.google.com/analytics/web/template?uid=RmbU4c2nSxOXUk1HGIlOfw
So why are unknown websites showing-up in the analytics?
There are a couple important points to note:
1) You can use your Google Analytics ID number on any website, and/or on multiple websites.
2) GA property IDs are public- if you know where to look. Anyone can see them just by going to your website. Therefore anyone can, should they so choose, send data to any GA property just by looking-up the property ID, or even by using random property IDs.
Why Would Your Google Analytics Property ID Number be used on Different Websites?
- YouTube: You have the ability to configure your YouTube channel to include your Google Analytics property ID. If so, you will see hits to your analytics from visits to your YouTube channel.
- Translation services: these are web-services which will proxy a website and modify the content before it reaches the user in order to translate it into different languages. Typically these services will not strip-out your GA Javascript code, so you will see them in your GA reports. Because these hits represent people who really are visiting your site, I consider this legitimate traffic.
- Mobile optimization services: services like Googleweblight will proxy your website in order to optimize it for delivery over low-bandwidth connections. Typically these services will not strip-out your GA Javascript code, so you will see them in your GA reports. Because these hits represent people who really are visiting your site, I consider this legitimate traffic.
- Fishing/Spam/Malware: the entity is sending data to your GA account in the hopes that you will be curious what it is and will visit the website. Upon visiting the website, you may be exposed to malware or other undesirable phenomena. Use caution (and/or countermeasures) when considering whether to visit unusual-looking URLs that you don’t recognize.
- Novelty websites: sites like Meowbify (which replaces all images with pictures of cats) or Gizoogle (which modifies website text to make it appear in a certain kind of ‘colloquial’ language) will let users generate links to send to people which will proxy the specified website while modifying it for comedic effect. Often, these websites don’t bother to strip out the GA JavaScript code, so data will be sent to your GA property from these other websites. While these hits do represent actual visits by a user, it is unlikely to be ‘legitimate’ traffic; if someone wanted to visit your website for its intended purpose, it is unlikely that they would do so this way.
- Code ‘borrowing’: the web being what it is (open), there is nothing stopping people from seeing a website they like, and copy-pasting the HTML source to a new website to play with. Again, in many cases, people don’t think about, or bother to, remove the JavaScript code that implements GA. Therefore this is unlikely to represent ‘legitimate’ traffic.
Here are the examples from above, with more information:
Hostname | Sessions | Type | Notes |
---|---|---|---|
www2.gmu.edu | 103733 | expected | |
www.youtube.com | 112 | YouTube | Our GA property ID was added to our YouTube channel intentionally to collect analytics data. |
www2.gmu.edu.googleweblight.com | 16 | optimization service | googleweblight |
m.youtube.com | 10 | YouTube (mobile) | Our GA property ID was added to our YouTube channel intentionally to collect analytics data. |
translate.googleusercontent.com | 6 | translation service | users viewing your website using the google translation tool |
georgemasonudev.prod.acquia-sites.com | 4 | test server | |
fanyi.myyoudao.com | 2 | translation service | |
web.archive.org | 2 | archive service | users viewing an archived version of our website |
webcache.googleusercontent.com | 2 | caching services | users viewing the cached version of your website in the google search results |
client.ql2.com | 1 | ||
core.itsdev.gmu.edu | 1 | test server | |
core.preprod.gmu.edu | 1 | test server | |
jijua.gmu.edu | 1 | test server | |
m.baidu.com | 1 | ||
www.gmu.edu | 1 | test server |
How to Filter This Traffic
You can establish filters on your GA views which will block hostnames other than those you specifically intend to collect data from (whitelist). This will prevent data from being collected from undesirable sources, but be careful when implementing this and be sure to maintain an unfiltered view in each GA property in order to make sure you are not blocking legitimate data.