Google Analytics is one of the leading platforms news organizations are using to gather data on their online content. It gives us tons of data, it’s highly configurable and best of all it’s free.
Google Analytics can be a powerful tool, but if you don’t pay attention to certain details, you’ll end up with misleading and inaccurate metrics.
Below is a list of some of the more common Google Analytics issues that I’ve come across, descriptions of the problems they create, and solutions for how to fix them. This is in no way an exhaustive list of the things that can go wrong with your Google Analytics data collection. But it’s a pretty good start.
A redundant hostname occurs when Google sees that you have two domains, both with the same tracking code, serving up the same content. A common example is when you have traffic coming from yoursite.com and www.yoursite.com.
The result of a redundant hostname is that your content analytics get split up between the different hostnames, so visits to www.yoursite.com/contact are potentially counted separately from visits to yoursite.com/contact. Segmentation like this lowers the numbers that you’re seeing from your content.
Another issue comes from how Google treats duplicate content. If Google sees www.yoursite.com and yoursite.com as the same site, both of which are generating analytics, it will penalize your search ranking.
First, decide if you would rather be www.yoursite.com or yoursite.com. This is totally a matter of preference. If you choose www.yoursite.com, users who try and go to yoursite.com will simply be redirected, almost instantaneously, to www.yoursite.com. And vice versa with www.yoursite.com redirecting to yoursite.com.
Second, figure out how to edit your website’s .htaccess file. I would suggest talking to your web administrator, your web host management company, or doing a simple google search. Ex: “mediatemple edit .htaccess”.
Before you edit your .htaccess file, make a copy of it and store it somewhere offline. That way, if you accidentally screw something up, you can simply restore the old file.
Once you’re in the .htaccess file, make sure this line of code exists:
If it doesn’t, add it. Next, if you want your url to INCLUDE www, add these lines:
Alternatively, if you want your url to EXCLUDE the www, add these lines:
You must choose EITHER the include www script OR the exclude www script. Adding both will likely make your website inaccessible.
Subdomain issues occur when you have a top-level domain and one or more subdomains that use the same GA code. For example, when yoursite.com and sub.yoursite.com are both sharing the same GA tracking code.
Google Analytics content is, by default, listed by its URI; that is everything in the url that comes after the domain name (http://yoursite.com/this-is-the-uri). If you’re tracking both your top-level domain and your subdomain with the same GA code, then pages with different domains but the same URIs will be counted as the same content. Example: yoursite.com/about and sub.yoursite.com/about
If you wanted to look at the metrics for yoursite.com/about without looking at the metrics for sub.yoursite.com/about, you would have to change your report’s primary dimension to something other than the default.
Setting up a filter to more precisely identify your content is an easy way to solve this subdomain issue. You should also set up a unique view for your subdomain, in case you want to be able to quickly look at metrics for just that subdomain.
Create a filter that shows the full path of your content:
These instructions assume:
- You’re using Universal Analytics code (analytics.js)
- Your top-level domain is called yoursite.com → To implement this code, make sure to replace all instance of yoursite.com with your actual top-level domain address
- Your subdomain is called sub.yoursite.com
- Add the same tracking code to your subdomain that you have on your top-level domain
- On the Google Analytics platform, go to Admin > Account > All Filters > Add Filter
- Include these filter settings
- Give your new filter a name you’ll understand later
- Filter Type: Custom > Advanced
- Field A -> Extract A:
- Select Field: Hostname
- Value: (sub.yoursite.com) *include the parentheses
- Field B -> Extract B:
- Select Field: Request URI
- Value: (.*) *include parentheses
- Output To -> Constructor
- Select Field: Request URI
- Value: $A1$B1
- Field A Required: Checked
- Field B Required: Unchecked
- Override Output Field: Checked
- Case Sensitive: Unchecked
- Apply this filter to your master view (the view that shows all data from all of your sites)
- Test this by visiting your subdomain (or having a friend visit if your IP is being excluded) and checking out Real Time analytics. You should see the full URL of the subdomain represented there
Now that you have your subdomain separated from your top-level domain, you’ll want to create a view for all subdomain data so that you can easily view metrics from that subdomain.
Creating a new view for your subdomain:
- On the GA platform, go to Admin > View and click the current view dropdown
- Select Create New View
- Note: you can only have 25 views for any given property
- Configure your view:
- Add a name that indicates what the view is going to display, example: sub.yoursite.com
- Choose they kind of data it should track: website or mobile app
- Set the reporting timezone. Note: make sure reporting timezones are consistent between the views you have on your property. Otherwise, you’ll end up with some funky report numbers
- Click Create View to finish creating your view
- Once you’ve created your view, you need to create a filter that tells the view to only collect information on the subdomain you’re trying to track
- With your newly created view set as the current view, go to Admin > View > Filters > Add Filter
- Choose method to apply filter to view: Create new Filter
- Filter Name: Include only traffic from sub.yoursite.com
- Filter Type: Predefined
- Select Filter Type: Include only
- Select Source or destination: Traffic to the hostname
- Select expression: that contains
- Hostname: sub.yoursite.com
- Save your filter
This view will now be listed under your property along with your master view. You might also want to create a view that shows only traffic to your top-level domain.
- It’s worth noting that adding filters to views not only removes certain content from being visible, it actually prevents data from being collected. This is important because if you create a view that filters out content from Australia (for whatever reason) you cannot simply remove that filter and get all the historical data from Australia. For the period of time that the filter was in place, you will have no data from Australia. This is why Google suggest that you retain the integrity of your master view by imposing no filters or restrictions on it.
- It’s also important to note that this solution is only acceptable when you’re talking about top-level domains and subdomains. When you start talking about tracking between multiple top-level domains, i.e. yoursite.com and yourothersite.com, you need to implement something called Cross-domain tracking.
Cross-domain tracking issues occur when you have multiple top-level domains that you’re trying to track with a single GA tracking code.
Examples include pushing people from your site to another site in order to sign up for a newsletter or make a donation and tracking content that will be embedded on another site via iframe.
Unlike tracking content between a top-level domain and a subdomain, the security measures in place between top-level domains prevent data collection in the way that you would like. The result is that multiple records are created for single users, creating inconsistent and erroneous user data.
For example, if a single user travels from yoursite.com to yourothersite.com, both of which use the same GA tracking code, a new user session is created for the user on yourothersite.com. In addition to this skewing session and visitor numbers, it makes tracking conversions and goals much harder if not impossible.
Here are the steps you’ll need to take in order to setup proper cross-domain tracking:
- Add the same tracking code to each site
- Set up cross-domain tracking by modifying each tracking code (Google’s explanation is pretty spot on)
- Create a new view
- We do this step because we want to leave the master view free of filters and other modifications
- Add one filter for each domain your tracking to that new view. Each filter should be configured to show the full URL of each domain.
- You can follow instructions listed under Create a filter that shows the full path of your content above for each filter
- (optional) Set up views for each individual domain you’re tracking so you can see domain specific analytics quickly
- (optional) If the domains you’re tracking traffic between are showing up as “Referral sites”, you can exclude them by going to Admin > Property > Tracking Info > Referral Exclusion List > Add Referral Exclusion. Add an exclusion record for each domain that’s showing up as a referrer
Filtering Internal Traffic
Internal traffic contamination occurs when people on staff are being counted as users.
It’s safe to say that you and other company staff are interacting with the company website very differently than normal users.
In addition to not wanting to conflate metrics like time on site and sessions, internal traffic can skew patterns and trends that might otherwise help you understand how users interact with your content.
The solution to this issue can be complicated by several things:
- Your office is on a dynamic IP
- Your company works remotely
If your company is all in one place, the simplest way to exclude internal traffic is to create a filter that excludes a range of IP addresses. Google has very good documentation on how to get that up and running.
If your company is remote or spread out, you’ll need to get a little bit fancier. LunaMetrics, a search marketing company, has a pretty extensive list of all the different way you can try and exclude internal traffic, including:
- Automatically setting custom dimensions on staff devices and browsers when they visit a specific, staff-only web page; then adding a filter that excludes traffic with that custom dimension
- Pros: Automatic is good because you only need to ensure that staffers visits the web page you tell them to
- Cons: They will need to visit the established page on each browser and device that they use to access your site; there is a bit of technical coding involved in setting this up; if your team ever clears their browser cookies they will need to revisit the page to have their custom dimension reset
- Creating an “Opt-out” page that will allow staffers to manually set the custom dimension on their devices and browsers
- Pros and cons: This is essentially the same as the first option, except staffers are manually setting the custom dimension which undoubtedly makes it that much less likely that they will remember to do it
- Instructing staffers to use a browser plugin that will exclude them from being counted
- Pros: This gives staffers visual access over whether or not they’re being included in your Google Analytics data
- Cons: These extensions exclude people not only from your own Google Analytics data, but from any Google Analytics data, so if they forget to turn the plugin off once they leave your site, it’s impacting everyone else’s website data. Also, the likelihood that someone will remember to switch the plugin on and off in coordination with entering and leaving your site is highly unlikely, even for the most considerate of us.
Again, these are not all of the pitfalls you can encounter when using Google Analytics to collect user and site data, but it does cover many of the common issues that media organizations are likely to face.
If you’re interested in other Google Analytics issues, Kissmetrics has created a list of no fewer than 29 issues that could be affecting your data.
If you’ve encountered other Google Analytics issues that are creating problems with how you data is being reported, let us know! We’d love to know what is standing in the way of our readers and their data.
Alexandra Kanik (@act_rational) is the Metrics Editor/Curator for MediaShift.