Google Analytics: Identifying 404 Pages for Mason Drupal Websites

Posted: June 21, 2017 at 4:09 pm, Last Updated: June 28, 2017 at 12:38 pm

Mason’s Drupal implementation will serve any arbitrary URL, even if such a page doesn’t exist on the website.

While the resulting page will present you with a “Page Not Found” message, and will return a 404 HTTP response code, the resulting page will appear to be at the URL you provided; it will not redirect (visibly) to a separate 404 page.

Example: https://www2.gmu.edu/thisisnotapage

This can make it challenging to figure-out how to see all of your 404 pages in Google Analytics.

HTML Page Titles

On Mason’s Drupal websites, the HTML page titles (i.e. the actual page title tags in HTML that show-up in the browsers tab headings – not the page titles within the website content) are typically in the form of “[Node Title] | Website Title”.

For example, the core website Academics page’s HTML page title is “Academics | George Mason”.

When you go to a page on the new site that doesn’t exist, what that really means is that such a node does not exist. Therefore the HTML page title ends up being “| George Mason” (i.e. no node title). We can leverage this to see all 404 pages in aggregate.

Example: 404 pages on the S-CAR Drupal Website

Account: Mason School for Conflict Analysis and Resolution
Property: S-CAR Mason CMS
View: All Web Site Data
Date Range: June 1, 2017 – June 10, 2017 (immediately post launch of new Drupal website)
Report: Behavior -> Site Content -> All Pages
Primary Dimension: Change the primary dimension from “Page” to “Page Title”.
Filter: Use the advanced report filter option to implement the following filter: Include -> Page Title -> Exactly Matching -> “| The School for Conflict Analysis and Resolution”.
Note: if you use the the quick filter box, you will have to search for the string “^\| The School for Conflict Analysis and Resolution$”. GA automatically interprets entries in the quick filter box as regular expressions, so you will have to account for the pipe character in the regular expression to make this work.

This will show you the total number of 404 pages that have been viewed.

Identifying Specific 404 Pages

Add ‘Page’ as a Secondary Dimension in order to see the specific pages resulting in 404 errors.
Note that the Drupal site does log these pageviews to GA as 404.html, and includes parameters which indicate the original URL requested, as well as the referrer.
Now you know that you can also identify 404 pages by filtering on the Page field for “404.html”.

Note the drop in 404 results over time. What accounts for this?

Identify Where 404 Traffic is Coming From

You can instead add the ‘Source/Medium’ field as a secondary dimension to see where traffic to the 404 pages is coming from. This allows you to see which sources are sending you traffic that result in 404 pages.
Select and plot first two rows: “Google/Organic” and “(direct)/(none)”.
You can see now that the “(direct)/(none)” results have more or less held steady, while the 404s from Google have steadily dropped.
Moreover, the “Google / organic” line tracks very closely to the overall 404 page hits. This is likely due to Google identifying that those pages are missing and removing them from its search index.