2015-10-21

Meeting Information

Date: 10/21/2015
Time: 2pm
Location: Enterprise 420a

Agenda

Filter Mason On-Campus Traffic - Jan Macario

EXAMPLES OF REGEX STRINGS FOR MATCHING IP ADDRESSES
===================================================

Regex that matches strings representing the numbers 0-255:
----------------------------------------------------------
^([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$

Regex for IP block with one variable value in last position:
------------------------------------------------------------
^(129\.174\.51\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))$

Regex for IP block with two variable values in last two positions:
------------------------------------------------------------------
^(129\.174\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))$

Regex for IP block with three variable values in last three positions:
----------------------------------------------------------------------
^(42\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))$


MASON ON-CAMPUS IP ADDRESS RANGES
=================================

129.174.x.x
199.26.254.1 - 199.26.254.254
192.5.215.1 - 192.5.215.254
192.168.0.1 - 192.168.3.254 *private IP address range - class C
42.0.0.1 - 42.255.255.254
10.0.0.1 - 10.255.255.254 *private IP address range- class A

TEST DATA FOR ABOVE IP RANGES REGEX 
===================================
Note that not all of these should match.

129.174.0.0
129.174.255.255
129.174.255.256

199.26.254.0
199.26.254.1
199.26.254.254
199.26.254.255

192.5.215.0
192.5.215.1
192.5.215.254
192.5.215.255

192.168.0.0
192.168.0.1
192.168.1.1
192.168.2.1
192.168.3.254
192.168.3.255
192.168.4.1

42.0.0.0
42.0.0.1
42.255.255.254
42.255.255.255

10.0.0.0
10.0.0.1
10.255.255.254
10.255.255.255


FULL REGEX STRING TO MATCH ABOVE IP RANGES
==========================================

This regex string will match any IP in the above ranges exactly.
----------------------------------------------------------------
^((129\.174\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))|(199\.26\.254\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))|(192\.5\.215\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))|(192\.168\.[0-3]\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))|(42\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))|(10\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))))$

Here is the same string, slightly expanded for readability.
-----------------------------------------------------------
^(
(129\.174\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))
|
(199\.26\.254\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))
|
(192\.5\.215\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))
|
(192\.168\.[0-3]\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))
|
(42\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))
|
(10\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))
)$

However, this regex is too long to use in a google analytics filter, as regular expression strings are limited to 255 characters. 

BACK TO THE DRAWING BOARD
=========================

Note that we don't have to be so specific with our regex, as the input data will be solely IP addresses provided by google analytics. As such, they are already guaranteed to be IP addresses in the correct format. Therefore, we can simply our regular expression tremendously.

FULL REGEX STRING THAT MATCHES ALL IP BLOCKS - LESS SPECIFIC
============================================================

^((129\.174\.\d+\.\d+)|(199\.26\.254\.\d+)|(192\.5\.215\.\d+)|(192\.168\.[0-3]\.\d+)|(42\.\d+\.\d+\.\d+)|(10\.\d+\.\d+\.\d+))$

Here is the same string, slightly expanded for readability.
-----------------------------------------------------------
^(
(129\.174\.\d+\.\d+)
|
(199\.26\.254\.\d+)
|
(192\.5\.215\.\d+)
|
(192\.168\.[0-3]\.\d+)
|
(42\.\d+\.\d+\.\d+)
|
(10\.\d+\.\d+\.\d+)
)$

This regex is only 126 characters. It will work!

GOING FURTHER
=============

This regex can be simplified even more by looking at the IP range data provided.
Note that two of the blocks represent private IP address ranges. 192.168.x.x represents a range of private class C IP addresses. 10.x.x.x represents a range of private class A IP addresses.

As such, these IP addresses are only valid within the Mason network. The private IP address from any traffic originating from any IP addresses within these ranges on campus will be translated to a different, public IP address.

The google analytics server is not located within the Mason network (obviously). Therefore, we should not expect to ever see any traffic to the google analytics server from these addresses. This means we can leave them out of our calculations and produce an even simpler regex string.

FINAL REGEX STRING
==================

^((129\.174\.\d+\.\d+)|(199\.26\.254\.\d+)|(192\.5\.215\.\d+)|(42\.\d+\.\d+\.\d+))$

83 characters. And we're done!

Here is the final regex string, slightly expanded for readability.
------------------------------------------------------------------
^(
(129\.174\.\d+\.\d+)
|
(199\.26\.254\.\d+)
|
(192\.5\.215\.\d+)
|
(42\.\d+\.\d+\.\d+)
)$

CLOSING THOUGHTS
================

It turns out that this was all completely unneccesary! 

We have a couple other options to do this.

1) Using the google analytics predefined filter: Exclude -> Traffic from the IP addresses -> that begin with -> (example: 129.174.) Note that we would have to create separate filters for each IP address range and all of these filters would have to be applied to the view in question.

But even that option is probably unnecessary.

2) We can create a filter for "george mason university" on the dimension called "ISP organization". In the reports this dimension is called "service provider". Although the terminology is different, this is the same dimension. Initial tests indicate that this will do what we want, but we may want to confirm this for all the IP ranges. For example, does the service provider for traffic originating from the Korea campus indicate "george mason university"?

Compare and Aggregate Multiple Pages in Report Using Regular Expressions (regex) - Jan Macario

WHAT IS THE ISSUE

With the Mason website redesign, the addresses of many pages have changed. That can make it challenging to review historical data or compare year-to-date numbers.

WHY IS THIS EVEN AN ISSUE?

When doing a quick filter on the google analytics report, it will not do an exact match, but will instead filter for any data *containing* the specified search term.

When using an advanced filter on the google analytics report, we can only combine multiple filter criteria with the “and” operator; not the “or” operator.

SOLUTION

We can use a regular expression to filter the standard google analytics reports to include only traffic from the page(s) in question, and no other pages, thereby leaving out any extraneous data.

EXAMPLE

The old admissions homepage address was: admissions.gmu.edu/
The new admissions landing page is: www2.gmu.edu/admissions-aid

We would like a report that shows only these two pages, and no others. We can then see the data on these two pages, and the google analytics report will automatically aggregate and display the combined data as well.

HOW

We can use the regular expression option in the advanced report filter to do this.

Google Analytics -> Reporting -> Behavior -> Site Content -> All Pages -> Advanced Filter Include -> Page -> Matching RegExp

Here are the pages we’d like to aggregate:

www2.gmu.edu/admissions-aid
admissions.gmu.edu/

Here is the regex string that will do this:
^((admissions.gmu.edu/)|(www2.gmu.edu/admissions-aid))$

Regex string expanded for clarity:

^(
(admissions.gmu.edu/)
|
(www2.gmu.edu/admissions-aid)
)$

Advanced Filter to Include Hostname in Page Path