Meeting Information
Date: 10/21/2015
Time: 2pm
Location: Enterprise 420a
Agenda
Filter Mason On-Campus Traffic - Jan Macario
EXAMPLES OF REGEX STRINGS FOR MATCHING IP ADDRESSES =================================================== Regex that matches strings representing the numbers 0-255: ---------------------------------------------------------- ^([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$ Regex for IP block with one variable value in last position: ------------------------------------------------------------ ^(129\.174\.51\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))$ Regex for IP block with two variable values in last two positions: ------------------------------------------------------------------ ^(129\.174\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))$ Regex for IP block with three variable values in last three positions: ---------------------------------------------------------------------- ^(42\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))$ MASON ON-CAMPUS IP ADDRESS RANGES ================================= 129.174.x.x 199.26.254.1 - 199.26.254.254 192.5.215.1 - 192.5.215.254 192.168.0.1 - 192.168.3.254 *private IP address range - class C 42.0.0.1 - 42.255.255.254 10.0.0.1 - 10.255.255.254 *private IP address range- class A TEST DATA FOR ABOVE IP RANGES REGEX =================================== Note that not all of these should match. 129.174.0.0 129.174.255.255 129.174.255.256 199.26.254.0 199.26.254.1 199.26.254.254 199.26.254.255 192.5.215.0 192.5.215.1 192.5.215.254 192.5.215.255 192.168.0.0 192.168.0.1 192.168.1.1 192.168.2.1 192.168.3.254 192.168.3.255 192.168.4.1 42.0.0.0 42.0.0.1 42.255.255.254 42.255.255.255 10.0.0.0 10.0.0.1 10.255.255.254 10.255.255.255 FULL REGEX STRING TO MATCH ABOVE IP RANGES ========================================== This regex string will match any IP in the above ranges exactly. ---------------------------------------------------------------- ^((129\.174\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])))|(199\.26\.254\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))|(192\.5\.215\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))|(192\.168\.[0-3]\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))|(42\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4])))|(10\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))))$ Here is the same string, slightly expanded for readability. ----------------------------------------------------------- ^( (129\.174\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))) | (199\.26\.254\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))) | (192\.5\.215\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))) | (192\.168\.[0-3]\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))) | (42\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))) | (10\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))) )$ However, this regex is too long to use in a google analytics filter, as regular expression strings are limited to 255 characters. BACK TO THE DRAWING BOARD ========================= Note that we don't have to be so specific with our regex, as the input data will be solely IP addresses provided by google analytics. As such, they are already guaranteed to be IP addresses in the correct format. Therefore, we can simply our regular expression tremendously. FULL REGEX STRING THAT MATCHES ALL IP BLOCKS - LESS SPECIFIC ============================================================ ^((129\.174\.\d+\.\d+)|(199\.26\.254\.\d+)|(192\.5\.215\.\d+)|(192\.168\.[0-3]\.\d+)|(42\.\d+\.\d+\.\d+)|(10\.\d+\.\d+\.\d+))$ Here is the same string, slightly expanded for readability. ----------------------------------------------------------- ^( (129\.174\.\d+\.\d+) | (199\.26\.254\.\d+) | (192\.5\.215\.\d+) | (192\.168\.[0-3]\.\d+) | (42\.\d+\.\d+\.\d+) | (10\.\d+\.\d+\.\d+) )$ This regex is only 126 characters. It will work! GOING FURTHER ============= This regex can be simplified even more by looking at the IP range data provided. Note that two of the blocks represent private IP address ranges. 192.168.x.x represents a range of private class C IP addresses. 10.x.x.x represents a range of private class A IP addresses. As such, these IP addresses are only valid within the Mason network. The private IP address from any traffic originating from any IP addresses within these ranges on campus will be translated to a different, public IP address. The google analytics server is not located within the Mason network (obviously). Therefore, we should not expect to ever see any traffic to the google analytics server from these addresses. This means we can leave them out of our calculations and produce an even simpler regex string. FINAL REGEX STRING ================== ^((129\.174\.\d+\.\d+)|(199\.26\.254\.\d+)|(192\.5\.215\.\d+)|(42\.\d+\.\d+\.\d+))$ 83 characters. And we're done! Here is the final regex string, slightly expanded for readability. ------------------------------------------------------------------ ^( (129\.174\.\d+\.\d+) | (199\.26\.254\.\d+) | (192\.5\.215\.\d+) | (42\.\d+\.\d+\.\d+) )$ CLOSING THOUGHTS ================ It turns out that this was all completely unneccesary! We have a couple other options to do this. 1) Using the google analytics predefined filter: Exclude -> Traffic from the IP addresses -> that begin with -> (example: 129.174.) Note that we would have to create separate filters for each IP address range and all of these filters would have to be applied to the view in question. But even that option is probably unnecessary. 2) We can create a filter for "george mason university" on the dimension called "ISP organization". In the reports this dimension is called "service provider". Although the terminology is different, this is the same dimension. Initial tests indicate that this will do what we want, but we may want to confirm this for all the IP ranges. For example, does the service provider for traffic originating from the Korea campus indicate "george mason university"?
Compare and Aggregate Multiple Pages in Report Using Regular Expressions (regex) - Jan Macario
WHAT IS THE ISSUE
With the Mason website redesign, the addresses of many pages have changed. That can make it challenging to review historical data or compare year-to-date numbers.
WHY IS THIS EVEN AN ISSUE?
When doing a quick filter on the google analytics report, it will not do an exact match, but will instead filter for any data *containing* the specified search term.
When using an advanced filter on the google analytics report, we can only combine multiple filter criteria with the “and” operator; not the “or” operator.
SOLUTION
We can use a regular expression to filter the standard google analytics reports to include only traffic from the page(s) in question, and no other pages, thereby leaving out any extraneous data.
EXAMPLE
The old admissions homepage address was: admissions.gmu.edu/
The new admissions landing page is: www2.gmu.edu/admissions-aid
We would like a report that shows only these two pages, and no others. We can then see the data on these two pages, and the google analytics report will automatically aggregate and display the combined data as well.
HOW
We can use the regular expression option in the advanced report filter to do this.
Google Analytics -> Reporting -> Behavior -> Site Content -> All Pages -> Advanced Filter Include -> Page -> Matching RegExp
Here are the pages we’d like to aggregate:
www2.gmu.edu/admissions-aid
admissions.gmu.edu/
Here is the regex string that will do this:
^((admissions.gmu.edu/)|(www2.gmu.edu/admissions-aid))$
Regex string expanded for clarity:
^(
(admissions.gmu.edu/)
|
(www2.gmu.edu/admissions-aid)
)$