Google Analytics is an amazing tool. It enables you to make the data work for you, get a broader picture, and understand your visitors better.
The problem comes when you expect more from this ‘reporting’ tool. I will tell why you should consider leveraging BigQuery (or a similar tool) along with your Google Analytics data.
3 Reasons Why:
1. Handling Changes over time:
Every website gets experiential changes over time. Along with this, the way you store data will also change. Hence, for an apple to apple comparison, you will need to transform the data to a common view.
2. Data Sampling
If you are dependent on Google Analytics, you would know that GA samples its data often. As long as you are not molding the data enough or you are using the out of the box reports, GA will give you a 100% accurate data. It’s only when you start filtering your data or increase its cardinality, Google Analytics will start sampling the data proportionally to maintain its speed.
*Sampling: During Sampling, GA returns you the metrics based on a smaller sample space instead of the whole pool of data.
3. Lack of Data Manipulation and Transformation
Once the data is stored in the analytics servers, you won’t be able to modify the data using formulas or other logic. This is significant because there is always a lot of scope for data classification and transformation for a website with a huge audience and multiple content categories. Additionally, there will several scenarios where you will need to clean the data. These scenarios can be – human errors, implementation gone wrong, bots missed by Google Analytics. If you want to know more about identifying bots missed by Google Analytics, read my article on this topic.
BigQuery is a cloud based data warehousing tool that lets you store and analyze petabytes of data at lightning speed. For GA 360 users, Google provides you the option to get the daily data dump for website’s sessions into BigQuery. You can use this data to overcome the challenges discussed in the previous section.
How does it work?:
Everyday Google Analytics stores its sessions’ data in BigQuery Servers. This session data is a table where each row is dedicated to a user visit while each column represents a different dimension or metric that can be repeated and nested. For a rough idea: Column A: visitor Id or cookie id, Column B: date of the session. The point where it gets complex is when the table stores all the hits(events), page views and custom dimensions in one row. This is what makes BigQuery different from a flat table system.
2. This table acts as our input. First, we un-nest and un-stitch this complex nested data by flattening it. And then, we stitch it back according to our needs.
3. Being a proper cloud based ETL tool, it provides us great transformation features and returns un-sampled data at great speed.
Recently, I used BigQuery for a huge research and advisory based website that has multiple site sections and massive content collection covering various industries. We created a classification structure in BigQuery based on several inputs that would have been impossible via GA. Additionally, there was one scenario, in which the way of recording a form submission based event has changed from URL based to Event-based because of change in the experience of filling the form. We were able to identify the sessions with the particular event easily with BigQuery for a simple comparison. And of course, it goes without saying that you can do a lot of data cleaning with this tool.
While working with Google Analytics data in BigQuery, you will get in-depth knowledge about ‘How Google Analytics stores and calculate all the reports’. Also, after exploring the schema you will find several interesting field which can be further used to answer some questions that Google Analytics cannot (for now).
For those interested in Querying the Google Analytics data in BigQuery, take a look at this detailed guide