Content Analysis on high traffic websites using BigQuery

James StandenAnalytics, Analytics Canvas pro tips, Google BigQuery

Using Analytics Canvas and Google BigQuery, it is possible to do detailed content analysis for sites with 10s of millions of annual visitors, even with Google Analytics Standard.

In this blog post we outline how to avoid sampling and load detailed content analysis data into BigQuery. Analytics Canvas provides the solution.

Google Analytics Premium obviously gives you many more options and capability (which Analytics Canvas supports fully).

But even if you don’t have Premium, it is possible to do a deep dive into even 100s of millions of visits by using Analytics Canvas and Google BigQuery.

Analytics Canvas lets you automate Core Reporting API queries, extracting data at a daily level from GA Standard. It also lets you clean up the data, then append it to a BigQuery table- this means that you can get millions and millions of rows of unsampled data available for analysis quickly and easily- all without GA Premium.

Eliminating sampling and storing huge data sets

Imagine that you are operating a content site with 50,000 visitors a day. You have millions of visits, thousands of pieces of content written by perhaps hundreds of authors.

Fifty thousand visitors a day will cause sampling in Google analytics almost immediately, depending on the number of unique URLs involved. Sampling is triggered both through the number of sessions involved (so after 10 days) but also there is a limit of 1 million unique rows per query- so very large queries accross long time spans will not return detailed information, but use “other”. For both of these reasons, when using Google Analytics Standard, the power of Analytics Canvas lets you automatically partition queries- breaking them down into individual requests that avoid sampling, while respecting all of Googles API rules and quotas. The result is a detailed data set in your database ready for analysis.

Setting up a GA Core reporting query to be automated

bigquery-analytics-canvas-google-analyticsThe first step is to get the large volume of data into a repository that can handle it. We’re going to use BigQuery which is Googles Big data cloud platform- to use this you need to get an account, but Analytics Canvas also connects to numerous databases which are another option.

BigQuery is very affordable– as an example, in a month we loaded over 50 million rows looking at over 150 million sessions, total storage and processing costs were less than $8 for the month.

The key power of Analytics Canvas is to load data in multiple steps automatically, ensuring that sampling is eliminated. In addition, it can clean the data making it ready to join to our content information from the CMS.

This is done by laying graphical blocks on a canvas- left to right, we can see the Google Analytics Query, a calculation block to clean the data, a summarize block to roll up the values to the new summarized URLs and an export block to write it into a BigQuery table: When we run this for one day, as long as the daily sessions are less than half a million, sampling is avoided.

writing-ga-content-data-to-google-bigquery

Automate the query for millions of rows of unsampled goodness

analytics-canvas-batch-file-menu-selectionThen we run the above canvas a few hundred times, each time with a different day. Analytics Canvas lets you automate hundreds, or even thousands of runs.

The result is a BigQuery table with all your sessions by landing page, source, medium, campaign, device type etc., with bounces, session duration, goal conversions, whatever metrics you want to track. By getting fancy you can even exceed the 10 metric limitation in the API, (just as long as you can join multiple queries together on the same dimensions- something again that Analytics Canvas makes easy.)

If you have the kind of traffic volume that is overwhelming your spreadsheets, take a step up to the next level.

Give Analytics Canvas a try all this can be done with the full feature trial version, or if you’d like a bit of help getting started, contact us– we have done this type of analysis for clients, and are happy to provide a more detailed overview and even a proof of concept to show you what you are missing by not diving deep into your unsampled data.