Sampling in Analytics 360? Explained and Solved with Analytics Canvas

Ameet WadhwaniAvoiding sampling, Google Analytics Data, unsampled reports

Note: This article was written for those using Google Analytics 360 accounts. For those using Analytics Standard check out our guide:  Google Analytics Sampling:  Explained and Solved without Analytics360

One of the hot topics for businesses using Google Analytics in their data analysis is the topic of data sampling. Many businesses using the standard Google Analytics accounts to pull their data are discovering that Google has set limits on the amount of data that can be pulled in a report query for a single date range.

In response, many online guides have been stating that the solution for sampling is to simply subscribe for a paid Google Analytics 360 account.

However, sampling does exist for Analytics 360 accounts!  But unlike Google Analytics standard accounts, Google provides tools to eliminate sampling from your queries.  

In this post, we’re going to review the cases where you may still see sampling and what you can do to eliminate it from your data sets.

Our company uses Google Analytics 360 what do you mean there’s still sampling?

Google Analytics places limits on the total number of sessions that can be included in a single ad-hoc query in order to save time and processing energy, because of this Google Analytics sampling happens even for Analytics 360 accounts. The only difference between Google Analytics free accounts and Google Analytics 360 accounts, as far as sampling is concerned, is the data threshold allowed before sampling sets in.

The basic limits are these:

  • Google Analytics Standard accounts are allowed up to 500K sessions at the property level for a single date range before sampling will be employed.
  • Google Analytics 360 accounts are allowed up to 100 million sessions at the VIEW level for a single date range before sampling will be employed when using Resource Quota, but just 1M sessions without the use of Resource Quota.

While it’s harder for most businesses to exceed the sampling threshold with a Google Analytics 360 account, the reality is that many businesses will exceed the 1M session count within the month and some will even exceed this amount within the day.

The good news for those businesses using Google Analytics 360 is that there are many more options available to deal with sampling, especially with the assistance of tools such as Analytics Canvas.  

How does Analytics Canvas Help Eliminate Sampling for Google Analytics 360 Users?

Analytics 360 users have been provided with a significant amount of versatility when it comes to customizing their report parameters and there are plenty of ways to eliminate sampling from data sets. However, as a data analyst working in the flow, the majority of these solutions for sampling are also laden with complexities that make them impediments to getting your answers.

The best solutions provided by Google are only available through APIs or through complex SQL and unfortunately, most data analysts do not have in the required skills in their wheelhouse.

If an analyst is going to use these APIs often that means requesting data with the help of IT, a process that can take many days or even weeks. If an analyst decides to try and pull the data manually through exports, it is a tedious process often resulting in data sets too large for spreadsheets and prone to human error.

The reality is, to eliminate sampling data analysts need to learn SQL, get help from someone who can make themselves available for on-demand coding, or get access to a tool that eliminates the need for coding and makes API calls, SQL queries and data transformation a visual process that anyone can follow.

Do such tools exist? Yes! That’s exactly what Analytics Canvas was designed to do.

With the help of Analytics Canvas, all an analyst needs to know to eliminate sampling is what tools Google has made available, how they work, when to use them, and the limitations within each.

The four main tools to help eliminate sampling for Analytics360 users are:

  1. Query Partitioning
  2. Resource Quota
  3. Unsampled Reports API
  4. Big Query Reports Integration

Let’s outline how each works with your data, when to use them, and what limitations to expect.

How to Resolve Sampling with Query Partitioning:

Image

What is it?

  • Query Partitioning is the process of breaking down a data query request into multiple queries to identify the point at which sampling is eliminated from the data set.

How does it work?

  • Analytics Canvas programmatically reduces the date range of the query breaking it down into multiple queries until the total sessions for each query is below the sampling threshold (100 Million Sessions) or the query has reached the smallest possible date range- a single day.

When to use it?  

  • Google Analytics 360 using Reporting API V4 → when your business sees less than 100 million sessions per day but exceeds 100 million sessions in the date range of your query.

  • When you want to simply use the Reporting API
    • don’t have BigQuery,
    • don’t want to consume Resource Quota,
    • want more dimensions and a faster query response time than the Unsampled Reports API offers.

What are the limitations?  

  • Partitioning works when there are less than 1 Million sessions per day. If your business regularly exceeds a million sessions in a single day then query partitioning will reduce sampling overall but will not eliminate it from the query.  

  • Query Partitioning cannot use calculated metrics (averages or rates) or unique metrics because Canvas will be aggregating results and these metrics become inaccurate when aggregated. The good news is, for calculated metrics, you can easily make those calculations after the fact within Canvas. There’s an explanation here.  For Users and other unique metrics, you simply cannot use partitioning to eliminate sampling if you do not include Date in your query as the aggregation will invalidate the results.

See exactly how to use Analytics Canvas to partition your data quickly and easily with our walkthrough, How Google Analytics Accounts Can Get Unsampled Data.

How to Resolve Sampling with Google Analytics 360 Resource Quota:

Image

What is it?

  • Resource Quotas have to do with the cost of running a query through the Google Analytics API. It stands to reason that queries for data vary greatly in complexity and that the more complex queries are more computationally expensive for Google to perform. In order to provide their services to all their users without breaking the “bank” limits were put in place to equalize the level of complexity provided for a general query, which is where sampling comes into play.

    Essentially Google decided that those queries exceeding the computational threshold would either need to be simplified with sampling or the increased expense should be paid for by Google Analytics 360 users with a currency called “resource quota tokens”. All Google Analytics 360 accounts are given a certain number of tokens for each property on their account each day, these tokens can then be essentially traded to purchase higher data thresholds for specific queries experiencing sampling.  

How does Resource Quota work?

  • When preparing an API call for data through Analytics Canvas, Google Analytics 360 account users can enable the use of Resource Quota tokens in order to increase their sampling thresholds. The cost of the query will be calculated by the API and those tokens will be removed from the total amount allowed on a daily or hourly basis for your account.

How much Resource Quota does our account have?

  • Google Analytics 360 allots 100,000 query cost units per day per property or 25,000 query cost units per hour per property.

When to use it?  

  • Analytics Canvas will use your Resource Quota only after first attempting to partition the data and eliminate sampling. If your business has exceeded the sampling threshold of 1 million sessions even when partitioned down to a single day, and you have enabled the option to use Resource Quota, then Canvas will submit the request again to increase the threshold and eliminate sampling from your data.

What are the limitations?  

  • If you have already used all your Resource Quota tokens in that day or hour that you are requesting the new query, Analytics Canvas will not be able to successfully eliminate sampling at that time. As a note, if a higher sampling threshold can’t be acquired for your request, the tokens won’t be removed from your account.

  • Intraday and data older than a year may still be subject to sampling, even if you approve the use of Resource Quota units.

See how Analytics Canvas uses your Resource Quota to the greatest effect with our step-by-step guide, "How to Use Resource Quota to Get Up to 100M Sessions of Unsampled Data"

How to Eliminate Sampling Using the Unsampled Reports API:

Image

What is it?

  • The Unsampled Reports API is another tool that Google Analytics 360 accounts receive a free quota for use, allowing users to easily request a one-off unsampled report with no session limits and up to 3 million lines of data. Anything in excess of 3 million lines will be aggregated into a single row titled “other”. These reports are saved in the account and can be reviewed or exported for up to 60 days, any exports are sent to the account’s associated Google Drive and will never be deleted.

How does it work?

  • A request is placed through the Unsampled Reports API, the data is then prepared and delivered to your Google Drive. As this process can not be automated directly through Google Analytics, Analytics Canvas has designed a solution to allow you to request all the queries you want at once, download them, and automatically integrate the data together in a file or loaded directly into a database.  

When to use it?  

  • When you are using Google’s Core Reporting API and find that a report has been calculated using sampling you can easily place a request for the same report, yet unsampled, through the Unsampled Reports API.
  • Since the introduction of Resource Quota, there is less of a need for Unsampled Reports.  However if you have exhausted your Resource Quota for the day and you can’t or don’t want to use BigQuery, Unsampled Reports will do the trick.

What are the limitations?  

  • The Unsampled Reports API is MUCH slower than partitioning or using Resource Quota.
  • You can only request up to 4 dimensions through the Unsampled Reports API, many of our customers find this makes it hard to see the full picture of data they are looking for.
  • Reports with long date ranges (e.g., over 1 year) can fail.
  • Unsampled data is not available for the “compare to previous period” feature, flow visualization, dashboards, or Multi-Channel Funnel reports that include more than 1 million conversion paths.

See how Analytics Canvas makes this whole process just a few simple steps with our complete walkthrough, "How to use the Unsampled Reports API."

How to Eliminate Sampling Using the BigQuery Reports Integration:

Image

What is it?

  • The BigQuery integration allows for interactive analysis of very large data sets. Google Analytics 360 customers receive a monthly credit towards querying data and automatic access to Google Analytics data from the BigQuery interface, reducing expense and time to query data for analysis.

How does it work?

  • After setting up a link between BigQuery and the Google Analytics 360 account with views you would like to pull unsampled data from, you can receive a continuous stream of data that will be saved in your BigQuery data warehouse. However, getting this data out requires a more complicated knowledge of SQL and the BigQuery console has limited export abilities.

  • While you can write SQL to pull data from BigQuery into your Canvas, however, since the schema for Analytics 360 is known, Analytics Canvas provides a specialized solution that requires no coding or programming knowledge. This connection allows you to easily pull large sets of data out of BigQuery by pointing and clicking your way through a visual query builder. No SQL required - Canvas will automatically generate the required SQL for you!  You can then process that data for use using Canvas, then load your data back into BigQuery, into your data warehouse, or into various file types including Excel and Tableau. 

When to use it?  

  • When there is a specific data view that you find you need to use on a regular basis BigQuery can help automate the daily request and preparation of that data. Analytics Canvas can further automate the transfer of this data from the BigQuery data warehouse into the storage or visualization tools you plan to use to take advantage of this data.

  • When you want hit-level data indexed to the millisecond

  • When you want to query for an unlimited number of dimensions and metrics

  • When your queries exceed the limits available through Partitioning, Resource Quota, and the Unsampled Reports API.  Data is never sampled when extracted from BigQuery.

What are the limitations?  

  • Without the functionality of Analytics Canvas, BigQuery only offers two limited solutions for extracting the data (a REST API and file exports), both of which involve writing SQL to extract your data.

Visit our walkthrough, "How to Get Analytics 360 Data Out of BigQuery Without Writing SQL."

Conclusion

Despite the common misconceptions, Google Analytics 360 (or premium) accounts DO have limits to the data they can extract. Beyond those limits, sampling is employed. The good news is that Google has provided many powerful APIs for 360 users and if employed correctly these APIs can be effective at eliminating sampling from a report.

Still, the reality is, if you’re encountering sampling in your data even when you’ve increased your thresholds with a Google Analytics 360, utilizing those APIs to get your data can be more of a hindrance than a help since few data analysts know how to write the proper scripts to even use the APIs to pull out unsampled data.

Google Analytics sampling issues have left businesses with little choice.

You can pay to have your developers learn the complexities of Google APIs and force your analysts to work within the limited free time of a developer’s schedule; OR you can provide your team with an out of the box solution created by a Google Technology Partner that has been eliminating sampling for nearly 10 years and is dedicated to staying on top of the Analytics APIs to ensure the connectors are up to date and you are able to take advantage of the features and automated functionality right away.

The choice is yours. Your data can be fully dependent upon the flexibility of programmers, or it can be available to every member of your team who can follow their flow of analysis to find you the answers you need to run your business successfully into the future.

Don’t place unnecessary limitations on your team! Try Analytics Canvas for FREE for 30 days and see how your team performs when the power is in their hands.