How to Determine if Your Data is Sampled and Calculate the Sample Rate in Google Analytics using Analytics Canvas
Analytics Canvas will automatically detect and eliminate sampling wherever possible. For Analytics 360 accounts, there's always a way to eliminate sampling using Analytics Canvas, its simply a matter of using the right API. Standard accounts ou may still experience sampling if the sampling limits established by Google are exceeded.
With Analytics Canvas, you can quickly and easily determine if your data is being sampled and if so, calculate the sample rate. In this tutorial, we're going to purposefully generate a sampled dataset and calculate the sample rate.
You will need a trial license key or a subscription to use the features discussed in this tutorial. If you haven’t already done so, sign-up for a free 30-day trial and instantly get your license key so you can begin.
- Open Analytics Canvas and apply your license key (you only need to do this once).
- Go to New Source > Google Analytics > Reporting API V4, and:
- generate a query that contains sampling. To do this, take 500,000 and divide it by the average number of daily sessions on the site. For example, if you get 5,000 sessions per day, you will see sampling if your query for more than 100 days.
- Next, add dimensions and metrics to your query. For example, Date, Country, Source, Referral Path, Page, Sessions, and Bounces.
- Go to the Sampling tab.
- By default, the option is set to "Automatically determine partitioning”. Change this to “Do Not Partition”.
- If you have an Analytics360 account, deselect the option to Use Resource Quota.
- Under Additional Columns, select all columns. This will be used to calculate the sample rate
- Click OK to run the query.
- When the query has run and Sampling has been detected, right click on the Canvas, choose Add Data Block > Calculate
- Connect the sampled query to the Calculation Block:
- Click the center of the Calculation Block to bring up the control
- Click “+Calc Col” to add a calculated column
- Paste in the following calculation:
If([Original.ContainsSampledData] = True, [Original.sampleSize]/[Original.sampleSpace], 1)
This calculation checks if there was sampling, and if there was, calculates the sample rate.
“Sample size” refers to the sessions being used to calculate the data for your report, while “sample space” refers to the full set of sessions you queried for.
This calculation determines the sample rate, or the integrity of the data in your report on a scale from 0 to 1: If there was no sampling, the calculation will yield the number “1,” meaning that 100% of your dataset was used in your report. The lower the number is below “1,” the more aggressively your data was sampled, the more data is missing from your analysis, and the less precise your standard GA reports are.
- Click OK.
- Label the field “Sample Rate”
- Click on the output stub of the calculation block - you will see the new calculation as part of the data set.
You want the Sample Rate to be 1, or 100%, indicating that you have the entire dataset. The lower the number, the more data is missing from your analysis.
Getting Unsampled Data with Analytics Canvas
Want to get straight to using unsampled data? Refer to the following tutorials for detailed instructions on how to get unsampled data using Analytics Canvas: