How to Generate Large Datasets for Google Data Studio with Analytics Canvas
To use this feature, you will need Analytics Canvas V3.5 or greater.
On the Data Studio export block of any Canvas, including ones built in earlier versions, you will see a tab called "Update Options". Within that tab there are two options for configuring your export – Overwrite or Append.
Update Options: Overwrite
The Overwrite option creates a new dataset from scratch each time. This is similar to dropping and reloading a database table. Overwriting is a good option for small to mid-size datasets and for cases where any row in the table can change between runs. It will work fine on larger datasets, provided your machine has the capacity to load those sets.
To generate a dataset at once, start by loading the data into a Canvas. Next connect and configure the export with the Data Studio Update Option set to "Overwrite" and run the file. When the block turns green, the data will be available in Google Data Studio under the Analytics Canvas Partner Connector.
This is the easiest way to create a dataset but it is the most resource intensive process to maintain. If you're using Analytics Canvas Desktop/Cloud Automation service and are concerned about quota, or if the load takes a long time and slows down your machine, use the Append option.
Update Options: Append
The option to append to an existing dataset opens up a world of possibilities. By setting the Update Option to Append, you can simply load new data onto an existing dataset.
There are a couple of downsides of this method. The first is that you can inadvertently duplicate your data by running the same export twice. Having done this, the only solution is to generate the dataset from scratch.
The second issue relates to data reprocessing. Data sources like Google Analytics will reprocess data within a few days before the data is considered 'golden'. At that point, the data is frozen and will not change.
The problem is, many people simply load the last 1 day of data and append it to their dataset. Reprocessing can affect more than just the previous day, however, which means that the dataset slowly becomes out of sync with the underlying data source. This can lead to awkward conversations about data quality and data that "doesn't match" what a user sees when they run their own report.
To solve for this, Analytics Canvas can perform a date-based incremental load.
Date Based Incremental Load
With time-series data, such as sales transactions or Google Analytics sessions by day, you can perform a date-based incremental load. This means that a date field is used to identify data that is to be replaced in the dataset.
When the Canvas runs, it will first remove data from the existing dataset where the dates in the selected date field falls within the Master Date Range of the Canvas. This is the critical step! This setting allows you to run the same file multiple times without corrupting the dataset with duplicate rows.
Google Analytics data is a great example of data that should be loaded with this method. Standard accounts have their data processed within 24-48 hours. This means that the data recorded on any given day can be adjusted for up to 2 days. With a simple nightly load, stored data will slowly fall out of sync with the API.
To setup a date based incremental load, first create the initial dataset in Analytics Canvas. Next, on the Data Studio export block, change the Update Option to Append and check the box that says "Delete Data In Master Date Range Before Append". If you have more than 1 date column, choose the one that will be used to perform the incremental load.
Using this technique, you can continue appending to a dataset, dramatically increasing the row limit in Google Data Studio to 2 Billion rows.
APIs as the Data Source for Date Based Incremental Load
When using APIs in Analytics Canvas, simply link them to the Master Date Range under the Time Period tab of each query. When the file runs, the API connector will pull data matching that range. Canvas will then remove data matching that range from the Data Studio dataset before replacing it with fresh data from the API.
Databases as the Data Source for Date Based Incremental Load
When loading data from a database, the Where clause of the SQL query must reference the Master Date Range. Simply use the date parameters for the Canvas in the query to retrieve data that aligns with the Master Date Range. Canvas will then remove that data from the dataset before replacing it with fresh data from the database.
Check the box to "Enable Parameters". The Parameter Preview window will open when you add the Date parameters for referencing the Master Date Range. These are $AC_StartDate and $AC_EndDate parameters and they can be used in any expression editor.
When the date parameters are used this way, data retrieved from databases and APIs will link to the Master Date Range of the Canvas. The export to Data Studio will also be linked to this same range, ensuring that the dataset is kept up to date.
To manually load up a dataset, simply configure the Master Date Range and run the Canvas until the dataset contains all of your data.
Automated Date Based Incremental Load
Once a historic dataset has been created, it is easy to keep it up to date with one of two automation modes in Analytics Canvas: Server Automation which runs on your hardware, or Cloud Automation, which runs in our cloud.
For the process to run in automation, verify the following:
- The data sources have been linked to the Master Date Range
- The Master Date Range is set to a relative period
- The Update Options tab of the Data Studio export block has been set to Append and Delete Data In Mater Date Range before Append has been checked
Configuring the canvas this way allows it to be scheduled for routine updates using either of the two automation options.
If you need additional support getting your data into Data Studio or speeding up your dashboards, contact us.