For most people, 2 Billion rows is an incredible amount of data that they will never have to process, ever. Yet long before hitting the 1M row limit of an Excel workbook, the 5M cell limit of a Google Sheet, or the limits of a database, performance will seriously degrade. Then the trade-offs begin - process less data, or spend more time processing one particular set at the expense of another.
We set out to address this problem with the latest release of Analytics Canvas. We have effectively removed row limits on our Google Data Studio Partner Connector while delivering exceptional performance!
Freedom from Row Limits in Google Data Studio
Analytics Canvas was among the first 10 partners to release a connector for Google Data Studio. Serving a global market of business users and IT professionals, we've seen a wide array of use cases and creative ways that people are using Data Studio.
So far we've satisfied customers looking to dramatically increase the performance of their reports. Our Partner Connector takes the pressure off of databases and allows them to seamlessly connect more of their data to Google Data Studio.
Customers are using Canvas to get data from SAP HANA, Amazon RedShift, Microsoft SQL Server, Oracle, and our best-in-class connectors for Google Analytics, Google AdWords and Search Console among other databases, files, and APIs.
But one issue kept coming up with customers large and small both as datasets continued to grow over time and as their adoption of Data Studio increased: data size limits.
How to Generate Large Datasets for Google Data Studio with Analytics Canvas
Experienced Canvas users should have no trouble using this new feature to blow past any previous Google Data Studio row limits. For those who are new to Canvas or who are not experienced with the database exports, this guide should help.
To use this new feature, first download and install the latest version of Canvas. You will need Analytics Canvas V3.5 or greater. New to Analytics Canvas? There's a full featured, no-hassle 30 day free trial available as well. Try it! You can have all your data connected to Data Studio today.
On the Data Studio export block of any Canvas, including ones built in earlier versions, you will see a new tab called "Update Options". Within that tab there are two options for configuring your export – Overwrite or Append.
Update Options: Overwrite
The Overwrite option creates a new dataset from scratch each time. This is similar to dropping and reloading a database table. Overwriting is a good option for small to mid-size datasets and for cases where any row in the table can change between runs. It will work fine on larger datasets, provided your machine has the capacity to load those sets.
To generate a dataset at once, start by loading the data into a Canvas. Next connect and configure the export with the Data Studio Update Option set to "Overwrite" and run the file. When the block turns green, the data will be available in Google Data Studio under the Analytics Canvas Partner Connector.
This is the easiest way to create a dataset but it is the most resource intensive process to maintain. If you're using our Cloud Automation service and are concerned about quota, or if the load takes a long time and slows down your machine, use the Append option.
Update Options: Append
The option to append to an existing dataset opens up a world of possibilities. By setting the Update Option to Append, you can simply load new data onto an existing dataset.
There are a couple of downsides of this method. The first is that you can inadvertently duplicate your data by running the same export twice. Having done this, the only solution is to generate the dataset from scratch.
The second issue relates to data reprocessing. Data sources like Google Analytics will reprocess data within a few days before the data is considered 'golden'. At that point, the data is frozen and will not change.
The problem is, many people simply load the last 1 day of data and append it to their dataset. Reprocessing can affect more than just the previous day, however, which means that the dataset slowly becomes out of sync with the underlying data source. This can lead to awkward conversations about data quality and data that "doesn't match" what a user sees when they run their own report.
To solve for this, Analytics Canvas can perform a date-based incremental load.
Date Based Incremental Load
With time-series data, such as sales transactions or Google Analytics sessions by day, you can perform a date-based incremental load. This means that a date field is used to identify data that is to be replaced in the dataset.
When the Canvas runs, it will first remove data from the existing dataset where the dates in the selected date field falls within the Master Date Range of the Canvas. This is the critical step! This setting allows you to run the same file multiple times without corrupting the dataset with duplicate rows.
Google Analytics data is a great example of data that should be loaded with this method. Standard accounts have their data processed within 24-48 hours. This means that the data recorded on any given day can be adjusted for up to 2 days. With a simple nightly load, stored data will slowly fall out of sync with the API.
To setup a date based incremental load, first create the initial dataset in Analytics Canvas. Next, on the Data Studio export block, change the Update Option to Append and check the box that says "Delete Data In Master Date Range Before Append". If you have more than 1 date column, choose the one that will be used to perform the incremental load.
Using this technique, you can continue appending to a dataset, dramatically increasing the row limit in Google Data Studio to 2 Billion rows.
APIs as the Data Source for Date Based Incremental Load
When using APIs in Analytics Canvas, simply link them to the Master Date Range under the Time Period tab of each query. When the file runs, the API connector will pull data matching that range. Canvas will then remove data matching that range from the Data Studio dataset before replacing it with fresh data from the API.
Databases as the Data Source for Date Based Incremental Load
When loading data from a database, the Where clause of the SQL query must reference the Master Date Range. Simply use the date parameters for the Canvas in the query to retrieve data that aligns with the Master Date Range. Canvas will then remove that data from the dataset before replacing it with fresh data from the database.
Ensure you check the box to "Enable Parameters" - you will then see the Parameter Preview window open when you add the Date parameters for referencing the Master Date Range. These are $AC_StartDate and $AC_EndDate and they can be used in any expression editor.
When the date parameters are used this way, data retrieved from databases and APIs will link to the Master Date Range of the Canvas. The export to Data Studio will also be linked to this same range, ensuring that the dataset is kept up to date.
To manually load up a dataset, simply configure the Master Date Range and run the Canvas until the dataset contains all of your data.
Automated Date Based Incremental Load
Once a historic dataset has been created, it is easy to keep it up to date with one of two automation modes in Analytics Canvas: Server Automation which runs on your hardware, or Cloud Automation, which runs in our cloud.
For the process to run in automation, verify the following:
- The data sources have been linked to the Master Date Range
- The Master Date Range is set to a relative period
- The Update Options tab of the Data Studio export block has been set to Append and Delete Data In Mater Date Range before Append has been checked
When the Canvas has been configured this way, it can be scheduled for routine updates using either of the two automation options.
Performance in Google Data Studio
If your Google Data Studio reports are slow, chances are you're using the native Sheets connector, your own MySQL or Postgres database, or an API from one of the other Partner Connectors.
The Analytics Canvas connector is powered by BigQuery and uses Data Studio Advanced Services. This means that instead of passing data between the connector and Data Studio, we pass queries to BigQuery. The results are then returned directly to Data Studio.
Whether your dataset is 1,000 rows, 1M rows or 1B rows, processing speed is pretty much the same. In fact, the larger the dataset, the more impressive the results! But don't just take our word for it - there is an instantly available free trial. You can setup your own test very quickly either on your own or in a guided session with one of us.
Concerns over Google Data Studio row limits are a thing of the past. The Analytics Canvas Partner Connector is now capable of holding up to 2 Billion rows per dataset! Even very large datasets perform exceptionally well, and that performance is highly scalable as reporting ramps up.
Our best-in-class connector is included in our Starter, Pro, and Premium plans available for $49/mo, $299/mo, and $499/mo respectively.