How to load more than 150K rows from SQL Server to Google Data Studio

James StandenGoogle Data Studio

It is possible to load more than 150k rows from SQL Server to Google Data Studio

In this blog post, we’ll explain how you can avoid hitting the 150,000 row limit in Google Data Studio when you have large data sets in SQL Server. 

We’ll give you a heads up on how Data Studio will actually show incorrect data if you do exceed the 150,000 row limit, which is a reported issue.

Finally, we’ll introduce an easier way to get larger SQL Server data sets into Google Data Studio and speed up your reports using Analytics Canvas and without having to use custom SQL.

Get all your data into Google Data Studio in the next 10 minutes! Sign-up for an Instant Free Trial, connect to your data, and publish it to Data Studio.

What is the 150k row limit for databases like SQL Server and MySQL in Data Studio and how do I avoid it?

Google has limited the number of rows that can be accessed using their Database connectors to 150,000 rows.

For small amounts of data, the default built-in connectors in Data Studio for SQL Server or MySQL might be all you need.  But if you have more than 150k rows in any of your tables, these connectors cannot be used directly.  Instead you're required to use custom SQL to reduce the number of rows below the 150k limit.

In this post we'll explain how to use the custom SQL feature of the Google Data Studio database connectors to avoid this limit. However, be warned, this does significantly slow down your reports.  

There is also a way to load all of your data and get exceptional performance without creating a burden on your database, which we'll cover at the end of this post. 

WARNING: Known reported issue with Data Studio Database connectors

As of this writing (we’ll update as this is addressed by Google) the Google database connectors for Data Studio do not warn you if the row limit has been exceeded.  The Data Studio report will show incomplete data, and scorecard totals will be wrong.  

You might see missing data in a line chart, for example. The data suddenly stops even though in the database there is more data for the remainder of the date range.  This might happen at the beginning or at the end of the chart- some data might be shown, but it might be incorrect.

The Analytics Canvas Data Studio Data Connector is not affected by this limitation.

Here's what happens when when the 150K row limit is reached:
reaching the 150k row limit in Data Studio will result in charts and tables with incomplete data

We have created a bug report with Google and we do a deeper dive on this issue here showing exactly whats going on to help you confirm if this is what you might be seeing.

How to use custom SQL to avoid Data Studio database row limits

To avoid the 150k row limit in Data Studio, use custom SQL to reduce the number of rows instead of connecting to the entire table.  

This has the distinct and unavoidable disadvantage of slowing down the report refresh time and putting additional load on your database.

You will also have to create multiple data sources off of any given table, one for each of the columns or combination of columns you want to show. Unfortunately, you can’t just point Data Studio at the table because the results will always be limited to 150,000 rows. 

Example of using custom SQL to avoid Data Studio's 150K database row limits

Let's see how we do this using a table with 3.6 Million rows, a demo sales data set of the Acme Fruit and Veg company.  This data has a date, store ID, product and geography information in it, with individual sales by product, by store, by day over a period from 2017 to the present.

Here we want to see sales by country and product in line and donut charts in Data Studio.  And maybe want to have the option of filtering by SKU.

What we need to do is reduce the number of rows returned to Data Studio from the 3.6M rows in the table to less than 150k. 

To do this, we need some custom SQL:

SELECT  DATEADD(dd,0,DATEDIFF(dd,0,[SaleDate])) as [SaleDate],[SKU],[Store_Country], sum([Revenue]) AS Revenue  
FROM AcmeStoreSales
WHERE [SaleDate]>DATEADD(month,-6,getdate())  
GROUP BY  DATEADD(dd,0,DATEDIFF(dd,0,[SaleDate])) ,[SKU],[Store_Country]

In this query, we aggregate away the store detail, rolling up to data at a per day level, and limiting the date range to just data from 6 months ago onwards.  This reduces the number of rows to less than the 150,000 row limit in Data Studio for SQL Server. 

Of course, it also means your users can’t see a full year of data!  But with this data set it's just not possible to get below the 150k limit without doing that for these two columns.

How to enter a custom query to create a specific data set for a chart or table in Data Studio

To do run this query from within Data Studio, we go into Google's SQL Server connector for Data Studio, enter our database information and credentials, and select Custom SQL:

custom SQL query using Microsoft SQL Server connector for Google Data Studio

As a result, it is now possible to create the graph we want.

Chart showing a dataset with 3.6M rows in Google Data Studio

But unfortunately, because now we must run this query on our database each time a user opens or refreshes this report, we find that this report is very slow to refresh.

With just two charts it can be over 15 seconds to load - with a dashboard with 5 charts it can take 30 or 40 seconds or more.

This of course depends on the performance and load on your database. For the tests in this blog post, we used a 100 DTU Azure SQL database which costs about $2300 a year and just 1 test user. Obviously if you have many users, even on a larger DB reports might be slow.

Want to avoid custom SQL? Use Analytics Canvas to load huge SQL Server data into Data Studio and Speed up reports - Loads in 3 seconds.

Analytics Canvas can load all 3.6 Million rows from SQL Server, and make them all available in Data Studio at once in one data source.  

And it's fast! The same dashboards using Analytics Canvas refreshed in 2-4 seconds, up to ten times faster than the Google SQL Server connector with custom SQL!

Google Data Studio SQL Server connector performance vs Analytics Canvas

Interested in a more technical look at WHY the Google Data Studio connector might be slow and how it can cause issues with your SQL server database? Check out this post about too many connections and errors in connecting to databases from Data Studio.

How does Analytics Canvas make Google Data Studio reports so much faster?  

To increase the performance of SQL Server data in Data Studio, Analytics Canvas buffers the data in BigQuery automatically, and you don’t even need to have your own BigQuery account.  

Because the data is in BigQuery, no matter how heavy the load, your reports will be much faster, and your database will never get overloaded.  You can control when, and how often Analytics Canvas updates the data. You can update constantly, or just once a day, whatever your data routine calls for.  Either way, it won't be every time a user interacts with a Data Studio report! 

Since Analytics Canvas can load all of your data without any row limits, you don’t have to write any SQL or have multiple data sources with different column combinations.

Next Steps

Whenever you’re ready… here are 3 ways we can help you with your Data Studio reporting challenges:

  1. Speed up Data Studio reports 
  2. Resolve connection and access issues between your DB and Data Studio 
  3. Clean and prepare data sets prior to reporting in Data Studio

Ready for the next step?

  • Start a 30 day risk-free trial. No credit card or sales call required. 
  • Schedule a demo for you and your team.
  • Contact us to discuss plans and pricing or activate your subscription 

Wondering if Canvas is right for you? Check out the related articles to learn more about our Data Studio Partner connector.