Missing data in Google Data Studio when using database connectors

James Standen Google Data Studio

When using Google Data Studio database connectors you might see data missing for a given date range, or incorrect totals in scorecards and graphs. In this post, we look at why data is missing in Google Data Studio reports when querying against SQL Server, MySQL, Postgres, and other database connectors.  We also show you two different methods to ensure all of your data is included. If interested, you can read more about all the ways you can get data from SQL to Google Data Studio.

Data Limits = Missing Data?

Google has specified in the documentation that you can query a maximum of 150K rows per query using the database connectors.

However, there is no warning in Data Studio that you’ve reached this limit.  Data will simply be missing from your charts and reports if you use too large a table. Hopefully the gap is noticeable!  If not, you and your users will have no idea that there is missing data and may come to incorrect conclusions. 

It's possible that at first, when the table is under the 150,000 row limit you see all the data. But as the table size grows (for example as new data is added to it or additional historical data is appended) the problem will set in.  This could be happening on a report that was working in the past and hasn’t had any changes. And depending on how the data is loaded, users may be completely unaware that there are holes in the data. 

Testing the limits of the Data Studio database connectors

Let's look in detail at what is happening so you can determine if you are missing data in Google Data Studio from your database due to this limit.  In the example below we use a data set of a bit more than 3.6 Million rows.  The complete data looks like this:

example of a complete data set with 3.6M rows of data in Google Data Studio

But when we use the Data Studio SQL connector, we see this fragment of the data in our line chart:

an incomplete line chart showing what happens when data is missing from Google Data Studio

Why is the data missing in the line chart?  Because the connector only loads in the first 150,000 rows in the SQL Server table!

Depending on how the data was loaded into the SQL Server table, this could even not be a continuous date range.  You might see no data at all in Data Studio, or you might see incomplete or incorrect data depending on the date range selected as we’ll see.

Example 2: Data Studio Scorecards with missing data

In this next example, the scorecards show how exceeding the data limit causes incomplete and inaccurate results.

To start, we’ll take a subset of data, so that there are not too many rows to start.  We do this with about 75,000 rows and we see the result looking at a period from August 1st to September 15th:

correct view of line chart and scorecards

 

This is right - the totals and row count match the database.  We can verify this with a query in SQL Server Management Studio (SSMS):

SQL for generating values in scorecards using SSMS

Watch what happens if we add historical data before the August 1st start date shown in the SQL above. Note that this data that will not be included in the report. It is just loaded into the table and brings the row limit above 150K:

line chart showing missing data in Data Studio

Data Studio does not show all the data!

We see missing data in the line chart and issues with both the row count and the revenue in the scorecards.

Why is Data Studio not showing all the data when using the native database connectors? Because of the row limit! If we add the row limit into the SQL query in SSMS, we’ll see why Data Studio is giving the results it is:

SQL and result of SSMS query when 150K limit applied

An exact match when the "select top 150000 *" clause is added! 

If you try this type of SQL on your data set you can confirm if it is the row limit that is causing your issues.  Adding the “select top 150000” shows us why the Data Studio SQL Server connector is not returning all the rows.

To avoid this issue, you could use Custom SQL in the Data Studio connector by following these instructions.

But the best way to avoid this issue is to not use a row limited connector.  Adding custom SQL means more maintenance effort and it slows down the reports

Buffering SQL Server data for faster Data Studio reporting with Analytics Canvas

Analytics Canvas provides SQL Server, PostGres, MySQL, Oracle, and other database connectors that let you load all your data from your database without any row limits. 

The Analytics Canvas Data Studio Partner Connector is powered by BigQuery. It is FAST, scalable, and capable of delivering ALL your data to your Data Studio users.

It is easy to use and can be configured in minutes. No technical resources, no Google Cloud Console administration, no pressure on your existing database. And if you're connecting with Canvas Desktop from behind your own firewall, no list of IPs to permit!

Best of all, no more missing data! 

Next Steps

Whenever you’re ready… here are 3 ways Analytics Canvas can help you with your Data Studio reporting challenges:

  1. Speed up Data Studio reports 
  2. Resolve connection and access issues between your DB and Data Studio 
  3. Clean and prepare data sets prior to reporting in Data Studio

Ready for the next step?

  • Start a 30 day risk-free trial. No credit card or sales call required. 
  • Schedule a demo for you and your team.
  • Contact us to discuss plans and pricing or activate your subscription 

Wondering if Canvas is right for you? Check out the related articles to learn more about our Data Studio Partner connector.