How Google Data Studio Connects to Databases and How to Resolve Connection Issues

James Standen Google Data Studio

In this post we look at how Google Data Studio connects to your database, what causes the myriad of database connection errors that users come across, and offer solutions on how to resolve them. 

To demonstrate how the connection process works and what was causing these errors, we ran a series of tests.  Later in the post we explain how you can run the same tests against your database to see if Data Studio is the cause of the performance and connectivity issues you’re experiencing. 

Errors and Issues with Data Studio SQL Server and other Database connectors

Data Studio is a distributed system capable of having many users online at once.  It generates requests for data in parallel, as users make them. Since it does this for every user using Data Studio, and for every click of every filter/date range change, your database works really hard to serve data to each user and report. 

All of these requests create a large number of connections - one or more per chart, per page, and per user - and a large number of queries to your database.  Often this results in connection issues and other error messages returned from Data Studio.  

When only a few charts are needed to query smaller data sets (less than 150,000 rows), the Data Studio SQL Server, MySQL, and other database connector can often be a good solution.  But reports that have most of their charts reading from a database, often using custom SQL to avoid row limits, are often slow to load and refresh. 

Since Data Studio does not buffer your data within the Google Cloud it has to query the database constantly, leading to connection issues and “fatal errors” about having “too many connections”.    

While it uses caching when users are requesting exactly the same data, any type of dynamic filters, date range selections, etc., will mean another hit on the database. Data Studio is more than capable of handing large data sets, but not via the database connectors. 

Common database connection errors in Google Data Studio

Assuming you have been able to connect to your database platform directly from within Data Studio, you may get any of the following errors if the database is under load:

Data Studio Connection Error
Data Studio cannot connect to your data set.
Sorry, we encountered an error and were unable to complete your request.
Request contains an invalid argument
Error ID 814d45af

Failed to create Jdbc connection with error: SQL Server returned an incomplete response. The connection has been closed.
DataSet Configuration Error Request contains an invalid argument
Can’t connect to the database. Please double check your connection parameters.
System Error Data Studio has encountered a system error. Sorry, we encountered an error and were unable to complete your request.
common database connection error in Google Data Studio

Some of these errors are a bit misleading, as they seem to often happen simply because the database is overloaded, not because there is an issue with your configuration.  

We’ve seen cases where after getting errors, simply refreshing the report without changing anything else will cause it to work again.  This appears to happen once the database is no longer overloaded.

But what is causing such a heavy load on seemingly simple dashboards and reports? 

How many queries does Google Data Studio make for a single chart?

If we look at a Data Studio dashboard, every element on the dashboard results in one or more queries against the database unless the exact data is already cached.

All of these queries happen at the same time so the database sees a very large spike in requests and connections.

If you have dashboards with many charts and tables, and multiple users viewing it while changing date ranges and filters, your database may have a hard time keeping up with the spike in requests.  Even worse, the reports will be extremely slow or have intermittent errors like those above.

Let’s look at just one chart as an example to see what Data Studio is doing to your database:

simple Data Studio line chart that makes 2 database queries

This is a simple line chart with a dimensional breakdown for Country.

This chart type results in two queries:  

  1. a SELECT for the data in the chart, and 
  2. a query which gets the values for the breakdown dimension (Country) over the time period to know which dimensions are the top 10  

This is the same for other elements, like the donut and pie charts in Data Studio that also show a top 10 legend.

A report with just 6 charts like the ones shown below results in 12 queries.  

When we tested this with SQL Server, we saw 8 separate connections to the database!

6 charts that result in 12 queries and 8 database connections from Data Studio

If you have a lot of users, and a lot of data, it's likely you don’t want Data Studio generating that many queries against your database.

Even with just a few users, if your database is being used by other systems and not exclusively setup for Data Studio reporting, you're at risk of creating headaches for other users and IT. 

How can I know how Data Studio is impacting my Database?

To see all the connections data studio is creating to your SQL Server database, you can use this query:

SELECT DB_NAME(dbid) AS DBName,
       COUNT(dbid)   AS NumberOfConnections,
       loginame      AS LoginName,
       nt_domain     AS NT_Domain,
       nt_username   AS NT_UserName,
       hostname      AS HostName
FROM   sys.sysprocesses
WHERE  dbid > 0 and HostName like ‘%google%’
GROUP BY dbid,
       hostname,
       loginame,
       nt_domain,
       nt_username
ORDER BY NumberOfConnections DESC;

When you run this, you might see some connections with domains like “prod.google.com”  or “borgtask.google.com”  (A Star Trek fan somewhere at Google?) You’ll see that the LoginName should be the credential you used when connecting with the data studio SQL Server connector.

How to reduce or eliminate database connection errors in Google Data Studio

Upgrading the DB might help to solve some of the performance issues?  But chances are it might become very expensive since buying raw processing power to overcome the number of users and reports will likely eventually fail.  

Of course optimizing indexes or perhaps creating more tables in your database is also critical. The trouble is that consumes database administration resources from the often overloaded IT department and slows down the business intelligence team.

For charts and tables that are missing data due to the 150k row limit, you can use custom SQL <LINK TO BLOG POST> to return the data and avoid the error, however as our tests illustrate, there is still a big impact on your DB and the charts will be slow to populate. 

One of the best solutions to eliminate all these connections, and to speed up your Data Studio reports, is to replicate your data to BigQuery.  This way your database will only be queried once when the data needs to be updated, not every single time any user in Data Studio clicks on a date or filter control in a report.  Data Studio will be querying BigQuery for that, and everything stays in the powerful Google Cloud.

There is, of course, an out of the box solution that is turn-key, very affordable, and can be setup in just a few minutes by any business user. 

Loading data from your database into BigQuery and Data Studio the easy way with Analytics Canvas

The Analytics Canvas Google Data Studio Partner Connector is designed to avoid these database connection and performance issues.  

As Google Marketing Platform Technology Partners, Canvas builds upon the standard Partner Connector and takes full advantage of Data Studio’s Advanced Services and Google BigQuery. It replicates the data you want to see in Data Studio, loads it into BigQuery, and delivers a fast, 3-4 second refresh on even the most complex reports with millions of rows.

All you need to do is build a simple Canvas that connects your database table to a Data Studio Export which looks like this:  

resolving database connection issues in Google Data Studio by buffering through Analytics Canvas

Canvas replicates the data and makes it all available to Data Studio for blazing fast reports.

You can clean and prepare your data before sending to Data Studio, allowing those complex joins and transforms to happen within Canvas.  

You can schedule how often you want to update and update only the most recent data incrementally.  This means you can have massive tables, tens, or even hundreds of millions of rows, without slowing down your users or bogging down your database.

With this approach, if you have 100 users, you might reduce the load on your SQL Server db by 100s of times or more.

Speed up your data studio reports by up to 10X

Analytics Canvas typically returns everything to Data Studio within about 3-4 seconds.  

In our tests, using custom SQL to avoid the row limits in the Data Studio SQL Server connector resulted in 30-40 second load times for a modest Data Studio report. 

This varies, of course ,on the performance of the database and the number of users involved.

Canvas refreshes the report below in about 3 seconds:

fast refresh time on Analytics Canvas connector in Data Studio

The Data Studio SQL Server database connector took almost 30 seconds longer to refresh the same report:

slow result using SQL Server connector for Google Data Studio

We repeatedly refreshed both reports, as a user would if they were adjusting filters or the date range. While the Canvas connected report continued to deliver, the database connected report had the same intermittent errors mentioned above and reported by Data Studio users.

We believe this is a database timeout as after waiting for the database load to drop, a refresh returns data normally, still in the 30-40 second range in terms of delay.

common system error when connecting a database to Google Data Studio

Wrapping Up

With smaller data sets and a limited number of reports/charts/users, the Data Studio database connectors can be an effective solution.  

However, due to the way that the SQL Server, MySQL, Postgres, and other database connectors within Data Studio have been designed, your database can quickly become overwhelmed when trying to serve data to Data Studio reports. 

While custom SQL may help to deal with the 150K row limit, and while you can have a DBA index tables or scale up the DB, the time and costs involved erode the value of using Data Studio pretty quickly. 

Analytics Canvas eliminates these issues by buffering the data in BigQuery, using Data Studio’s Advanced Services, caching results that are re-used across the report, and applying other optimization techniques to reduce response times and protect your database from unnecessary load. 

Give it a try today - it takes just a few minutes to turbocharge your Data Studio reporting, relieve the added pressure on your databases, and resolve database connection issues in Google Data Studio. 

Next Steps

Whenever you’re ready… here are 3 ways we can help you with your Data Studio reporting challenges:

  1. Speed up Data Studio reports 
  2. Resolve connection and access issues between your DB and Data Studio 
  3. Clean and prepare data sets prior to reporting in Data Studio

Ready for the next step?

  • Start a 30 day risk-free trial. No credit card or sales call required. 
  • Schedule a demo for you and your team.
  • Contact us to discuss plans and pricing or activate your subscription 

Wondering if Canvas is right for you? Check out the related articles to learn more about our Data Studio Partner connector.