Avoid this costly mistake when connecting GA4 BigQuery to Data Studio

James Standen GA4, Google BigQuery, Google Data Studio

Having made the BigQuery export available to all GA4 users, a lot of people will be using BigQuery for the first time. Many of them will be connecting GA4 BigQuery to Data Studio without knowing the risks and without first optimizing their workflows. 

If you have a large number of daily events in Google Analytics 4 [GA4] and a handful or more of active Data Studio users, it is critical to avoid this very costly mistake:  Whatever you do, do not query the event tables directly from within Data Studio’s BigQuery connector using a custom query and date parameters.

the kind of query that should not be made when connecting large BigQuery event tables to Data Studio

While it is easy to connect the raw data to the dashboard, when it comes to connecting GA4 data from BigQuery in Data Studio there are two ways to do it, the right way, and the wrong way.

The wrong way:  Connect directly to the BigQuery event_ table with the Data Studio BigQuery custom query.  We’ll show you how this might cost hundreds of dollars a day in BigQuery costs.

The right way:  Create a summary table and connect Data Studio to that.  We’ll show you how  you can avoid large BigQuery costs for exactly the same report without sacrificing performance.  

That’s a lot of ground to cover, so let’s dive right in.

How to calculate the cost of a query in BigQuery

Let’s look at a simple example that includes just one chart, landing page sessions by date for just one month.  The example is a fairly high traffic Analytics360 site that sees about 3 million events per day in GA4.  We’re just going to look at the relative savings, so this example applies to any level of traffic.

If we put this query into BigQuery it will tell us how many bytes it will consume in the upper right corner.:

In this case, the query will process 67.43GB when run.

The cost of this query is impacted by three main things;

  1. The number and size of columns included in the query, either in the result or in the where clause.
  2. The date range (so therefore the number of daily tables used by the query).
  3. The number of events per day which defines how many rows each table has.

Queries with longer time frames cost significantly more.  All things being equal, a query for a year is 12 times the cost of a query for a month!

There are two components to pricing in BigQuery – Storage and Analysis.  In this case, the query falls under Analysis pricing, which can either be on-demand or at a flat rate.  

Flat rate pricing can vary, but starts at $1700/mo! 

On-demand pricing charges based on the number of bytes processed by each query and is $5 per TB.  The first TB per month is free.  

Our example query is 67.43GB, or 0.06585 TB.  As such, this query would cost 0.06585TB * $5/TB = $0.33.  

Doesn’t seem so bad, right?  Wait until you see how many queries Data Studio generates…

How Data Studio queries BigQuery

The problem is that Data Studio doesn’t just do the query once. When users interact with a report, it sends a constant stream of queries to BigQuery. Each one results in BigQuery usage and potentially cost.  Imagine what would happen if you put this query into Data Studio as a custom query:

Every user who changes the date filter in a report to a new range will cause another query to run. Fortunately, Data Studio does have some caching.  If two users look at the same date range, with everything the same (all filters, all settings) the cache will be used within a given time frame.

Imagine that you add a filter for landing page, making a simple, one table report like this:

If a user tries 10 different filters, they may have just generated up to 10 queries. Caching both in Data Studio and at the BigQuery level might reduce this a bit.  

Every element on the Data Studio report creates more queries and more cost

Data Studio makes one or two queries for every element on your report.  Most reports have at least a few charts or score cards on them.  This multiplies the cost.

For example, a line chart results in two queries, one to determine the top 10 categories and another to get the detailed data for those categories.  When you are using a BigQuery connector, both these queries will incur the BigQuery cost.  If you have a report with 10 charts, 2 scorecards and a table, you might have two dozen queries hitting BigQuery.  

How to see Data Studio queries in BigQuery as they happen

To see how Data Studio interacts with BigQuery and how many queries are made,

  1. Create a new query using the BigQuery connector and note the billing project being used in your query.

    When making the data source connection, note the name of the billing project.

  2. Make a chart using the query you created

    Add a chart or table to your report that uses the BigQuery data set.

  3. Login to BigQuery

    Using the same credentials that were used in the Data Studio report, login to BigQuery

  4. Check your Personal History

    Select the same project as you as as the billing project in Data Studio, then select the Personal History tab. You will see the queries Data Studio has made on your behalf.

As an example, let’s look at the earlier example report in Data Studio.  I refreshed it, then went to the BigQuery editor, and my personal history:

There, as expected, I can see there were 5 queries run by Data Studio.  One of them is to populate the table, two for the chart and two for the donut graph.  If I click on the Job ID and look at one of them, I can see my SQL, and the Data Studio date parameters, and I can see that this query consumed 61.62 Gb, which is 0.06018 Tb and cost $0.30.  

When I look at all 5 queries, I see they are all about 60 GB. So this one refresh will result in a BigQuery charge of about $1.50.  If you have multiple users and many reports this will add up very quickly.  All those clicks could be hundreds of dollars a day.

Remember, this property has over 3 million events per day, if you have 20,000 events a day the same refresh only costs a penny.  

Does Caching in Data Studio help reduce BigQuery cost? 

Caching can definitely help and both Data Studio and BigQuery have caching.  Caching will return a previously processed query providing the underlying data hasn’t changed, and the query is exactly the same. So, even if you are using a custom query direct to your event tables, if you have a report that doesn’t have adjustable filters, or you have low cardinality dimensions so there are only so many unique filters possible, then users will often be seeing cached results in Data Studio and additional BigQuery cost won’t be incurred.

If you don’t let your users customize their reports, they will be cached and it will reduce the number of times a query runs.  But all users must see just fixed reports- no ability to explore the data.  And all queries must run at least once a day since there is new data and people will need to see it.

What about BigQuery flat rate pricing, is that unlimited?

If you really do have huge data, and huge numbers of users, you might have moved to flat rate billing, which starts at $2000 a month, although there are discounts for committing to annual plans and paying upfront.

But the flat rate is not unlimited.  When you exceed the flat rate quota, everything slows down as jobs are put on hold until capacity is available. But, if you have flat rate, you know you won’t pay more than your monthly rate.

Based on my situation, what is the GA4 BigQuery cost for using Data Studio?

This isn’t a simple question, since every case is a bit different, but we can create some example scenarios, and you can find one close to your case.

The main drivers of BigQuery cost for Data Studio are:

  • The rows in the table being queried (for GA4 this is events per day)
  • The bytes in the columns in the query (varies depending on query, we use an average)
  • The number of active users in Data Studio
  • The number of queries the users do (linked to how active they are)
  • How many of the queries are exactly the same (caching helps avoid cost)

Remember, a user that makes one selection on a report might generate 20 or 30 queries with one click.  If multiple users are active in Data Studio for an hour a day, thousands of queries are generated.

To estimate the cost for querying the event tables directly we use a number of assumptions which may or may not be applicable to your situation. However, they give an overall indication of the order of magnitude based on data size and use type. The cost estimates here are likely a bit low. More detailed queries, more active users, or more interactive reports will all increase the queries and therefore the cost.

For the cost of using the reporting tables, we apply the estimated reduction in intermediate table size- all those Data Studio queries are going towards much smaller tables.  But we do have to update those tables every day, so we do have to query the event tables once per day.  We therefore calculate the cost of loading 5 days of the event data every day, assuming you are doing a daily update of the last 5 days.  This ensures data freshness (but not wasting BigQuery cost on constantly requerying historical data that has not changed). 

The key is that relatively speaking we estimate that using intermediate tables will cost about 50 times less.

Example ScenarioCost Direct to Event tableCost using Reporting Table
High Traffic Property– 3 Million events a day, 10 Data Studio Users, 20 active reports$4800*<$100
Medium Traffic Property with 500,000 events a day, 10 Data Studio users, 10 active reports$800<$20
Smaller Traffic multi properties– 10 properties each averaging 25,000 events a day, 10 Data Studio Users, 10 active reports $400<$10
Agency Reporting– 100 properties, averaging 25,000 events a day, 25 Data Studio users, 5 active reports$1000<$20
Small Traffic single property – 10,000 events a day,  2 Data Studio users, 10 active reports $15**<$1

costs as estimated by Analytics Canvas, September 2022

* it is likely at this point you would go to a flat rate, so pay $2000 a month, and depending on how your users interact with Data Studio might see slow downs at busy times.

** Obviously, if you have very few events in your GA4 export tables, then even though it is inefficient, since it is not expensive you might decide to just query the event tables directly. But there are a lot of other reasons to avoid putting raw, complex BigQuery SQL directly into the Data Studio BigQuery connector-  read on!

Managing GA4 BigQuery data and giving Self Service access to all your Analysts 

Analytics Canvas is a tool that lets you do all the cost saving methods that we’ve been discussing quickly and easily in a graphical environment.  

It has a fast Data Studio partner connector to deliver your data into your reports letting you refresh up to 4X faster than going direct to event tables.

Even more importantly, it gives your analysts that might not be experts in BigQuery SQL the ability to work with GA4 data using a graphical user interface.  Canvas lets them preview, profile, and prepare data for analysis and visualization, while also allowing them to create automated data workflows that can be run on a schedule or on-demand.

If you are a company, it means everyone can get access to their data fast. If you are an agency, it means you don’t need data scientists to build reports for your clients.

Let’s go back to our original example query with landing page.

In Analytics Canvas, it looks like this:  you set up the same query you would have in Data Studio and connect it to a Data Studio export.  This will generate a “Smart Table” that is incrementally loaded and available for refresh on a schedule or on-demand.  In Data Studio, connect to the data set using the Analytics Canvas data source.  

Now your Data Studio report will be powered by a much smaller table that is optimized for cost and performance.  Better still, Canvas has it’s own proprietary caching system to ensure data that’s already available doesn’t reload when a user interacts with a report.

Even better – for Starter, Pro, and Premium plan customers, those queries are covered by your Analytics Canvas subscription – you will not be billed separately for them by Google. 

Using the GA4 BigQuery export without writing SQL

Don’t even want to write the SQL but still make new GA4 BigQuery Queries?  We’re working on that!!

Sure, you can write the SQL required to make optimized, high performance queries.  But your value is in delivering data and insights, not in writing SQL.  So why would you spend minutes and hours writing those queries when Canvas can generate them in seconds? 

We’re hard at work on a connector for the BigQuery GA4 export.  It offers an easy, point and click user interface that lets you dial in the query you want in seconds.  Analytics Canvas will generate the SQL, and will optimize your BigQuery costs.  This is still in development, but if you are interested, sign up here and we’ll let you know when it’s ready to take for a spin!

Wrapping Up

The BigQuery Export for GA4 is an excellent way to access your GA4 data. But the costs can add up! Particularly if you connect directly to the BigQuery export without using cost optimization techniques. Even though every query doesn’t cost very much as BigQuery is a very cost effective way to query data, making tens of thousands of them needlessly really adds up!

Analytics Canvas handles the data engineering on your behalf, providing a helpful user interface that lets you query for data, profile it, process it for reporting, and publish it to Data Studio on-demand or on a schedule.  There’s no easier way to work with GA4 data!

Sign-up for your risk-free trial today.


Related Posts