How to query event_params in the GA4 BigQuery export
In our first tutorial, we looked at a simple query that limited itself to the top, simple columns in the event table. This type of query can provide lots of information, but much analysis requires the use of the event_params column, where custom dimensions and other key values are stored.
To be able to write an SQL query that uses the event_param column in the GA4 BigQuery schema, you need to understand the repeated RECORD data type, and how use the UNNEST function.
This tutorial will give you a first introduction to UNNEST and how it can be used to flatten your Google Analytics 4 data from BigQuery. We will use the example of page views by page for our example query.
What is the RECORD data type in the Google Analytics 4 BigQuery Export?
A field used to store a date might have data type DATE, to store someone's name it would be STRING, to store a numeric value it might be INTEGER or FLOAT or NUMERIC etc.
This is where many data models stop- each row is a set of simple values, one value for each field. In some databases however, a field can hold more than just a simple value. BigQuery is one of these. BigQuery has a data type called RECORD which is itself a whole new row with fields. RECORDs can also be repeated in a field and this makes that field within the row contain a whole other table of data. Tables within tables- fun!
The RECORD data type makes writing SQL a bit more complicated and, yes, you guessed it, the GA4 BigQuery export schema has both RECORD and repeated RECORD fields in it.
A table with record structure can be looked at as JSON and when it's well formatted it gives us an idea of what's in the event table for each event. To see a row in a table you can use the BigQuery sql editor, and do a Select *. Usually, Select * is not a great idea in BigQuery as it means every column in the table is included in the query cost, but if you are looking at just one daily event table, this won’t be a problem.
A record can also be referred to as “nested” data since it contains more than a single attribute. To get data out of a record, we need to “Unnest” it. This means we need to pull specific data out of this table and include it in the result set. More on that later.
Understanding Google Analytics 4 event tables and event_params
If we look at our events_20221004 table, we see just the events on October 4th, 2022. I have filtered on the timestamp, so we see two events at the beginning of a session. When we do this, the BigQuery SQL Explorer formats the multiple records in event_params so that you can see them all nested;
What we can see is that for different events, there are different event parameters. For the first event, which is a “first_visit” event, there are 5 event parameters in the event_param column. For the second event, a “session_start” event, there are 6 event parameters, with “session_engaged” being the extra parameter.
One of the things you’ll notice is that event_params has two fields, Key and Value.
KEY - The key is the name of the parameter we are looking at, for example “page_location” or “ga_session_id”. There is only one value for each event for a given key. Custom dimensions are stored here as well.
VALUE - The value is the value for the given key for the event. Value is split into four separate columns for different data types; string, integer, double and float. Float is not used by GA4 at this time. For a given key it is possible to have different data types for different events so your SQL has to be specific to include all values. We’ll cover this in more detail in the next tutorial.
These columns are described in the BigQuery export schema docs:
Our goal is to get page views by page. The page is recorded in the “page_location” key where the value is a string. We want to count each occurrence of this event for each unique page as a page view.
So how do we write a query for that?
Getting a value out of the event_params column for a specific key
This is the query for page views by date and page which illustrates how to pick a specific value from event_params.:
SELECT PARSE_DATE('%Y%m%d',`event_date`) AS date,
key = 'page_location') AS page,
COUNT(1) AS pageViews
_table_suffix between '20210101' and '20210131'
GROUP BY date, page
Let's dig into the details step by step.
Breaking down the query
Sometimes it’s clearer to start at the bottom, and look at what rows we are filtering on in the WHERE clause for the source table.
FROM: Starting with the FROM, we can see that we’ve used events_* so it means we are including ALL the event tables unless we constrain the date.
WHERE: Then we can see in the WHERE that we use the _table_suffix BETWEEN to filter on dates, so in this case we only include the event tables between the two dates.
Finally, we have another condition in the WHERE which limits it only to events where the event_name is page_view. This means that we will only get events where there was a page view on a page.
SELECT: Let's go back to the start and look at the column definitions. As always, we start with SELECT, then we list the columns we want, and if the columns are calculated from columns in the table, we do the calculation then AS and the desired column name in the final result. We separate the columns with a comma.
The first column, date, is a calculation to convert event_date from its original string data type to a more useful date data type.
The second column requires a bit more SQL to unpack.
UNNEST: The calculation for the second column, page, is completely wrapped in brackets, which is necessary because it has the magical UNNEST in it.
The UNNEST function in SQL that converts an array into a set of rows. This is also known as ‘flattening’. UNNEST takes an array and returns a table with a single row for each element of the array.
This column is actually a little subquery that we know will only return one row because the key is unique in the event_params column. So what this query does is UNNESTs the event_params column to turn it into a table, then takes just one value. We get the value we want by having the WHERE key = ‘page_location’ clause that takes just the page location. We need to select the appropriate data type to return value from (one of the 4 we outlined above), in this case it's a string so we want value.string_value.
pageViews - The calculation for the third column is just the count of the rows, since each row is one page view because we have put the page_view constraint in the WHERE clause..
Because we used COUNT to count the rows for each page_location, we need to have a GROUP BY with all non-aggregation columns in it. So in this case, date and page.
Making a page views by page Query in GA4
While I think it is important to learn at least some SQL, I also love using tools that save time, and let me not constantly re-write the same code again and again and again.
Analytics Canvas is a tool that provides a graphical query builder for the GA4 BigQuery schema.
To do the query we just did above, you just need to:
- add a Google Analytics block to the Canvas,
- select GA4 BigQuery Export as the data source
- pick the property, then pick the dimensions and metrics.
- run the query
That's it! The SQL is automatically generated and running the query will return a data block on the canvas that you can preview, profile, or publish to your preferred destination.
You'll notice that the generated query has some additional SQL which checks for null values and replaces with 0 when found. It also uses a slightly different approach, as it is designed to allow querying multiple dimensions and metrics together. In our SQL generator the overall syntax and structure adapt automatically depending on the query.
These sorts of things are just some of the factors that must be considered when writing and modifying production code manually.
Analytics Canvas has all the tools you need to write the most complex GA4 queries in a matter of seconds, store that data in BigQuery summary tables, and process your data in a visual, no-code environment.
By building your queries and workflows with Analytics Canvas, you save significant time in generating your query and you can have full confidence in your data! If you aren't already a Canvas user, you can sign-up for a free-trial today.