When comparing data from the 4 different GA4 data sources, you may find that your GA4 data doesn't match. There are a number of possible reasons depending on what queries and reporting tools you are using, but some of those reasons are built into GA4.
Two are related to how GA4 defines and calculates sessions, and one is a common error in BigQuery SQL. In this post, we'll cover the 3 most common reasons why session counts in BigQuery don't match with Reports, the GA4 Data API, or the Explorations module.
If you want to processs your GA4 data faster, skip ahead to Analytics Canvas Online and:
- sign-up using your Google Account
- add one or more Google Analytics account authorizations
- export your data, transform it inside Analytics Canvas and export it wherever you need
Reason #1 your GA4 data doesn't match: Sessions in GA4 are an estimate in the GA4 UI and Data API
Yep. They are, by design, not the exact count of unique session IDs. Google has done this to reduce processing effort required for larger data sets. From the documentation:
So they simply won’t match exactly, particularly on very high traffic properties. But this estimation should be within perhaps 2% or better.
Reason #2 your GA4 data doesn't match: Sessions that span time periods are counted differently
The definition of Sessions in GA4 UI and Data API is different from how it used to be in Universal Analytics and the Reporting API. GA4 counts a session in a time period if there were any events from the session in the time period. UA counts a session in the time period when it started. Let’s look at the difference between these definitions:
As a result, comparing monthly or yearly sessions and sessions that are the sum of daily session counts will not match in GA4.
If your query in BigQuery uses the date of the start_session event, it will only be counted once, regardless of the period. This will tend to make BigQuery sessions lower than Reports and Explorations Sessions that are being double counted from the “unique sessions” view point. How much lower depends very much on the nature of your website traffic. If it’s mostly before midnight it will be pretty big. Longer session durations will also increase the difference since more sessions will have events on both sides of any given time period.
Reason #3 your GA4 data doesn't match: Watch out for time zones
Another reason you might see your session counts from BigQuery not agreeing with those in GA4 Reports, Explorer or Data API is if you are not being careful with time zones. The BigQuery schema has two fields that are commonly used in queries when it comes to dates;
event_date - “The date on which the event was logged (YYYYMMDD format in the registered timezone of your app).”
event_timestamp - “The time (in microseconds, UTC) at which the event was logged on the client.”
Notice that these two columns are not going to agree except for GA4 properties with a property time zone set to UTC. As a result, the number of sessions will not match because when using the event_timestamp to determine date, the date in UTC will be returned.
To convert the event_timestamp column to the date in the time zone of the property, you need to use this SQL, customized with your own time zone:
EXTRACT(DATETIME FROM TIMESTAMP_MICROS(`event_timestamp`) at TIME ZONE 'America/Toronto' )
This will then have the date cover the same events as the GA4 UI, and Data API which uses the property time zone.
Reason #4 your GA4 data doesn't match: The queries are different
Most users will create a query in the GA4 Data Explorer (the Web UI), and compare those results to what they're seeing in another data tool that uses the API or BigQuery.
When comparing the two results, there can be WILD discrepancies - double or triple the number of Users, session totals that are WAY off, and generally data that just doesn't make sense!
Most often it is an aggregation issue.
The Data Explorer forces you to make a specific query and shows you the results after making one or more queries behind the scenes. However, when in your BI tool, whether it's Analytics Canvas, Looker, or the BigQuery console, you must ensure you're comparing the following:
- the exact same dimensions and metrics
- the exact same Property
- the exact same filters
- the exact same time period
More often than not, a user will make a query to the API or to BigQuery that crosses scopes, includes unique metrics across a different time period, or includes additional fields that render the data non-aggregatable.
When aggregated, such as into a scorecard in Looker Studio, the data will be wrong as unique metrics are non-summable, and certain query results are non-aggregatable (like Sessions by eventDateHourMinute - you will sum sessions that span minutes!).
Users must be counted distinctly over the time period. In BigQuery, use the userPseudoID. Using the API, you would need a custom dimension that has your own userPseudoID and you'd perform a distinct count on that. Same applies to sessionID when including event parameters in the same query.
Other reasons for GA4 data discrepancies between sources and refreshes
- Data from Google Signals is not exported from Analytics to BigQuery. As a result, when counting events, you will see different results between Analytics and BigQuery.
- Explorations will use a sample of your data if there are more than 10 million events as part of the exploration query.
- Using the Data API, Reports, or Explorations, your data may be subject to row limits from the underlying result table. In that case, an "(other)" row appears as the last row in the results with all metrics aggregated onto that row. BigQuery is not subject to this limit.
- Are your queries really identical? When comparing results, ensure you are querying the same property, the same time period, and the EXACT SAME dimensions and metrics. For example, first_user_source_medium, session_source_medium, and source_medium are not the same!
- Data compared is beyond the data retention window. Explorations and the Data API are limited by your property's data retention settings. This means that if you create a Report with a date range outside the user and event level retention settings, data outside of the retention period won't be included in Explorations or the Data API.
- If you are comparing revenue for anything other than USD, your data will fluctuate based on the daily exchange rate. That's right, Google Analytics converts your non-USD currency into USD at the moment of processing. They convert back to your currency EACH TIME A REPORT IS GENERATED! Obviously this is going to lead to discrepancies in your data.
- Large amounts of data: For cost, scale, and performance reasons, Analytics uses different methods in Reports and Explorations. The differences in methodologies can cause differences in the results between these sources and the BigQuery export.
The first important thing to remember is that because of reason #1, your GA4 data doesn't match when compared between sources. There may always be small discrepancies between what BigQuery returns and the UI or Data API. Remember, BigQuery will always be the most consistent, detailed set of data for sophisticated analysis.
In terms of differences in counting sessions across time periods, it is my opinion that for most cases, the method of counting session starts is the best one. One important advantage is that a daily, monthly, and yearly total will always match across different queries and reports. And if you have reports that look at hourly or minute level detail, then clearly a 6 minute session should not be counted 6 times.
Comparing data mixing time zones is simply something to avoid. It will distort analysis, particularly if focused on periods of a day or less.
For my analysis, I tend to use the BigQuery export tables as BigQuery lets me write my SQL so that session counts are exact and based on the session_start event. This way hourly, daily, monthly and annual totals always match.
I use Analytics Canvas to do this because while I can write SQL from scratch when I need to, using the query builder and canvases is a faster, more efficient and less error prone way to quickly make the queries I need.
Fast, Accurate, Accessible GA4 Data
Here are 3 ways Canvas will help you with your GA4 data accuracy:
- Land data using both the GA4 Data API and the BigQuery export for a complete picture of your Property.
- Store and share data in safe, accessible, read-only tables within your account. Curate your GA4 data to ensure the same rules always apply when users are making ad-hoc queries.
- Load and process millions of rows in seconds with visual, no-code tools to aid in the analysis and documentation of your workflows.
Ready for the next step?
- Sign-up for the private beta to be the first to use the GA4BigQuery connector!
- Start an instant 30 day risk-free trial of the GA4 Data API and BigQuery connectors. No credit card or sales call required.
- Schedule a demo or discussion for you and your team.
- Contact us to discuss plans and pricing or activate your subscription
Wondering if Canvas is right for you? Check out the related articles to learn more about our GA4 capabilities.