Marketers have long used Google Analytics as a cornerstone tool. With the introduction of Google Analytics 4 (GA4), there's been a significant shift in how data is collected, processed, and presented. One of the most frequent questions we encounter with our software product, Analytics Canvas, especially from marketing agencies and developers, is: "Why can't we summarize on the Users metric in GA4?"
This blog aims to demystify this question and provide a clear understanding of the intricacies of user metrics in GA4.
Users are a calculated metric, just like Bounce Rate, Avg Time on Page, or Pageview Per Session. As such, they are non-summable.
Understanding Metrics for Users in GA4
A 'User' represents a unique individual who has visited a website or used an app. In the previous Universal Analytics (UA) version, users were primarily identified using cookies. However, GA4 has adopted a more advanced approach, focusing on event-based tracking and leveraging multiple identifiers like cookies, user IDs, and device IDs.
In Google Analytics 4 (GA4), there are several user metrics that help you understand the behavior and engagement of users on your website or application.
Metric | Description |
---|---|
Total Users | This metric represents the total number of unique users who logged any kind of event during the specified time period. |
Active Users | This metric indicates the number of users who were active on your website or application during a given time frame. It's a way to measure user engagement and retention. |
New Users | This metric is measured by the number of new unique user IDs that logged the first_open or first_visit event. It gives insights into how many new individuals are interacting with your site or app for the first time during a specific period. |
The Non-Additive Nature of Users in GA4
The primary reason you can't simply sum up the Users metric in GA4 is that it's non-additive. In other words, Users are a calculated metric, similar to engagementRate, average session duration, or pageviewsPerSession.
Let's break this down:
Scenario: Imagine a user visits your website on Monday and then again on Tuesday. If you were to look at the data for each day individually, you'd see one user for Monday and one user for Tuesday. But if you were to sum these up for the week, you shouldn't get '2' users because it's the same individual visiting on both days.
Implication: Simply adding up the daily user counts would lead to double-counting and an inflated number of users. This is why GA4 doesn't allow straightforward summation of the Users metric across multiple days or other dimensions.
When you make an API call to retrieve User data for a specific granularity (e.g., daily), the response table will provide user counts for each day. If you try to aggregate this data to a higher level (e.g., weekly, monthly or yearly), you'll face the non-additivity issue.
Implications of including Users in GA4 Data API Queries
Since the User metric is non-additive (not summable), reporting on users by day, week, month, and year requires a strategic approach.
Here's how you can achieve accurate reporting:
Separate Queries for Different Time Granularities:
Instead of querying daily data and then trying to aggregate it to weekly, monthly, or yearly levels, make separate API calls for each granularity.
For instance:
- One query for daily users over a specific period.
- A separate query for weekly users over a period.
- Another for monthly users, and so on.
By doing this, GA4 will handle the deduplication of users for each specific granularity, ensuring accurate counts.
Some dimensions can cause double-counting when combined with the User metric.
Imagine you want to understand how many unique users come to your website from different traffic sources, such as organic search, paid search, social media, and direct traffic.
Scenario: A user discovers your website through an ad on Monday morning and returns later in the day by searching for the website and clicking on a search result. In the evening, they return to the site directly by typing in the URL.
API Query Result: If you query the GA4 Data API for users by traffic source for that week:
- The user will be counted once under "organic search"
- They will be counted again under "paid search"
- The same user will be counted once more under "direct"
Implication: The same user is counted three times in the result, once for each traffic source they used. If you were to sum up the users across all traffic sources for that day, you'd get an inflated number. This is because the User metric is non-additive, and combining it with the traffic source dimension can lead to double or even triple counting.
To avoid such pitfalls, when querying for users by traffic source (or similar dimensions), it's crucial to interpret the results with the understanding that users can be counted in multiple categories. The total number of users for the period will likely be less than the sum of users across all traffic sources.
Deduplication of Users in Live API Calls
Why worry about Users in Analytics Canvas's GA4 API connector when the Web interface or Looker Studio doesn't?
The reason is that GA4 uses advanced methods to deduplicate users across devices and platforms based on each query you make.
When in Looker Studio and the web interface, you are making a live request - you're asking Google for the deduplicated results based on the full query at the time you load the report.
Whereas when you pre-load the data, you have asked for a report at a given level of granularity. It has already been deduped and provided to you as a non-summable table. If you later try to summarize that data, such as a roll-up view in a scorecard, you will get the wrong answer.
Implications of including Users in GA4 BigQuery queries
Just like in the GA4 interface, the User metric in the BigQuery export is non-additive. If you're analyzing user counts across different days, simply summing up daily counts can lead to overestimations due to the same user visiting on multiple days.
However, there is one dimension in the GA4 BigQuery export that is a game changer. The raw event level export includes a user_pseudo_id, effectively a user identifier, which can be included in each query. When included, the provided table CAN be aggregated by counting the distinct user IDs at any level of aggregation!
Conclusion
While the inability to summarize the Users metric in GA4 might seem like a limitation, it's essential to understand that this design choice ensures more accurate and meaningful data. For marketing agencies and developers working with marketing data, it's crucial to grasp these nuances to make the most of GA4's capabilities.
Counting distinct user_pseudo_id's in BigQuery ensures accurate GA4 user counts. While using the API, you have to take a much more strategic approach in making your queries at the right level of aggregation for your reports.
At Analytics Canvas, we're always here to help you navigate these complexities and ensure you're extracting valuable insights from your data. If you have more questions or need assistance with GA4, feel free to reach out to our team!
Next Steps
Whenever you’re ready… here are 3 ways Canvas can help you with your GA4 reporting challenges:
- Extract data from all your properties using the API or BigQuery without writing code
- Profile, analyse, and prepare data for reporting
- Maintain your GA4 data warehouse within Analytics Canvas Online or your own DB
Ready for the next step?
- Start an instant 30 day risk-free trial. No credit card or sales call required.
- Schedule a demo for you and your team.
- Contact us to discuss plans and pricing or activate your subscription
Wondering if Canvas is right for you? Check out the related articles to learn more about our GA4 connectors.