Google Analytics Data Retention explained for UA and GA4

Ameet WadhwaniAnalytics

Over the past year we've been offering our UA backup utility, a number of people have been surprised to find that certain datasets don't contain the full history of data.  On the GA4 side, I routinely get inquiries from people wanting to backup their GA4 data before it expires.  

This post will explain the implications of Google Analytics Data Retention settings for both UA and GA4 so that you'll know what to expect when querying over historic data.

⚠️

TLDR; Your User and Event level data may be subject to retention, not all of your data. There's no going back - data recorded while a data retention setting is in place will be gone forever once expired. Unlike UA, for GA4, this only applies to the Explorations section of Google Analytics. It does NOT affect the API or the BigQuery export.

What does data retention mean and what does it apply to?

There's a misconception in the marketplace about what the Data Retention settings are applied to.  In this post on the Analytics Help site, as of writing this article, Google explains that "the setting applies to user-level and event-level data".  There is a distinction between UA and GA4 around time periods, but fundamentally it ONLY affects these two types of data. All other data will remain available beyond the retention limits. 

Data Retention in Universal Analytics

In the admin section of a Universal Analytics property, under "Tracking Info" > "Data Retention," you can find settings that allow you to specify how long Google Analytics retains user and event data before automatically deleting it.

The options available are:

  • 14 months
  • 26 months
  • 38 months
  • 50 months
  • Do not automatically expire

Modifying the retention duration or setting it to "Do not automatically expire" will not impact previously collected data. If the retention was initially set for 14 months, data collected during that time will be deleted after 14 months, regardless of any changes to extend the retention period to 26 months later on.

Standard aggregated Google Analytics reporting is not impacted. The user and event data governed by this setting are required only for advanced functionalities, such as applying custom segments to reports or generating unique custom reports.

Fields such as Event Category, Action, and Label are affected, as are User scoped custom dimensions.  User are surprised to see that fields like LandingPagePath are also affected, however this make sense given that you can reconstruct a users's path through the site by using details included in page paths. 

Data Retention in GA4

This is very misunderstood.  We constantly have people inquiring about Canvas to 'retain their GA4 data'.  While we specialize in GA4 data reporting and help to store GA4 data in BQ for more robust reporting, we just cannot make a sale on this premise alone. 

In Google Analytics 4 (GA4), the data retention setting impacts data differently compared to Universal Analytics. It's important to understand that:

  • Standard aggregated reports (including those utilizing primary and secondary dimensions) within your GA4 property are not affected by the data retention setting. This means that aggregated data, trends, and insights remain accessible regardless of the data retention period you set.

  • Comparisons you create in these reports are also unaffected by data retention settings. You can still segment and compare your data in standard reports without worrying about the data retention impacting these analyses.

  • The data retention setting specifically applies to explorations and funnel reports. These are the areas where user-level and event-level data are used for detailed analysis and where the retention settings determine how long this detailed data is available within the GA4 web interface's Explorations section.

The two "reporting surfaces" that are outside of the GA4 web interface are not affected by this. 

Data Retention in the GA4 API

DAs of the publication of this post, data retention does not apply to this reporting surface. 

Data Retention in the GA4 BigQuery Event Export

BigQuery exports for GA4 are a different scenario. When you link GA4 to BigQuery, you can export raw, event-level data into BigQuery. One of the key benefits of this integration is that once data is exported to BigQuery, it's no longer subject to GA4's data retention policies.

In BigQuery, data retention is managed according to your BigQuery settings and Google Cloud Platform (GCP) billing account policies, not GA4's. This means you can retain and query your raw data in BigQuery indefinitely, subject to your BigQuery storage costs and management policies.

Conclusion

The Data Retention setting should only be modified with full consent and knowledge of the website owner, and in accordance with the organization's data retention and data governance policies.

If someone with access to the site enabled this setting, it will be permanent and lasting until the setting is changed. Changing the data retention settings will not retroactively affect data previously collected but will apply to data moving forward from the change. Data that falls outside the retention period will be deleted unless the retention is set to "Do not automatically expire."

For GA4, this only applies to Explorations.  User and Event data is still available in the API and BigQuery exports.

To extract your UA or Google Analytics 4 data, check out Analytics Canvas and our robust tools designed to store your data and keep it up to date.

Next Steps

Whenever you’re ready… here are 3 ways Canvas can help you with your GA4 reporting challenges:

  1. Extract data from all your properties using the API or BigQuery without writing code
  2. Profile, analyse, and prepare data for reporting  
  3. Maintain your GA4 data warehouse within Analytics Canvas Online or your own DB

Ready for the next step?

  • Start an instant 30 day risk-free trial. No credit card or sales call required. 
  • Schedule a demo for you and your team.
  • Contact us to discuss plans and pricing or activate your subscription 

Wondering if Canvas is right for you? Check out the related articles to learn more about our Data Studio Partner connector. 

Notes and Links from Table

1 - GA4 Explorer does not have all the dimensions and metrics found in GA4 Reporting UI, plus there are cases where the data returned won’t match - https://support.google.com/analytics/answer/9371379 

2 - GA4 Data API only has some of the dimensions and metrics that are available in the Exporer user interface in GA4 - they can be found here- https://developers.google.com/analytics/devguides/reporting/data/v1/api-schema

3 - GA4 BigQuery export contains raw data for events- this is the data model that describes the data exported into BigQuery.  Each GA4 property is linked to a BigQuery DataSet that has the name analytics_x where “x” is the property ID.  Every day, a new table is created.  Data in the tables will be updated for up to 72 hours, so it's important to reload data from these tables for at least this long to ensure the “golden” data version is used in any reporting tables and your data warehouse.  https://support.google.com/analytics/answer/7029846

4 - GA4 BigQuery Export Setup.  Its important to setup your BigQuery export as soon as possible as there is no data backfill- you will only have data starting from the time you have turned it on forward.  To keep a table for every day, and have access to processed data, be sure to select both the Daily and the Streaming export of data.   https://support.google.com/analytics/answer/9823238

5 - GA4 Data API Quota.   https://developers.google.com/analytics/devguides/reporting/data/v1/quotas

6 - GA4 Data API Response JSON structure https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/RunReportResponse

7 - BigQuery Pricing https://cloud.google.com/bigquery/pricing

8 - GA4 Data thresholds- data is often not included to prevent being able to infer the identity of individual users based on demographics, interests etc. This also means that event counts will often not match- due to this, and to differences in how Google calculates counts in signals data.  https://support.google.com/analytics/answer/9383630?hl=en

9 - In a number of cases “(Other)” can be a real pain!  Reporting in particular will return “(Other)”.   Explorer, on the other hand, does not return other so as long as you are not at the sampling limits this is one option.  The key for these cardinality limits is to not store too many unique values if you don’t have to.  For example, clean out URL parameters from page path,  don’t save your own customerID into a custom event or some other unique timestamp- these will all increase how often you see “(Other)” even if you don’t include the specific field in your query, as its the cardinality of the underlying table that matters. https://support.google.com/analytics/answer/9309767?hl=en

10 - For many reports, there is a limit of 50,000 for certain aggregate tables- if the unique values on a given day exceed this, then excess rows are rolled up under (Other).  This means that the values for dimensional breakdowns are approximate- some of the values shown might also have been characterized as (Other). https://support.google.com/analytics/answer/10702008