December 20, 2023
Thresholding applied in Google Analytics 4? Do this
Updated: December 20th, 2023
Here’s something that you might often face in Google Analytics 4. You open a report and see an orange exclamation mark at the top of the report. You click it and see this warning: “Google Analytics has applied thresholding to one or more cards in this report and will only display the data when the data meets the minimum aggregation thresholds.”
How can that be? The report is unsampled, but that “threshold” sounds like sampling, where you get only a portion of the data you captured.
In this blog post, I will explain what thresholding is, what happens when it’s applied, and how to avoid it.
Table of contents
– Hide table of contents –
- What is causing this?
- What is the impact of Thresholding in Google Analytics 4?
- Why is Google doing this?
- How to avoid Thresholding in GA4?
- What to do if you see a Thresholding warning?
- Report without user metrics is not thresholded
- It’s not the end of the world (or is it?)
- Final words
Video tutorial
If you prefer video content, here’s a tutorial from my Youtube channel. If you haven’t yet, consider subscribing.
If you are in a hurry and just need the solution
In this article, I explain thresholding, why it happens, how to avoid it, and a workaround. But if you are in a hurry and just want the solution answer, you can jump straight to this chapter.
Also, you can watch this Youtube short from my channel.
What is causing this?
Thresholds in Google Analytics 4 are caused by a feature called Google Signals. It is disabled by default, but if you turn it on, things might get weird.
Why would you want to enable Google Signals in the first place? There are at least two reasons. But first, let’s quickly learn what Google Signals is in general.
Google Signals enables the tracking of users across devices and platforms. When enabled, Google Signals collects data from users who have signed in to a Google account and have enabled the feature in their Google Account settings. This data is then used to provide insights into your audience’s demographics, interests, and other characteristics. You can learn more about it here.
If Google Signals is active, your GA4 property will collect more data and unlock certain features. That’s where we come to at least two reasons why people might want to enable Google Signals:
- It will start populating demographic data in GA4
- It lets you reuse Google Analytics audiences as retargeting audiences in Google Ads (thus, you can show more targetted ads to them)
But together with that, we get one caveat, thresholding.
Update: Google announced that Google Signals will not be included in the GA4 reporting identity starting from February 12th. This means that thresholding should go away from then.
What is the impact of Thresholding in Google Analytics 4?
If you are looking at the report and the property contains data from Google Signals, Google Analytics will hide rows in the reports with small user numbers). I don’t know the exact number, but it looks like something below 50 users/events per row.
So if you are looking at a Traffic Acquisition report and some traffic sources generated less than 50 users in that timeframe, GA4 interface will hide that data. It is still stored in the database, but it’s not displayed.
Here’s an example. I know that there are hundreds of unique traffic sources driving visitors to a website (I checked Universal Analytics). But if data thresholding kicks in, you will see only those that drove more than 50 (or so) users.
Why is Google doing this?
Officially, they say this is to prevent us (GA users) from identifying individual users based on the data that Google Signals adds to our reports (e.g., age, gender, etc.).
Honestly, I have no idea how I could identify a user based on that (because, for example, Google Signals data is not exported to Bigquery), but that’s Google’s position. And there isn’t much we, as GA users, can do here. Thresholds are system-defined, and we cannot adjust them.
How to avoid Thresholding in GA4?
You have three options:
- Don’t enable Google Signals. But this means that you will not have certain demographics data (e.g., age, gender)
- Have Google Signals enabled but disable Include Google signals in reporting identity option in Google Signals settings. I explain this later in this article.
- Have Google Signals enabled, but change Reporting Identity to “Device-based”.
Let me explain them.
What to do if you see a Thresholding warning?
If you are on this page, this means that you already enabled Google Signals in the past, and you are facing this nasty issue. Now what?
One workaround will help you turn off thresholding – changing the default reporting identity. But there’s a caveat too. First, let me explain where to change this, and then I will explain the implications.
Note: I have added several updates to this article. So, make sure to read the entire article.
Default reporting identity is a feature that affects how Google Analytics calculates users of your website/app. Should it use only cookie data? Should it also use User ID data (that you may be already sending to GA)? Should Google signals data be included too?
You can change it by going to Admin > Reporting Identity.
Here you will see two options (but actually, there are three). Click Show all.
- Device-based reporting identity is the most basic. It will use just Device ID (a.k.a. first-party cookie). If the same user uses multiple browsers/devices, GA will treat it as separate users.
- Observed is a bit more advanced. It uses cookie data, Google Signals data (if you enabled it), and user ID (if you are tracking that too). Things such as user ID or Google Signals data can help GA to deduplicate certain users and understand that a person using several devices might still be the same person.
- Blended is the most advanced. It includes all the previous identity methods, plus it uses machine learning to fill in the gaps and model data. You need to implement Google consent mode to unlock this feature.
If you use an Observed or Blended reporting identity (and you have collected data from Google Signals), thresholding will probably be applied.
BUT if you switch to Device-based, then Google Signals will not be used to calculate users, and thresholding will go away.
The good thing about reporting identity is that you can switch/change this as many times as you want and whenever you want. The data stored in GA’s database will not be affected. And reporting identity is applied retroactively too.
So in most cases, you can continue using Observed identity, and if you are curious about rows with small numbers, you can quickly switch to the device-based identity back and forth.
Just remember that when you use device-based, things like User ID are not taken into the calculation of your reports, thus user counts will be less accurate. So that’s the main caveat.
Don’t worry. Reporting identity does not affect the data collection. So if you switched to device-based (while your GA4 is collecting user IDs), all data would be collected. But it won’t be used in user calculations until you switch back to observed or blended identity.
Sometimes, a bug happens
Occasionally, I noticed that sometimes the thresholding warning remains even if I change the reporting identity to device-only. In those cases, doing a hard refresh (CTRL + F5 on Windows) helps sometimes. If not, I ignore the warning because the reports start showing the rows with small numbers too.
Maybe when you are reading this, the issue is already fixed. But keep this in mind.
Update #1: A report without user metrics will not be thresholded
Here’s a thing I learned after I wrote this article. Thresholding is also looking at what metrics you use in the report. If the report does not include user metrics (e.g., Total users, Active users, Users, event count per user, etc.), then Thresholding will not be applied to that report.
So if it makes sense and is possible in your situation, you can try to remove the user metric(s) from the report and still see other numbers (even if your reporting identity is not device-based).
Update #2: Don’t include Google Signals in your reporting identity
Google has released a new feature that allows you do exclude Google Signals data from the reporting identity. This means that you can still use Google Signals for audiences (imported in Google Ads), but it will not mess up with user counts and thresholding in your reports if you are using blended or observed reporting identities.
You can manage this in Admin > Data Setting > Data Collection. Then disable the Include Google signals in reporting identity.
It’s not the end of the world (or is it?)
Sometimes yes, sometimes no.
Based on what I have seen, rows with small numbers (at least in the traffic acquisition report) usually account for less than 5% of all traffic. So that’s not a big deal to data accuracy because GA4 then tries to fill in some gaps with modeled data or user-id/Google Signals.
But there might also be situations where the impact is much larger. For example, small websites (that get just hundreds of visitors per day/week) might face a more significant challenge. Imagine that you cannot see half of your events in reports because there just aren’t many. Then you will be forced to stick with a device-based reporting identity.
So I would suggest regularly switching between reporting identity settings to double-check the impact. I wish there was a quick way to change the reporting identity directly in the reports/main interface (rather than going to the Admin section). Also, another wish would be to have a separate reporting identity that lets us continue using the user id but not Google Signals.
Thresholding applied in Google Analytics 4: Final words
This is one of those articles where I wish such a thing would not exist in GA4.
Data thresholding in Google Analytics 4 is not sampling. Those are different things. Thresholding is applied when your GA4 property meets all of these conditions:
- You have collected some data through Google Signals (by enabling them at some point)
- Your reporting identity is either Blended or Observed
- AND a report (that you’re looking at) contains rows with small user/event/session numbers (I don’t know the exact number, but I would say it should be 50 or below)
In that case, rows with small numbers will be hidden and not displayed in the report (even though that data is still available somewhere in the background).
To avoid data thresholding in the future, don’t enable Google Signals (if you don’t plan to use remarketing features or demographic reports in GA). If you have already done it, you can change the reporting identity to device-based whenever you want, and you are free to switch between them. This setting does not impact the data you have collected, it affects the way numbers are calculated.
42 COMMENTS
Does thresholding applies to data from API as well?
Didn't try it yet
Yes it does apply, but not in BigQuery link.
Analytics Mania, you are a hero.
Will any of this matter if Google goes all in on Bard and changes how source links are displayed to (and valued by) users?
Your guess is a as good as mind
Hi,
as Jakub said, I assume that the data from GA4 that we display on Looker Studio will not display either because of the threshold.
Thank you.
Hi Julius, very interesting blogpost, thank you!
So if I understand correctly: if you are not using user ID, you can use Google Signals ánd avoid thresholding at the same time (using device-based reporting)?
No. User ID is not causing thresholding problems. Google signals does. So not using user ID but having Google signals will not avoid thresholding.
Hey Julius
Is there another way to get away from that? Because I use GA4 audiences for remarketing purposes on google ads so I don't want to turn off google signals on ga4.
Thanks,
All workarounds are explained in the blog post
Hi Julius,
thank you for your post.
do you think specified audiences (too narrow) can make this issue last longer than it should when signaling is applied?
Thank you in advance,
Hi Julius, so grateful for your posts. I'm running into this thresholding vs google signals issue with a lot of my clients.
Do you think the best strategy would be to turn Google Signals on, but also switch over to using Device-based reporting identity?
It seems that this way, Google Signals could be used for audience building for Google Ads (for example), but the GA4 reports would not have thresholding applied.
I usually try to not enable signals at all. And then build remarketing audiences with a basic Google Ads remarketing tag.
If the company has more complex requirements where more advanced audiences are needed, then Google Signals will be enabled and then I would be switching between reporting identities (if Thresholding causes a lot of problems).
Hi,
Pls tell me if I need to disable goggle signals? Is it a good thing and shall i ignore this ?
Pls guide.
Thanks,
Bhawana
I already explained in the guide what to do about it
Hi Julius, I've switched to 'Device based' reporting identity but my custom report is still sampled. Do custom dimensions that are triggered every time a page is viewed affect data sampling? I'm not using them as dimensions in my report, but does the fact that they are triggered in the background affect it?
Thresholding and data sampling are two different things. This blog post is about thresholding.
Does the threshold shot less data in the graphics too? Like a drop in the users trend, which is not seen in Google Search Console o Google ads in the clicks?
Thank you for deep explanation of the topic. Great job.
Hello,
So if I get it right, when analyzing users, sessions etc. it is more accurate to check reports with the observed or advanced reporting ID, but to go into details and analyzing events for example, it is better to be based on device model only. Is this right ?
Thanks for your help and this blogpost by the way.
Thank you for great explanation. One question: is it possible, that data thresholding is causing, that I have more conversions in traffic aqusition marked as unassigned /(not set)?
Most likely no
If we switch to device-based, will we stop seeing demographic data?
Demographic details report will still show data.
Brilliant explanation. Thanks for this.
Question - Does thresholding also limit events data (e.g. purchase events)?
Thank you for the explanation. If I have the threshold this may be causing some purchase events not to show?
What if you run two properties simultaneously? One with signals for advertising purposes and one without for reporting purposes.
I had this Thresholding applied option when I enabled analytics on firebase for my iOS app, I think it's related to Apple policy.
Thanks for this article it was really helpful for me.
Thanks for the article. Quite insightful. My only question is on historic data comparison. It seems that enabling/disabling thresholding would make it difficult to compare historic data with current data, wouldn't it, given that the displayed data changes? Or am I missing something?
Hi.
First thanks for your super practical articles.
If I choose device-based option, it changes google ads remarketing preferences or may take damage?
If I delete analytics access and add the website as new in another account. Will this action solve thresholding problem?
Thank you, this solved our issue - we'd only just noticed the lack of results for sparsely-viewed pages!
I was looking for an answer to why the total user count ("Users") was different in every report even though they are all defined as "The total number of active users".
There were huge discrepancies between reports for some sites!
Like, 100k difference in a month.
I found your solution here to be the solution to my reporting question as well. I guess it makes sense, but also.. it's not very clear at the same time.
I've found that thresholding is applied to the API data even if Signals is off. Or maybe it's not thresholding or sampling or any other name they have for the same thing, but it's definitely there. Pull a report with just Sessions and it's a dead match. Add Region and rows will suddenly disappear.
I got around some of this by creating separate profiles with filters applied. Apply the Region filter ahead of time and the numbers still match up. It's a pain, but when Region is a client priority 1 it's worth the effort. Then just accept the rest of the data fudging they do since as you say it's like 5% or so.
Just passing by to say THANK YOU! <3
I've been struggling with this for 3 days.
Could thresholding data cause a drop in all of the traffic sources on both GA4 and Universal Analytics?
Hello Julius, I have thresholding applied on several pages. They are displayed in exploration though. I need to fire a custom event when users reach those pages. Will the event be fired even though the reports does not show such pages?
The event will be fired
Is reporting identity impacting BQ export?
From the article I understood that this is not the case however this reporting identity is the only thing that (I found) differs my two GAs (I know what should be data levels there) and one of them is missing over half of the PVs (no matter the device). First GA is Blended and second is also Blended but modelling is not available (yet?). I have Google Signals off.
reporting identity does not impact BQ export
Thanks for making this post, Julius! I had the Thresholding Applied warning on a client's GA4 account. I noticed that none of the conversions were showing in the Engagement -> Conversions section. I also noticed that none of my client's confirmation pages were showing up in the Engagement -> Pages and screens section. I followed your instructions and changed the reporting identity to Device-based. As soon as I did that, the conversions that I created were showing up as well as the urls in the Engagement -> Pages and screens section. I also turned off Include Google signals in reporting identity. I'm not sure what kind of difference that will make, but I followed your instructions anyway. Your post was a lifesaver. Thanks again!
Thank you Julius, this is great! It's nice that we can toggle the views in 'reporting identity' however, do we know if the 'reporting identity' has any impact on GA4 Conversions used in Google Ads. For example, if we're using GA4 conversions in Google Ads and 'reporting identity' is not set to device based, will the conversions shown in Google Ads also be thresholded?