Are you tired of spending a fortune on querying session data in Google Analytics 4 (GA4) BigQuery (BQ)? Well, I’ve got great news for you! I’ve discovered a clever workaround that can reduce your GA4 BQ SQL cost by over 90% when counting sessions. Let me show you how.

The Problem with Counting Sessions in GA4

Counting sessions in GA4 typically requires unnesting the entire dataset to retrieve the session ID. This process can be time-consuming and expensive, especially for large datasets. But fear not, as I have found a solution that can significantly reduce your costs.

Introducing the Campaign ID Method

Instead of relying on the traditional session counting approach, I propose using the campaign ID as a substitute. Why campaign ID, you may ask? Well, campaign ID is rarely used and is automatically pulled at the event level in the pipeline. This gives us a unique opportunity to leverage it for session counting.

The Methodology Made Simple

Here’s a step-by-step breakdown of how you can implement this cost-saving technique:

  1. Create a custom HTML code that generates cookies to mimic session behavior. These cookies should have a 30-minute expiry time.
  2. Capture the cookies and place them in the campaign ID parameter.
  3. When querying the data, use collected_traffic_source.manual_campaign_id as the session ID instead of the traditional approach.
  4. Enjoy the cost savings and improved efficiency!

Step by step

First, setup the start time cookies which help setting up the session id.

step1

<script>
function captureInitialVisitStartTime(cookie_name) {
var sessionData = getSessionData(cookie_name);
if (!sessionData) {
    var currentTime = Math.floor(Date.now() / 1000);
    var expiryTime = new Date();
    expiryTime.setTime(expiryTime.getTime() + (30 * 60 * 1000)); // 30 minutes expiry time
    document.cookie = cookie_name + '=' + currentTime + '; expires=' + expiryTime.toUTCString() + '; path=/';
}
}

function refreshTTL(cookie_name) {
var sessionData = getSessionData(cookie_name);
if (sessionData) {
    var expiryTime = new Date();
    expiryTime.setTime(expiryTime.getTime() + (30 * 60 * 1000));  // 30 minutes expiry time
    document.cookie = cookie_name + '=' + sessionData + '; expires=' + expiryTime.toUTCString() + '; path=/';
}
}

function isSessionActive(cookie_name) {
var sessionData = getSessionData(cookie_name);
if (sessionData) {
    return true;
}
return false;
}

function getSessionData(cookie_name) {
var cookies = document.cookie.split(';');
for (var i = 0; i < cookies.length; i++) {
    var cookie = cookies[i].trim();
    if (cookie.startsWith(cookie_name + '=')) {
    var cookieValue = cookie.substring(cookie_name.length + 1);
    return cookieValue;
    }
}
return null;
}

var cookie_name = 'start_time'

// Capture initial visit start time of a user
captureInitialVisitStartTime(cookie_name);

// Refresh the TTL window whenever a user visits a page
refreshTTL(cookie_name);

// Check if a session is active for a user
var sessionActive = isSessionActive(cookie_name);
</script>

Set the cookies to be fired when container intialised

step2

Set the variable to capture the cookies value

step3

Put the value in your configuration. Done!

step4

Limitations to Consider

Like any solution, this one also has a few limitations. Here are the two main ones to keep in mind:

  1. Cookies: This method relies on cookies to save the start time and enable the value to be read by Google Tag Manager (GTM). If cookies are blocked, the start time cannot be saved, and the value cannot be read.
  2. Slight Variation in Session Numbers: It’s important to note that the session numbers may not be an exact match due to Google’s session attribution logic. However, after rigorous testing with real-life traffic data, I am confident that this session calculation is more than sufficient for accurate session counting.

Demo & Evaluation

Normal SQL (shout out to ga4sql)

Bytes processed:3.17 KB

SELECT
COUNT(DISTINCT session_id) AS sessions
FROM
(
    SELECT
    CONCAT(
        user_pseudo_id,
        (
        SELECT
            value.int_value
        FROM
            UNNEST (event_params)
        WHERE
            key = 'ga_session_id'
        )
    ) AS session_id
    FROM
`XXYYZZ.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20231030' AND '20231102' 
    GROUP BY
    session_id
)

SQL for workaround

Bytes processed:165 B

SELECT
collected_traffic_source.manual_source,
collected_traffic_source.manual_medium,
COUNT(DISTINCT collected_traffic_source.manual_campaign_id) as session,
FROM
`XXYYZZ.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20231030' AND '20231102' 
AND collected_traffic_source.manual_source IS NOT NULL
GROUP BY 
1,2

Leave aside how clean the SQL looks now, the size process is 3.17KB/165B = 19.2121212 That’s nearly 20 times difference!

In Conclusion

By utilizing this innovative campaign ID method, you can save a significant amount on your GA4 BQ SQL cost. Not only will you reduce your expenses, but you’ll also enjoy a more streamlined approach to session counting. So why wait? Implement this cost-saving technique today and reap the benefits!

Stay tuned for more valuable insights and tips on optimizing your GA4 BQ usage. Happy analyzing!