Unlocking GA4 Data, Say Goodbye to SQL and Get All the Insights You Need!
using chatGPT to run SQL query effortlessly
I am ashamed; as an analyst, a core part of my job is to write proficient SQL. In fact, if you were to pose that question to ChatGPT, it would confirm the same.
What are some technical skill sets required for an analyst?
chatGPT response:
Database Management: Understanding of SQL (Structured Query Language) for querying and managing databases. Knowledge of relational database concepts is important.
I have to admit that from time to time, I do feel a bit of a headache when writing SQL, but I can still confidently say that I am quite skilled at it.
However, today, I have finally lived long enough to see myself become a villain. I crafted this prompt precisely so that I would never need to write any SQL ever again, period.
Are you certain you want to embrace the dark side with me?
Okay… here’s the prompt
I have a table in BigQuery as such data schema:
event_date
event_timestamp
event_name
event_params
event_previous_timestamp
event_value_in_usd
event_bundle_sequence_id
event_server_timestamp_offset
user_id
user_pseudo_id
privacy_info
user_properties
user_first_touch_timestamp
user_ltv
device
geo
app_info
traffic_source
stream_id
platform
event_dimensions
ecommerce
items
collected_traffic_source
is_active_user
event_params
items.item_id
items.item_name
items.item_brand
items.item_variant
items.item_category
items.item_category2
items.item_category3
items.item_category4
items.item_category5
items.price_in_usd
items.price
items.quantity
items.item_revenue_in_usd
items.item_revenue
items.item_refund_in_usd
items.item_refund
items.coupon
items.affiliation
items.location_id
items.item_list_id
items.item_list_name
items.item_list_index
items.promotion_id
items.promotion_name
items.creative_name
items.creative_slot
privacy_info.analytics_storage
privacy_info.ads_storage
privacy_info.uses_transient_token
user_properties
user_ltv.revenue
user_ltv.currency
device.category
device.mobile_brand_name
device.mobile_model_name
device.mobile_marketing_name
device.mobile_os_hardware_model
device.operating_system
device.operating_system_version
device.vendor_id
device.advertising_id
device.language
device.is_limited_ad_tracking
device.time_zone_offset_seconds
device.browser
device.browser_version
device.web_info
device.web_info.browser
device.web_info.browser_version
device.web_info.hostname
traffic_source.name
traffic_source.medium
traffic_source.source
geo.continent
geo.country
geo.region
geo.city
geo.sub_continent
geo.metro
app_info.id
app_info.version
app_info.install_store
app_info.firebase_app_id
app_info.install_source
event_dimensions.hostname
ecommerce.total_item_quantity
ecommerce.purchase_revenue_in_usd
ecommerce.purchase_revenue
ecommerce.refund_value_in_usd
ecommerce.refund_value
ecommerce.shipping_value_in_usd
ecommerce.shipping_value
ecommerce.tax_value_in_usd
ecommerce.tax_value
ecommerce.unique_items
ecommerce.transaction_id
collected_traffic_source.manual_campaign_id
collected_traffic_source.manual_campaign_name
collected_traffic_source.manual_source
collected_traffic_source.manual_medium
collected_traffic_source.manual_term
collected_traffic_source.manual_content
collected_traffic_source.gclid
collected_traffic_source.dclid
collected_traffic_source.srsltid
under event_params, it is a nested structure with a key & value pair, where under value, there is 4 types, one event_params can have many key & value pairs
structure as below:
event_params.key
event_params.value
event_params.value.string_value
event_params.value.int_value
event_params.value.float_value
event_params.value.double_value
under user_properties, it is a nested structure with a key & value pair, where under value, there is 5 types, one user_properties can have many key & value pairs
structure as below:
user_properties.key
user_properties.value
user_properties.value.string_value
user_properties.value.int_value
user_properties.value.float_value
user_properties.value.double_value
user_properties.value.set_timestamp_micros
under items it is list of item_id in an array structure, each item_id can up to have 25 dimension , one items can have many item_id, the item_id and the rest of the dimension is in 1:1 relationship.
structure as below:
items.item_id
items.item_name
items.item_brand
items.item_variant
items.item_category
items.item_category2
items.item_category3
items.item_category4
items.item_category5
items.price_in_usd
items.price
items.quantity
items.item_revenue_in_usd
items.item_revenue
items.item_refund_in_usd
items.item_refund
items.coupon
items.affiliation
items.location_id
items.item_list_id
items.item_list_name
items.item_list_index
items.promotion_id
items.promotion_name
items.creative_name
items.creative_slot
Finally, remember only the items, user_properties and event_params is a nested data, the rest is non-nested.
Also, for each event_params, there could be multiple event_params.key, meaning if you used subquery in the WHERE condition, it is likely that the other evet_params.key will not be accessible.
When selecting the event_name with specific event_params.key condition but pulling a different event_params.key within the same event, it is recommend to create a table to get those key value pairs first, then query on that table to get the final output.
here is an example of getting of getting the key "source" from the event_params and count the unique number of user_pseudo_id.
SELECT
(SELECT params.value.string_value FROM UNNEST (event_params) AS params WHERE params.key = 'source') AS session_source,
COUNT( DISTINCT user_pseudo_id) AS unique_users
FROM
`YOUR_TABLE_*`
WHERE_TABLE_SUFFIX BETWEEN "20230101" AND "20230701"
GROUP BY
session_source
Now, based on the above information, can you write me a standard SQL to get the event number of event with name page_view and the corresponding page_location in the event_params?
I only want to get those event where the event_params .key is with source equal facebook.
All you need to do, is to change the final setences.
Now, based on the above information, can you write me a standard SQL to get the event number of event with name page_view and the corresponding page_location in the event_params?
into your own request.
There you go. You are welcome.
You have made me jobless.
Testing
Joke aside, althogh this prompt is powerful, it still hit 80-90% of the time, so you will still need to modify it a bit.
Here’s a sample result I got from chatGPT
Getting page view from page that originated from specific source
My input prompt:
Now, based on the above information, can you write me a standard SQL to get the event number of event with name page_view and the corresponding page_location in the event_params? I only want to get those event where the event_params.key is with source equal facebook.
ChatGPT response:
WITH EventsWithFacebookSource AS (
SELECT
event_name,
(SELECT params.value.string_value FROM UNNEST (event_params) AS params WHERE params.key = 'source') AS event_source,
(SELECT params.value.string_value FROM UNNEST (event_params) AS params WHERE params.key = 'page_location') AS page_location
FROM
`YOUR_TABLE_*`
WHERE
_TABLE_SUFFIX BETWEEN "20230101" AND "20230701"
AND event_name = 'page_view'
)
SELECT
event_source,
page_location,
COUNT(*) AS event_count
FROM
EventsWithFacebookSource
WHERE
event_source = 'facebook'
GROUP BY
event_source, page_location
ORDER BY
event_count DESC;
This is a good one.
###
My input prompt:
can you write me the SQL to list all source (source is from event_params.key) and the corresponding event count?
ChatGPT response:
SELECT
(SELECT params.value.string_value FROM UNNEST(event_params) AS params WHERE params.key = 'source') AS session_source,
COUNT(*) AS event_count
FROM
`YOUR_TABLE_*`
WHERE
_TABLE_SUFFIX BETWEEN "20230101" AND "20230701"
GROUP BY
session_source
ORDER BY
event_count DESC;
This one is good too!
Getting the top page view for user come from specific source
My input prompt:
Now, based on the above information, can you write me a standard SQL to get the top 10 page_location? page_location is a key under the string_value in the event page_view, I only want to get those event where the event_params.key is with source equal facebook.
ChatGPT response:
WITH PageViews AS (
SELECT
user_pseudo_id,
(SELECT params.value.string_value FROM UNNEST (event_params) AS params WHERE params.key = 'source') AS event_source,
(SELECT params.value.string_value FROM UNNEST (event_params) AS params WHERE params.key = 'page_location') AS page_location
FROM
`YOUR_TABLE_*`
WHERE
_TABLE_SUFFIX BETWEEN "20230101" AND "20230701"
AND event_name = 'page_view'
)
SELECT
page_location,
COUNT(DISTINCT user_pseudo_id) AS unique_users
FROM
PageViews
WHERE
event_source = 'facebook'
GROUP BY
page_location
ORDER BY
unique_users DESC
LIMIT
10;
this is a good one.
Count the session by source and see their respective scroll event per session.
My input prompt:
Now, based on the above information, can you group the session by source, session is defined as distinctive ga_session_id under event_params.value.int_value, then calculate how many of those session have completed a ‘scroll’ event_name?
ChatGPT response:
WITH SessionSources AS (
SELECT
(SELECT params.value.string_value FROM UNNEST (event_params) AS params WHERE params.key = 'source') AS session_source,
(SELECT params.value.int_value FROM UNNEST (event_params) AS params WHERE params.key = 'ga_session_id') AS session_id
FROM
`YOUR_TABLE_*`
WHERE
_TABLE_SUFFIX BETWEEN "20230101" AND "20230701"
)
SELECT
session_source,
COUNT(DISTINCT session_id) AS sessions_with_scroll_event
FROM
SessionSources
WHERE
session_id IN (
SELECT
(SELECT params.value.int_value FROM UNNEST (event_params) AS params WHERE params.key = 'ga_session_id') AS session_id
FROM
`YOUR_TABLE_*`
WHERE
_TABLE_SUFFIX BETWEEN "20230101" AND "20230701" AND
event_name = 'scroll'
)
GROUP BY
session_source
Not exactly what I want, as it only include session with the event, if I use this directly, I will get 100% conversion rate. I would give it 70% completion!
Conclusion
Utilizing ChatGPT for SQL can undoubtedly reduce the time required to write a query, and it also has the potential to lead to more creative and potentially efficient query formulations. (One issue is that the SQL generated by ChatGPT often lacks reader-friendliness.)
In summary, while technical knowledge and skills do hold value, they can depreciate over time. While possessing technical skills remains crucial (as evident in their necessity for quality assurance, modifications, and drafting the right prompt), understanding the ‘how’ and identifying the correct solutions have grown increasingly important.
Being an adept analyst entails more than just possessing the skill set to perform a job or being a subject matter expert. I believe that the finest analysts are those who can pose the right questions, pinpoint the genuine problems, and subsequently provide fitting solutions that create a meaningful impact.