Explore Snowplow data
Data should now be loaded into your warehouse. In this section we will take a closer look at the output to mitigate data issues and get familiar with the derived tables.
Step 1: Check the output schemas
Head to the SQL editor of your choice (e.g.: Snowflake Web UI) to check the model’s output. You should be able to see three new schemas created:
- [your_custom_schema]_scratch: drop and recompute models that aid the incremental run
- [your_custom_schema]_derived: main output models you can use in your downstream models and reporting
- [your_custom_schema]_snowplow_manifest: tables that help the integrity and core incremental logic of the model
Step 2: Explore your data
Take some time to familiarize yourself with the derived tables. You could run a few simple queries such as the ones listed below. Make sure to modify the schema to be aligned with your custom dbt schema.
Find out the number of page reads using derived.snowplow_web_page_views:
WITH READS AS (
SELECT
PAGE_TITLE,
COUNT(*)
FROM
YOUR_CUSTOM_SCHEMA_DERIVED.SNOWPLOW_WEB_PAGE_VIEWS
WHERE
ENGAGED_TIME_IN_S > 60
AND VERTICAL_PIXELS_SCROLLED > 5000
GROUP BY 1
ORDER BY 2 DESC
)
SELECT * FROM READS
Calculate the bounce rate using derived.snowplow_sessions:
WITH BOUNCE_RATE AS (
SELECT
FIRST_PAGE_URLPATH,
COUNT(DISTINCT DOMAIN_SESSIONID) AS SESSIONS,
COUNT(DISTINCT CASE WHEN PAGE_VIEWS = 1 THEN DOMAIN_SESSIONID END) / COUNT(DISTINCT DOMAIN_SESSIONID) AS BOUNCE_RATE
FROM YOUR_CUSTOM_SCHEMA_DERIVED.SNOWPLOW_WEB_SESSIONS
GROUP BY 1
ORDER BY SESSIONS DESC
)
SELECT * FROM BOUNCE_RATE
Find out details about the highest engaged user using derived.snowplow_users:
WITH ENGAGEMENT AS (
SELECT *
FROM YOUR_CUSTOM_SCHEMA_DERIVED.SNOWPLOW_WEB_USERS
ORDER BY ENGAGED_TIME_IN_S DESC
LIMIT 1
)
SELECT * FROM ENGAGEMENT
Check out the database section of the documentation site for a full breakdown of what the output should look like.