Primary features returned from the Snowplow dbt web model can be grouped into categories based on their origin:
First we need to create a view or table of users features based on their first website visit. If you are using your own conversion flag ensure you include this so we are ready to train and test our models. Conversion can be derived from a Snowplow tracked event or using other sources like Salesforce data.
If you are using the sample dataset from the Advanced Analytics for Web
accelerator, use the
converted_users CTE shown in the example below. Otherwise replace or remove this with your own conversion events or tables.
create or replace view first_touch_user_features as ( with first_page_view as ( select user_id, absolute_time_in_s, vertical_percentage_scrolled, geo_country, geo_region, br_lang, device_family, os_family from snowplow_web_page_views where page_view_in_session_index = 1 qualify row_number() over (partition by user_id order by start_tstamp) = 1 ), converted_users as ( -- Replace or remove this with your own website conversion logic select distinct user_id, True as converted from snowplow_web_page_views where (page_urlpath like '/get-started/%' or page_urlpath = '/contact-us/') and engaged_time_in_s >= 30 ) select u.user_id, u.first_page_title, u.refr_urlhost, u.refr_medium, u.mkt_medium, u.mkt_source, u.mkt_term, u.mkt_campaign, u.engaged_time_in_s, pv.absolute_time_in_s, pv.vertical_percentage_scrolled, pv.geo_country, pv.geo_region, pv.br_lang, pv.device_family, pv.os_family, coalesce(c.converted, false) as converted_user -- Your conversion flag here from snowplow_web_users u inner join first_page_view pv on u.user_id = pv.user_id left join converted_users c on u.user_id = c.user_id qualify row_number() over (partition by u.user_id order by u.start_tstamp) = 1 )
Here we have just selected a few of the features Snowplow behavioral data has to offer. You can include more columns before going through feature engineering as you start using ML to predict more and more types of user behavior, building out a richer view of each of your customers / users.
Consider adding this step as a custom model in dbt so the table is kept up to date when your ML model is in production. Read more about adding custom dbt models here .