Set-up and run snowplow-fractribution package
The snowplow_web_page_views
(created by the snowplow_web package) and sample_events_attribution
tables will be used to demonstrate how to set-up and run the snowplow_fractribution
dbt package to create the tables needed for attribution modeling.
As a reminder, we will use the term channels throughout this Accelerator, however you are free to define these however you like and can include other things, e.g. specific campaigns.
Step 1: Set-up Variables
The snowplow_fractribution
dbt package comes with a list of variables specified with a default value that you may need to overwrite in your own dbt project dbt_project.yml
file. For details you can have a look at our docs which contains descriptions and default values of each variable.
For the sake of simplicity we have selected the variables that you will most likely need to overwrite, the rest can be changed at a later stage if and when it is needed.
snowplow__conversion_window_start_date
: The start date in UTC for the window of conversions to includesnowplow__conversion_window_end_date
: The end date in UTC for the window of conversions to includesnowplow__conversion_hosts
: url_hosts to considersnowplow__path_lookback_steps
: The limit for the number of marketing channels to look at before the conversion (default is 0 = unlimited)snowplow__path_lookback_days
: Restrict the model to marketing channels within this many days of the conversion (values of 30, 14 or 7 are recommended)snowplow__path_transforms
: A dictionary of path transforms and their arguments (see below section Path Transform Options)snowplow__consider_intrasession_channels
: Boolean. If false, only considers the channel at the start of the session (i.e. first page view). If true, considers multiple channels in the conversion session as well as historically.snowplow__channels_to_exclude
: List of channels to exclude from analysis (empty to keep all channels). For example, users may want to exclude the ‘Direct’ channel from the analysis.snowplow__channels_to_include
: List of channels to include in the analysis (empty to keep all channels). For example, users may want to include the ‘Direct’ channel only in the analysis.snowplow__page_views_source
: The page views table to use.snowplow__conversions_source
: The table to use for your conversion events.
The default source schemas and tables used by the snowplow_fractribution
package are:
- derived.snowplow_web_page_views for the page_views data (page_views_source)
- atomic.events for the Snowplow transaction event data (conversions_source)
If either of these differ in your warehouse, set the correct names as variables in your own dbt_project.yml
, e.g.:
snowplow__page_views_source
:custom_schema_derived.snowplow_web_page_views
snowplow__conversions_source
:custom_schema_derived.conversions
The page_views_source is straightforward in that it will need to match the derived page_views table generated by the snowplow-web
data model.
As for the conversions_source
there are multiple options:
You can leave the default atomic events table as is then use the
conversion_clause()
andconversion_value()
macros to define the field which contains the conversion value and the logic to filter on them only.To take advantage of our out of the box data modeling and avoid the potentially costly filter within the events table you may rather wish to use the in-built conversions modeling as part of our web package. Once you have already defined this, you can set
snowplow__conversions_source
to"{{ ref('snowplow_web_sessions') }}"
and adjust theconversion_clause()
andconversion_value()
to refer to your specific conversion column this will generate.Another of the box option is to use the
snowplow-ecommerce
data model’ssnowplow_ecommerce_transaction_interactions
table which will generate thetransaction_revenue
field needed to calculate the total revenue from conversions. For more details on how to achieve this check out the E-commerce accelerator.{{ config( tags=["this_run"], sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')), ) }} with revenue as ( select e.event_id, e.derived_tstamp, a.contexts_com_snowplowanalytics_snowplow_ecommerce_transaction_1[0]:revenue::decimal(7,2) as transaction_revenue from {{ ref('snowplow_web_base_events_this_run') }} as e where event_name = 'snowplow_ecommerce_action' and transaction_revenue is not null ) select * from revenue
For the sake of simplicity, if you use the sample data as a data source all you need to do is set
snowplow__conversions_source: 'atomic.sample_events_attribution'
in yourdbt_project.yml
as it comes with atr_total
field with the revenue data.
💡 You only need to set the variables for those that differ from the default.
Below is an example snippet of these variables in your dbt_project.yml
:
vars:
snowplow_fractribution:
snowplow__conversion_window_start_date: '2022-06-03'
snowplow__conversion_window_end_date: '2022-08-01'
snowplow__conversion_hosts: ['a.com']
snowplow__path_lookback_steps: 0
snowplow__path_lookback_days: 30
snowplow__path_transforms: {'exposure_path': null}
snowplow__consider_intrasession_channels: false
snowplow__channels_to_exclude: []
snowplow__channels_to_include: []
snowplow__page_views_source: 'custom_schema_derived.snowplow_web_page_views'
snowplow__conversions_source: 'atomic.sample_events_attribution'
Path Transform Options
Paths to conversion are often similar, but not identical. As such, path transforms reduce unnecessary complexity in similar paths before running the attribution algorithm.
There are six path transform options available:
exposure_path
(default in this package): the same events in succession are reduced to one:A → A → B
becomesA → B
. A compromise between first and unique.unique_path
: all events in a path are treated as unique (no reduction of complexity). Best for smaller datasets (small lookback window) without a lot of retargeting.first_path
: keep only the first occurrence of any event:A → B → A
becomesA → B
. Best for brand awareness marketing.frequency_path
: keep a count of the events’ frequency:A → A → B
becomesA(2) → B
. Best when there is a lot of retargeting.remove_if_last_and_not_all
: requires a channel to be added as a parameter, which gets removed from the latest paths unless it removes the whole path as it is trying to reach a non-matching channel parameter: E.g target element: A path:A → B → A → A
becomesA → B
.remove_if_not_all
: requires a channel to be added as a parameter, which gets removed from the path altogether unless it would result in the whole path’s removal: E.g target element: A path:A → B → A → A
becomesB​
.
Snowflake Only If you are using Snowflake, you can automatically run the python scripts using Snowpark when running the dbt package. This is done using macros that create and run a stored procedure on Snowpark after the dbt models have completed. To enable this you need to set some additional variables. For example, to enable this and use the
last_touch
attribution model:# dbt_project.yml ... vars: snowplow_fractribution: snowplow__run_python_script_in_snowpark: true snowplow__attribution_model_for_snowpark: 'last_touch'
Step 2: Configure macros
Configure the conversion_clause macro
The conversion_macro specifies how to filter Snowplow events to only conversion events. How this is filtered will depend on your definition of a conversion. The default is filtering to events where tr_total > 0
, but this could instead filter on event_name = 'checkout'
, for example. The sample data uses tr_total
so you don’t need to change anything for now if you rely on that. If you are using the e-commerce model, you will still need to set this to enable the models to run (even though all events are conversions in the e-commerce model), just change it to transaction_revenue > 0
.
If you wish to change this filter, copy the conversion_clause.sql
file from the macros folder in the snowplow_fractribution
package (at [dbt_project_name]/dbt_packages/snowplow_fractribution/macros/conversion_clause.sql
) and add it to the macros folder of your own dbt project. Update the filter and save the file.
Configure the conversion_value macro
The conversion_value macro specifies either a single column or a calculated value that represents the value associated with that conversion. The default is tr_total
, but revenue
or a calculation using revenue
and discount_amount
from the default e-commerce schema, for example, could similarly be used.
If you wish to change this value, copy the conversion_value.sql
file from the macros folder in the snowplow_fractribution
package (at [dbt_project_name]/dbt_packages/snowplow_fractribution/macros/conversion_value.sql
) and add it to the macros folder of your own dbt project. Update the value and save the file.
Configure the channel_classification macro (Optional)
The channel_classification macro is used to perform channel classifications. This can be altered to generate your expected channels if they differ from the channels generated in the default macro. It is highly recommended that you examine and configure this macro when using your own data, as the default values will not consider any custom marketing parameters.
Ensure that any channels listed in the snowplow__channels_to_exclude
variable are specified in this channel classification macro, as events are filtered based on the channels created here. (E.g. if you wish to exclude events from the Direct
channel, you first need to have the events classified as Direct
in order to subsequently filter them out).
If you wish to change the channel classification macro, copy the channel_classification.sql
file from the macros folder in the snowplow_fractribution package (at [dbt_project_name]/dbt_packages/snowplow_fractribution/macros/channel_classification.sql
) and add it to the macros folder of your own dbt project. Update the SQL and save the file.
Step 3: Run the model
Execute the following either through your CLI or from within dbt Cloud
dbt run --select snowplow_fractribution
This should take a couple of minutes to run.
Step 4: Check the output schema
Head to the SQL editor of your choice (e.g.: Snowflake Web UI) to check the model’s output. You should be able to see the data under the _derived schema.
Step 5: Explore the data created by your dbt models
Take some time to familiarize yourself with the derived tables. These tables are used in the next step to attribute revenue to channels. Tables output by the snowplow_fractribution
dbt package are:
snowplow_fractribution_channel_counts
: Number of events grouped by channel, campaign, source and medium.snowplow_fractribution_channel_spend
: Spend on each channel, used in ROAS calculations.snowplow_fractribution_conversions_by_customer_id
: Conversion revenue for each conversion, along with the associatedcustomerid
.snowplow_fractribution_path_summary
: Summary of different path combinations and associated conversion/non-conversions.snowplow_fractribution_paths_to_conversion
: Path combinations leading to conversion.snowplow_fractribution_paths_to_non_conversion
: Path combinations leading to non-conversion.snowplow_fractribution_sessions_by_customer_id
: Channel information by session timestamp, where an event timestamp is considered as the session start.
Next we will run a Python script to create the main output table, the snowplow_fractribution_report_table
.