Set-up and run snowplow-fractribution package

The snowplow_web_page_views (created by the snowplow_web package) and sample_events_attribution tables will be used to demonstrate how to set-up and run the snowplow_fractribution dbt package to create the tables needed for attribution modeling.

As a reminder, we will use the term channels throughout this Accelerator, however you are free to define these however you like and can include other things, e.g. specific campaigns.

Step 1: Set-up Variables

The snowplow_fractribution dbt package comes with a list of variables specified with a default value that you may need to overwrite in your own dbt project dbt_project.yml file. For details you can have a look at our docs which contains descriptions and default values of each variable.

For the sake of simplicity we have selected the variables that you will most likely need to overwrite, the rest can be changed at a later stage if and when it is needed.

snowplow__conversion_window_start_date: The start date in UTC for the window of conversions to include
snowplow__conversion_window_end_date: The end date in UTC for the window of conversions to include
snowplow__conversion_hosts: url_hosts to consider
snowplow__path_lookback_steps: The limit for the number of marketing channels to look at before the conversion (default is 0 = unlimited)
snowplow__path_lookback_days: Restrict the model to marketing channels within this many days of the conversion (values of 30, 14 or 7 are recommended)
snowplow__path_transforms: A dictionary of path transforms and their arguments (see below section Path Transform Options)
snowplow__consider_intrasession_channels: Boolean. If false, only considers the channel at the start of the session (i.e. first page view). If true, considers multiple channels in the conversion session as well as historically.
snowplow__channels_to_exclude: List of channels to exclude from analysis (empty to keep all channels). For example, users may want to exclude the ‘Direct’ channel from the analysis.
snowplow__channels_to_include: List of channels to include in the analysis (empty to keep all channels). For example, users may want to include the ‘Direct’ channel only in the analysis.
snowplow__page_views_source: The page views table to use.
snowplow__conversions_source: The table to use for your conversion events.

The default source schemas and tables used by the snowplow_fractribution package are:

derived.snowplow_web_page_views for the page_views data (page_views_source)
atomic.events for the Snowplow transaction event data (conversions_source)

If either of these differ in your warehouse, set the correct names as variables in your own dbt_project.yml, e.g.:

snowplow__page_views_source: custom_schema_derived.snowplow_web_page_views
snowplow__conversions_source: custom_schema_derived.conversions

The page_views_source is straightforward in that it will need to match the derived page_views table generated by the snowplow-web data model.

As for the conversions_source there are multiple options:

You can leave the default atomic events table as is then use the conversion_clause() and conversion_value() macros to define the field which contains the conversion value and the logic to filter on them only.
To take advantage of our out of the box data modeling and avoid the potentially costly filter within the events table you may rather wish to use the in-built conversions modeling as part of our web package. Once you have already defined this, you can set snowplow__conversions_source to "{{ ref('snowplow_web_sessions') }}" and adjust the conversion_clause() and conversion_value() to refer to your specific conversion column this will generate.

Another of the box option is to use the snowplow-ecommerce data model’s snowplow_ecommerce_transaction_interactions table which will generate the transaction_revenue field needed to calculate the total revenue from conversions. For more details on how to achieve this check out the E-commerce accelerator.


{{
    config(
        tags=["this_run"],
        sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
    )
}}

with revenue as (
    select
        e.event_id,
        e.derived_tstamp,
        a.contexts_com_snowplowanalytics_snowplow_ecommerce_transaction_1[0]:revenue::decimal(7,2) as transaction_revenue

    from {{ ref('snowplow_web_base_events_this_run') }} as e

    where
    event_name = 'snowplow_ecommerce_action'
    and transaction_revenue is not null
)

select *

from revenue

For the sake of simplicity, if you use the sample data as a data source all you need to do is set snowplow__conversions_source: 'atomic.sample_events_attribution' in your dbt_project.yml as it comes with a tr_total field with the revenue data.

💡 You only need to set the variables for those that differ from the default.

Below is an example snippet of these variables in your dbt_project.yml:

vars:
  snowplow_fractribution:
    snowplow__conversion_window_start_date: '2022-06-03'
    snowplow__conversion_window_end_date: '2022-08-01'
    snowplow__conversion_hosts: ['a.com']
    snowplow__path_lookback_steps: 0
    snowplow__path_lookback_days: 30
    snowplow__path_transforms: {'exposure_path': null}
    snowplow__consider_intrasession_channels: false
    snowplow__channels_to_exclude: []
    snowplow__channels_to_include: []
    snowplow__page_views_source: 'custom_schema_derived.snowplow_web_page_views'
    snowplow__conversions_source: 'atomic.sample_events_attribution'

Path Transform Options

Paths to conversion are often similar, but not identical. As such, path transforms reduce unnecessary complexity in similar paths before running the attribution algorithm.

There are six path transform options available:

exposure_path (default in this package): the same events in succession are reduced to one: A → A → B becomes A → B. A compromise between first and unique.
unique_path: all events in a path are treated as unique (no reduction of complexity). Best for smaller datasets (small lookback window) without a lot of retargeting.
first_path: keep only the first occurrence of any event: A → B → A becomes A → B. Best for brand awareness marketing.
frequency_path: keep a count of the events’ frequency: A → A → B becomes A(2) → B. Best when there is a lot of retargeting.
remove_if_last_and_not_all: requires a channel to be added as a parameter, which gets removed from the latest paths unless it removes the whole path as it is trying to reach a non-matching channel parameter: E.g target element: A path: A → B → A → A becomes A → B.
remove_if_not_all: requires a channel to be added as a parameter, which gets removed from the path altogether unless it would result in the whole path’s removal: E.g target element: A path: A → B → A → A becomes B.

Snowflake Only If you are using Snowflake, you can automatically run the python scripts using Snowpark when running the dbt package. This is done using macros that create and run a stored procedure on Snowpark after the dbt models have completed. To enable this you need to set some additional variables. For example, to enable this and use the last_touch attribution model:
# dbt_project.yml
...
vars:
  snowplow_fractribution:
   snowplow__run_python_script_in_snowpark: true
   snowplow__attribution_model_for_snowpark: 'last_touch'

Step 2: Configure macros

Configure the conversion_clause macro

The conversion_macro specifies how to filter Snowplow events to only conversion events. How this is filtered will depend on your definition of a conversion. The default is filtering to events where tr_total > 0, but this could instead filter on event_name = 'checkout', for example. The sample data uses tr_total so you don’t need to change anything for now if you rely on that. If you are using the e-commerce model, you will still need to set this to enable the models to run (even though all events are conversions in the e-commerce model), just change it to transaction_revenue > 0.

If you wish to change this filter, copy the conversion_clause.sql file from the macros folder in the snowplow_fractribution package (at [dbt_project_name]/dbt_packages/snowplow_fractribution/macros/conversion_clause.sql) and add it to the macros folder of your own dbt project. Update the filter and save the file.

Configure the conversion_value macro

The conversion_value macro specifies either a single column or a calculated value that represents the value associated with that conversion. The default is tr_total, but revenue or a calculation using revenue and discount_amount from the default e-commerce schema, for example, could similarly be used.

If you wish to change this value, copy the conversion_value.sql file from the macros folder in the snowplow_fractribution package (at [dbt_project_name]/dbt_packages/snowplow_fractribution/macros/conversion_value.sql) and add it to the macros folder of your own dbt project. Update the value and save the file.

Configure the channel_classification macro (Optional)

The channel_classification macro is used to perform channel classifications. This can be altered to generate your expected channels if they differ from the channels generated in the default macro. It is highly recommended that you examine and configure this macro when using your own data, as the default values will not consider any custom marketing parameters.

Ensure that any channels listed in the snowplow__channels_to_exclude variable are specified in this channel classification macro, as events are filtered based on the channels created here. (E.g. if you wish to exclude events from the Direct channel, you first need to have the events classified as Direct in order to subsequently filter them out).

If you wish to change the channel classification macro, copy the channel_classification.sql file from the macros folder in the snowplow_fractribution package (at [dbt_project_name]/dbt_packages/snowplow_fractribution/macros/channel_classification.sql) and add it to the macros folder of your own dbt project. Update the SQL and save the file.

Step 3: Run the model

Execute the following either through your CLI or from within dbt Cloud

dbt run --select snowplow_fractribution

This should take a couple of minutes to run.

Step 4: Check the output schema

Head to the SQL editor of your choice (e.g.: Snowflake Web UI) to check the model’s output. You should be able to see the data under the _derived schema.

Step 5: Explore the data created by your dbt models

Take some time to familiarize yourself with the derived tables. These tables are used in the next step to attribute revenue to channels. Tables output by the snowplow_fractribution dbt package are:

snowplow_fractribution_channel_counts: Number of events grouped by channel, campaign, source and medium.
snowplow_fractribution_channel_spend: Spend on each channel, used in ROAS calculations.
snowplow_fractribution_conversions_by_customer_id: Conversion revenue for each conversion, along with the associated customerid.
snowplow_fractribution_path_summary: Summary of different path combinations and associated conversion/non-conversions.
snowplow_fractribution_paths_to_conversion: Path combinations leading to conversion.
snowplow_fractribution_paths_to_non_conversion: Path combinations leading to non-conversion.
snowplow_fractribution_sessions_by_customer_id: Channel information by session timestamp, where an event timestamp is considered as the session start.