Fractribution Quickstart
👉 Take me there! 👈
The Fractribution Package is no longer maintained, please refer to the Attribution package for marketing attribution analysis with Snowplow
Requirements#
In addition to dbt being installed and a web events dataset being available in your database:
- have
snowplow_web_page_viewsderived table available as a source (generated by the snowplow_web package) - have a table with revenue data by users (
domain_userid,user_id) that serves as another source for the fractribution calculations, you can choose either of the following options:- your
atomic.eventstable with any self-describing event that captures revenue data - the
snowplow_ecommerce_transaction_interactionsderived table generated by the snowplow_ecommerce package - any custom incremental table that is built on top of the
snowplow_webmodel that results in an aggregated revenue dataset
- your
pythonordockerinstalled (or you can use Snowpark on Snowflake)
Installation#
Make sure to create a new dbt project and import this package via the packages.yml as recommended by dbt, or add to an existing top level project. Do not fork the packages themselves.
Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages. If you are using multiple packages you may need to up/downgrade a specific package to ensure compatibility.
packages:
- package: snowplow/snowplow_fractribution
version: 0.3.6
Make sure to run the dbt deps command after updating your packages.yml to ensure you have the specified version of each package installed in your project.
Setup#
1. Override the dispatch order in your project#
To take advantage of the optimized upsert that the Snowplow packages offer you need to ensure that certain macros are called from snowplow_utils first before dbt-core. This can be achieved by adding the following to the top level of your dbt_project.yml file:
dispatch:
- macro_namespace: dbt
search_order: ['snowplow_utils', 'dbt']
If you do not do this the package will still work, but the incremental upserts will become more costly over time.
2. Set variables#
The package has some variables that need to be set before it can be run, you should edit these in your dbt_project.yml file. Further customization can be done via the variables listed in the configuration page.
snowplow__conversion_window_start_date: The start date in UTC for the window of conversions to includesnowplow__conversion_window_end_date: The end date in UTC for the window of conversions to includesnowplow__conversion_hosts:url_hoststo processsnowplow__path_transforms: A dictionary of path transforms and their arguments (see Path Transform Options section)
vars:
snowplow_fractribution:
snowplow__conversion_window_start_date: '2022-01-01'
snowplow__conversion_window_end_date: '2023-02-01'
snowplow__conversion_hosts: ['mysite.com']
snowplow__path_transforms: \{'exposure_path' : null}
If you are using Snowflake, you can automatically run the python scripts using Snowpark when running the dbt package. This is done using macros that create and run a stored procedure on Snowpark after the dbt models have completed.
To enable this you need to set some additional variables. For example, to enable this and use the last_touch attribution model:
vars:
snowplow_fractribution:
snowplow__run_python_script_in_snowpark: true
snowplow__attribution_model_for_snowpark: 'last_touch'
Modifying the conversions source
By default the snowplow__conversions_source is your atomic events table. In most cases this is likely to be what you want to use, however you may wish to use the in-built conversions modeling as part of our web package if you have already defined this, by setting snowplow__conversions_source to "\\{\\{ ref('snowplow_web_sessions') }}\\}".
Alternatively, if you are using Redshift/Postgres you may wish to include additional fields from a Self-Describing Event, or an Entity. To do this, you should create a new model in your project, e.g. models/snowplow/snowplow_joined_events_table.sql which should have something like the following content:
For more information about dealing with duplicates and the macro in this code, make sure to see our deduplication docs.
with {{ snowplow_utils.get_sde_or_context('atomic',
'my_custom_context',
"'\{\{ get_lookback_date_limits(\"min\") }}\}'",
"'\{\{ get_lookback_date_limits(\"max\") }}\}'",
'my_prefix')}}
select
events.*,
b.*
from \\{\\{ source('atomic', 'events') }}\\} as events
left join nl_basjes_my_prefix_1 b on
events.event_id = b.my_prefix__id
and events.collector_tstamp = b.my_prefix__tstamp
where
-- use the appropriate partition key to filter on in addition to this, add a bit of a buffer if it is not derived_tstamp
date(derived_tstamp) >= '\{\{ get_lookback_date_limits(\"min\") }}\}'
and date(derived_tstamp) <= '\{\{ get_lookback_date_limits(\"max\") }}\}'
Finally ensure you set the snowplow__conversions_source to "\\{\\{ ref('snowplow_joined_events_table') }}\\}"
3. Configure macros#
While the macro and matching columns/tables use the term channel, it is entirely possible to define this in terms of something else e.g. a campaign.
All the below macros are created with the intention to let users modify them to fit their personal use case. If you wish to change this, copy the macro from the macros folder in the snowplow_fractribution package (at [dbt_project_name]/dbt_packages/snowplow_fractribution/macros/conversion_clause.sql) and add it to the macros folder of your own dbt project where you are free to make any alterations. You will find a detailed guide / illustration with sample code within the individual macros themselves.
To overwrite these macros correctly with those in your project, ensure you prefix the macro name by default__ in the definition e.g.
{% macro default__conversion_value() %}
tr_total/100
{% endmacro %}
conversion_clause macro​
The
conversion_clausemacro specifies how to filter Snowplow events to only succesfful conversion events. How this is filtered will depend on your definition of a conversion. The default is filtering to events wheretr_total > 0, but this could instead filter onevent_name = 'checkout', for example. If you are using the e-commerce model, you will still need to set this for the fractribution code to run (even though all events are conversions in the e-commerce model), in this case change it totransaction_revenue > 0.
conversion_value macro​
The
conversion_valuemacro specifies either a single column or a calculated value that represents the value associated with that conversion. The default istr_total, but revenue or a calculation using revenue and discount_amount from the default e-commerce schema, for example, could similarly be used.
channel_classification macro​
The
channel_classificationmacro is used to perform channel classifications. This can be altered to generate your expected channels if they differ from the channels generated in the default macro. It is highly recommended that you examine and configure this macro when using your own data, as the ROAS calculations and attribution calculations will run against these channel definitions, and the default values will not consider any custom marketing parameters.
channel_spend macro​
The
channel_spendmacro is used to query the spend by channels. It requires a user supplied SQL script to extract the total ad spend by channel.Required output format:
channel: STRING NOT NULLspend: FLOAT64 (or other warehouse equivalent) (Use the same monetary units as conversion revenue, and NULL if unknown.)
4. Run the model#
Execute the following either through your CLI, within dbt Cloud, or within Snowplow Console
dbt run --select snowplow_fractribution
5. Run the python script to generate the final models#
Depending on your setup, please follow the appropriate steps below. All these methods will create the following tables in your warehouse:
snowplow_fractribution_path_summary_with_channelssnowplow_fractribution_report_tablesnowplow_fractribution_channel_attribution
Run on Snowflake using Snowpark
If you enabled this already, the tables will have already been built as part of step 4. If you wish to just re-run the attribution modeling for some reason you can run the following:
dbt run --select snowplow_fractribution_call_snowpark_macros