Attribution Quickstart
Requirementsโ
In addition to dbt being installed and a web events dataset being available in your database:
- have the Campaign attribution enrichment enabled to generate the marketing campaign fields such as mkt_medium, mkt_source, mkt_term, mkt_content, and mkt_campaign which would be needed for channel and campaign classification to ultimately calculate the appropriate attribution
- have the Referer parser enrichment enabled to extract attribution data from referer URLs
- have
snowplow_unified_views
table available as a path (touch points) source (generated by the snowplow_unified package) - have
snowplow_unified_conversions
table available as a conversions source including the revenue (generated by the optional conversions module of the snowplow_unified package). Make sure you use v.0.4.0 and above (from unified 0.2.0 onwards) - have a
spend source
table of your choice (optional) available which will contain your marketing spend data by channel and or campaign with a timestamp field which denotes the period. This is needed for the ROAS calculation for the drop and recompute report table
Installationโ
Make sure to create a new dbt project and import this package via the packages.yml
as recommended by dbt, or add to an existing top level project. Do not fork the packages themselves.
Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages. If you are using multiple packages you may need to up/downgrade a specific package to ensure compatibility.
packages:
- package: snowplow/snowplow_attribution
version: 0.4.0
Make sure to run the dbt deps
command after updating your packages.yml
to ensure you have the specified version of each package installed in your project.
Setupโ
1. Set variablesโ
Marketing Attribution Analysis can be done in many different ways depending on your business and needs. Our aim has always been to make it flexible for users to make changes from selecting a different data source to reshaping how the customer journey to conversion will be adjusted for the analysis. We suggest taking the time initially, before running the model to understand what setup you would need in order to avoid having to do a full-refresh of the package.
Each of such variables will be explained below, you should edit these in your dbt_project.yml
file. Further customization can be done via the variables listed in the configuration page.
The setup also assumes you would like to run the package in an incremental fashion. To run it on an ad-hoc basis check out our custom section.
snowplow__attribution_start_date
: The start date in UTC for the package to run. Please bear in mind it will process all paths from the path_source (snowplow_unified_views, unless changed) since this date until the timestamp of the last conversion event in the conversions_source for the first run of the package
Changes to make if not using the default snowplow_unified
source tables (for path or conversions):
snowplow__conversion_path_source
: The source table for the path information. By default, it will take values from{{target.schema ~ '_derived'}}.snowplow_unified_views
snowplow__conversions_source
: The source table for the conversions & revenue information. By default it will take values from{{target.schema ~ '_derived'}}.snowplow_unified_conversions
snowplow__conversion_hosts
:url_hosts
to process, if left empty it will include all
Optional changes:
snowplow__conversion_clause
: A user defined sql script to filter on specific conversions if needed. Defaulted to 'cv_value > 0'snowplow__path_transforms
: A dictionary of path transforms and their arguments (see Transform Paths section which includes other relevant variable changes including changing thesnowplow_path_lookback_days
andsnowplow_path_lookback_steps
variables.)snowplow__spend_source
: The optional source table for the spend information used to calculate ROAS in thesnowplow_attribution_overview
. By default, it is empty and will not get processedsnowplow__conversion_window_days
: By default the package processes the last complete n number of days (calculated from the last processed path in the path source) to dynamically define the conversion window for each incremental run.- for a full list of variables you can use and their definitions, check out the Configurations page.
vars:
snowplow_attribution:
snowplow__attribution_start_date: '2023-01-01'
snowplow__conversion_path_source: 'my_schema_derived.snowplow_unified_views'
snowplow__conversions_source: 'my_schema_derived.snowplow_unified_conversions'
snowplow__conversion_hosts: ['mysite.com']
snowplow__path_transforms: {'exposure_path' : null}
Decide on running optional reporting models
By default, the package creates a View used for BI reporting called snowplow__enable_attribution_overview
. In case you would like to use the Attribution Modeling Data App (coming soon!), this is not required to be enabled as app will take care of querying this data for you directly.
There is also a snowplow__enable_paths_to_non_conversion
variable to produce a drop and recompute table for more advanced analysis which is disabled by default. The data app does not yet support recreating the same data, therefore, it needs to be enabled by setting the default variable to true
.
2. Configure macrosโ
All the below macros are created with the intention to let users modify them to fit their personal use case. If you wish to change this, copy the macro from the macros folder in the snowplow_attribution
package (e.g. at [dbt_project_name]/dbt_packages/snowplow_attribution/macros/paths_to_conversion.sql
) and add it to the macros folder of your own dbt project where you are free to make any alterations. You will find a detailed guide / illustration with sample code within the individual macros themselves.
To overwrite these macros correctly with those in your project, ensure you prefix the macro name by default__
in the definition e.g.
{% macro default__channel_classification() %}
case when (...)
{% endmacro %}
channel_classification
macroโ
The
channel_classification
macro is used to classify each marketing touchpoint from your path source deciding which marketing channel it was driven by. Be default it expects an already classified field calleddefault_channel_group
. In case the Attribution package is used together with the Unified Digital package, it is advisable to classify your pageviews and sessions upstream using thechannel_group_guery()
macro. For an in-depth guide on how to achive this check the channel group query macro documentation.
attribution_overview
macroโ
Defines the sql for the view called attribution overview which provides the main report calculating the return on advertising spend (ROAS). In order to do that you would need a marketing spend table as an additional source which will contain the spend information by channel and or campaign with a timestamp to filter on the period. If you are happy with the logic of the macro, you can just define your spend table in the
snowplow__spend_source
variable and let it run.
paths_to_conversion
macroโ
Macro to allow flexibility for users to modify the definition of the paths_to_conversion incremental table. By default the incremental table uses the
snowplow_unified_views
andsnowplow_unified_conversions
source tables but this macro allows for flexibility around that. You can also restrict which conversion events to take
3. Good-to-knowโ
Please consider the below before running the package as it might help you save time in the long run.
Configuring the spend_source
To make setting up your attribution modeling package easier, we provided a snowplow__spend_source
variable which you can use to directly reference a table or view where you store information about your marketing spends. We suggest you just create a view on top of your existing table, making sure you have the following fields in place:
- a timestamp field called
spend_tstamp
- a varchar/text field called
channel
- a varchar/text field called
channel
- a numeric field called
spend
Your channel and spend data will be summed separately in the attribution_overview
, so it should not matter how you structure your data, it is fine to have channel twice for the same period even but do keep in mind that the model will not apply any deduplication.
Running both Unified and Attribution dbt packages from the same project
snowplow__conversion_path_source: "{{ ref('snowplow_unified_views') }}"
snowplow__conversions_source: "{{ ref('snowplow_unified_conversions') }}"
snowplow__user_mapping_source: "{{ ref('snowplow_unified_user_mapping') }}"
Keep in mind that the manifest tables are still not linked, therefore both projects' statefulness is dictated by their own set of tables and values. For instance, you might want to keep in mind how you use the time specific variables that dictate the scope of processing the data:
vars:
snowplow_unified:
snowplow__start_date: '2024-06-03'
snowplow_attribution:
snowplow__attribution_start_date: '2024-09-03'
The snowplow_start_date variable used for the unified package dictates since when to process web and mobile events into views, session, users and conversions tables. You may have been running the package for a long time and only lately you decided to carry out marketing attribution analysis, and you are only interested in the most recent dataset. In this case it is perfectly fine to set a much later date as the start date.
Keeping cohesion between the packages
There is, however, no dependency between the packages when it comes to syncing them. You may want to run your Unified package once an hour for the more business critical tables to get updated, while you can process the Attribution package once a month, if that's what makes sense for your business. There is no concept of backfill limits or partials reprocessing for the Attribution package, regardless of the last run, the package will reprocess all the data since the last processed conversion event that became available in the source tables.
4. Run the modelโ
Execute the following either through your CLI, within dbt Cloud, or within Snowplow BDP
dbt run --select snowplow_attribution