Skip to main content

Attribution Quickstart

Requirementsโ€‹

In addition to dbt being installed and a web events dataset being available in your database:

  • have the Campaign attribution enrichment enabled to generate the marketing campaign fields such as mkt_medium, mkt_source, mkt_term, mkt_content, and mkt_campaign which would be needed for channel and campaign classification to ultimately calculate the appropriate attribution
  • have the Referer parser enrichment enabled to extract attribution data from referer URLs
  • have snowplow_unified_views table available as a path (touch points) source (generated by the snowplow_unified package)
  • have snowplow_unified_conversions table available as a conversions source including the revenue (generated by the optional conversions module of the snowplow_unified package). Make sure you use v.0.4.0 and above (from unified 0.2.0 onwards)
  • have a spend source table of your choice (optional) available which will contain your marketing spend data by channel and or campaign with a timestamp field which denotes the period. This is needed for the ROAS calculation for the drop and recompute report table

Installationโ€‹

Make sure to create a new dbt project and import this package via the packages.yml as recommended by dbt, or add to an existing top level project. Do not fork the packages themselves.

Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages. If you are using multiple packages you may need to up/downgrade a specific package to ensure compatibility.

packages.yml

packages:
- package: snowplow/snowplow_attribution
version: 0.4.0
note

Make sure to run the dbt deps command after updating your packages.yml to ensure you have the specified version of each package installed in your project.

Setupโ€‹

1. Set variablesโ€‹

Marketing Attribution Analysis can be done in many different ways depending on your business and needs. Our aim has always been to make it flexible for users to make changes from selecting a different data source to reshaping how the customer journey to conversion will be adjusted for the analysis. We suggest taking the time initially, before running the model to understand what setup you would need in order to avoid having to do a full-refresh of the package.

Each of such variables will be explained below, you should edit these in your dbt_project.yml file. Further customization can be done via the variables listed in the configuration page.

The setup also assumes you would like to run the package in an incremental fashion. To run it on an ad-hoc basis check out our custom section.

  • snowplow__attribution_start_date: The start date in UTC for the package to run. Please bear in mind it will process all paths from the path_source (snowplow_unified_views, unless changed) since this date until the timestamp of the last conversion event in the conversions_source for the first run of the package

Changes to make if not using the default snowplow_unified source tables (for path or conversions):

  • snowplow__conversion_path_source: The source table for the path information. By default, it will take values from {{target.schema ~ '_derived'}}.snowplow_unified_views
  • snowplow__conversions_source: The source table for the conversions & revenue information. By default it will take values from {{target.schema ~ '_derived'}}.snowplow_unified_conversions
  • snowplow__conversion_hosts: url_hosts to process, if left empty it will include all

Optional changes:

  • snowplow__conversion_clause: A user defined sql script to filter on specific conversions if needed. Defaulted to 'cv_value > 0'
  • snowplow__path_transforms: A dictionary of path transforms and their arguments (see Transform Paths section which includes other relevant variable changes including changing the snowplow_path_lookback_days and snowplow_path_lookback_steps variables.)
  • snowplow__spend_source: The optional source table for the spend information used to calculate ROAS in the snowplow_attribution_overview. By default, it is empty and will not get processed
  • snowplow__conversion_window_days: By default the package processes the last complete n number of days (calculated from the last processed path in the path source) to dynamically define the conversion window for each incremental run.
  • for a full list of variables you can use and their definitions, check out the Configurations page.
dbt_project.yml
vars:
snowplow_attribution:
snowplow__attribution_start_date: '2023-01-01'
snowplow__conversion_path_source: 'my_schema_derived.snowplow_unified_views'
snowplow__conversions_source: 'my_schema_derived.snowplow_unified_conversions'
snowplow__conversion_hosts: ['mysite.com']
snowplow__path_transforms: {'exposure_path' : null}
Decide on running optional reporting models

By default, the package creates a View used for BI reporting called snowplow__enable_attribution_overview. In case you would like to use the Attribution Modeling Data App (coming soon!), this is not required to be enabled as app will take care of querying this data for you directly.

There is also a snowplow__enable_paths_to_non_conversion variable to produce a drop and recompute table for more advanced analysis which is disabled by default. The data app does not yet support recreating the same data, therefore, it needs to be enabled by setting the default variable to true.

2. Configure macrosโ€‹

All the below macros are created with the intention to let users modify them to fit their personal use case. If you wish to change this, copy the macro from the macros folder in the snowplow_attribution package (e.g. at [dbt_project_name]/dbt_packages/snowplow_attribution/macros/paths_to_conversion.sql) and add it to the macros folder of your own dbt project where you are free to make any alterations. You will find a detailed guide / illustration with sample code within the individual macros themselves.

tip

To overwrite these macros correctly with those in your project, ensure you prefix the macro name by default__ in the definition e.g.

{% macro default__channel_classification() %}
case when (...)
{% endmacro %}

channel_classification macroโ€‹

The channel_classification macro is used to classify each marketing touchpoint from your path source deciding which marketing channel it was driven by. Be default it expects an already classified field called default_channel_group. In case the Attribution package is used together with the Unified Digital package, it is advisable to classify your pageviews and sessions upstream using the channel_group_guery() macro. For an in-depth guide on how to achive this check the channel group query macro documentation.

attribution_overview macroโ€‹

Defines the sql for the view called attribution overview which provides the main report calculating the return on advertising spend (ROAS). In order to do that you would need a marketing spend table as an additional source which will contain the spend information by channel and or campaign with a timestamp to filter on the period. If you are happy with the logic of the macro, you can just define your spend table in the snowplow__spend_source variable and let it run.

paths_to_conversion macroโ€‹

Macro to allow flexibility for users to modify the definition of the paths_to_conversion incremental table. By default the incremental table uses the snowplow_unified_views and snowplow_unified_conversions source tables but this macro allows for flexibility around that. You can also restrict which conversion events to take

3. Good-to-knowโ€‹

Please consider the below before running the package as it might help you save time in the long run.

Configuring the spend_source

To make setting up your attribution modeling package easier, we provided a snowplow__spend_source variable which you can use to directly reference a table or view where you store information about your marketing spends. We suggest you just create a view on top of your existing table, making sure you have the following fields in place:

  • a timestamp field called spend_tstamp
  • a varchar/text field called channel
  • a varchar/text field called channel
  • a numeric field called spend

Your channel and spend data will be summed separately in the attribution_overview, so it should not matter how you structure your data, it is fine to have channel twice for the same period even but do keep in mind that the model will not apply any deduplication.

Running both Unified and Attribution dbt packages from the same project
Although auto-deployed packages managed via Console run in separate projects, for others there may be use cases when it is more practical to run both the Unified Digital and Attribution dbt packages from the same project. We purposefully did not directly link the two packages to allow for flexibility but there is a way to run them from the same project. When specifying the sources just make sure you change the default source references to: ref('') instead of hard coding the schema.table_name for these variables in order for dbt to establish a proper lineage graph to process things in order:
snowplow__conversion_path_source: "{{ ref('snowplow_unified_views') }}"
snowplow__conversions_source: "{{ ref('snowplow_unified_conversions') }}"
snowplow__user_mapping_source: "{{ ref('snowplow_unified_user_mapping') }}"

Keep in mind that the manifest tables are still not linked, therefore both projects' statefulness is dictated by their own set of tables and values. For instance, you might want to keep in mind how you use the time specific variables that dictate the scope of processing the data:

vars:
snowplow_unified:
snowplow__start_date: '2024-06-03'
snowplow_attribution:
snowplow__attribution_start_date: '2024-09-03'

The snowplow_start_date variable used for the unified package dictates since when to process web and mobile events into views, session, users and conversions tables. You may have been running the package for a long time and only lately you decided to carry out marketing attribution analysis, and you are only interested in the most recent dataset. In this case it is perfectly fine to set a much later date as the start date.

Keeping cohesion between the packages
In case there are data issues (e.g. you may want to reprocess the analysis based on the latest user stitching status) the attribution package produced data models can be reprocessed fully without the need to refresh the Unified package but not the other way round as it is used as one of the primary sources to rely on when creating the incremental tables. If the underlying data source changed that impacted the views or conversions source data, you would need to refresh the attribution package as well.

There is, however, no dependency between the packages when it comes to syncing them. You may want to run your Unified package once an hour for the more business critical tables to get updated, while you can process the Attribution package once a month, if that's what makes sense for your business. There is no concept of backfill limits or partials reprocessing for the Attribution package, regardless of the last run, the package will reprocess all the data since the last processed conversion event that became available in the source tables.

4. Run the modelโ€‹

Execute the following either through your CLI, within dbt Cloud, or within Snowplow BDP

dbt run --select snowplow_attribution