Skip to main content

Attribution Quickstart

Requirementsโ€‹

In addition to dbt being installed and a web events dataset being available in your database:

  • have snowplow_unified_views table available as a path (touch points) source (generated by the snowplow_unified package)
  • have snowplow_unified_conversions table available as a conversions source including the revenue (generated by the optional conversions module of the snowplow_unified package). Make sure you use v.0.4.0 and above (from unified 0.2.0 onwards)
  • optionally have a spend source table of your choice available which will contain your marketing spend data by channel and or campaign with a timestamp field which denotes the period. This is needed for the ROAS calculation for the drop and recompute report table

Installationโ€‹

Make sure to create a new dbt project and import this package via the packages.yml as recommended by dbt, or add to an existing top level project. Do not fork the packages themselves.

Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages. If you are using multiple packages you may need to up/downgrade a specific package to ensure compatibility.

packages.yml

packages:
- package: snowplow/snowplow_attribution
version: 0.2.0
note

Make sure to run the dbt deps command after updating your packages.yml to ensure you have the specified version of each package installed in your project.

Setupโ€‹

1. Set variablesโ€‹

Marketing Attribution Analysis can be done in many different ways depending on your business and needs. Our aim has always been to make it flexible for users to make changes from selecting a different data source to reshaping how the customer journey to conversion will be adjusted for the analysis. We suggest taking the time initially, before running the model to understand what setup you would need in order to avoid having to do a full-refresh of the package.

Each of such variables will be explained below, you should edit these in your dbt_project.yml file. Further customization can be done via the variables listed in the configuration page.

  • snowplow__attribution_start_date: The start date in UTC for the package to run. Please bear in mind it will process all paths from the path_source (snowplow_unified_views, unless changed) since this date until the timestamp of the last conversion event in the conversions_source for the first run of the package

Changes to make if not using the default snowplow_unified source tables (for path or conversions):

  • snowplow__conversion_path_source: The source table for the path information. By default it will take values from {{target.schema ~ '_derived'}}.snowplow_unified_views
  • snowplow__conversions_source: The source table for the conversions & revenue information. By default it will take values from {{target.schema ~ '_derived'}}.snowplow_unified_conversions
  • snowplow__conversion_hosts: url_hosts to process, if left empty it will include all

Optional changes:

  • snowplow__conversion_clause: A user defined sql script to filter on specific conversions if needed. Defaulted to 'cv_value > 0'
  • snowplow__path_transforms: A dictionary of path transforms and their arguments (see Transform Paths section)
  • snowplow__spend_source: The optional source table for the spend information used to calculate ROAS in the snowplow_attribution_overview. By default it is empty and will not get processed
dbt_project.yml
vars:
snowplow_attribution:
snowplow__attribution_start_date: '2023-01-01'
snowplow__conversion_path_source: 'my_schema_derived.snowplow_unified_views'
snowplow__conversions_source: 'my_schema_derived.snowplow_unified_conversions'
snowplow__conversion_hosts: ['mysite.com']
snowplow__path_transforms: {'exposure_path' : null}
Decide on running optional reporting models

By default the package creates a View used for BI reporting called snowplow__enable_attribution_overview. In case you would like to use the Attribution Modeling Data App (coming soon!), this is not required to be enabled as app will take care of querying this data for you directly.

There is also a snowplow__enable_paths_to_non_conversion variable to produce a drop and recompute table for more advanced analysis which is disabled by default. The data app does not yet support recreating the same data, therefore, it needs to be enabled by setting the default variable to true.

2. Configure macrosโ€‹

All the below macros are created with the intention to let users modify them to fit their personal use case. If you wish to change this, copy the macro from the macros folder in the snowplow_attribution package (e.g. at [dbt_project_name]/dbt_packages/snowplow_attribution/macros/paths_to_conversion.sql) and add it to the macros folder of your own dbt project where you are free to make any alterations. You will find a detailed guide / illustration with sample code within the individual macros themselves.

tip

To overwrite these macros correctly with those in your project, ensure you prefix the macro name by default__ in the definition e.g.

{% macro default__channel_classification() %}
case when (...)
{% endmacro %}

channel_classification macroโ€‹

The channel_classification macro is used to perform channel classifications. We encourage users to define this at the upstream source level already, if possible. If you are using snowplow_unified_views or sessions as a base for the path source you can just leave this untouched, and the Attribution package will take the default_channel_group field as is. Alternatively, you can overwrite the logic based on the fields you may have in your source table. It is highly recommended that you examine and configure this or the upstream macro when using your own data, as the ROAS calculations and attribution calculations will run against these channel definitions, and the default values will not consider any custom marketing parameters.

attribution_overview macroโ€‹

Defines the sql for the view called attribution overview which provides the main report calculating the return on advertising spend (ROAS). In order to do that you would need a marketing spend table as an additional source which will contain the spend information by channel and or campaign with a timestamp to filter on the period. If you are happy with the logic of the macro, you can just define your spend table in the snowplow__spend_source variable and let it run.

paths_to_conversion macroโ€‹

Macro to allow flexibility for users to modify the definition of the paths_to_conversion incremental table. By default the incremental table uses the snowplow_unified_views and snowplow_unified_conversions source tables but this macro allows for flexibility around that. You can also restrict which conversion events to take

3. Good-to-knowโ€‹

Please consider the below before running the package as it might help you save time in the long run.

Configuring the spend_source

To make setting up your attribution modeling package easier, we provided a snowplow__spend_source variable which you can use to directly reference a table or view where you store information about your marketing spends. We suggest you just create a view on top of your existing table, making sure you have the following fields in place:

  • a timestamp field called spend_tstamp
  • a varchar/text field callled channel
  • a varchar/text field callled channel
  • a numeric field called spend

Your channel and spend data will be summed separately in the attribution_overview so it should not matter how you structure your data, it is fine to have channel twice for the same period even but do keep in mind that the model will not apply any deduplication.

Running both Unified and Attribution dbt packages from the same project
Although auto-deployed packages managed via Console run in separate projects, for others there may be use cases when it is more practical to run both the [Unified Digital](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-unified-data-model/) and Attribution dbt packages from the same project. We purposefully did not directly link the two packages and this method of running is non-standard but there is a way to make it work. When specifying the sources just make sure you change the default source references to: ref('') instead of hard coding the schema.table_name for these variables:
snowplow__conversion_path_source: "{{ ref(snowplow_unified_views') }}"
snowplow__conversions_source: "{{ ref('snowplow_unified_conversions' }}"

Keep in mind that the manifest tables are still not linked, therefore both projects' statefulness is dictated by their own set of tables and values.

4. Run the modelโ€‹

Execute the following either through your CLI, within dbt Cloud, or within Snowplow BDP

dbt run --select snowplow_attribution
Was this page helpful?