Skip to main content

Snowplow Optimized Materialization

tip

This functionality requires dispatching to our macros over dbt core, see how to do this on the dispatch setup page.

Our packages make use of the standard dbt incremental materialization with an optimization applied for incremental models. The advantage is that it limits table scan on the target table when updating/inserting based on the new data. This improves performance and reduces cost. We do this by overriding the macro that generates the sql for the merge and insert_delete incremental methods.

All other features of the incremental materialization are supported including incremental_predicates and on_schema_change. The code for the overridden macro can be found here .

Usageโ€‹

Controlling the buffer sizeโ€‹

We calculate the upper and lower limit from the source table (usually a this run table) before adding a buffer to the lower limit to cover late arriving data or other issues with specific timestamps. This buffer is controlled by the snowplow__upsert_lookback_days variable and usually has a default of 30.

To disable this buffer entirely, you can either set snowplow__upsert_lookback_days to 0 or you can disable_upsert_lookback to true in your model config if you want to do this for a specific model. Note this is a model config, not a variable.

Adding to your own modelsโ€‹

All our incremental models have this functionality enabled by default, but if you wish to use it in a custom model you you need to ensure a unique_key and upsert_date_key are provided in the model config, and that snowplow_optimize=true in the config as well, e.g.

my_model.sql
{{
config(
materialized='incremental',
unique_key='page_view_id',
upsert_date_key='start_tstamp',
snowplow_optimize = true,
...
)
}}

Disabling the optimizationโ€‹

To disable the optimized upsert entirely and use the default incremental materialization, set the snowplow_optimize to false in your model config. Note this is a model config, not a variable.