Configure the Unified Digital data model

This page helps you configure the Snowplow Unified Digital dbt package. You can customize variables, generate configuration code, and set output schemas.

dbt unified version

Package configuration variables

This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your dbt_project.yml file.

Variable name prefix

All variables in Snowplow packages start with snowplow__ but we have removed these in the below tables for brevity.

tip

When modifying the session/user_identifiers or using session/user_sql in the unified package these will overwrite the domain_sessionid and domain_userid fields in tables, rather than being session/user_identifier as in the core utils implementation. This is for historic reasons to mitigate breaking changes. Original values for these fields can be found in original_domain_session/userid in each table.

Warehouse and tracker

Operation and logic

Entities (contexts), filters, and logs

Warehouse-specific

Config generator

You can use the below inputs to generate the code that you need to place into your dbt_project.yml file to configure the package as you require. Any values not specified will use their default values from the package.

Project Variables:

yaml
vars:
  snowplow_unified: null

Output schemas

By default all scratch/staging tables will be created in the <target.schema>_scratch schema, the derived tables, will be created in <target.schema>_derived and all manifest tables in <target.schema>_snowplow_manifest. Some of these schemas are only used by specific packages, ensure you add the correct configurations for each packages you are using. To change, please add the following to your dbt_project.yml file:

tip

If you want to use just your connection schema with no suffixes, set the +schema: values to null

Manifest Schema Suffix

Scratch Schema Suffix

Derived Schema Suffix

Seed Schema Suffix

yaml
models:
  snowplow_unified:
    base:
      manifest:
        +schema: my_manifest_schema
      scratch:
        +schema: my_scratch_schema
    sessions:
      +schema: my_derived_schema
      scratch:
        +schema: my_scratch_schema
    user_mapping:
      +schema: my_derived_schema
    users:
      +schema: my_derived_schema
      scratch:
        +schema: my_scratch_schema
    views:
      +schema: my_derived_schema
      scratch:
        +schema: my_scratch_schema
seeds:
  snowplow_unified:
    +schema: my_seed_schema

Package configuration variables​

Warehouse and tracker​

Operation and logic​

Entities (contexts), filters, and logs​

Warehouse-specific​

Config generator​

Schema (dataset) that contains your atomic events

Database that contains your atomic events

Target name of your development environment as defined in your `profiles.yml` file

The name of the table that contains your atomic events

Page ping heartbeat time as defined in your tracker configuration

Minimum visit length as defined in your tracker configuration

The users module requires data from the derived sessions table. If you choose to disable the standard sessions table in favor of your own custom table, set this to reference your new table e.g. {{ ref("snowplow_unified_sessions_custom") }}

Grant Select List

> Click the plus sign to add a new entry

Enable granting usage on schemas

The maximum numbers of days of new data to be processed since the latest event processed

Conversion Definition

> Click the plus sign to add a new entry

The number of days to use for web vital measurements (if enabled)

The percentile that the web vitals measurements that are produced for all page views (if enabled)

The maximum allowed number of days between the event creation and it being sent to the collector

The number of hours to look before the latest event processed - to account for late arriving data, which comes out of order

The maximum allowed session length in days. For a session exceeding this length, all events after this limit will stop being processed

Number of days to limit scan on `snowplow_unified_base_sessions_lifecycle_manifest` manifest

The date to start processing events from in the package on first run or a full refresh, based on `collector_tstamp`

Number of days to look back over the incremental derived tables during the upsert

Session Identifiers

> Click the plus sign to add a new entry

User Identifiers

> Click the plus sign to add a new entry

Custom SQL for your events this run table.

Use refr fields when mkt fields are null for default channel group

Use this field to enable initial checks when changing the package configuration to ensure quick fails if the configuration is incorrect

Use this field to bypass optimization which avoids long scans due to late arriving data for trackers which do not send dvce_created or sent tstamps.

Use this field to accept the Snowplow user license.

App IDs

> Click the plus sign to add a new entry

Page View Passthroughs

> Click the plus sign to add a new entry

Session Passthroughs

> Click the plus sign to add a new entry

User First Passthroughs

> Click the plus sign to add a new entry

User Last Passthroughs

> Click the plus sign to add a new entry

User Conversion Passthroughs

> Click the plus sign to add a new entry

View Aggregations

> Click the plus sign to add a new entry

Session Aggregations

> Click the plus sign to add a new entry

User First Aggregations

> Click the plus sign to add a new entry

The catalogue your atomic events table is in

(Redshift) Entities or SDEs

> Click the plus sign to add a new entry

Enable running the models on an iceberg table using lakeloader data

Project Variables:

Output schemas​

Want to see a custom demo?

Package configuration variables

Warehouse and tracker

Operation and logic

Entities (contexts), filters, and logs

Warehouse-specific

Config generator

Output schemas