Skip to main content

Snowplow Identities dbt package

CloudPrivate Managed CloudSelf-HostedAddon

The package source code can be found in the snowplow/dbt-snowplow-identities repository.

The Snowplow Identities dbt package transforms the raw identity entities and merge events produced by Identities into a set of derived tables for identity resolution, identifier lookup, and audit.

The package supports Snowflake and BigQuery.

Check out the Quick Start and configuration pages to get started.

Simple incremental strategy

The package uses a simplified, timestamp-based incremental strategy rather than the session-based incremental logic used by other Snowplow dbt packages.

This means that the standard guidance around package mechanics, custom models, and dbt operations doesn't apply to this package.

Key fields

The Identities models use these field names consistently:

FieldDescription
snowplow_idThe Snowplow ID enriched onto the event by the Identities service in real time. Represents the identity at the time the event occurred. It may later be merged into a different ID.
active_snowplow_idThe current, unified identifier that represents a cluster of identities. When identities merge over time, this field always points to the surviving parent ID in the cluster. One active_snowplow_id maps to many snowplow_id values.
previous_snowplow_idUsed in the ID changes table to denote the snowplow_id that was merged into another ID.

Output models

The package produces six output models.

snowplow_identities_id_changes

A fact table containing a complete history of all changes to Snowplow ID clusters, including both merges and new ID creation. Use this table for auditing the full lifecycle of identity changes.

ColumnDescription
id_change_keySurrogate primary key (hash of snowplow_id + previous_snowplow_id + effective_at)
snowplow_idThe Snowplow ID after this change
previous_snowplow_idThe ID that was merged; NULL for new ID creation
effective_atEvent timestamp; derived_tstamp for creates, merged_at for merges
changed_atProcessing timestamp, recording when dbt materialized this record
change_typeEither created or merged
first_seen_event_idThe first event that triggered this change
first_seen_app_idThe app_id where this change was first observed

snowplow_identities_snowplow_id_mapping

A mapping between any previous snowplow_id and its current active_snowplow_id. The model automatically resolves chains of merges, so you always get the most current unified ID. Only merged snowplow_id values appear in this table — unmerged IDs are their own active_snowplow_id by definition.

ColumnDescription
active_snowplow_idThe current, canonical Snowplow ID
snowplow_idA merged Snowplow ID currently represented by the active_snowplow_id
merged_atTimestamp of when this mapping was last modified
model_tstampTimestamp of the dbt run that produced this record

To look up the active_snowplow_id for records in another table, use a left join with a COALESCE fallback:

sql
select
s.*,
coalesce(m.active_snowplow_id, s.snowplow_id) as active_snowplow_id
from some_other_model s
left join snowplow_identities_snowplow_id_mapping m
on s.snowplow_id = m.snowplow_id

The COALESCE handles the case where a snowplow_id has not been merged — it is still its own active_snowplow_id.

snowplow_identities_id_mapping_scd

A slowly changing dimension (SCD) type 2 table that captures how the canonical Snowplow ID mapping changes over time. This is useful for audit purposes, and reporting that requires the state of the identity graph at a specific point in time.

ColumnDescription
active_snowplow_idThe canonical Snowplow ID for this time period.
snowplow_idA Snowplow ID represented by the active_snowplow_id (including the active one itself).
effective_atTimestamp from which this record is valid.
superseded_atTimestamp at which this record was superseded. NULL for the current record.
is_currentBoolean flag; true when superseded_at is NULL.

To query the state of the identity graph at a specific point in time:

sql
select *
from snowplow_identities_id_mapping_scd
where effective_at <= '2026-01-15'
and (superseded_at > '2026-01-15' or superseded_at is null)

snowplow_identities_identifier_mapping

Contains all external identifiers linked to an active_snowplow_id. The list of identifier types is configurable and should reflect the identifiers you have configured in Console, plus any additional identifiers you want available in the output tables. Identifiers are stored in a normalized format with id_type and id_value columns rather than pivoted into separate columns per type.

Use this table to look up addressable identifiers, such as email, that can be activated in downstream systems.

ColumnDescription
active_snowplow_idThe current, canonical Snowplow ID
id_typeThe type of identifier e.g. email, user_id, domain_userid
id_valueThe actual identifier value, or its SHA-256 hash if hashing is enabled
first_seen_event_idThe first event that contributed this identifier
first_app_idThe app_id of the event where this identifier was first observed
last_app_idThe app_id of the event where this identifier was most recently observed
first_seen_atTimestamp of the event where this identifier was first observed
last_seen_atTimestamp of the event where this identifier was last observed
uuidSurrogate key

snowplow_identities_new_identities

One row per snowplow_id, recording the first and last observed activity for that identity. The snowplow__identifiers variable determines which identifier columns appear in this table.

ColumnDescription
snowplow_idThe Snowplow ID; unique key for this table
created_atTimestamp when this identity was first created, from the identity entity
first_seen_event_idThe first event associated with this identity
first_app_idThe app_id where this identity was first observed
last_app_idThe app_id where this identity was most recently observed
first_derived_tstampderived_tstamp of the first event for this identity
last_derived_tstampderived_tstamp of the most recent event for this identity
(identifier columns)One column per configured identifier (e.g., domain_userid, user_id)

snowplow_identities_merge_events

One row per identity merge event. Each row records the surviving (active) Snowplow ID and the full hierarchy of IDs that were merged into it. Use this table to audit the raw merge operations as they occurred.

ColumnDescription
merge_event_idThe event_id of the identity_merge event — unique key for this table
active_snowplow_idThe surviving Snowplow ID after this merge
collector_tstampTimestamp when the merge event was collected
derived_tstampTimestamp when the merge event occurred
mergedArray containing the full merge hierarchy (all IDs merged into active_snowplow_id)
mergesArray of objects describing each individual merge operation, with snowplow_id, merged_at, and triggering_event_id