The snowplow-web dbt package provides a means to run the standard web model via dbt. It processes events incrementally and is designed in a modular manner, allowing you to easily integrate your own custom SQL into the incremental framework provided by the package.
- Web Page context enabled (enabled by default in v3+).
- Page view events implemented.
- dbt must be installed.
- Redshift, Snowflake, BigQuery and PostgreSQL
Installing the package
Add the snowplow-web package to your
packages.yml file. For more information refer to dbt's package hub.
1 - Check source data
This package will by default assume your Snowplow events data is contained in the
atomic schema of your target.database. In order to change this, please add the following to your
2 - Enabled desired contexts
The web package has the option to join in data from the following 3 Snowplow enrichments:
By default these are all disabled in the web package. Assuming you have the enrichments turned on in your Snowplow pipeline, to enable the contexts within the package please add the following to your
3 - Filter your data set
You can specify both
start_date at which to start processing events and the
app_id's to filter for. By default the
start_date is set to
2020-01-01 and all
app_id's are selected. To change this please add the following to your
4 - Verify page ping variables
min_visit_length (default 5) and
heartbeat (default 10) differs from the defaults provided in this package, you can override by adding to your
snowplow__min_visit_length: 5 # Default value
snowplow__heartbeat: 10 # Default value
There are many additional configurations you can make to the model such as changing destination schemas, disabling modules and removing bot traffic.
For more details please refer to the snowplow-web package documentation.
The Snowplow web model is designed to be run as a whole, which ensures all incremental tables are kept in sync. As such, we suggest running the model using:
dbt run --models snowplow_web tag:snowplow_web_incremental
We strongly advise reading the operation section on the full doc site for more information on operations such as full-refreshes of the web model and backfilling data, since operations such as full refreshes deviate slightly from the native dbt implementation.
A full suite of tests are included within the package to ensure data quality. For more information on these and suggested implementations when running in orchestration please refer to the doc site.
Customizing the web model
The package is designed in a modular manner, allowing you to easily add in your own custom SQL as well incorporate data from events outside of page views and page pings.