Skip to main content

Media Player Quickstart

Unleash the power of your behavioral data
If you're looking for a more guided approach that contains information about tracking and modeling your data, check out our Video and Media Analytics Accelerator!
๐Ÿ‘‰ Take me there! ๐Ÿ‘ˆ
Release Versionย Actively Maintainedย Snowplow Personal and Academic License

Requirementsโ€‹

In addition to dbt being installed and a web or mobile events dataset being available in your database:

The model is compatible with all versions of our media tracking APIs. These have evolved over time and may track the media events using two sets of event and contexts schemas:

  1. Version 1 media schemas:

  2. Version 2 media schemas (preferred):

Installationโ€‹

Make sure to create a new dbt project and import this package via the packages.yml as recommended by dbt, or add to an existing top level project. Do not fork the packages themselves.

Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages. If you are using multiple packages you may need to up/downgrade a specific package to ensure compatibility.

packages.yml

packages:
- package: snowplow/snowplow_media_player
version: 0.9.2
note

Make sure to run the dbt deps command after updating your packages.yml to ensure you have the specified version of each package installed in your project.

Setupโ€‹

1. Override the dispatch order in your projectโ€‹

To take advantage of the optimized upsert that the Snowplow packages offer you need to ensure that certain macros are called from snowplow_utils first before dbt-core. This can be achieved by adding the following to the top level of your dbt_project.yml file:

dbt_project.yml
dispatch:
- macro_namespace: dbt
search_order: ['snowplow_utils', 'dbt']

If you do not do this the package will still work, but the incremental upserts will become more costly over time.

2. Adding the selectors.yml fileโ€‹

Within the packages we have provided a suite of suggested selectors to run and test the models within the package together with the media player model. This leverages dbt's selector flag. You can find out more about each selector in the YAML Selectors section.

These are defined in the selectors.yml file (source) within the package, however in order to use these selections you will need to copy this file into your own dbt project directory. This is a top-level file and therefore should sit alongside your dbt_project.yml file. If you are using multiple packages in your project you will need to combine the contents of these into a single file.

3. Check source dataโ€‹

This package will by default assume your Snowplow events data is contained in the atomic schema of your target.database, in the table labeled events. In order to change this, please add the following to your dbt_project.yml file:

dbt_project.yml
vars:
snowplow_media_player:
snowplow__atomic_schema: schema_with_snowplow_events
snowplow__database: database_with_snowplow_events
snowplow__events_table: table_of_snowplow_events
Databricks only

Please note that your target.database is NULL if using Databricks. In Databricks, schemas and databases are used interchangeably and in the dbt implementation of Databricks therefore we always use the schema value, so adjust your snowplow__atomic_schema value if you need to.

4. Filter your data setโ€‹

You can specify both start_date at which to start processing events, the app_id's to filter for, and the event_name value to filter on. By default the start_date is set to 2020-01-01, all app_id's are selected, and all events with the com.snowplowanalytics.snowplow.media or the media_player_event event name are being surfaced. To change this please add/modify the following in your dbt_project.yml file:

dbt_project.yml
...
vars:
snowplow_media_player:
snowplow__start_date: 'yyyy-mm-dd'
snowplow__app_id: ['my_app_1','my_app_2']
snowplow__media_event_names: ['media_player_event', 'my_custom_media_event']

5. Additional vendor specific configurationโ€‹

BigQuery Only

Verify which column your events table is partitioned on. It will likely be partitioned on collector_tstamp or derived_tstamp. If it is partitioned on collector_tstamp you should set snowplow__derived_tstamp_partitioned to false. This will ensure only the collector_tstamp column is used for partition pruning when querying the events table:

dbt_project.yml
...
vars:
snowplow_media_player:
snowplow__derived_tstamp_partitioned: false

6. Enable desired contexts and configurationโ€‹

The media player package creates tables that depend on the existence of certain context entities that are tracked by the media plugins in the Snowplow trackers. Depending on which media plugin or tracking implementation you, you will need to enable the relevant contexts in your dbt_project.yml.

6a. Using trackers with support for the version 2 media schemasโ€‹

This option applied in case you are tracking media events with either the Snowplow Media plugin, Vimeo plugin for JavaScript tracker, or the iOS/Android trackers.

dbt_project.yml
...
vars:
snowplow_media_player:
# don't use the older version 1 of the media player context schema
snowplow__enable_media_player_v1: false
# use the version 2 of the media player context schema
snowplow__enable_media_player_v2: true
# use the media session context schema (unless disabled on the tracker)
snowplow__enable_media_session: true
# depending on whether you track ads, ad breaks and progress within ads:
snowplow__enable_media_ad: true
snowplow__enable_media_ad_break: true
snowplow__enable_ad_quartile_event: true
# depending on whether you track events from web or mobile apps:
snowplow__enable_web_events: true
snowplow__enable_mobile_events: true

6b. Using the HTML5 media tracking plugin for JavaScript trackerโ€‹

dbt_project.yml
...
vars:
snowplow_media_player:
# use the version 1 of the media player context schema used by the YouTube plugin
snowplow__enable_media_player_v1: true
# don't use the version 2 of the media player context schema as it is not tracked by the plugin
snowplow__enable_media_player_v2: false
# don't use the media session context schema as it is not tracked by the plugin
snowplow__enable_media_session: false
# set to true if the HTML5 media element context schema is enabled
snowplow__enable_whatwg_media: true
# set to true if the HTML5 video element context schema is enabled
snowplow__enable_whatwg_video: true

6c. Using the YouTube tracking plugin for JavaScript trackerโ€‹

dbt_project.yml
...
vars:
snowplow_media_player:
# use the version 1 of the media player context schema used by the YouTube plugin
snowplow__enable_media_player_v1: true
# don't use the version 2 of the media player context schema as it is not tracked by the plugin
snowplow__enable_media_player_v2: false
# don't use the media session context schema as it is not tracked by the plugin
snowplow__enable_media_session: false
# set to true if the YouTube context schema is enabled
snowplow__enable_youtube: true

For other variables you can configure please see the model configuration section.

7. Optimize your projectโ€‹

There are ways how you can deal with high volume optimizations at a later stage, if needed, but you can do a lot upfront by selecting carefully which variable to use for snowplow__session_timestamp, which helps identify the timestamp column used for sessionization. This timestamp column should ideally be set to the column your event table is partitioned on. It is defaulted to collector_tstamp but depending on your loader it can be the load_tstamp as the sensible value to use:

dbt_project.yml
vars:
snowplow_media_player:
snowplow__session_timestamp: 'load_tstamp'

8. Verify your variables using our Config guides (Optional)โ€‹

If you are unsure whether the default values set are good enough in your case or you would already like to maximize the potential of your models, you can dive deeper into the meaning behind our variables on our Config page. It includes a Config Generator to help you create all your variable configurations, if necessary.

9. Run your modelโ€‹

You can now run your models for the first time by running the below command (see the operation page for more information on operation of the package):

dbt run --selector snowplow_media_player