Media Player Quickstart
๐ Take me there! ๐
Requirementsโ
In addition to dbt being installed and a web or mobile events dataset being available in your database:
- A dataset of media events must be available in the database. You can collect media events using our plugins for the JavaScript tracker or using the iOS and Android trackers: Media plugin, HTML5 media player plugin, YouTube plugin, Vimeo plugin or the iOS and Android media APIs
- Have the
webPage
context enabled on Web or the screen context on mobile (enabled by default). - Enabled session tracking on the tracker (default on mobile).
The model is compatible with all versions of our media tracking APIs. These have evolved over time and may track the media events using two sets of event and contexts schemas:
Version 1 media schemas:
- media-player event schema used for all media events.
- media-player context v1 schema.
- Depending on the plugin / intention there are player-specific contexts:
- in case of embedded YouTube tracking: Have the YouTube specific context schema enabled.
- in case of HTML5 audio or video tracking: Have the HTML5 media element context schema enabled.
- in case of HTML5 video tracking: Have the HTML5 video element context schema enabled.
Version 2 media schemas (preferred):
- per-event media event schemas.
- media-player context v2 schema.
- optional media-session context schema.
- optional media-ad and ad break context schema.
Installationโ
Make sure to create a new dbt project and import this package via the packages.yml
as recommended by dbt, or add to an existing top level project. Do not fork the packages themselves.
Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages. If you are using multiple packages you may need to up/downgrade a specific package to ensure compatibility.
packages:
- package: snowplow/snowplow_media_player
version: 0.9.1
Make sure to run the dbt deps
command after updating your packages.yml
to ensure you have the specified version of each package installed in your project.
Setupโ
1. Override the dispatch order in your projectโ
To take advantage of the optimized upsert that the Snowplow packages offer you need to ensure that certain macros are called from snowplow_utils
first before dbt-core
. This can be achieved by adding the following to the top level of your dbt_project.yml
file:
dispatch:
- macro_namespace: dbt
search_order: ['snowplow_utils', 'dbt']
If you do not do this the package will still work, but the incremental upserts will become more costly over time.
2. Adding the selectors.yml
fileโ
Within the packages we have provided a suite of suggested selectors to run and test the models within the package together with the media player model. This leverages dbt's selector flag. You can find out more about each selector in the YAML Selectors section.
These are defined in the selectors.yml
file (source) within the package, however in order to use these selections you will need to copy this file into your own dbt project directory. This is a top-level file and therefore should sit alongside your dbt_project.yml
file. If you are using multiple packages in your project you will need to combine the contents of these into a single file.
3. Check source dataโ
This package will by default assume your Snowplow events data is contained in the atomic
schema of your target.database, in the table labeled events
. In order to change this, please add the following to your dbt_project.yml
file:
vars:
snowplow_media_player:
snowplow__atomic_schema: schema_with_snowplow_events
snowplow__database: database_with_snowplow_events
snowplow__events_table: table_of_snowplow_events
Please note that your target.database
is NULL if using Databricks. In Databricks, schemas and databases are used interchangeably and in the dbt implementation of Databricks therefore we always use the schema value, so adjust your snowplow__atomic_schema
value if you need to.
4. Filter your data setโ
You can specify both start_date
at which to start processing events, the app_id
's to filter for, and the event_name
value to filter on. By default the start_date
is set to 2020-01-01
, all app_id
's are selected, and all events with the com.snowplowanalytics.snowplow.media
or the media_player_event
event name are being surfaced. To change this please add/modify the following in your dbt_project.yml
file:
...
vars:
snowplow_media_player:
snowplow__start_date: 'yyyy-mm-dd'
snowplow__app_id: ['my_app_1','my_app_2']
snowplow__media_event_names: ['media_player_event', 'my_custom_media_event']
5. Additional vendor specific configurationโ
Verify which column your events table is partitioned on. It will likely be partitioned on collector_tstamp
or derived_tstamp
. If it is partitioned on collector_tstamp
you should set snowplow__derived_tstamp_partitioned
to false
. This will ensure only the collector_tstamp
column is used for partition pruning when querying the events table:
...
vars:
snowplow_media_player:
snowplow__derived_tstamp_partitioned: false
6. Enable desired contexts and configurationโ
The media player package creates tables that depend on the existence of certain context entities that are tracked by the media plugins in the Snowplow trackers. Depending on which media plugin or tracking implementation you, you will need to enable the relevant contexts in your dbt_project.yml
.
6a. Using trackers with support for the version 2 media schemasโ
This option applied in case you are tracking media events with either the Snowplow Media plugin, Vimeo plugin for JavaScript tracker, or the iOS/Android trackers.
...
vars:
snowplow_media_player:
# don't use the older version 1 of the media player context schema
snowplow__enable_media_player_v1: false
# use the version 2 of the media player context schema
snowplow__enable_media_player_v2: true
# use the media session context schema (unless disabled on the tracker)
snowplow__enable_media_session: true
# depending on whether you track ads, ad breaks and progress within ads:
snowplow__enable_media_ad: true
snowplow__enable_media_ad_break: true
snowplow__enable_ad_quartile_event: true
# depending on whether you track events from web or mobile apps:
snowplow__enable_web_events: true
snowplow__enable_mobile_events: true
6b. Using the HTML5 media tracking plugin for JavaScript trackerโ
...
vars:
snowplow_media_player:
# use the version 1 of the media player context schema used by the YouTube plugin
snowplow__enable_media_player_v1: true
# don't use the version 2 of the media player context schema as it is not tracked by the plugin
snowplow__enable_media_player_v2: false
# don't use the media session context schema as it is not tracked by the plugin
snowplow__enable_media_session: false
# set to true if the HTML5 media element context schema is enabled
snowplow__enable_whatwg_media: true
# set to true if the HTML5 video element context schema is enabled
snowplow__enable_whatwg_video: true
6c. Using the YouTube tracking plugin for JavaScript trackerโ
...
vars:
snowplow_media_player:
# use the version 1 of the media player context schema used by the YouTube plugin
snowplow__enable_media_player_v1: true
# don't use the version 2 of the media player context schema as it is not tracked by the plugin
snowplow__enable_media_player_v2: false
# don't use the media session context schema as it is not tracked by the plugin
snowplow__enable_media_session: false
# set to true if the YouTube context schema is enabled
snowplow__enable_youtube: true
For other variables you can configure please see the model configuration section.
7. Optimize your projectโ
There are ways how you can deal with high volume optimizations at a later stage, if needed, but you can do a lot upfront by selecting carefully which variable to use for snowplow__session_timestamp
, which helps identify the timestamp column used for sessionization. This timestamp column should ideally be set to the column your event table is partitioned on. It is defaulted to collector_tstamp
but depending on your loader it can be the load_tstamp
as the sensible value to use:
vars:
snowplow_media_player:
snowplow__session_timestamp: 'load_tstamp'
8. Verify your variables using our Config guides (Optional)โ
If you are unsure whether the default values set are good enough in your case or you would already like to maximize the potential of your models, you can dive deeper into the meaning behind our variables on our Config page. It includes a Config Generator to help you create all your variable configurations, if necessary.
9. Run your modelโ
You can now run your models for the first time by running the below command (see the operation page for more information on operation of the package):
dbt run --selector snowplow_media_player