Skip to main content

Understanding the structure of Snowplow data

Overview

In order to analyze Snowplow data, it is important to understand how it is structured. We have tried to make the structure of Snowplow data as simple, logical, and easy-to-query as possible.

Each line represents one event

Each line in the Snowplow events table represents a single event, be that a page view, add to basket, play video, like etc.

Structured data

Snowplow data is structured: individual fields are stored in their own columns, making writing sophisticated queries on the data easy, and making it straightforward for analysts to plugin any kind of analysis tool into their Snowplow data to compose and execute queries.

Extensible schema

Snowplow started life as a web analytics data warehousing platform, and has a basic schema suitable for performing web analytics, with a wide range of web-specific dimensions (related to page URLs, browsers, operating systems, devices, IP addresses, cookie IDs) and web-specific events (page views, page pings, transactions). All of these fields can be found in the atomic.events table, which is a "fat" (many columns) table.

As Snowplow has evolved into a general purpose event analytics platform, we've enabled Snowplow users to define additional event types (we call these self describing events) and define their own entities (we call these custom entities) so that they can extend the schema to suit their own businesses.

For Snowplow users running Amazon Redshift, each type of self-describing event and each type of entity will be stored in their own dedicated tables. These additional tables can be joined back to the core atomic.events table, by joining on the root_id field in the self-describing event / entity table with the event_id in the atomic.events table, and the root_tstamp and collector_tstamp field in the respective tables. For users on other warehouses these will be additional columns in the atomic.events table.

Single table

All the events are effectively stored in a single table, making running queries across the data very easy. Even if you're running Snowplow with Redshift and have extended the schema as described above, you can still query the data as if it were in a single fat table. This is because:

  • The joins from the additional tables to the core atomic.events table are one-to-one.
  • The field joined on is the distribution key for both tables, so queries are as fast as if the data were in a single table.

Immutable log

The Snowplow data table is designed to be immutable: the data in each line should not change over time. Data points that we would expect to change over time (e.g. what cohort a particular user belongs to, how we classify a particular visitor) can be derived from Snowplow data. However, our recommendation is that these derived fields should be defined and calculated at analysis time, stored in a separate table and joined to the Snowplow events table when performing any analysis.

Out-of-the-box fields

The sections below go over the standard fields found in all Snowplow events grouped into categories. Some are platform independent (=comes out-of-the-box for both web and mobile trackers) and some will only get populated for a specific platform as indicated in the tables below ( ✅ | ❌ ).

As we are moving away from the wide table structure and shift the canonical fields into specific, tematically grouped entities (contexts, self-describing events) in places we will include a link to the specific tracking documentation so that you can find more information of the out-of-the-box options.

We also provide a Source for each field, however sometimes a field may have multiple sources e.g. a value originally set by a tracker but is overwritten by a later enrichment; in this case we have tended to classify this as the earliest source and adding more details where useful.


Common fields (platform and event independent)

Application fields

FieldTypeDescriptionReqd?ExampleSourceWebMobile
app_idtextApplication IDYes'angry-birds'Tracking
platformtextPlatform, limited to specific valuesYes'web'Tracking

The application ID is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.

The platform ID is used to distinguish the same app running on different platforms, e.g. iOS vs web.

Date / time fields

FieldTypeDescriptionReqd?ExampleSourceWebMobile
collector_tstamptimestampTimestamp for the event recorded by the collectorYes'2013-11-26 00:02:05'Pipeline
dvce_created_tstamptimestampTimestamp for the event recorded on the client deviceNo'2013-11-26 00:03:57.885'Tracking
dvce_sent_tstamptimestampWhen the event was actually sent by the client deviceNo'2013-11-26 00:03:58.032'Tracking
etl_tstamptimestampTimestamp for when the event was validated and enriched. Note: the name is historical and does not mean that the event is loaded at this point (this is further downstream).No'2017-01-26 00:01:25.292'Pipeline
os_timezonetextClient operating system timezoneNo'Europe/London'Tracking: Timezone Plugin
derived_tstamptimestampTimestamp making allowance for inaccurate device clockNo'2013-11-26 00:02:04'Default Enrichment
true_tstamptimestampUser-set "true timestamp" for the eventNo'2013-11-26 00:02:04'Tracking
load_tstamptimestampTimestamp for when the data was loaded into the warehouse, best choice for incremental processingNo'2013-11-26 00:02:04'Pipeline

Event / transaction fields

FieldTypeDescriptionReqd?ExampleSourceWebMobile
eventtextThe type of event recordedYes'page_view'Tracking
event_idtextA UUID for each eventYes'c6ef3124-b53a-4b13-a233-0088f79dcbcb'Tracking (or enrichment if empty)
txn_idintTransaction ID set client-side, used to de-dupe recordsNo421828Tracking (Deprecated)
event_fingerprinttextHash client-set event fields, used to de-dupe recordsNoAADCE520E20C2899F4CED228A79A3083Event Fingerprint Enrichment

A complete list of event types is given here.

Snowplow version fields

FieldTypeDescriptionReqd?ExampleSourceWebMobile
v_trackertextTracker versionYes'js-3.0.0'Tracking
v_collectortextCollector versionYes'ssc-2.1.0-kinesis'Pipeline
v_etltextETL versionYes'snowplow-micro-1.1.0-common-1.4.2'Default Enrichment
name_trackertextTracker namespaceNo'sp1'Tracking

Some Snowplow Trackers allow the user to name each specific Tracker instance. name_tracker corresponds to this name, and can be used to distinguish which tracker generated which events.

FieldTypeDescriptionReqd?ExampleSourceWebMobile
user_idtextUnique ID set by businessNo'c94f860b-1266-4dad-ae57-3a36a414a521'Tracking
domain_useridtextUser ID set by Snowplow using 1st party client-set cookie, (only used for Web events!)No'4b0dfa75-9a8c-46a1-9691-01add9db4200'Tracking
network_useridtextUser ID set by Snowplow using server-set cookie, which may be 1st or 3rd party, depending on collector configuration.No'ecdff4d0-9175-40ac-a8bb-325c49733607'Tracking or Pipeline
user_ipaddresstextUser IP address, can be overwritten with the IP Anonymization EnrichmentNo'92.231.54.234'Tracking or Pipeline
domain_sessionidxintA visit / session indexNo3Tracking
domain_sessionidtextA visit / session identifierNo'c6ef3124-b53a-4b13-a233-0088f79dcbcb'Tracking

domain_sessionidx is the number of the current user session. For example, an event occurring during a user's first session would have domain_sessionidx set to 1. The JavaScript Tracker calculates this field by storing a visit count in a first-party cookie. Whenever the Tracker fires an event, if more than 30 minutes have elapsed since the last event, the visitor count is increased by 1. (Whenever an event is fired, a "session cookie" is created and set to expire in 30 minutes. This is how the Tracker can tell whether the visit count should be incremented.) Thirty minutes is the default value and can be changed using the sessionCookieTimeout configuration option in the tracker.

It is worth to call out here that the domain_userid is regarded as the most reliable session based identifier for most use cases, it is therefore treated as the primary user_identifier field in our data models that rely on sessionisation (e.g. the Unified data model). These are only populated for web events, the mobile equivalent is embedded into an ootb entity called Session Context Entity which are then extracted (session__user_id) then coalesced with the web based domain_userid by default and being referred to as user_identifier within the data modeling package.

Device and operating system fields

FieldTypeDescriptionReqd?ExampleSourceWebMobile
useragenttextRaw useragentNoTracking or Pipeline
dvce_typetextType of deviceNo'Computer'Deprecated
dvce_ismobilebooleanIs the device mobile?No1Deprecated
dvce_screenheightintScreen height in pixelsNo1024Tracking
dvce_screenwidthintScreen width in pixelsNo1900Tracking
os_nametextName of operating systemNo'Android'Deprecated
os_familytextOperating system familyNo'Linux'Deprecated
os_manufacturertextCompany responsible for OSNo'Apple'Deprecated

For more information on this topic please check out the relevant Tracking Documentation.

Location fields

FieldTypeDescriptionReqd?ExampleSourceWebMobile
geo_countrytextISO 3166-1 code for the country the visitor is located inNo'GB', 'US'IP Enrichment
geo_regiontextISO-3166-2 code for country region the visitor is inNo'I9', 'TX'IP Enrichment
geo_citytextCity the visitor is inNo'New York', 'London'IP Enrichment
geo_zipcodetextPostcode the visitor is inNo'94109'IP Enrichment
geo_latitudetextVisitor location latitudeNo37.443604IP Enrichment
geo_longitudetextVisitor location longitudeNo-122.4124IP Enrichment
geo_region_nametextVisitor region nameNo'Florida'IP Enrichment
geo_timezonetextVisitor timezone nameNo'Europe/London'IP Enrichment

For more information on this topic please check out the relevant Tracking Documentation.

IP address-based fields

FieldTypeDescriptionReqd?ExampleSourceWebMobile
ip_isptextVisitor's ISPNo'FDN Communications'IP Enrichment
ip_organizationtextOrganization associated with the visitor's IP address - defaults to ISP name if none is foundNo'Bouygues Telecom'IP Enrichment
ip_domaintextSecond level domain name associated with the visitor's IP addressNo'nuvox.net'IP Enrichment
ip_netspeedtextVisitor's connection typeNo'Cable/DSL'IP Enrichment

Metadata fields

Fields containing information about the event type.

FieldTypeDescriptionReqd?ExampleSourceWebMobile
event_vendortextWho defined the eventYes'com.acme'Default Enrich (event schema)
event_nametextEvent nameYes'link_click'Default Enrich (event schema)
event_formattextFormat for eventYes'jsonschema'Default Enrich (event schema)
event_versiontextVersion of event schemaYes'1-0-2'Default Enrich (event schema)

Marketing / traffic source fields

FieldTypeDescriptionReqd?ExampleSourceWebMobile
mkt_mediumtextType of traffic sourceNo'cpc', 'affiliate', 'organic', 'social'Campaign Attribution Enrichment
mkt_sourcetextThe company / website where the traffic came fromNo'Google', 'Facebook'Campaign Attribution Enrichment
mkt_termtextAny keywords associated with the referrerNo'new age tarot decks'Campaign Attribution Enrichment
mkt_contenttextThe content of the ad. (Or an ID so that it can be looked up.)No13894723Campaign Attribution Enrichment
mkt_campaigntextThe campaign IDNo'diageo-123'Campaign Attribution Enrichment
mkt_clickidtextThe click IDNo'ac3d8e459'Campaign Attribution Enrichment
mkt_networktextThe ad network to which the click ID belongsNo'DoubleClick'Campaign Attribution Enrichment

Platform-specific fields

Web-specific fields

Page Fields
FieldTypeDescriptionReqd?ExampleSourceWebMobile
page_urltextThe page URLNo'http://www.example.com'Tracking
page_urlschemetextScheme aka protocolNo'https'Default Enrichment (url)
page_urlhosttextHost aka domainNo'“www.snowplowanalytics.com'Default Enrichment (url)
page_urlportintPort if specified, scheme dependent if not (443 for https, 80 for http)No80Default Enrichment (url)
page_urlpathtextPath to pageNo'/product/index.html'Default Enrichment (url)
page_urlquerytextQuerystringNo'id=GTM-DLRG'Default Enrichment (url)
page_urlfragmenttextFragment aka anchorNo'4-conclusion'Default Enrichment (url)
page_referrertextURL of the referrerNo'http://www.referrer.com'Tracking
page_titletextWeb page titleNo'Snowplow Docs - Understanding the structure of Snowplow data'Tracking
refr_urlschemetextReferrer schemeNo'http'Default Enrichment (referer)
refr_urlhosttextReferrer hostNo'www.bing.com'Default Enrichment (referer)
refr_urlportintReferrer portNo80Default Enrichment (referer)
refr_urlpathtextReferrer page pathNo'/images/search'Default Enrichment (referer)
refr_urlquerytextReferrer URL querystringNo'q=psychic+oracle+cards'Default Enrichment (referer)
refr_urlfragmenttextReferrer URL fragmentNoDefault Enrichment (referer)
refr_mediumtextType of referrerNo'search', 'internal'Referrer Parser Enrichment
refr_sourcetextName of referrer if recognizedNo'Bing images'Referrer Parser Enrichment
refr_termtextKeywords if source is a search engineNo'psychic oracle cards'Referrer Parser Enrichment
refr_domain_useridtextThe Snowplow domain_userid of the referring websiteNo'bc2e92ec6c204a14'Default Enrichment (cross_domain)
refr_dvce_tstamptimestampThe time of attaching the domain_userid to the inbound linkNo'2013-11-26 00:02:05'Default Enrichment (cross_domain)
Document fields
FieldTypeDescriptionReqd?ExampleSourceWebMobile
doc_charsettextThe page’s character encodingNo, 'UTF-8'Tracking
doc_widthintThe page's width in pixelsNo1024Tracking
doc_heightintThe page's height in pixelsNo3000Tracking
Browser fields
FieldTypeDescriptionReqd?ExampleSourceWebMobile
user_fingerprintintA user fingerprint generated by looking at the individual browser featuresNo2161814971Tracking (Deprecated)
br_nametextBrowser nameNo'Firefox 12'Deprecated
br_versiontextBrowser versionNo'12.0'Deprecated
br_familytextBrowser familyNo'Firefox'Deprecated
br_typetextBrowser typeNo'Browser'Deprecated
br_renderenginetextBrowser rendering engineNo'GECKO'Deprecated
br_langtextLanguage the browser is set toNo'en-GB'Tracking
br_features_pdfbooleanWhether the browser recognizes PDFsNo1Deprecated
br_features_flashbooleanWhether Flash is installedNo1Deprecated
br_features_javabooleanWhether Java is installedNo1Deprecated
br_features_directorbooleanWhether Adobe Shockwave is installedNo1Deprecated
br_features_quicktimebooleanWhether QuickTime is installedNo1Deprecated
br_features_realplayerbooleanWhether RealPlayer is installedNo1Deprecated
br_features_windowsmediabooleanWhether mplayer2 is installedNo1Deprecated
br_features_gearsbooleanWhether Google Gears is installedNo1Deprecated
br_features_silverlightbooleanWhether Microsoft Silverlight is installedNo1Deprecated
br_cookiesbooleanWhether cookies are enabledNo1Tracking
br_colordepthintBit depth of the browser color paletteNo24Tracking
br_viewheightintViewport heightNo1000Tracking
br_viewwidthintViewport widthNo1000Tracking

For more information on this topic please check out the relevant Tracking Documentation.

Event-specific fields

Snowplow includes specific fields to capture data associated with specific events.

Note that to date, all event types have been defined by Snowplow. Also note that event_vendor values follow the Java package naming convention.

Snowplow currently supports the following event types:

Event typeValue of event field in model
Page views'page_view'
Page pings'page_ping'
E-commerce transactions'transaction' and 'transaction_item'
Structured events'struct'
Self-describing events'unstruct' (for legacy reasons)

Details of which fields are available for which events are given below. In some cases these events will store values in the atomic.events columns, which are listed below.

Page views

There are currently no fields that are specific to page_view events: all the fields that are required are part of the standard fields available for any web-based event e.g. page_urlscheme, page_title.

Page pings

There are four additional fields included with page pings that indicate how a user has scrolled over a web page since the last page ping:

FieldTypeDescriptionReqd?Example
pp_xoffset_minintegerMinimum page x offset seen in the last ping periodNo10
pp_xoffset_maxintegerMaximum page x offset seen in the last ping periodNo100
pp_yoffset_minintegerMinimum page y offset seen in the last ping periodNo5
pp_yoffset_maxintegerMaximum page y offset seen in the last ping periodNo200

E-commerce transactions

There are a large number of fields specifically for transaction events.

Fields that start tr_ relate to the transaction as a whole. Fields that start ti_ refer to the specific item included in the transaction. (E.g. a product in the basket.) Single transactions typically span multiple lines of data: there will be a single line where event = transaction, where the tr_ fields are set, and multiple lines (one for each product included) where event = transaction_item and the ti_ fields are set.

FieldTypeDescriptionReqd?Example
tr_orderidtextOrder IDYes'#134'
tr_affiliationtextTransaction affiliation (e.g. store where sale took place)No'web'
tr_totaldecimalTotal transaction valueYes12.99
tr_taxdecimalTotal tax included in transaction valueNo3.00
tr_shippingdecimalDelivery cost chargedNo0.00
tr_total_base*decimalTotal in base currencyNo12.99
tr_tax_base*decimalTotal tax in base currencyNo3.00
tr_shipping_base*decimalDelivery cost in base currencyNo0.00
tr_citytextDelivery address, cityNo'London'
tr_statetextDelivery address, stateNo'Washington'
tr_countrytextDelivery address, countryNo'France'
tr_currencytextCurrencyNo'USD'
ti_orderidtextOrder IDYes'#134'
ti_skutextProduct SKUYes'pbz00123'
ti_nametextProduct nameNo'Cone pendulum'
ti_categorytextProduct categoryNo'New Age'
ti_pricedecimalProduct unit priceYes9.99
ti_price_base*decimalPrice in base currencyNo9.99
ti_quantityintegerNumber of product in transactionYes2
ti_currencytextCurrencyNo'EUR'
base_currency*textReporting currencyNo'GBP'

* Set exclusively by the Currency conversion enrichment.

For more information on this topic please check out the relevant Tracking Documentation.

Structured events

Structured events allow you to send your own custom data, as long as it fits in the following 5 fields:

FieldTypeDescriptionReqd?Example
se_categorytextCategory of eventYes*'ecomm', 'video'
se_actiontextAction performed / event nameYes*'add-to-basket', 'play-video'
se_labeltextThe object of the action e.g. the ID of the video played or SKU of the product added-to-basketNo'pbz00123'
se_propertytextA property associated with the object of the actionNo'HD', 'large'
se_valuedecimalA value associated with the event / action e.g. the value of goods added-to-basketNo9.99

* These fields are only required for struct events.

For more information on this topic please check out the relevant Tracking Documentation.

Self-describing events

Self-describing events can contain their own set of fields, defined by their schema.

For each type of self-describing event, there will be a dedicated column (or table, in case of Redshift and Postgres) that holds the event-specific fields.

See querying data for more details on the structure and how to query it in different warehouses. You might also want to check how schema definitions translate to the warehouse.

For more information on this topic please check out the relevant Tracking Documentation.

Entities

Entities (also known as contexts) provide extra information about the event, such as data describing a product or a user.

For each type of entity, there will be a dedicated column (or table, in case of Redshift and Postgres) that holds entity-specific fields. Note that an event can have any number of entities attached, including multiple entities of the same type. For this reason, the data inside the entity columns is an array.

See querying data for more details on the structure and how to query it in different warehouses. You might also want to check how schema definitions translate to the warehouse.

For more information on this topic please check out the relevant Tracking Documentation.

Out-of-the-box self-describing events and entities

These are also a variety of self-describing events and entities defined by Snowplow. You can find their schemas here.