Skip to main content

Understanding the structure of Snowplow data

Overview​

In order to analyze Snowplow data, it is important to understand how it is structured. We have tried to make the structure of Snowplow data as simple, logical, and easy-to-query as possible.

Each line represents one event​

Each line in the Snowplow events table represents a single event, be that a page view, add to basket, play video, like etc.

Structured data​

Snowplow data is structured: individual fields are stored in their own columns, making writing sophisticated queries on the data easy, and making it straightforward for analysts to plugin any kind of analysis tool into their Snowplow data to compose and execute queries.

Extensible schema​

Snowplow started life as a web analytics data warehousing platform, and has a basic schema suitable for performing web analytics, with a wide range of web-specific dimensions (related to page URLs, browsers, operating systems, devices, IP addresses, cookie IDs) and web-specific events (page views, page pings, transactions). All of these fields can be found in the atomic.events table, which is a "fat" (many columns) table.

As Snowplow has evolved into a general purpose event analytics platform, we've enabled Snowplow users to define additional event types (we call these self describing events) and define their own entities (we call these custom entities) so that they can extend the schema to suit their own businesses.

For Snowplow users running Amazon Redshift, each type of self-describing event and each type of entity will be stored in their own dedicated tables. These additional tables can be joined back to the core atomic.events table, by joining on the root_id field in the self-describing event / entity table with the event_id in the atomic.events table, and the root_tstamp and collector_tstamp field in the respective tables. For users on other warehouses these will be additional columns in the atomic.events table.

Single table​

All the events are effectively stored in a single table, making running queries across the data very easy. Even if you're running Snowplow with Redshift and have extended the schema as described above, you can still query the data as if it were in a single fat table. This is because:

  • The joins from the additional tables to the core atomic.events table are one-to-one.
  • The field joined on is the distribution key for both tables, so queries are as fast as if the data were in a single table.

Immutable log​

The Snowplow data table is designed to be immutable: the data in each line should not change over time. Data points that we would expect to change over time (e.g. what cohort a particular user belongs to, how we classify a particular visitor) can be derived from Snowplow data. However, our recommendation is that these derived fields should be defined and calculated at analysis time, stored in a separate table and joined to the Snowplow events table when performing any analysis.

Out-of-the-box fields​

The sections below go over the standard fields found in all Snowplow events grouped into categories. Some are platform independent (=comes out-of-the-box for both web and mobile trackers) and some will only get populated for a specific platform as indicated in the tables below ( βœ… | ❌ ).

As we are moving away from the wide table structure and shift the canonical fields into specific, tematically grouped entities (contexts, self-describing events) in places we will include a link to the specific tracking documentation so that you can find more information of the out-of-the-box options.

We also provide a Source for each field, however sometimes a field may have multiple sources e.g. a value originally set by a tracker but is overwritten by a later enrichment; in this case we have tended to classify this as the earliest source and adding more details where useful.


Common fields (platform and event independent)​

Application fields​

FieldTypeDescriptionReqd?ExampleSourceWebMobile
app_idtextApplication IDYes'angry-birds'Trackingβœ…βœ…
platformtextPlatform, limited to specific valuesYes'web'Trackingβœ…βœ…

The application ID is used to distinguish different applications that are being tracked by the same Snowplow stack, e.g. production versus dev.

The platform ID is used to distinguish the same app running on different platforms, e.g. iOS vs web.

Date / time fields​

FieldTypeDescriptionReqd?ExampleSourceWebMobile
collector_tstamptimestampTimestamp for the event recorded by the collectorYes'2013-11-26 00:02:05'Pipelineβœ…βœ…
dvce_created_tstamptimestampTimestamp for the event recorded on the client deviceNo'2013-11-26 00:03:57.885'Trackingβœ…βœ…
dvce_sent_tstamptimestampWhen the event was actually sent by the client deviceNo'2013-11-26 00:03:58.032'Trackingβœ…βœ…
etl_tstamptimestampTimestamp for when the event was validated and enriched. Note: the name is historical andΒ does not meanΒ that the event is loaded at this point (this is further downstream).No'2017-01-26 00:01:25.292'Pipelineβœ…βœ…
os_timezonetextClient operating system timezoneNo'Europe/London'Tracking:Β Timezone Pluginβœ…βœ…
derived_tstamptimestampTimestamp making allowance for inaccurate device clockNo'2013-11-26 00:02:04'Default Enrichmentβœ…βœ…
true_tstamptimestampUser-set "true timestamp" for the eventNo'2013-11-26 00:02:04'Trackingβœ…βœ…
load_tstamptimestampTimestamp for when the data was loaded into the warehouse, best choice for incremental processingNo'2013-11-26 00:02:04'Pipelineβœ…βœ…

Event / transaction fields​

FieldTypeDescriptionReqd?ExampleSourceWebMobile
eventtextThe type of event recordedYes'page_view'Trackingβœ…βœ…
event_idtextA UUID for each eventYes'c6ef3124-b53a-4b13-a233-0088f79dcbcb'Tracking (or enrichment if empty)βœ…βœ…
txn_idintTransaction ID set client-side, used to de-dupe recordsNo421828Tracking (Deprecated)βœ…βœ…
event_fingerprinttextHash client-set event fields, used to de-dupe recordsNoAADCE520E20C2899F4CED228A79A3083Event Fingerprint Enrichmentβœ…βœ…

A complete list of event types is given here.

Snowplow version fields​

FieldTypeDescriptionReqd?ExampleSourceWebMobile
v_trackertextTracker versionYes'js-3.0.0'Trackingβœ…βœ…
v_collectortextCollector versionYes'ssc-2.1.0-kinesis'Pipelineβœ…βœ…
v_etltextETL versionYes'snowplow-micro-1.1.0-common-1.4.2'Default Enrichmentβœ…βœ…
name_trackertextTracker namespaceNo'sp1'Trackingβœ…βœ…

Some Snowplow Trackers allow the user to name each specific Tracker instance. name_tracker corresponds to this name, and can be used to distinguish which tracker generated which events.

FieldTypeDescriptionReqd?ExampleSourceWebMobile
user_idtextUnique ID set by businessNo'c94f860b-1266-4dad-ae57-3a36a414a521'Trackingβœ…βœ…
domain_useridtextUser ID set by Snowplow using 1st party client-set cookie, (only used for Web events!)No'4b0dfa75-9a8c-46a1-9691-01add9db4200'Trackingβœ…βŒ
network_useridtextUser ID set by Snowplow using server-set cookie, which may be 1st or 3rd party, depending on collector configuration.No'ecdff4d0-9175-40ac-a8bb-325c49733607'Tracking or Pipelineβœ…βœ…
user_ipaddresstextUser IP address, can be overwritten with theΒ IP Anonymization EnrichmentNo'92.231.54.234'Tracking or Pipelineβœ…βœ…
domain_sessionidxintA visit / session indexNo3Trackingβœ…βŒ
domain_sessionidtextA visit / session identifierNo'c6ef3124-b53a-4b13-a233-0088f79dcbcb'Trackingβœ…βŒ

domain_sessionidx is the number of the current user session. For example, an event occurring during a user's first session would have domain_sessionidx set to 1. The JavaScript Tracker calculates this field by storing a visit count in a first-party cookie. Whenever the Tracker fires an event, if more than 30 minutes have elapsed since the last event, the visitor count is increased by 1. (Whenever an event is fired, a "session cookie" is created and set to expire in 30 minutes. This is how the Tracker can tell whether the visit count should be incremented.) Thirty minutes is the default value and can be changed using the sessionCookieTimeout configuration option in the tracker.

It is worth to call out here that the domain_userid is regarded as the most reliable session based identifier for most use cases, it is therefore treated as the primary user_identifier field in our data models that rely on sessionisation (e.g. the Unified data model). These are only populated for web events, the mobile equivalent is embedded into an ootb entity called Session Context Entity which are then extracted (session__user_id) then coalesced with the web based domain_userid by default and being referred to as user_identifier within the data modeling package.

Device and operating system fields​

FieldTypeDescriptionReqd?ExampleSourceWebMobile
useragenttextRaw useragentNoTracking or Pipelineβœ…βœ…
dvce_typetextType of deviceNo'Computer'Deprecatedβœ…βœ…
dvce_ismobilebooleanIs the device mobile?No1Deprecatedβœ…βœ…
dvce_screenheightintScreen height in pixelsNo1024Trackingβœ…βœ…
dvce_screenwidthintScreen width in pixelsNo1900Trackingβœ…βœ…
os_nametextName of operating systemNo'Android'Deprecatedβœ…βœ…
os_familytextOperating system familyNo'Linux'Deprecatedβœ…βœ…
os_manufacturertextCompany responsible for OSNo'Apple'Deprecatedβœ…βœ…

For more information on this topic please check out the relevant Tracking Documentation.

Location fields​

FieldTypeDescriptionReqd?ExampleSourceWebMobile
geo_countrytextISO 3166-1 code for the country the visitor is located inNo'GB', 'US'IP Enrichmentβœ…βœ…
geo_regiontextISO-3166-2 code for country region the visitor is inNo'I9', 'TX'IP Enrichmentβœ…βœ…
geo_citytextCity the visitor is inNo'New York', 'London'IP Enrichmentβœ…βœ…
geo_zipcodetextPostcode the visitor is inNo'94109'IP Enrichmentβœ…βœ…
geo_latitudetextVisitor location latitudeNo37.443604IP Enrichmentβœ…βœ…
geo_longitudetextVisitor location longitudeNo-122.4124IP Enrichmentβœ…βœ…
geo_region_nametextVisitor region nameNo'Florida'IP Enrichmentβœ…βœ…
geo_timezonetextVisitor timezone nameNo'Europe/London'IP Enrichmentβœ…βœ…

For more information on this topic please check out the relevant Tracking Documentation.

IP address-based fields​

FieldTypeDescriptionReqd?ExampleSourceWebMobile
ip_isptextVisitor's ISPNo'FDN Communications'IP Enrichmentβœ…βœ…
ip_organizationtextOrganization associated with the visitor's IP address - defaults to ISP name if none is foundNo'Bouygues Telecom'IP Enrichmentβœ…βœ…
ip_domaintextSecond level domain name associated with the visitor's IP addressNo'nuvox.net'IP Enrichmentβœ…βœ…
ip_netspeedtextVisitor's connection typeNo'Cable/DSL'IP Enrichmentβœ…βœ…

Metadata fields​

Fields containing information about the event type.

FieldTypeDescriptionReqd?ExampleSourceWebMobile
event_vendortextWho defined the eventYes'com.acme'Default Enrich (event schema)βœ…βœ…
event_nametextEvent nameYes'link_click'Default Enrich (event schema)βœ…βœ…
event_formattextFormat for eventYes'jsonschema'Default Enrich (event schema)βœ…βœ…
event_versiontextVersion of event schemaYes'1-0-2'Default Enrich (event schema)βœ…βœ…

Marketing / traffic source fields​

FieldTypeDescriptionReqd?ExampleSourceWebMobile
mkt_mediumtextType of traffic sourceNo'cpc', 'affiliate', 'organic', 'social'Campaign Attribution Enrichmentβœ…βœ…
mkt_sourcetextThe company / website where the traffic came fromNo'Google', 'Facebook'Campaign Attribution Enrichmentβœ…βœ…
mkt_termtextAny keywords associated with the referrerNo'new age tarot decks'Campaign Attribution Enrichmentβœ…βœ…
mkt_contenttextThe content of the ad. (Or an ID so that it can be looked up.)No13894723Campaign Attribution Enrichmentβœ…βœ…
mkt_campaigntextThe campaign IDNo'diageo-123'Campaign Attribution Enrichmentβœ…βœ…
mkt_clickidtextThe click IDNo'ac3d8e459'Campaign Attribution Enrichmentβœ…βœ…
mkt_networktextThe ad network to which the click ID belongsNo'DoubleClick'Campaign Attribution Enrichmentβœ…βœ…

Platform-specific fields​

Web-specific fields​

Page Fields​
FieldTypeDescriptionReqd?ExampleSourceWebMobile
page_urltextThe page URLNo'http://www.example.com'Trackingβœ…βŒ
page_urlschemetextScheme aka protocolNo'https'Default Enrichment (url)βœ…βŒ
page_urlhosttextHost aka domainNo'β€œwww.snowplowanalytics.com'Default Enrichment (url)βœ…βŒ
page_urlportintPort if specified, scheme dependent if not (443 for https, 80 for http)No80Default Enrichment (url)βœ…βŒ
page_urlpathtextPath to pageNo'/product/index.html'Default Enrichment (url)βœ…βŒ
page_urlquerytextQuerystringNo'id=GTM-DLRG'Default Enrichment (url)βœ…βŒ
page_urlfragmenttextFragment aka anchorNo'4-conclusion'Default Enrichment (url)βœ…βŒ
page_referrertextURL of the referrerNo'http://www.referrer.com'Trackingβœ…βŒ
page_titletextWeb page titleNo'Snowplow Docs - Understanding the structure of Snowplow data'Trackingβœ…βŒ
refr_urlschemetextReferrer schemeNo'http'Default Enrichment (referer)βœ…βŒ
refr_urlhosttextReferrer hostNo'www.bing.com'Default Enrichment (referer)βœ…βŒ
refr_urlportintReferrer portNo80Default Enrichment (referer)βœ…βŒ
refr_urlpathtextReferrer page pathNo'/images/search'Default Enrichment (referer)βœ…βŒ
refr_urlquerytextReferrer URL querystringNo'q=psychic+oracle+cards'Default Enrichment (referer)βœ…βŒ
refr_urlfragmenttextReferrer URL fragmentNoDefault Enrichment (referer)βœ…βŒ
refr_mediumtextType of referrerNo'search', 'internal'Referrer Parser Enrichmentβœ…βŒ
refr_sourcetextName of referrer if recognizedNo'Bing images'Referrer Parser Enrichmentβœ…βŒ
refr_termtextKeywords if source is a search engineNo'psychic oracle cards'Referrer Parser Enrichmentβœ…βŒ
refr_domain_useridtextThe Snowplow domain_userid of the referring websiteNo'bc2e92ec6c204a14'Default Enrichment (cross_domain)βœ…βŒ
refr_dvce_tstamptimestampThe time of attaching the domain_userid to the inbound linkNo'2013-11-26 00:02:05'Default Enrichment (cross_domain)βœ…βŒ
Document fields​
FieldTypeDescriptionReqd?ExampleSourceWebMobile
doc_charsettextThe page’s character encodingNo, 'UTF-8'Trackingβœ…βŒ
doc_widthintThe page's width in pixelsNo1024Trackingβœ…βŒ
doc_heightintThe page's height in pixelsNo3000Trackingβœ…βŒ
Browser fields​
FieldTypeDescriptionReqd?ExampleSourceWebMobile
user_fingerprintintA user fingerprint generated by looking at the individual browser featuresNo2161814971Tracking (Deprecated)❌❌
br_nametextBrowser nameNo'Firefox 12'Deprecated❌❌
br_versiontextBrowser versionNo'12.0'Deprecated❌❌
br_familytextBrowser familyNo'Firefox'Deprecated❌❌
br_typetextBrowser typeNo'Browser'Deprecated❌❌
br_renderenginetextBrowser rendering engineNo'GECKO'Deprecated❌❌
br_langtextLanguage the browser is set toNo'en-GB'Trackingβœ…βŒ
br_features_pdfbooleanWhether the browser recognizes PDFsNo1Deprecatedβœ…βŒ
br_features_flashbooleanWhether Flash is installedNo1Deprecatedβœ…βŒ
br_features_javabooleanWhether Java is installedNo1Deprecatedβœ…βŒ
br_features_directorbooleanWhether Adobe Shockwave is installedNo1Deprecatedβœ…βŒ
br_features_quicktimebooleanWhether QuickTime is installedNo1Deprecatedβœ…βŒ
br_features_realplayerbooleanWhether RealPlayer is installedNo1Deprecatedβœ…βŒ
br_features_windowsmediabooleanWhether mplayer2 is installedNo1Deprecatedβœ…βŒ
br_features_gearsbooleanWhether Google Gears is installedNo1Deprecatedβœ…βŒ
br_features_silverlightbooleanWhether Microsoft Silverlight is installedNo1Deprecatedβœ…βŒ
br_cookiesbooleanWhether cookies are enabledNo1Trackingβœ…βŒ
br_colordepthintBit depth of the browser color paletteNo24Trackingβœ…βŒ
br_viewheightintViewport heightNo1000Trackingβœ…βŒ
br_viewwidthintViewport widthNo1000Trackingβœ…βŒ

For more information on this topic please check out the relevant Tracking Documentation.

Event-specific fields​

Snowplow includes specific fields to capture data associated with specific events.

Note that to date, all event types have been defined by Snowplow. Also note that event_vendor values follow the Java package naming convention.

Snowplow currently supports the following event types:

Event typeValue of event field in model
Page views'page_view'
Page pings'page_ping'
E-commerce transactions'transaction' and 'transaction_item'
Structured events'struct'
Self-describing events'unstruct' (for legacy reasons)

Details of which fields are available for which events are given below. In some cases these events will store values in the atomic.events columns, which are listed below.

Page views​

There are currently no fields that are specific to page_view events: all the fields that are required are part of the standard fields available for any web-based event e.g. page_urlscheme, page_title.

Page pings​

There are four additional fields included with page pings that indicate how a user has scrolled over a web page since the last page ping:

FieldTypeDescriptionReqd?Example
pp_xoffset_minintegerMinimum page x offset seen in the last ping periodNo10
pp_xoffset_maxintegerMaximum page x offset seen in the last ping periodNo100
pp_yoffset_minintegerMinimum page y offset seen in the last ping periodNo5
pp_yoffset_maxintegerMaximum page y offset seen in the last ping periodNo200

E-commerce transactions​

There are a large number of fields specifically for transaction events.

Fields that start tr_ relate to the transaction as a whole. Fields that start ti_ refer to the specific item included in the transaction. (E.g. a product in the basket.) Single transactions typically span multiple lines of data: there will be a single line where event = transaction, where the tr_ fields are set, and multiple lines (one for each product included) where event = transaction_item and the ti_ fields are set.

FieldTypeDescriptionReqd?Example
tr_orderidtextOrder IDYes'#134'
tr_affiliationtextTransaction affiliation (e.g. store where sale took place)No'web'
tr_totaldecimalTotal transaction valueYes12.99
tr_taxdecimalTotal tax included in transaction valueNo3.00
tr_shippingdecimalDelivery cost chargedNo0.00
tr_total_base*decimalTotal in base currencyNo12.99
tr_tax_base*decimalTotal tax in base currencyNo3.00
tr_shipping_base*decimalDelivery cost in base currencyNo0.00
tr_citytextDelivery address, cityNo'London'
tr_statetextDelivery address, stateNo'Washington'
tr_countrytextDelivery address, countryNo'France'
tr_currencytextCurrencyNo'USD'
ti_orderidtextOrder IDYes'#134'
ti_skutextProduct SKUYes'pbz00123'
ti_nametextProduct nameNo'Cone pendulum'
ti_categorytextProduct categoryNo'New Age'
ti_pricedecimalProduct unit priceYes9.99
ti_price_base*decimalPrice in base currencyNo9.99
ti_quantityintegerNumber of product in transactionYes2
ti_currencytextCurrencyNo'EUR'
base_currency*textReporting currencyNo'GBP'

* Set exclusively by the Currency conversion enrichment.

For more information on this topic please check out the relevant Tracking Documentation.

Structured events​

Structured events allow you to send your own custom data, as long as it fits in the following 5 fields:

FieldTypeDescriptionReqd?Example
se_categorytextCategory of eventYes*'ecomm', 'video'
se_actiontextAction performed / event nameYes*'add-to-basket', 'play-video'
se_labeltextThe object of the action e.g. the ID of the video played or SKU of the product added-to-basketNo'pbz00123'
se_propertytextA property associated with the object of the actionNo'HD', 'large'
se_valuedecimalA value associated with the event / action e.g. the value of goods added-to-basketNo9.99

* These fields are only required for struct events.

For more information on this topic please check out the relevant Tracking Documentation.

Self-describing events​

Self-describing events can contain their own set of fields, defined by their schema.

For each type of self-describing event, there will be a dedicated column (or table, in case of Redshift and Postgres) that holds the event-specific fields.

See querying data for more details on the structure and how to query it in different warehouses. You might also want to check how schema definitions translate to the warehouse.

For more information on this topic please check out the relevant Tracking Documentation.

Entities​

Entities (also known as contexts) provide extra information about the event, such as data describing a product or a user.

For each type of entity, there will be a dedicated column (or table, in case of Redshift and Postgres) that holds entity-specific fields. Note that an event can have any number of entities attached, including multiple entities of the same type. For this reason, the data inside the entity columns is an array.

See querying data for more details on the structure and how to query it in different warehouses. You might also want to check how schema definitions translate to the warehouse.

For more information on this topic please check out the relevant Tracking Documentation.

Out-of-the-box self-describing events and entities​

These are also a variety of self-describing events and entities defined by Snowplow. You can find their schemas here.