Skip to main content

User and session identification

User identifiers​

Tracker (in-browser/on-device) generated user identifier​

This user identifier is generated by the tracker. It takes the form of a UUID. It is persisted either in the browser cookies or in the app user settings storage (e.g., UserDefaults on iOS).

It is the default identifier used in our dbt packages to identify users.

In the tracked events, it can appear either:

  1. As the domain_userid parameter in the atomic event properties.
    • This is currently the default behavior on Web apps.
    • It is not supported in mobile apps.
    • This is referred to as the default user_identifier field in the dbt-snowplow-unified package
  2. As the userId property in the client_session context entity (see below).
    • This is an optional configuration in the JavaScript tracker for Web apps.
    • It is the default behavior on mobile apps.
    • This is referred to as the default user_identifier field in the dbt-snowplow-unified package (used to be referred to as device_user_id in our legacy dbt-snowplow-mobile package).
Coalesced into the user_identifier in our unified dbt package

A good practice is to coalesce the domain_userid and the userId property in the client session context entity to make sure that the value can come from either place. This is also what our unified dbt packages does – it provides the information under a single user_identifier field.

In case you want to change the default tracking behavior, refer to the following documentation:

note

In mobile apps, there are additional on-device identifiers provided by the platform – advertising ID (IDFA) and vendor ID (IDFV, app set ID). These can be tracked in the mobile context entity.

Server (Collector) generated user identifier​

The Snowplow Collector generates a user identifier that is stored in cookies for the Collector domain. This is the network_userid property in the atomic events table and it has the format of a UUID.

The identifier is available both in Web and mobile apps. However, in Android apps, it is stored in memory so it is reset after the app restarts.

In most scenarios, this identifier may have a longer lifetime than the tracker generated identifier. However, browsers can restrict it's lifetime for different reasons, such as when the Snowplow Collector is on a third-party domain from the website (not recommended), or due to the ITP restrictions in Safari (Snowplow provides a solution to mitigate this problem – the ID service).

info

network_userid is captured via a cookie set by the Snowplow Collector. It can be overriden by setting tnuid on a Tracker request payload but is typically expected to be populated by the Collector cookies.

Business user identifier​

This is an external identifier given in the tracker by the app. Most commonly it is the username or e-mail address of the logged in user (e.g., jon.doe@email.com).

Referred to as the user_id in the atomic events and our dbt packages

The business user identifier is provided under the user_id field in the atomic events as well as our dbt packages.

All our trackers have an API to set this identifier. You can find it in the JavaScript tracker docs here and the mobile trackers docs here.

This identifier can be very useful to stitch the generated tracker identifiers together in order to identify the same user across multiple browsers or devices. See below for more information on user stitching.

IP address​

Although it's not a very reliable identifier, the events also contain the IP address of the user which can be used to identify them to some extent. The value is stored in the user_ipaddress property in the atomic events table.

Session identifiers​

In Snowplow events, sessions are tracked on the client-side – by the tracker. The session information is then attached to the events when they are tracked.

The session information consists of two main pieces of information (when using the client session entity discussed below, there is also extra information tracked with sessions):

  1. Session identifier. UUID identifier generated by the tracker at the start of the session.
  2. Session index. Index of the session for the current user (per the tracker generated user identifier).

Sessions have a configurable inactivity timeout (which defaults to 30 minutes). After no user activity for the set timeout, a new session is started. On the Web, activity is detected as any user interaction with the page (mouse movement, clicks, ...). In mobile apps, activity is detected using the tracked events.

Identity stitching in our dbt packages​

Identity stitching is the process of taking various user identifiers and combining them into a single user identifier, to better identify and track users throughout their journey on your site/app. It effectively allows you to attribute logged-in and non-logged-in sessions and page views back to a single user. Our dbt packages support identity stitching as explained here.

In order for identity stitching to be reliable, it is necessary to follow two recommendations in tracking:

  1. Set the business user identifier.
  2. Reset generated identifiers after the user logs out.

Setting the business user identifier​

The stitching functionality in our dbt packages makes use of the business user identifier (see above) to stitch the logged in sessions and page views to non-logged in sessions. It stitches the business user ID with tracker generated user identifiers to identify sessions and page views from the same user.

When the business user ID is present in an event, identity stitching will attribute all events with the same tracker generated user ID with the same stitched user ID. This means that all previous sessions on the same browser or app will have the same stitched user ID as the one with the assigned business user ID.

The business user ID is therefore a requirement for identity stitching.

Reset generated identifiers after the user logs out​

In settings where multiple users share the same browser, identity stitching may wrongly attribute sessions coming from the same browser. Because the tracker generated user identifier will be consistent for all sessions from the browser, the stitched user identifier will also be consistent regardless of whether one or more users log in.

In case you want to make sure that after a user logs out, the following events won't be attributed to the same user ID, you can reset the user and session identifiers in the tracker. To do this on the JavaScript tracker, make use of the function to clear user data (such a feature is not yet available on the mobile trackers). Clearing the user data on the tracker will result in new tracker generated user and session identifiers. It will not clear the server generated network_userid user identifier.

Where to find the identifiers in events?​

The identifiers are either added in the atomic event properties or as a client session context entity.

Client session context entity​

The client session context entity is attached on mobile trackers by default and optionally on the JavaScript tracker. It contains the tracker generated user and session identifiers.

Event: client_session

Schema for a client-generated user session

Schema URI:iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-0-2

WebMobileTracked automatically
βœ…βœ…βœ…
πŸ‘€ Example
{
"sessionIndex": 7,
"sessionId": "bca0fa0e-853c-41cf-9cc4-15048f6f0ff5",
"previousSessionId": "fa008142-c427-4289-8424-6fb2b6576692",
"userId": "7a62ec9d-2aa0-4426-b014-eba2d0dcfebb",
"firstEventId": "1548BE58-4CE7-4A32-A5E8-2696ECE941F4",
"eventIndex": 66,
"storageMechanism": "SQLITE",
"firstEventTimestamp": "2022-01-01T00:00:00Z"
}
πŸ“ƒ Schema properties definition
PropertyTypeDescriptionRequired?
userId"string"An identifier for the user of the session. It is set on app install.βœ…
sessionId"string"An identifier for the sessionβœ…
sessionIndex"integer"The index of the current session for this userβœ…
eventIndex["null","integer"]Optional index of the current event in the session❌
previousSessionId["null","string"]The previous session identifier for this userβœ…
storageMechanismOne of: SQLITE, COOKIE_1, COOKIE_3, LOCAL_STORAGE, FLASH_LSOThe mechanism that the session information has been stored on the deviceβœ…
firstEventId["null","string"]The optional identifier of the first event for this session❌
firstEventTimestamp["null","string"]Optional date-time timestamp of when the first event in the session was tracked❌
❓ How to query the event in the warehouse?
select
unstruct_event_com_snowplowanalytics_snowplow_client_session_1_0_2
from
PIPELINE_NAME.events events
where
events.collector_tstamp > timestamp_sub(current_timestamp(), interval 1 hour)
and events.event = 'unstruct'
and events.event_name = 'client_session'
and events.event_vendor = 'com.snowplowanalytics.snowplow'

Properties in the atomic events​

See the table on this page to get an overview of the user and session related fields in the atomic events table.

Anonymisation​

By default, Snowplow captures identifiers with all events that can be considered personal identifiable information (PII) β€” user and session IDs as well as the IP address. However, Snowplow also allows you to track data without these identifiers. It provides two ways for limiting the amount of PII you capture and store:

  • Not collecting the PII in the first place using the anonymous tracking feature on our trackers.
  • Pseudonymizing the PII information during enrichment.

Anonymous tracking​

Anonymous tracking is a feature on our JavaScript and mobile trackers that lets you disable collecting PII information. In particular, it can disable the collection of the following identifiers:

  • tracker generated user identifier,
  • tracker generated session identifier,
  • server generated user,
  • the business user identifier,
  • the user’s IP address.

There are several levels of anonymous tracking that our trackers provide:

  1. Fully anonymous tracking that prevents collecting both client and server identifiers.
  2. Client-only anonymisation – in this setting, the server assigned identifiers (network user ID and IP address) are still collected, but no other identifiers are collected on the tracker.
  3. Anonymisation with session tracking – this prevents collecting all user identifiers (optionally server assigned ones can be enabled), but still tracks sessions and adds session information to events.

How to track?​

PII pseudonymization using enrichment​

PII (personally identifiable information) pseudonymization enrichment runs after all the other enrichments and pseudonymizes the fields that are configured as PIIs.

It enables the users of Snowplow to better protect the privacy rights of data subjects, therefore aiding in compliance for regulatory measures.

Full details of this enrichment are available here.