Introduction to Snowplow Collector

The collector receives raw Snowplow events sent over HTTP by trackers or webhooks. It serializes them, and then writes them to a sink. Currently supported sinks are:

Amazon Kinesis
Google PubSub
Apache Kafka
NSQ
Amazon SQS
stdout for a custom stream collection process

The collector supports cross-domain Snowplow deployments, setting a user_id (used to identify unique visitors) server side to reliably identify the same user across domains.

How it works

User identification

The collector allows the use of a third-party cookie, making user tracking across domains possible.

In a nutshell: the collector receives events from a tracker, sets/updates a third-party user tracking cookie, and returns the pixel to the client. The ID in this third-party user tracking cookie is stored in the network_userid field in Snowplow events.

In pseudocode terms:

if (request contains an "sp" cookie) {
    Record that cookie as the user identifier
    Set that cookie with a now+1 year cookie expiry
    Add the headers and payload to the output array
} else {
    Set the "sp" cookie with a now+1 year cookie expiry
    Add the headers and payload to the output array
}

Technical architecture

The collector is written in scala and built on top of http4s.

GitHub repository

How it works​

User identification​

Technical architecture​

Want to see a custom demo?

How it works

User identification

Technical architecture