Set up Collector

A Terraform module is available which deploys the collector on a AWS EC2 without the need for this manual setup.

Run the collector#

The collector is on docker hub with several different flavours. Pull the image that matches the sink you are using:

bash
docker pull snowplow/scala-stream-collector-kinesis:3.7.0
docker pull snowplow/scala-stream-collector-pubsub:3.7.0
docker pull snowplow/scala-stream-collector-kafka:3.7.0
docker pull snowplow/scala-stream-collector-rabbitmq-experimental:3.7.0
docker pull snowplow/scala-stream-collector-nsq:3.7.0
docker pull snowplow/scala-stream-collector-sqs:3.7.0
docker pull snowplow/scala-stream-collector-stdout:3.7.0

The application is configured by passing a hocon file on the command line:

bash
docker run --rm \
-v $PWD/config.hocon:/snowplow/config.hocon \
-p 8080:8080 \
snowplow/scala-stream-collector-${flavour}:3.7.0 --config /snowplow/config.hocon

Alternatively, you can download and run a jar file from the github release.

bash
java -jar scala-stream-collector-kinesis-3.7.0.jar --config /path/to/config.hocon

Telemetry notice

By default, Snowplow collects telemetry data for Collector (since version 2.4.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!).

This data is anonymous and minimal, and since our code is open source, you can inspect what’s collected.

If you wish to help us further, you can optionally provide your email (or just a UUID) in the collector.telemetry.userProvidedId configuration setting.

If you wish to disable telemetry, you can do so by setting collector.telemetry.disable to true.

See our telemetry principles for more information.

Health check#

Pinging the collector on the /health path should return a 200 OK response:

bash
curl http://localhost:8080/health

Run the collector#

Health check#

Want to see a custom demo?