Skip to main content

Set up the stream collector

Available on Terraform Registry​

A Terraform module is available which deploys the stream collector on a AWS EC2 without the need for this manual setup.

Run the collector​

The stream collector is on docker hub with several different flavours. Pull the image that matches the sink you are using:

docker pull snowplow/scala-stream-collector-kinesis:3.1.0
docker pull snowplow/scala-stream-collector-pubsub:3.1.0
docker pull snowplow/scala-stream-collector-kafka:3.1.0
docker pull snowplow/scala-stream-collector-rabbitmq-experimental:3.1.0
docker pull snowplow/scala-stream-collector-nsq:3.1.0
docker pull snowplow/scala-stream-collector-sqs:3.1.0
docker pull snowplow/scala-stream-collector-stdout:3.1.0

The application is configured by passing a hocon file on the command line:

docker run --rm \
-v $PWD/config.hocon:/snowplow/config.hocon \
-p 8080:8080 \
snowplow/scala-stream-collector-${flavour}:3.1.0 --config /snowplow/config.hocon

Alternatively, you can download and run a jar file from the github release.

java -jar scala-stream-collector-kinesis-3.1.0.jar --config /path/to/config.hocon
Telemetry notice

By default, Snowplow collects telemetry data for Collector (since version 2.4.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!).

This data is anonymous and minimal, and since our code is open source, you can inspect what’s collected.

If you wish to help us further, you can optionally provide your email (or just a UUID) in the collector.telemetry.userProvidedId configuration setting.

If you wish to disable telemetry, you can do so by setting collector.telemetry.disable to true.

See our telemetry principles for more information.

Health check​

Pinging the collector on the /health path should return a 200 OK response:

curl http://localhost:8080/health
Was this page helpful?