S3 Loader
Snowplow S3 Loader consumes records from an Amazon Kinesis stream and writes them to S3. A typical Snowplow pipeline would use the S3 loader in several places:
- Load enriched events from the "enriched" stream. These serve as input for the RDB loader when loading to a warehouse.
- Load failed events from the "bad" stream.
Records that can't be successfully written to S3 are written to a second Kinesis stream with the error message.
Output format : GZIP
The records are treated as byte arrays containing UTF-8 encoded strings (whether CSV, JSON or TSV). New lines are used to separate records written to a file. This format can be used with the Snowplow Kinesis Enriched stream, among other streams.
Gzip encoding is generally used for both enriched data and bad data.
Running
Available on Terraform Registry
A Terraform module which deploys the Snowplow S3 Loader on AWS EC2 for use with Kinesis. For installing in other environments, please see the other installation options below.
Docker image
We publish two different flavours of the docker image:
-
snowplow/snowplow-s3-loader:3.0.0 -
snowplow/snowplow-s3-loader:3.0.0-distroless(lightweight alternative)
Here is a standard command to run the loader on a EC2 instance in AWS:
docker run \
-d \
--name snowplow-s3-loader \
--restart always \
--log-driver awslogs \
--log-opt awslogs-group=snowplow-s3-loader \
--log-opt awslogs-stream='ec2metadata --instance-id' \
--network host \
-v $(pwd):/snowplow/config \
-e 'JAVA_OPTS=-Xms512M -Xmx1024M -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN' \
snowplow/snowplow-s3-loader:3.0.0 \
--config /snowplow/config/config.hocon