Skip to main content

RDB shredder configuration reference

caution
You are reading documentation for an outdated version. Here’s the latest one!

Shredder and loader use different configurations starting from 2.0.0. An example config for shredder can be found here.

This is a complete list of the options that can be configured

inputRequired. S3 url the enriched archive. It must be populated separately with run=YYYY-MM-DD-hh-mm-ss directories.
output.pathRequired. S3 url of the shredded output.
output.compressionOptional. One of "NONE" or "GZIP". Default value GZIP.
output.regionOptional if it can be resolved with AWS region provider chain. AWS region of the S3 bucket.
queue.typeRequired. Type of the queue. It can be either sqs or sns.
queue.queueNameRequired if queue type is sqs. Name of the sqs queue.
queue.topicArnRequired if queue type is sns. ARN of sns topic.
queue.regionOptional if it can be resolved with AWS region provider chain. AWS region of the sqs queue or sns topic.
formats.defaultRequired, either TSV or JSON. Data format produced by default by the shredder. TSV is recommended as it enables table autocreation, but requires Iglu Server to be available with known schemas (including Snowplow schemas). JSON does not require Iglu Server, but requires Redshift JSONPaths to be configured and does not support table autocreation
formats.tsvRequired, list of iglu uri, but can be set to empty list []. If default is set to JSON these list of schemas will still be shredded into TSV
formats.jsonRequired, list of iglu uri, but can be set to empty list []. If default is set to TSV these list of schemas will still be shredded into JSON
formats.skipRequired, list of iglu uri, but can be set to empty list []. Schemas for which loading can be skipped.
monitoring.sentry.dsnOptional. For tracking runtime exceptions.