Concepts
๐๏ธ Application overview
Architecture
๐๏ธ Scaling
Horizontal scaling
๐๏ธ Sources
Sources deal with retrieving data from the input stream, and forwarding it for processing โ once messages are either filtered or successfully sent, they are then acked (if the source technology supports acking). Otherwise, messages will be retrieved again by the source. Sources also have a setting which controls concurrency for the instance โ concurrent_writes.
๐๏ธ Transformations and Filters
Transformations allow you to modify messages' data on the fly before they're sent to the destination. There are a set of built-in transformations, specifically for use with Snowplow data (for example transforming Snowplow enriched events to JSON), You can also configure a script to transform your data however you require - for example if you need to rename fields or change a field's format.
๐๏ธ Targets
Targets check for validity and size restrictions, batch data where appropriate and send data to the detination stream.
๐๏ธ Batching model
Messages are processed in batches according to how the source provides data. The Kinesis and Pubsub sources provide data in message-by-message, data is handled in batches of 1 message. The SQS source is batched according to how the SQS queue returns messages.
๐๏ธ Failure model
Failure targets