Skip to main content

Transformer Kafka configuration reference

The configuration reference in this page is written for Transformer Kafka 6.1.3

An example of the minimal required config for the Transformer Kafka can be found here and a more detailed one here.

License

Since version 6.0.0, RDB Loader is released under the Snowplow Limited Use License (FAQ).

To accept the terms of license and run RDB Loader, set the ACCEPT_LIMITED_USE_LICENSE=yes environment variable. Alternatively, you can configure the license.accept option, like this:

license {
accept = true
}
ParameterDescription
input.topicNameName of the Kafka topic to read from
input.bootstrapServersA list of host:port pairs to use for establishing the initial connection to the Kafka cluster
input.consumerConfOptional. Kafka consumer configuration. See https://kafka.apache.org/documentation/#consumerconfigs for all properties
output.pathAzure Blob Storage path to transformer output
output.compressionOptional. One of NONE or GZIP. The default is GZIP.
output.bad.typeOptional. Either kafka or file, default value file. Type of bad output sink. When file, failed events are written as files under URI configured in output.path.
output.bad.topicNameRequired if output type is kafka. Name of the Kafka topic that will receive the bad data.
output.bad.bootstrapServersRequired if output type is kafka. A list of host:port pairs to use for establishing the initial connection to the Kafka cluster
output.producerConfOptional. Kafka producer configuration. See https://kafka.apache.org/documentation/#producerconfigs for all properties
queue.topicNameName of the Kafka topic used to communicate with Loader
queue.bootstrapServersA list of host:port pairs to use for establishing the initial connection to the Kafka cluster
queue.producerConfOptional. Kafka producer configuration. See https://kafka.apache.org/documentation/#producerconfigs for all properties
monitoring.metrics.*Send metrics to a StatsD server or stdout.
monitoring.metrics.statsd.*Optional. For sending metrics (good and bad event counts) to a StatsD server.
monitoring.metrics.statsd.hostnameRequired if monitoring.metrics.statsd section is configured. The host name of the StatsD server.
monitoring.metrics.statsd.portRequired if monitoring.metrics.statsd section is configured. Port of the StatsD server.
monitoring.metrics.statsd.tagsOptional. Tags which are used to annotate the StatsD metric with any contextual information.
monitoring.metrics.statsd.prefixOptional. Configures the prefix of StatsD metric names. The default is snoplow.transformer.
monitoring.metrics.stdout.*Optional. For sending metrics to stdout.
monitoring.metrics.stdout.prefixOptional. Overrides the default metric prefix.
telemetry.disableOptional. Set to true to disable telemetry.
telemetry.userProvidedIdOptional. See here for more information.
monitoring.sentry.dsnOptional. For tracking runtime exceptions.
featureFlags.enableMaxRecordsPerFile (since 5.4.0) Optional, default = true. When enabled, output.maxRecordsPerFile configuration parameter is going to be used.
validations.*Optional. Criteria to validate events against
validations.minimumTimestampThis is currently the only validation criterion. It checks that all timestamps in the event are older than a specific point in time, eg 2021-11-18T11:00:00.00Z.
featureFlags.*Optional. Enable features that are still in beta, or which aim to enable smoother upgrades.
featureFlags.legacyMessageFormatThis currently the only feature flag. Setting this to true allows you to use a new version of the transformer with an older version of the loader.
featureFlags.truncateAtomicFields (since 5.4.0) Optional, default false. When enabled, event's atomic fields are truncated (based on the length limits from the atomic JSON schema) before transformation.