Transformer Pubsub configuration reference
The configuration reference in this page is written for Transformer Pubsub 6.1.3
An example of the minimal required config for the Transformer Pubsub can be found here and a more detailed one here.
License
Since version 6.0.0, RDB Loader is released under the Snowplow Limited Use License (FAQ).
To accept the terms of license and run RDB Loader, set the ACCEPT_LIMITED_USE_LICENSE=yes
environment variable. Alternatively, you can configure the license.accept
option, like this:
license {
accept = true
}
Parameter | Description |
---|---|
input.subscription | Name of the Pubsub subscription with the enriched events |
input.parallelPullCount | Optional. Default value 1. Number of threads used internally by permutive library to handle incoming messages. These threads do very little "work" apart from writing the message to a concurrent Queue. |
input.bufferSize | Optional. Default value 500. The max size of the buffer queue used between fs2-pubsub and java-pubsub libraries. |
input.maxAckExtensionPeriod | Optional. Default value '1 hour'. The maximum period a message ack deadline will be extended. |
output.path | Required. GCS URI of the transformed output. It needs to have gs:// URI scheme |
output.compression | Optional. One of NONE or GZIP . The default is GZIP . |
output.bufferSize | Optional. Default value 4096. During the window period, processed items are stored in a buffer. This value determines the size of this buffer. When its limit is reached, buffer content is flushed to blob storage. |
output.maxRecordsPerFile (since 5.4.0) | Optional. Default = 10000. Max number of events per parquet partition. |
output.bad.type (since 5.4.0) | Optional. Either pubsub or file , default value file . Type of bad output sink. When file , failed events are written as files under URI configured in output.path . |
output.bad.topic (since 5.4.0) | Required if output type is pubsub . Name of the PubSub topic that will receive the bad data. |
output.bad.batchSize (since 5.4.0) | Optional. Default = 1000, max = 1000. Maximum number of messages sent to PubSub within a batch. When the buffer reaches this number of messages they are sent. |
output.bad.requestByteThreshold (since 5.4.0) | Optional. Default = 8000000, max = 10MB. Maximum number of bytes sent to PubSub within a batch. When the buffer reaches this size messages are sent. |
output.bad.delayThreshold (since 5.4.0) | Optional. Default = 200 milliseconds. Delay threshold to use for PubSub batching. After this amount of time has elapsed, before batchSize and requestByteThreshold have been reached, messages from the buffer will be sent. |
queue.topic | Name of the Pubsub topic used to communicate with Loader |
formats.fileFormat | Optional. The default option at the moment is JSON . Either JSON or PARQUET . |
windowing | Optional. Frequency to emit shredding complete message. The default is 5 minutes . Note that there is a problem with acking messages when window period is greater than 10 minute in transformer-pubsub. Therefore, it is advisable to make window period equal or less than 10 minutes. |
monitoring.metrics.* | Send metrics to a StatsD server or stdout. |
monitoring.metrics.statsd.* | Optional. For sending metrics (good and bad event counts) to a StatsD server. |
monitoring.metrics.statsd.hostname | Required if monitoring.metrics.statsd section is configured. The host name of the StatsD server. |
monitoring.metrics.statsd.port | Required if monitoring.metrics.statsd section is configured. Port of the StatsD server. |
monitoring.metrics.statsd.tags | Optional. Tags which are used to annotate the StatsD metric with any contextual information. |
monitoring.metrics.statsd.prefix | Optional. Configures the prefix of StatsD metric names. The default is snoplow.transformer . |
monitoring.metrics.stdout.* | Optional. For sending metrics to stdout. |
monitoring.metrics.stdout.prefix | Optional. Overrides the default metric prefix. |
telemetry.disable | Optional. Set to true to disable telemetry. |
telemetry.userProvidedId | Optional. See here for more information. |
monitoring.sentry.dsn | Optional. For tracking runtime exceptions. |
featureFlags.enableMaxRecordsPerFile (since 5.4.0) | Optional, default = true. When enabled, output.maxRecordsPerFile configuration parameter is going to be used. |
validations.* | Optional. Criteria to validate events against |
validations.minimumTimestamp | This is currently the only validation criterion. It checks that all timestamps in the event are older than a specific point in time, eg 2021-11-18T11:00:00.00Z . |
featureFlags.* | Optional. Enable features that are still in beta, or which aim to enable smoother upgrades. |
featureFlags.legacyMessageFormat | This currently the only feature flag. Setting this to true allows you to use a new version of the transformer with an older version of the loader. |
featureFlags.truncateAtomicFields (since 5.4.0) | Optional, default false . When enabled, event's atomic fields are truncated (based on the length limits from the atomic JSON schema) before transformation. |