RDB loader configuration reference
caution
You are reading documentation for an outdated version. Here’s the latest one!
Shredder and loader use different configurations starting from 2.0.0. An example config for loader can be found here.
This is a complete list of the options that can be configured
region | Optional if it can be resolved with AWS region provider chain. AWS region of the S3 bucket. |
---|---|
messageQueue | Required. A SQS topic name used by the shredder and loader to communicate. |
jsonpaths | Optional. A S3 URI that holds JSONPath files. |
storage.host | Required. Host name of redshift. |
storage.port | Required. Port of redshift. |
storage.database | Required. Name of the database. |
storage.roleArn | Required. WS Role ARN allowing Redshift to load data from S3 |
storage.schema | Required. Redshift schema name, e.g. "atomic" |
storage.username | Required. DB user with permission to load data. |
storage.password | Required. Password of DB user |
storage.jdbc.blockingRows | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.disableIsValidQuery | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.dsiLogLevel | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.filterLevel | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.loginTimeout | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.logLevel | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.socketTimeout | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.ssl | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.sslMode | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.sslRootCert | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.tcpKeepAlive | Optional. Refer to the Redshift JDBC driver reference. |
storage.jdbc.tcpKeepAliveMinutes | Optional. Refer to the Redshift JDBC driver reference. |
storage.maxError | Optional. Configures the Redshift MAXERROR load option. Default value 10. |
monitoring.webhook.endpoint | Optional. An http endpoint where monitoring alerts should be sent. |
monitoring.webhook.tags | Optional. Custom key-value pairs which can be added to the monitoring webhooks. E.g. {"tag1": "label1"} |
monitoring.snowplow.appId | Optional. When using Snowplow tracking, set this appId in the event. |
monitoring.snowplow.collector | Optional. Set to a collector url to turn on snowplow tracking. |
monitoring.sentry.dsn | Optional. For tracking runtime exceptions. |
monitoring.statsd.hostname | Optional, for sending loading metrics (latency and event counts) to a statsd server. |
monitoring.statsd.port | Optional, port of the statsd server. |
monitoring.statsd.tags | E.g. { "key1": "value1", "key2": "value2" } . Tags are used to annotate the statsd metric with any contextual information. |
monitoring.statsd.prefix | Optional, default “snoplow.rdbloader”. Configures the prefix of statsd metric names. |
monitoring.folders.staging | Required if folder monitoring section included in the config. Configuration for periodic unloaded/corrupted folders checks. Path where Loader could store auxiliary logs. Loader should be able to write here, Redshift should be able to load from here |
monitoring.folders.period | Required if folder monitoring section included in the config. How often to check for unloaded/corrupted folders. |
monitoring.folders.since | Required if folder monitoring section included in the config. Specifies until when folder monitoring will monitor. |
monitoring.folders.until | Required if folder monitoring section included in the config. Specifies from when folder monitoring will start to monitor. |
monitoring.folders.shredderOutput | Required if folder monitoring section included in the config. Path to shredded archive. |
monitoring.healthCheck.frequencyadded in 2.1.0 | Optional. How often to run a periodic DB health check, which raises a warning if DB does not respond to a SELECT 1 |
monitoring.healthCheck.timeoutadded in 2.1.0 | Optional. How long to wait for a health check response. |
retryQueue.periodadded in 2.1.0 | Optional. Configures a backlog of recently failed folders that could be automatically retried. period is how often a batch of failed folders should be pulled into a discovery queue. |
retryQueue.sizeadded in 2.1.0 | Required if retryQueue section is included. How many failures should be kept in memory. After the limit is reached, new failures are dropped. |
retryQueue.maxAttemptsadded in 2.1.0 | Required if retryQueue section is included. How many attempts to make for each folder. After the limit is reached new failures are dropped. |
retryQueue.intervaladded in 2.1.0 | Required if retryQueue section is included. Artificial pause after each failed folder before being added to the retry queue. |