Monitoring
The loader app has several types of monitoring built in to help the pipeline operator: folder monitoring, warehouse health checks, StatsD metrics, Sentry alerts, and Snowplow tracking.
For all monitoring configuration options, see the configuration reference.
Webhook alerts
The loader can send POST
requests via HTTP webhook to a configurable URL whenever there is an issue which needs investigation by the pipeline operator. The webhook payload conforms to the alert
schema on Iglu Central.
You can configure where the webhook is sent by setting the monitoring.webhook
section in the config.hocon
file.
The webhook monitoring can be used for folder monitoring and warehouse health checks.
Folder monitoring
A webhook alert is sent whenever the loader identifies inconsistencies between the transformed output in S3 and the data in the warehouse. The algorithm is as follows:
- Check if all folders on S3 have a
shredding_complete.json
file (the legacy name is kept for backwards compatibility, but this applies to wide row format data as well). A missing file suggests the transformer failed to complete writing the transformed data, and so manual intervention is required to remove the folder from S3 and rerun. - Check if all folders on S3 created within a specific time range are listed in the warehouse manifest table. This table is maintained by the loader and contains information about loads. If a folder is missing from the manifest table, it suggests the loader has previously tried and failed to load it. Manual intervention is required to resend the
shredding_complete.json
message via SQS / SNS to trigger reloading of the folder.
Folder monitoring is configured by setting the monitoring.folders
section in the config.hocon
file.
Warehouse health check
The loader can send an alert if the warehouse does not respond to a periodic SELECT 1
statement. For each failed health check, a POST
request is sent via the webhook.
The health check is configured by setting the monitoring.healthCheck
section in the config.hocon
file.
StatsD and stdout
StatsD is a daemon that aggregates and summarizes application metrics. It receives metrics sent by the application over UDP, and then periodically flushes the aggregated metrics to a pluggable storage backend.
The loader can emit metrics to a StatsD daemon describing every batch it processes. Here is a string representation of the metrics it sends:
snowplow.rdbloader.count_good:42|c|#tag1:value1
snowplow.rdbloader.count_bad:2|c|#tag1:value1
snowplow.rdbloader.latency_collector_to_load_min:123.4|g|#tag1:value1
snowplow.rdbloader.latency_collector_to_load_max:234.5|g|#tag1:value1
snowplow.rdbloader.latency_transformer_start_to_load:66.6|g|#tag1:value1
snowplow.rdbloader.latency_transformer_end_to_load:44.4|g|#tag1:value1
These are the meanings of the individual metrics:
count_good
: the total number of good events in the batch that was loadedcount_bad
: the total number of bad events in the batch that was loaded (available since version 5.4.0)latency_collector_to_load_min
: for the most recent event in the batch, this is the time difference between reaching the collector and getting loaded to the warehouselatency_collector_to_load_min
: for the oldest event in the batch, this is the time difference between reaching the collector and getting loaded to the warehouselatency_transformer_start_to_load
: time difference between the transformer starting on this batch and the loader completing loading to the warehouselatency_transformer_end_to_load
: time difference between the transformer completing this batch and the loader completing loading it into the warehouse.
StatsD monitoring is configured by setting the monitoring.metrics.statsd
section in the config.hocon
file.
You can expose these metrics in stdout
for easier debugging by setting the monitoring.metrics.stdout
section in the config.hocon
file.
Sentry
Sentry is a popular error monitoring service, which helps developers diagnose and fix problems in an application. The loader and transformer can both send an error report to sentry whenever something unexpected happens. The reasons for the error can then be explored in the Sentry server’s UI.
Common reasons might be lost connection to the database, or an HTTP error fetching a schema from an Iglu server.
Sentry monitoring is configured by setting the monitoring.sentry.dsn
section in the config.hocon
file.
Snowplow tracking
The loader can emit a Snowplow event to a collector when the application crashes with an unexpected error. The event conforms to the load_failed
schema on Iglu Central.
Snowplow tracking is configured by setting the monitoring.snowplow
section in the config.hocon
file.