The loader app has several types of monitoring built in to help the pipeline operator: folder monitoring, warehouse health checks, StatsD metrics, Sentry alerts, and Snowplow tracking.
For all monitoring configuration options, see the configuration reference.
The loader can send
POST requests via HTTP webhook to a configurable URL whenever there is an issue which needs investigation by the pipeline operator. The webhook payload conforms to the
alert schema on Iglu Central.
You can configure where the webhook is sent by setting the
monitoring.webhook section in the
The webhook monitoring can be used for folder monitoring and warehouse health checks.
A webhook alert is sent whenever the loader identifies inconsistencies between the transformed output in S3 and the data in the warehouse. The algorithm is as follows:
- Check if all folders on S3 have a
shredding_complete.jsonfile (the legacy name is kept for backwards compatibility, but this applies to wide row format data as well). A missing file suggests the transformer failed to complete writing the transformed data, and so manual intervention is required to remove the folder from S3 and rerun.
- Check if all folders on S3 created within a specific time range are listed in the warehouse manifest table. This table is maintained by the loader and contains information about loads. If a folder is missing from the manifest table, it suggests the loader has previously tried and failed to load it. Manual intervention is required to resend the
shredding_complete.jsonmessage via SQS / SNS to trigger reloading of the folder.
Folder monitoring is configured by setting the
monitoring.folders section in the
Warehouse health check
The loader can send an alert if the warehouse does not respond to a periodic
SELECT 1 statement. For each failed health check, a
POST request is sent via the webhook.
The health check is configured by setting the
monitoring.healthCheck section in the
StatsD and stdout
StatsD is a daemon that aggregates and summarizes application metrics. It receives metrics sent by the application over UDP, and then periodically flushes the aggregated metrics to a pluggable storage backend.
The loader can emit metrics to a StatsD daemon describing every batch it processes. Here is a string representation of the metrics it sends:
These are the meanings of the individual metrics:
count_good: the total number of good events in the batch that was loaded
count_bad: the total number of bad events in the batch that was loaded (available since version 5.4.0)
latency_collector_to_load_min: for the most recent event in the batch, this is the time difference between reaching the collector and getting loaded to the warehouse
latency_collector_to_load_min: for the oldest event in the batch, this is the time difference between reaching the collector and getting loaded to the warehouse
latency_transformer_start_to_load: time difference between the transformer starting on this batch and the loader completing loading to the warehouse
latency_transformer_end_to_load: time difference between the transformer completing this batch and the loader completing loading it into the warehouse.
StatsD monitoring is configured by setting the
monitoring.metrics.statsd section in the
You can expose these metrics in
stdout for easier debugging by setting the
monitoring.metrics.stdout section in the
Sentry is a popular error monitoring service, which helps developers diagnose and fix problems in an application. The loader and transformer can both send an error report to sentry whenever something unexpected happens. The reasons for the error can then be explored in the Sentry server’s UI.
Common reasons might be lost connection to the database, or an HTTP error fetching a schema from an Iglu server.
Sentry monitoring is configured by setting the
monitoring.sentry.dsn section in the
The loader can emit a Snowplow event to a collector when the application crashes with an unexpected error. The event conforms to the
load_failed schema on Iglu Central.
Snowplow tracking is configured by setting the
monitoring.snowplow section in the