Skip to main content

Postgres Loader

With Snowplow Postgres Loader you can load enriched data or failed events into PostgreSQL database.

Production use

The Postgres loader is not recommended for production use, especially with large data volumes. We recommend using a fully-fledged data warehouse like Databricks, Snowflake, BigQuery or Redshift, together with a respective loader.

Schemas in Postgres

For more information on how events are stored in Postgres, check the mapping between Snowplow schemas and the corresponding Postgres column types.

Available on Terraform Registry

A Terraform module which deploys the Snowplow Postgres Loader on AWS EC2 for use with Kinesis. For installing in other environments, please see the other installation options below.

Getting a Docker image

Snowplow Postgres Loader is published on DockerHub:

docker pull snowplow/snowplow-postgres-loader:0.3.3

It accepts very typical configuration for Snowplow Loader:

docker run --rm \
-v $PWD/config:/snowplow/config \
snowplow/snowplow-postgres-loader:0.3.3 \
--resolver /snowplow/config/resolver.json \
--config /snowplow/config/config.hocon

Iglu

Where resolver.json is a typical Iglu Client configuration.

Please pay attention that schemas for all self-describing JSONs flowing through Postgres Loader must be hosted on Iglu Server 0.6.0 or above.
Iglu Central is static registry and if you use Snowplow-authored schemas - you need to upload all schemas from there as well.

Configuration

The configuration file is in HOCON format, and it specifies connection details for the target database and the input stream of events.

{
"input": {
"type": "Kinesis"
"streamName": "enriched-events"
"region": "eu-central-1"
}
"output" : {
"good": {
"type": "Postgres"
"host": "localhost"
"database": "snowplow"
"username": "postgres"
"password": ${POSTGRES_PASSWORD}
"schema": "atomic"
}
}
}

The input section can alternatively specify a GCP PubSub subscription, instead of a kinesis stream like in the example above.

  "input": {
"type": "PubSub"
"projectId": "my-project"
"subscriptionId": "my-subscription"
}

See the configuration reference for a complete description of all parameters.

Other

Loader creates events table on the start and every other table when it first encounters its corresponding schema.

You should ensure that the database and schema specified in the configuration exist before starting the loader.