Skip to main content

Loading transformed data

For a high-level overview of the RDB Loader architecture, of which the loader is a part, see RDB Loader.

The loader applications are specialised to a specific storage target. Each one performs 3 key tasks:

  • Consume messages from SQS / SNS / Pubsub / Kafka to discover information about transformed data: where it is stored and what it looks like.
  • Use the information from the message to determine if any changes to the target table(s) are required, eg to add a column for a new event field. If required, submit the appropriate SQL statement for execution by the storage target.
  • Prepare and submit for execution the appropriate SQL COPY statement.

For loading into Redshift, use the Redshift loader. This loads shredded data into multiple Redshift tables.

For loading into Snowflake, use the Snowflake loader. This loads wide row JSON format data into a single Snowflake table.

For loading into Databricks, use the Databricks loader. This loads wide row Parquet format data into a single Databricks table.

note

AWS is fully supported for both Snowflake and Databricks. GCP is supported for Snowflake (since 5.0.0). Azure is supported for Snowflake (since 5.7.0).