Redshift
The Redshift integration is available for Snowplow pipelines running on AWS only.
The Snowplow Redshift integration allows you to load enriched event data directly into your Redshift cluster (including Redshift serverless) for analytics, data modeling, and more.
What you will need
Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process.
The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration.
Keep in mind that you will need to be able to:
- Provide your Redshift cluster endpoint and connection details
- Allow-list Snowplow IP addresses
- Specify the desired database and schema names
- Create a user and a role with the following permissions:
- Schema ownership (
CREATE SCHEMA ... AUTHORIZATION
) SELECT
on system tables (svv_table_info
,svv_interleaved_columns
,stv_interleaved_counts
) — this is required for maintenance jobs
- Schema ownership (
Getting started
You can add a Redshift destination through the Snowplow Console. (For self-hosted customers, please refer to the Loader API reference instead.)
Step 1: Create a connection
- In Console, navigate to Destinations > Connections
- Select Set up connection
- Choose Loader connection, then Redshift
- Follow the steps to provide all the necessary values
- Click Complete setup to create the connection
Step 2: Create a loader
- In Console, navigate to Destinations > Destination list. Switch to the Available tab and select Redshift
- Select a pipeline: choose the pipeline where you want to deploy the loader.
- Select your connection: choose the connection you configured in step 1.
- Click Continue to deploy the loader
You can review active destinations and loaders by navigating to Destinations > Destination list.
How loading works
The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields.
For more details on the loading flow, see the RDB Loader reference page, where you will find additional information and diagrams.
Snowplow data format in Redshift
The event data is split across multiple tables.
The main table (events
) contains the atomic fields, such as app_id
, user_id
and so on:
app_id | collector_tstamp | ... | event_id | ... | user_id | ... |
---|---|---|---|---|---|---|
website | 2025-05-06 12:30:05.123 | ... | c6ef3124-b53a-4b13-a233-0088f79dcbcb | ... | c94f860b-1266-4dad-ae57-3a36a414a521 | ... |
Snowplow data also includes customizable self-describing events and entities. These use schemas to define which fields should be present, and of what type (e.g. string, number).
For each type of self-describing event and entity, there are additional tables that can be joined with the main table:
unstruct_event_com_acme_button_press_1
root_id | root_tstamp | button_name | button_color | ... |
---|---|---|---|---|
c6ef3124-b53a-4b13-a233-0088f79dcbcb | 2025-05-06 12:30:05.123 | Cancel | red | ... |
contexts_com_acme_product_1
root_id | root_tstamp | name | price | ... |
---|---|---|---|---|
c6ef3124-b53a-4b13-a233-0088f79dcbcb | 2025-05-06 12:30:05.123 | Salt | 2.60 | ... |
c6ef3124-b53a-4b13-a233-0088f79dcbcb | 2025-05-06 12:30:05.123 | Pepper | 3.10 | ... |
Note:
- "unstruct[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively
- the
_1
suffix represents the major version of the schema (e.g.1-x-y
)
You can learn more in the API reference section.
Check this guide on querying Snowplow data.