Skip to main content

Iceberg

Cloud availability

The Iceberg integration is available for Snowplow pipelines running on AWS only.

Apache Iceberg is an open table format for data lake architectures. The Snowplow Iceberg integration allows you to load enriched event data (as well as failed events) into Iceberg tables in your data lake for analytics, data modeling, and more.

Iceberg data can be consumed using various tools and products, for example:

  • Amazon Athena
  • Amazon Redshift Spectrum
  • Apache Spark or Amazon EMR
  • Snowflake
  • ClickHouse

We currently only support the Glue Iceberg catalog.

What you will need

Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process.

tip

The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration.

Keep in mind that you will need to be able to:

  • Specify your AWS account ID
  • Provide an S3 bucket and an AWS Glue database
  • Create an IAM role with the following permissions:
    • For the S3 bucket:
      • s3:ListBucket
      • s3:GetObject
      • s3:PutObject
      • s3:DeleteObject
    • For the Glue database:
      • glue:CreateTable
      • glue:GetTable
      • glue:UpdateTable
  • Schedule a regular job to optimize the lake

Getting started

You can add an Iceberg destination through the Snowplow Console. (For self-hosted customers, please refer to the Loader API reference instead.)

Step 1: Create a connection

  1. In Console, navigate to Destinations > Connections
  2. Select Set up connection
  3. Choose Loader connection, then Iceberg
  4. Follow the steps to provide all the necessary values
  5. Click Complete setup to create the connection

Step 2: Create a loader

  1. In Console, navigate to Destinations > Destination list. Switch to the Available tab and select Iceberg
  2. Select a pipeline: choose the pipeline where you want to deploy the loader.
  3. Select your connection: choose the connection you configured in step 1.
  4. Select the type of events: enriched events or failed events
  5. Click Continue to deploy the loader

You can review active destinations and loaders by navigating to Destinations > Destination list.

We recommend scheduling regular lake maintenance jobs to ensure the best long-term performance.

How loading works

The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields.

For more details on the loading flow, see the Lake Loader reference page, where you will find additional information and diagrams.

Snowplow data format in Iceberg

All events are loaded into a single table (events).

There are dedicated columns for atomic fields, such as app_id, user_id and so on:

app_idcollector_tstamp...event_id...user_id...
website2025-05-06 12:30:05.123...c6ef3124-b53a-4b13-a233-0088f79dcbcb...c94f860b-1266-4dad-ae57-3a36a414a521...

Snowplow data also includes customizable self-describing events and entities. These use schemas to define which fields should be present, and of what type (e.g. string, number).

For self-describing events and entities, there are additional columns, like so:

app_id...unstruct_event_com_acme_button_press_1contexts_com_acme_product_1
website...data for your custom button_press event (as STRUCT)data for your custom product entities (as ARRAY of STRUCT)

Note:

  • "unstruct[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively
  • the _1 suffix represents the major version of the schema (e.g. 1-x-y)

You can learn more in the API reference section.

tip

Check this guide on querying Snowplow data. (You will need a query engine such as Spark SQL or Snowflake to query Iceberg tables.)

On this page

Want to see a custom demo?

Our technical experts are here to help.