Load Snowplow data to BigQuery

Cloud availability

The BigQuery integration is available for Snowplow pipelines running on AWS and GCP.

The Snowplow BigQuery integration allows you to load enriched event data (as well as failed events) directly into your BigQuery datasets for analytics, data modeling, and more.

What you will need

Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process.

tip

The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration.

Keep in mind that you will need to be able to:

Provide your Google Cloud Project ID and region
Allow-list Snowplow IP addresses
Specify the desired dataset name
Create a service account with the roles/bigquery.dataEditor permission (more permissions will be required for loading failed events and setting up Data Quality Dashboard)

Getting started

You can add a BigQuery destination through the Snowplow Console. (For self-hosted customers, please refer to the Loader API reference instead.)

Step 1: Create a connection

In Console, navigate to Destinations > Connections
Select Set up connection
Choose Loader connection, then BigQuery
Follow the steps to provide all the necessary values
Click Complete setup to create the connection

Step 2: Create a loader

In Console, navigate to Destinations > Destination list. Switch to the Available tab and select BigQuery
Select a pipeline: choose the pipeline where you want to deploy the loader.
Select your connection: choose the connection you configured in step 1.
Select the type of events: enriched events or failed events
Click Continue to deploy the loader

You can review active destinations and loaders by navigating to Destinations > Destination list.

How loading works

The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields.

tip

For more details on the loading flow, see the BigQuery Loader reference page, where you will find additional information and diagrams.

Snowplow data format in BigQuery

All events are loaded into a single table (events).

There are dedicated columns for atomic fields, such as app_id, user_id and so on:

app_id	collector_tstamp	...	event_id	...	user_id	...
website	2025-05-06 12:30:05.123	...	c6ef3124-b53a-4b13-a233-0088f79dcbcb	...	c94f860b-1266-4dad-ae57-3a36a414a521	...

Snowplow data also includes customizable self-describing events and entities. These use schemas to define which fields should be present, and of what type (e.g. string, number).

For self-describing events and entities, there are additional columns, like so:

app_id	...	unstruct_event_com_acme_button_press_1	contexts_com_acme_product_1
website	...	data for your custom `button_press` event (as `RECORD`)	data for your custom `product` entities (as `REPEATED RECORD`)

Note:

"unstruct[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively
the _1 suffix represents the major version of the schema (e.g. 1-x-y)

You can learn more in the API reference section.

tip

Check this guide on querying Snowplow data.

What you will need​

Getting started​

Step 1: Create a connection​

Step 2: Create a loader​

How loading works​

Snowplow data format in BigQuery​

Want to see a custom demo?