Google Analytics to Snowplow

This guide is to help technical implementers migrate from Google Analytics to Snowplow.

Platform differences

There are significant differences between Google Analytics and Snowplow as data platforms. For migration, it's important to understand how Snowplow structures events differently from GA4. This affects both your data layer implementation and how you'll model the warehouse data.

This table shows some key differences:

Feature	Google Analytics 4	Snowplow
Capturing user behavior	Uses `gtag('event', ...)` with event names, or `dataLayer.push()` with GTM triggers and variables	Direct tracker SDK calls for built-in or custom event types
Contextual event data	Included in the event `parameters` object	Included in the event as reusable entities
Warehouse tables	BigQuery only; daily partitioned `events_YYYYMMDD` table with nested fields	In most warehouses one big table with columns for every event or entity; in Redshift, one table per custom event or entity
Data validation	None	All events and entities defined by JSON schemas

Event structure

The main GA4 tracking method is gtag('event', ...) with event names and parameter objects. The event name describes the action taken, and the parameters provide contextual information about that action.

You might have implemented these calls directly in your codebase. More likely, you're using Google Tag Manager (GTM) and tracking behavior via the data layer. When using GTM, your application pushes structured data to the dataLayer array. GTM then processes this data using triggers and variables to fire the appropriate gtag() calls.

Here's an example GA4 web ecommerce event using the data layer. The application pushes a purchase event with an ecommerce object containing transaction details:

javascript
dataLayer.push({
  event: 'purchase',
  ecommerce: {
    transaction_id: '12345',
    value: 25.42,
    currency: 'USD',
    items: [{
      item_id: 'SKU_123',
      item_name: 'Product Name',
      price: 25.42,
      quantity: 1
    }]
  }
});

GTM processes this data layer push and fires the corresponding GA4 event. Alternatively, you might have implemented the event directly using gtag() calls, like this:

javascript
gtag('event', 'purchase', {
  transaction_id: '12345',
  value: 25.42,
  currency: 'USD',
  items: [{
    item_id: 'SKU_123',
    item_name: 'Product Name',
    price: 25.42,
    quantity: 1
  }]
});

To directly track the same behavior with Snowplow, you could use the web ecommerce trackTransaction event. The tracking code looks similar:

javascript
snowplow('trackTransaction', {
  transaction_id: 'T_12345',
  revenue: 25.42,
  currency: 'USD',
  products: [{
    id: 'SKU_123',
    name: 'Product Name',
    price: 25.42,
    quantity: 1
  }]
});

Tracking comparison

This table shows how some data layer GA4 tracking maps to Snowplow web tracking:

GA4 Event	GA4 Data Layer Example	Snowplow Implementation
`page_view`	Sent by default by GTM, or `dataLayer.push({event: 'page_view'})`.	Use `trackPageView`
`purchase`	`dataLayer.push({event: 'purchase', ecommerce: {transaction_id: '123', value: 99.99}})`	Use built-in ecommerce tracking, or define a custom `purchase` schema containing `transaction_id` and `value` properties, and track `trackSelfDescribingEvent`
Custom events	Define custom GTM configuration then call e.g. `dataLayer.push({event: 'read_article', article_title: 'Interesting text'})`	Define a custom `read_article` schema containing `article_title` property and track with `trackSelfDescribingEvent`
Send user IDs	`dataLayer.push({user_id: 'USER_123'})`, configured as user property in GTM	Use `setUserId('USER_123')` to set the user ID
Page metadata	`dataLayer.push({page_type: 'product', category: 'electronics'})`, configured as custom dimensions in GTM	Define custom entities for page context and attach to relevant events

Warehouse loading and data modeling

The key differences between GA4 and Snowplow events are in the warehouse loading and data modeling.

With GA, the example purchase event would be loaded into an events_YYYYMMDD table in BigQuery, with nested RECORD fields for parameters and user properties. Each parameter exists as key-value pairs with separate columns for different data types: string_value, int_value, float_value, or double_value. Analysts must use complex UNNEST operations to extract parameters and join data.

With Snowplow, the example event would be processed as a transaction event with one product entity. The Snowplow tracker SDKs and pipeline add additional contextual entities to each event, for example information about the specific page or screen view, the user's session, or the browser.

In BigQuery, Databricks, or Snowflake, the event will be loaded into the single atomic.events table, with a column for the transaction event and a column for each entity. In Redshift, each event and entity is loaded into its own table.

You can add the same product entity to any other relevant event, such as add_to_cart or view_item. This simplifies analysis of product lifecycles, from initial product view to purchase.

Snowplow tracker SDKs provide built-in methods for tracking page views, along with many other kinds of events. The additional entities added depend on which Snowplow SDK you're using, and which enrichments you've configured.

Snowplow provides out-of-the-box dbt data models for initial modeling and common analytics use cases.

Data validation

GA4 is loosely typed and doesn't validate incoming events. You can define expected properties and custom configuration, but events that don't match their specification are still processed. The defined configurations are mainly used for reporting in the GA4 interface.

The lack of validation leads to fragmented and low quality data. For example, the price for a purchase event might be accidentally tracked in different places as 25.42, "25.42", or even 2542. In the warehouse, this parameter's values will be split over float_value, string_value, and int_value columns, making querying more complex.

Event specification and validation isn't optional for Snowplow. Every event and entity is defined by a JSON schema, called a data structure. The data payload itself contains a reference to the specific data structure and version that defines it. The events are always validated as they're processed through the Snowplow pipeline, and events that fail validation are filtered out.

Snowplow provides monitoring and alerting for failed events. You can choose to load failed events into a separate table in your warehouse, or to analyze them in temporary buckets. This strict approach ensures high data quality.

Tracking plans

You've probably defined which GA4 events and parameters to track in a tracking plan spreadsheet or similar document. Snowplow provides event data management tools for defining and managing tracking plans. Each tracking plan contains a set of related event specifications. Each event specification has one event data structure, and any number of entity data structures.

You can use the Snowplow Console, API, or CLI to define your tracking data structures. For each event you can specify when it should be tracked, and which entities should be added. Once you've defined your event specifications, you may be able to use Snowtype to automatically generate the tracking code snippets.

Migration approaches

In general, we recommend a parallel-run migration approach. This involves running both GA4 and Snowplow tracking simultaneously, allowing you to validate your Snowplow data in your warehouse before switching over completely.

If you're using GA4 with direct gtag() calls, you can replace these with Snowplow tracker SDK calls in your application code.

However, Snowplow also provides templates and tags for GTM and GTM Server-side. You could choose to migrate your data layer tracking and your modeling implementations in separate stages. This table summarizes the three possible approaches to reach a parallel-run dual tracking implementation:

Approach	Phase 1	Phase 2	Phase 3	Best for
Direct tracker SDK calls	Define Snowplow tracking plans	Add Snowplow tracking calls; implement Snowplow data models		Teams with tracking implementation engineer resources; greenfield validation projects
Snowplow GTM templates	Define Snowplow tracking plans	Configure Snowplow tags to receive data layer events; implement Snowplow data models	Implement Snowplow tracking calls	Validation of Snowplow data quality for analytics or marketing teams
GTM Server-side tagging	Define Snowplow tracking plans	Use GTM Server-side to forward your existing events to Snowplow; implement Snowplow data models	Implement Snowplow tracking calls	Validation of Snowplow data quality for teams already using GTM Server-side

Using Snowplow's GTM templates or tags is a temporary solution to help you migrate.

Migration phases

We recommend using a parallel-run migration approach. This process can be divided into four phases:

Assess and plan
Validate data
Finish implementation
Cutover and finalize

1. Assess and plan

Your existing GA4 event configuration will serve as the foundation for your new Snowplow implementation. Before migration, we advise auditing your tracking calls and event configuration. Are all the events and parameters still relevant? Does your configuration match your tracking plan?

Start documenting all downstream data consumers, such as BI dashboards, dbt models, ML pipelines, or marketing automation workflows, that may need updating after migration.

Review your existing GA4 implementation using one or more of these methods:

Tag manager inspection: if using GTM, export your container configuration to identify all GA4 tags, their triggers, and variable mappings
Programmatic API export: use the Google Analytics Admin API to retrieve the full JSON definition of all custom event configurations
Code audit: review all dataLayer.push() or gtag('event', ...) calls in your codebase
Warehouse inference: analyze your BigQuery events_YYYYMMDD tables to identify all event names and parameters in use

For data layer implementations, pay special attention to:

The structure of your data layer objects and how properties are nested
Which GTM variables extract and transform data layer properties
How GTM triggers determine when events are sent
Any custom JavaScript variables that process data layer data

You'll need to translate your GA4 event configuration into Snowplow tracking plans. Some things to consider:

Which platforms will you be tracking on? Different Snowplow tracker SDKs include different built-in event types. The web and native mobile trackers are the most fully featured.
Which GA4 events can be migrated to built-in Snowplow events, and which should be custom self-describing events?
Are there sets of event parameters used in multiple places that could be defined as entities instead?
What's the best combination of event properties and entities to capture the same data as GA4?
If you're using GTM, will you go straight to implementing direct tracker SDK calls, or use Snowplow's GTM templates or GTM Server-side tagging to maintain the data layer pattern initially?

The goal is to create a set of JSON data structures for all your events and entities, organized into tracking plans and event specifications. The best way to import your new tracking plans into Snowplow is to use the Snowplow CLI.

Snowplow CLI MCP server

The Snowplow CLI includes an MCP server to help you translate your GA4 event configuration into Snowplow tracking plans.

In this phase, you'll also need to decide what to do with historical data. There are two main choices:

Coexistence: leave historical GA4 data in BigQuery. Write queries that combine data from both systems, using a transformation layer (for example, in dbt) to create compatible structures.
Unification: transform and backfill historical GA4 data into the Snowplow format. This requires a custom engineering project to export GA4 data, reshape it into the Snowplow enriched event format, and load it into the warehouse. The result is a unified historical dataset.

2. Validate data

This phase involves three main tasks:

Set up your Snowplow infrastructure
Start sending events to Snowplow and test your new tracking plan
Set up data models

Follow the Snowplow CDI getting started instructions to set up your Snowplow infrastructure.

If you haven't done this yet, use the Snowplow CLI to import your new tracking plans into Snowplow. You can also inspect and edit tracking plans using the Snowplow Console. They'll be available to the Snowplow pipeline for data validation on publishing. Use the Snowplow CLI or Console to publish.

Snowplow Micro

Use Snowplow Micro to test and validate your Snowplow events locally, before sending them to your warehouse.

Snowplow Micro is a Dockerized local pipeline. It uses the same schema registry for event validation as your production pipeline.

Use the Snowplow data quality monitoring tools for confidence in your tracking implementation.

Set up your analytics:

Configure the Snowplow dbt models for initial modeling
Update your existing data models to handle the new data structures
Create queries to reconcile your old and new data
Compare high-level metrics between systems e.g. daily event counts, or unique users

Your GA4 implementation will stay live until the final migration phase. By the end of this phase, you'll have a validated Snowplow implementation that captures or recreates your GA4 tracking, and recreates your modeling. The goal is to be confident that the new Snowplow pipeline is capturing critical business logic correctly.

Direct tracker SDK calls

For the most straightforward migration, add Snowplow tracking in parallel with your existing GA4 tracking:

Implement a tracker SDK, and track a small number of built-in events, such as page views
Use the Snowplow Inspector browser extension to confirm that the tracker is generating the expected events
Use Snowtype to generate custom tracking code for your tracking plans

Snowplow GTM templates

If you're using GTM and don't have the resources to implement Snowplow tracking yet, you can use Snowplow's GTM tag templates to generate Snowplow events without changing your application code:

Keep your existing dataLayer.push() calls in your application code
Install the Snowplow GTM tag template in your GTM web container
Configure GTM triggers to fire Snowplow tags based on your data layer events
Use GTM variables to map data layer properties to Snowplow event and entity fields

For example, the existing data layer push stays the same:

javascript
dataLayer.push({
  event: 'purchase',
  ecommerce: {
    transaction_id: '12345',
    value: 25.42,
    currency: 'USD'
  }
});

In GTM, you would configure a Snowplow tag that fires when event equals purchase, mapping the ecommerce properties to a custom Snowplow self-describing event or using the built-in ecommerce tracking.

GTM server-side tagging

If you're already using GTM Server-side to forward events, you could configure Snowplow as a destination using the Snowplow GTM Server-side tag:

Keep your existing dataLayer.push() calls in your application code
Configure GTM web container to forward events to GTM server-side
Set up Snowplow tracking in the GTM server-side container
Use GTM server-side variables and transformations to map data layer data to Snowplow events

3. Finish implementation

If you used Snowplow GTM templates or GTM server-side tagging in the previous phase, it's time to implement direct Snowplow tracker SDK calls in your application code.

The benefits of a full Snowplow implementation include:

Enables a complete migration away from GA4
Allows you to take advantage of all Snowplow features, including out-of-the-box tracking definitions and event data management tools
Increased page performance
Improved debugging and maintenance

4. Cutover and finalize

In the final phase, you'll update downstream data consumers, and decommission GA4.

Use Snowplow event forwarding to integrate with third-party destinations. Update all your data consumers to query the new Snowplow data tables.

Remove the GA4 tracking from your codebases. Finally, archive your GA4 configuration and data exports.

Platform differences​

Event structure​

Tracking comparison​

Warehouse loading and data modeling​

Data validation​

Tracking plans​

Migration approaches​

Migration phases​

1. Assess and plan​

2. Validate data​

Direct tracker SDK calls​

Snowplow GTM templates​

GTM server-side tagging​

3. Finish implementation​

4. Cutover and finalize​

Want to see a custom demo?