# Snowplow Documentation > Authoritative Snowplow documentation for implementing event tracking, validation, enrichment, governance, and delivery of clean event-level behavioral data. Focus areas include composable analytics, composable CDP, in-product personalization, AI agentic applications, and feeding AI-ready real-time data into warehouses, lakes, streams, and real-time tools. Documentation for previous versions of components is available on the site but is not included in this file. --- # Manage your Snowplow account using the Credentials API > Manage your Snowplow account configuration, users, and API keys through Console, including instructions for obtaining JWT tokens via the Credentials API. > Source: https://docs.snowplow.io/docs/account-management/ Manage your account configuration and users using the Snowplow Console. You can also use the underlying API directly. This page describes how to acquire an API key. ## Credentials API The API that drives Console's functionality is [publicly documented](https://console.snowplowanalytics.com/api/msc/v1/docs/index.html?url=/api/msc/v1/docs/docs.yaml) and available for customers to invoke via code. All calls to it need to be properly authenticated using JSON Web Tokens (JWT) that can be acquired via the Credentials API. The following view is available in [Console](https://console.snowplowanalytics.com/), under **Settings** in the navigation bar, then **Manage organization**, then **API keys for managing Snowplow**. Users can view this page only if they have the "view" permission on API keys. ![](/assets/images/accessing-generated-api-keys-8dd552f45cdaec4af7a5a070c498786e.png) API keys generation view You can create multiple API keys, and it's also possible to delete any key. When a new API key is generated, the following view will appear: ![](/assets/images/generated-api-key-v3-b28f53c0b129d6da072b23a0a98d4cf4.png) Newly created API key view Both the API key ID and API key are required. The API key functions like a combination of a username and password, and should be treated with the same level of security. Once you have an API key and key ID, exchanging it for a JWT is straightforward. For example, using curl, the process would look like this: ```bash curl \ --header 'X-API-Key-ID: ' \ --header 'X-API-Key: ' \ https://console.snowplowanalytics.com/api/msc/v1/organizations//credentials/v3/token ``` You can find your Organization ID [on the _Manage organization_ page](https://console.snowplowanalytics.com/settings) in Console. The curl command above will return a JWT as follows: ```json { "accessToken": "" } ``` You can then use this access token to supply authorization headers for subsequent API requests: ```bash curl --header 'Authorization: Bearer ' ``` ### Previous versions A previous version of the token exchange endpoint is still available, only requiring the API key: ```bash curl \ --header 'X-API-Key: ' \ https://console.snowplowanalytics.com/api/msc/v1/organizations//credentials/v2/token ``` While this method will continue to work, the endpoint is now deprecated and will be removed in the future. Use the v3 endpoint detailed above instead. --- # Available Console user permissions and roles > Configure user permissions in Snowplow Console with Global Admin, User, and Custom roles to control access to environments, data structures, tracking plans, data models, and API keys. > Source: https://docs.snowplow.io/docs/account-management/managing-permissions/ To set a user's permissions, navigate to **Settings** > **Users** and then to the user whose account you'd like to manage. ## What permissions can be set? Snowplow Console sets permissions for each area of Console as summarized below: | **Console feature** | **Description** | **Possible permissions** | | ------------------- | -------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | | User management | The management and addition of user access. This permission cannot be configured on a Custom role. | No access Edit Create | | Environments | The management of pipeline and development environments. This includes managing which Enrichments run on each environment. | No access View Edit | | Tracking plans | The management and creation of tracking plans. | No access View Edit Create | | Data structures | The management and creation of the schemas that define the events and entities you are capturing. | No access View Edit on development Edit on production Create | | Data models | The management and creation of your data models. | No access View Edit Create | | API keys | The management and creation of API keys. | No access View Manage | ## How are permissions set? To set permissions for a user, navigate to **Settings** > **Users** and select the user, within the management screen for their user you will be able to set their permissions. There are three ways of setting user permissions: - Global Admin (pre-defined role) - User (pre-defined role) - Custom (custom permissions role) The following tables describe the default permissions for each role. #### User permission set | **Console feature** | **Permissions** | | ------------------- | --------------- | | Environments | View access | | User management | View access | | Tracking plans | View access | | Data structures | View access | | Data models | View access | | API keys | View access | #### Global Admin permission set | **Console feature** | **Permissions** | | ------------------- | --------------- | | User management | Full access | | Environments | Full access | | Tracking plans | Full access | | Data structures | Full access | | Data models | Full access | | API keys | Full access | #### Custom permission set | **Console feature** | **Permissions** | | ------------------- | --------------------------- | | User management | Customized by you, per user | | Environments | Customized by you, per user | | Tracking plans | Customized by you, per user | | Data structures | Customized by you, per user | | Data models & jobs | Customized by you, per user | | API keys | Customized by you, per user | A note on API keys and permissions Please note: 1. Any API keys you create have full admin permissions 2. Any existing Iglu API keys allow permissions to be side-stepped by connecting directly to Iglu servers The recommended approach is to remove all existing API keys and Iglu keys, and set the API keys permission respectively so that only trusted users can create new keys. ## What does each permission mean? ### Environments An environment is the collective name for your Production pipelines, QA pipelines and development environments. An environment has three permissions: - **No access** - the user will not see the environment management screens. - **View** - the user can see the environment management screen, but cannot edit anything. This is the default setting for the User role. - **Edit** - the user can make edits to the environment. This includes configuration such as enrichment enablement, enrichment configuration and collector configuration. ### Tracking plans Tracking plans have four permissions: - **No access** - the user will not see the tracking plan management screens. - **View** - the user can see the tracking plan management screens, but cannot edit anything. This permission and all tracking plan permissions below require the user to have at least the **View** permission on data structures. - **Edit** - the user can see the tracking plan management screens, and can make edits to existing tracking plans. - **Create** - the user can create new tracking plans. ### Data structures Data structures have five permissions: - **No access** - the user will not see the data structure management screen. - **View** - the user can see the data structure management screen, but cannot edit anything. - **Edit on development** - the user can see the data structure management screen, and can make edits to data structures but only publish them to the development registry. - **Edit on production** - the user can see the data structure management screen, and can make edits to data structures, and can publish changes to the production registry. - **Create** - the user can create new data structures. ### Data models Data models and jobs have four permissions: - **No access** - the user will not see the data model management screens. - **View** - the user can see the data model management screens, but cannot edit anything. This is the default setting for the User role. - **Edit** - the user can see the data model management and can make edits to data models in production. This is the default setting for the Global Admin role. - **Create** - the user can create new data models. ### API keys API keys have four permissions: - **No access** - the user will not see the API key management screens. - **View** - the user can see the API key descriptions but cannot see the keys themselves or manage them. - **Manage** - the user can see and manage the API keys. - **Create** - the user can generate new API keys. ## Troubleshooting You shouldn’t be required to logout for new permissions to take effect, but if you do find permissions aren’t applying as requested logging out and back in should force the new permissions to apply. --- # Manage users in Console directly or with SSO > Add and remove users in Snowplow Console, configure Single Sign-On with identity providers including Google Workspace, Entra ID, Okta, and OpenID Connect. > Source: https://docs.snowplow.io/docs/account-management/managing-users/ There are two ways to add and remove users in Console: directly managed in Console, or managed through your Single Sign-On (SSO) provider. SSO is an authentication process that allows users to access multiple applications after signing in to a central Identity Provider. Snowplow supports SSO integration for the majority of identity providers. For organizations **not using SSO**, users can be added and removed directly in Console by navigating to **Settings** > **Users** in the navigation and creating a new user, or removing an existing user from there. Newly added users will receive an email to set their password and will be added with limited permissions, which you can then widen. For organizations **using SSO**, you will need to configure your account with your Identity Provider before you can add or remove users. ## SSO permissions Only system administrators can set up SSO for their company. For information on setting permissions for individual users, see [Managing user permissions](/docs/account-management/managing-permissions/). ## How to enable SSO for your account Setting up SSO for your account requires some information to be exchanged between you as the Identity Provider and Snowplow as the Service Provider. Depending on your Identity Provider, the information that is required is slightly different. To enable single sign-on (SSO) for Snowplow, follow these steps inside Console: 1. Go to the [manage organization](https://console.snowplowanalytics.com/settings) page. 2. Select [Single sign-on (SSO)](https://console.snowplowanalytics.com/users) from the Users panel. The SSO configuration is only visible to users with Admin role. 3. Click **Continue** and follow the steps for your Identity Provider. ## Which Identity Providers (IdPs) are supported? Snowplow’s SSO capability enables connections with many IdPs, including:  - ADFS - Auth0 - Entra ID (formerly known as Azure AD) - Google Workspace - Keycloak - Okta - PingFederate Because Snowplow supports OpenID Connect and SAML, virtually any external Identity Provider that uses those standards should work. ## What information will you need from us? This will differ depending on your Identity Provider, but typically will include information such as: - **Entity ID** - the URL that identifies the identity provider issuing a SAML request, this will be specific to your identity provider. - **Metadata URL** - the URL that allows access to obtain SSO configuration data, this will be specific to your identity provider. - **Redirect Login URL** - the URL where users in the company sign in to the identity provider. - **User information mapping** - locations of information required by Snowplow Console such as first name, last name and, optionally, job title. ## What happens when SSO is enabled? ### Adding new users Snowplow supports just-in-time provisioning with SSO connections. When a user logs in for the first time, a corresponding user account with the same email is created in Snowplow. A new user created via SSO will have a custom permissions set that allows them to view-only, as outlined below. This can then be edited by anyone with the Global Admin role on your account. For more details on setting user access, see [Managing user permissions](/docs/account-management/managing-permissions/). ### Existing users If a user already has a Snowplow account prior to SSO being enabled, the two accounts will be merged, and the user's current permissions will be applied. ### Logging in  When SSO is enabled, anybody who signs into Snowplow Console with an email address that uses your specified domain will be authenticated via SSO and your Identity Provider. Once SSO is enabled, users on your domain can no longer sign in with their old email address and password, or manage their personal details or password as these will all be managed within your Identity Provider. ## Disabling SSO If your company enables SSO, and later decides to disable it: - Users who did not set up a password before SSO was enabled must click Reset password on the login page to obtain a password. - Users who set up a password before SSO was enabled can log in with their old username and password. --- # Go Analytics SDK > Go library for parsing Snowplow enriched events with efficient field access and transformation to JSON or maps for serverless functions. > Source: https://docs.snowplow.io/docs/api-reference/analytics-sdk/analytics-sdk-go/ ## 1. Overview The [Snowplow Analytics SDK for Go](https://github.com/snowplow/snowplow-golang-analytics-sdk) lets you work with [Snowplow enriched events](/docs/fundamentals/canonical-event/) in your Go event processing, data modeling and machine-learning jobs. You can use this SDK with [AWS Lambda](https://aws.amazon.com/lambda/), [Google Cloud Functions](https://cloud.google.com/functions/), [Google App Engine](https://cloud.google.com/appengine) and other Golang-compatible data processing frameworks. ## 2. Compatibility Snowplow Analytics SDK fo Go was tested with Go versions 1.13-1.15. There are only two external dependencies currently: - `github.com/json-iterator/go` - used to parse JSON - `github.com/pkg/errors` - used to provide an improvement on the standard error reporting. ## 3. Setup snowplow/snowplow-golang-analytics-sdk can be imported to a project as a go module: `go get github.com/snowplow/snowplow-golang-analytics-sdk` ## 4. Usage ### 4.1 Overview The [Snowplow Analytics SDK for Go](https://github.com/snowplow/snowplow-golang-analytics-sdk) provides you an API to parse an enriched event from it's TSV-string form to a `ParsedEvent` slice of strings, then a set of methods to transform the entire event or a subset of fields into either `JSON` or `map` form. It also offers methods to efficiently get a field from the `ParsedEvent`. ### 4.2 Summary of example usage ```bash go get github.com/snowplow/snowplow-golang-analytics-sdk ``` ```go import "github.com/snowplow/snowplow-golang-analytics-sdk/analytics" parsed, err := ParseEvent(event) // Where event is a valid TSV string Snowplow event. if err != nil { fmt.Println(err) } parsed.ToJson() // whole event to JSON parsed.ToMap() // whole event to map parsed.GetValue("page_url") // get a value for a single canonical field parsed.GetSubsetMap("page_url", "domain_userid", "contexts", "derived_contexts") // Get a map of values for a set of canonical fields parsed.GetSubsetJson("page_url", "unstruct_event") // Get a JSON of values for a set of canonical fields ``` ### 4.3 API ```go func ParseEvent(event string) (ParsedEvent, error) ``` ParseEvent takes a Snowplow Enriched event tsv string as input, and returns a 'ParsedEvent' typed slice of strings. Methods may then be called on the resulting ParsedEvent type to transform the event, or a subset of the event to Map or Json. ```go func (event ParsedEvent) ToJson() ([]byte, error) ``` ToJson transforms a valid Snowplow ParsedEvent to a JSON object. ```go func (event ParsedEvent) ToMap() (map[string]interface{}, error) ``` ToMap transforms a valid Snowplow ParsedEvent to a Go map. ```go func (event ParsedEvent) GetSubsetJson(fields ...string) ([]byte, error) ``` GetSubsetJson returns a JSON object containing a subset of the event, containing only the atomic fields provided, without processing the rest of the event. For custom events and contexts, only "unstruct\_event", "contexts", or "derived\_contexts" may be provided, which will produce the entire data object for that field. For contexts, the resultant map will contain all occurrences of all contexts within the provided field. ```go func (event ParsedEvent) GetSubsetMap(fields ...string) (map[string]interface{}, error) ``` GetSubsetMap returns a map of a subset of the event, containing only the atomic fields provided, without processing the rest of the event. For custom events and entites, only "unstruct\_event", "contexts", or "derived\_contexts" may be provided, which will produce the entire data object for that field. For contexts, the resultant map will contain all occurrences of all contexts within the provided field. ```go func (event ParsedEvent) GetValue(field string) (interface{}, error) ``` GetValue returns the value for a provided atomic field, without processing the rest of the event. For unstruct\_event, it returns a map of only the data for the unstruct event. For contexts and derived\_contexts, it returns the data for all contexts or derived\_contexts in the event. ```go func (event ParsedEvent) ToJsonWithGeo() ([]byte, error) ``` ToJsonWithGeo adds the geo\_location field, and transforms a valid Snowplow ParsedEvent to a JSON object. ```go func (event ParsedEvent) ToMapWithGeo() (map[string]interface{}, error) ``` ToMapWithGeo adds the geo\_location field, and transforms a valid Snowplow ParsedEvent to a Go map. --- # JavaScript and TypeScript Analytics SDK > Lightweight JavaScript and TypeScript library to transform Snowplow enriched TSV events into JSON for serverless functions and Node.js. > Source: https://docs.snowplow.io/docs/api-reference/analytics-sdk/analytics-sdk-javascript/ ## Overview The [Snowplow JavaScript and TypeScript Analytics SDK](https://github.com/snowplow-incubator/snowplow-js-analytics-sdk) lets you work with [Snowplow enriched events](/docs/fundamentals/canonical-event/) in your JavaScript event processing, data modeling and machine-learning jobs. You can use this SDK with [AWS Lambda](https://aws.amazon.com/lambda/), [Google Cloud Functions](https://cloud.google.com/functions/), [Google App Engine](https://cloud.google.com/appengine) and other JavaScript-compatible frameworks. ## Setup Install using your preferred package manager, such as npm: ```bash npm install --save snowplow-analytics-sdk ``` ## Usage ### Overview The [Snowplow JavaScript and TypeScript Analytics SDK](https://github.com/snowplow-incubator/snowplow-js-analytics-sdk) provides you an API to parse an enriched event from it's TSV-string form to a `JSON` string. ### Example To consume in an AWS lambda you would do something like this in your `app.js`: ```javascript const { transform } = require('snowplow-analytics-sdk'); module.exports.handler = (input) => { let event = transform( new Buffer(input.Records[0].kinesis.data, 'base64').toString('utf8'), ); // ... }; ``` Or in `app.ts`: ```javascript import { transform } from 'snowplow-analytics-sdk'; export function handler(input: any) { let event = transform( new Buffer(input.Records[0].kinesis.data, 'base64').toString('utf8'), ); // ... } ``` ## API ### `transform(event: string): Event` - `event: string` - TSV string containing event data. Returns decoded [Snowplow enriched event](/docs/fundamentals/canonical-event/). --- # .NET Analytics SDK > .NET SDK with JSON event transformer for processing Snowplow enriched events in Azure Data Lake Analytics, Azure Functions, and C# applications. > Source: https://docs.snowplow.io/docs/api-reference/analytics-sdk/analytics-sdk-net/ ## 1. Overview The [Snowplow Analytics SDK for .NET](https://github.com/snowplow/snowplow-dotnet-analytics-sdk) lets you work with [Snowplow enriched events](/docs/fundamentals/canonical-event/) in your .NET event processing, data modeling and machine-learning jobs. You can use this SDK with [Azure Data Lake Analytics](https://azure.microsoft.com/en-gb/services/data-lake-analytics/), [Azure Functions](https://azure.microsoft.com/en-gb/services/functions/), [AWS Lambda](https://aws.amazon.com/lambda/), [Microsoft Orleans](https://dotnet.github.io/orleans/) and other .NET-compatible data processing frameworks. The .NET Analytics SDK makes it significantly easier to build applications that consume Snowplow enriched data directly from Event Hubs or Azure Blob Storage. ## 2. Compatibility Snowplow .NET Analytics SDK targets [.NET Standard 1.3](https://github.com/dotnet/standard/blob/master/docs/versions.md). ## 3. Setup To add the .NET Analytics as a dependency to your project, install it in the Visual Studio Package Manager Console using [NuGet](https://www.nuget.org/): ```powershell Install-Package Snowplow.Analytics ``` ## 4. Event Transformer ### 4.1 Overview The Snowplow enriched event is a relatively complex TSV string containing self-describing JSONs. Rather than work with this structure directly, Snowplow analytics SDKs ship with _event transformers_, which translate the Snowplow enriched event format into something more convenient for engineers and analysts. As the Snowplow enriched event format evolves towards a cleaner [Apache Avro](https://avro.apache.org/)-based structure, we will be updating this Analytics SDK to maintain compatibility across different enriched event versions. Working with the Snowplow .NET Analytics SDK therefore has two major advantages over working with Snowplow enriched events directly: 1. The SDK reduces your development time by providing analyst- and developer-friendly transformations of the Snowplow enriched event format 2. The SDK futureproofs your code against new releases of Snowplow which update our enriched event format Currently the Analytics SDK for .NET ships with one event transformer: the JSON Event Transformer. ### 4.2 The JSON Event Transformer The JSON Event Transformer takes a Snowplow enriched event and converts it into a JSON ready for further processing. This transformer was adapted from the code used to load Snowplow events into Elasticsearch in the Kinesis real-time pipeline. The JSON Event Transformer converts a Snowplow enriched event into a single JSON like so: ```json { "app_id":"demo", "platform":"web", "etl_tstamp":"2015-12-01T08:32:35.048Z", "collector_tstamp":"2015-12-01T04:00:54.000Z", "dvce_tstamp":"2015-12-01T03:57:08.986Z", "event":"page_view", "event_id":"f4b8dd3c-85ef-4c42-9207-11ef61b2a46e", "txn_id":null, "name_tracker":"co", "v_tracker":"js-2.5.0", "v_collector":"clj-1.0.0-tom-0.2.0",... ``` The most complex piece of processing is the handling of the self-describing JSONs found in the enriched event's `unstruct_event`, `contexts` and `derived_contexts` fields. All self-describing JSONs found in the event are flattened into top-level plain (i.e. not self-describing) objects within the enriched event JSON. For example, if an enriched event contained a `com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1`, then the final JSON would contain: ```json { "app_id":"demo", "platform":"web", "etl_tstamp":"2015-12-01T08:32:35.048Z", "unstruct_event_com_snowplowanalytics_snowplow_link_click_1": { "targetUrl":"http://www.example.com", "elementClasses":["foreground"], "elementId":"exampleLink" },... ``` ### 4.3 Examples You can convert an enriched event TSV string to a JSON like this: ```csharp using Snowplow.Analytics.Json; using Snowplow.Analytics.Exceptions; try { EventTransformer.Transform(enrichedEventTsv); } catch (SnowplowEventTransformationException sete) { sete.ErrorMessages.ForEach((message) => Console.WriteLine(message)); } ``` If there are any problems in the input TSV (such as unparseable JSON fields or numeric fields), the `transform` method will throw a `SnowplowEventTransformationException`. This exception contains a list of error messages - one for every problematic field in the input. --- # Parse enriched events in Azure Data Lake > Custom U-SQL extractor for parsing Snowplow enriched events in Azure Data Lake Analytics with direct access to nested context fields. > Source: https://docs.snowplow.io/docs/api-reference/analytics-sdk/analytics-sdk-net/snowplow-event-extractor/ [Azure Data Lake](https://azure.microsoft.com/en-in/solutions/data-lake/) is a secure and scalable data storage and analytics service. [Azure Data Lake Analytics](https://azure.microsoft.com/en-in/services/data-lake-analytics/) includes [U-SQL](https://blogs.msdn.microsoft.com/visualstudio/2015/09/28/introducing-u-sql-a-language-that-makes-big-data-processing-easy/), a big-data query language for writing queries that analyze data. ## Event Extractor Snowplow Event Extractor is an ADLA custom extractor that allows you to parse **[Snowplow enriched events](/docs/fundamentals/canonical-event/)**. Snowplow’s enrichment process outputs enriched events in a TSV format consisting of 131 fields. EventExtractor implements IExtractor interface: ```csharp [SqlUserDefinedExtractor] public class EventExtractor : IExtractor { private static readonly string ROW_DELIMITER = '\t'; public override IEnumerable Extract(IUnstructuredReader input, IUpdatableRow output) { //split the input based on ROW_DELIMITER //set the output data on the output object //EventExtractor only outputs columns and values that are defined with the output. } } ``` ## Usage Following is base U-SQL script that uses a Event Extractor: ```sql DECLARE @input_file string = @"\snowplow\event.tsv"; @rs0 = EXTRACT app_id string, platform string FROM @input_file USING new Snowplow.EventExtractor(); ``` The most complex piece of processing is the handling of the self-describing JSONs found in the enriched event's unstruct\_event, contexts and derived\_contexts fields. Consider contexts found in the tsv: ```json { 'schema': 'iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0', 'data': [{ 'schema': 'iglu:org.schema/WebPage/jsonschema/1-0-0', 'data': { 'genre': 'blog', 'inLanguage': 'en-US', 'datePublished': '2014-11-06T00:00:00Z', 'author': 'Devesh Shetty', 'breadcrumb': ['blog', 'releases'] } }, { 'schema': 'iglu:org.w3/PerformanceTiming/jsonschema/1-0-0', 'data': { 'navigationStart': 1415358089861, 'unloadEventStart': 1415358090270, 'unloadEventEnd': 1415358090287, 'redirectStart': 0, 'redirectEnd': 0 } }] } ``` One of the ways to fetch data from context would be to use user-defined function(UDF): ```sql DECLARE @input_file string = @"\snowplow\event.tsv"; //extract context from tsv @rs0 = EXTRACT context string FROM @input_file USING new Snowplow.EventExtractor(); /* context has nested data array */ @parseData = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(context, "data[*]").Values AS data_arr, FROM @rs0; /* The nested data array inside context consists of an array from which we parse the inner data field */ @parseGenre = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(data_arr, "$.data.genre").Values AS genre, FROM @parseData; ``` The above process can get quite complex. So to abstract away the complexity, Snowplow Event Extractor follows a simple mapping: ```sql DECLARE @input_file string = @"\snowplow\event.tsv"; //extract genre from context directly @rsGenre = EXTRACT context.data.genre FROM @input_file USING new Snowplow.EventExtractor(); ``` --- # Python Analytics SDK > Python SDK for processing Snowplow enriched events with run manifest support for idempotent data processing in PySpark and AWS Lambda. > Source: https://docs.snowplow.io/docs/api-reference/analytics-sdk/analytics-sdk-python/ ## 1. Overview The [Snowplow Analytics SDK for Python](https://github.com/snowplow/snowplow-python-analytics-sdk) lets you work with [Snowplow enriched events](/docs/fundamentals/canonical-event/) in your Python event processing, data modeling and machine-learning jobs. You can use this SDK with [Apache Spark](http://spark.apache.org/), [AWS Lambda](https://aws.amazon.com/lambda/), and other Python-compatible data processing frameworks. ## 2. Compatibility Snowplow Python Analytics SDK was tested with Python of versions: 2.7, 3.3, 3.4, 3.5. As analytics SDKs supposed to be used heavily in conjunction with data-processing engines such as [Apache Spark](http://spark.apache.org/), our goal is to maintain compatibility with all versions that PySpark supports. Whenever possible we try to maintain compatibility with broader range of Python versions and computing environments. This is achieved mostly by minimazing and isolating third-party dependencies and libraries. There are only one external dependency currently: - [Boto3](https://aws.amazon.com/sdk-for-python/) - AWS Python SDK that used to provide access to Event Load Manifests. These dependencies can be installed from the package manager of the host system or through PyPi. ## 3. Setup ### 3.1 PyPI The Snowplow Python Analytics SDK is published to [PyPI](https://pypi.python.org/), the official third-party software repository for the Python programming language. This makes it easy to either install the SDK locally, or to add it as a dependency into your own Python app or Spark job. ### 3.2 pip To install the Snowplow Python Analytics SDK locally, assuming you already have Pip installed: ```bash $ pip install snowplow_analytics_sdk --upgrade ``` To add the Snowplow Analytics SDK as a dependency to your own Python app, edit your `requirements.txt` and add: ```text snowplow_analytics_sdk==0.2.3 ``` ### 3.3 easy\_install If you are still using easy\_install: ```bash $ easy_install -U snowplow_analytics_sdk ``` ## 4. Run Manifests ### 4.1 Overview The [Snowplow Analytics SDK for Python](https://github.com/snowplow/snowplow-python-analytics-sdk) provides you an API to work with run manifests. Run manifests is simple way to mark chunk (particular run) of enriched data as being processed, by for example Apache Spark data-modeling job. ### 4.2 Usage Run manifests functionality resides in new `snowplow_analytics_sdk.run_manifests` module. Main class is `RunManifests`, that provides access to DynamoDB table via `contains` and `add`, as well as `create` method to initialize table with appropriate settings. Other commonly-used function is `list_runids` that is gives S3 client and path to folder such as `enriched.archive` or `shredded.archive` from `config.yml` lists all folders that match Snowplow run id format (`run-YYYY-mm-DD-hh-MM-SS`). Using `list_runids` and `RunManifests` you can list job runs and safely process them one by one without risk of reprocessing. ### 4.3 Example Here's a short usage example: ```python from boto3 import client from snowplow_analytics_sdk.run_manifests import * s3 = client('s3') dynamodb = client('dynamodb') dynamodb_run_manifests_table = 'snowplow-run-manifests' enriched_events_archive = 's3://acme-snowplow-data/storage/enriched-archive/' run_manifests = RunManifests(dynamodb, dynamodb_run_manifests_table) run_manifests.create() # This should be called only once for run_id in list_runids(s3, enriched_events_archive): if not run_manifests.contains(run_id): process(run_id) run_manifests.add(run_id) else: pass ``` In above example, we create two AWS service clients for S3 (to list job runs) and for DynamoDB (to access manifests). These clients are provided via [boto3](https://aws.amazon.com/sdk-for-python/) Python AWS SDK and can be initialized with static credentials or with system-provided credentials. Then we list all run ids in particular S3 path and process (by user-provided `process` function) only those that were not processed already. Note that `run_id` is simple string with S3 key of particular job run. `RunManifests` class is a simple API wrapper to DynamoDB, using which you can: - `create` DynamoDB table for manifests, - `add` run to table - check if table `contains` run id --- # Scala Analytics SDK > Parse Snowplow enriched events into case classes with JSON transformation and event inventory metadata for Apache Spark, Flink, and AWS Lambda. > Source: https://docs.snowplow.io/docs/api-reference/analytics-sdk/analytics-sdk-scala/ ## 1. Overview The [Snowplow Analytics SDK for Scala](https://github.com/snowplow/snowplow-scala-analytics-sdk) lets you work with [Snowplow enriched events](/docs/fundamentals/canonical-event/) in your Scala event processing, data modeling and machine-learning jobs. You can use this SDK with [Apache Spark](http://spark.apache.org/), [AWS Lambda](https://aws.amazon.com/lambda/), [Apache Flink](https://flink.apache.org/), [Scalding](https://github.com/twitter/scalding), [Apache Samza](http://samza.apache.org/) and other JVM-compatible data processing frameworks. The Scala Analytics SDK makes it significantly easier to build applications that consume Snowplow enriched data directly from Kinesis or S3. ## 2. Compatibility Snowplow Scala Analytics SDK was compiled against Scala versions 2.12 and 2.13. Minimum required Java Runtime is JRE8. ## 3. Setup The latest version of Snowplow Scala Analytics SDK is 3.0.0 and it is available on Maven Central. ### 3.1 SBT If you’re using SBT, add the following lines to your build file: ```scala // Dependency libraryDependencies += "com.snowplowanalytics" %% "snowplow-scala-analytics-sdk" % "3.0.0" ``` Note the double percent (`%%`) between the group and artifactId. This will ensure that you get the right package for your Scala version. ### 3.2 Gradle If you are using Gradle in your own job, then add following lines in your `build.gradle` file: ```gradle dependencies { ... // Snowplow Scala Analytics SDK compile 'com.snowplowanalytics:snowplow-scala-analytics-sdk_2.12:3.0.0' } ``` Note that you need to change `_2.12` to `_2.13` in artifactId if you're using Scala 2.13. ### 3.3 Maven If you are using Maven in your own job, then add following lines in your `pom.xml` file: ```xml com.snowplowanalytics snowplow-scala-analytics-sdk_2.12 3.0.0 ``` Note that you need to change `_2.12` to `_2.13` in artifactId if you're using Scala 2.13. ## 4. Scala Analytics SDK Event Transformer ### 4.1 Overview The Snowplow enriched event is a relatively complex TSV string containing scalars and self-describing JSONs. Rather than work with this structure directly, Snowplow analytics SDKs ship with _event transformers_, which translate the Snowplow enriched event format into other formats that are more convenient for engineers and analysts. As the Snowplow enriched event format evolves towards a cleaner [Apache Avro](https://avro.apache.org/)-based structure, we will be updating this SDK to maintain compatibility across different enriched event versions. Working with the Snowplow Scala Analytics SDK therefore has two major advantages over working with Snowplow enriched events directly: 1. The SDK reduces your development time by providing analyst- and developer-friendly transformations of the Snowplow enriched event format; 2. The SDK futureproofs your code against new releases of Snowplow which update our enriched event format. Currently the Analytics SDK for Scala ships with one event transformer: the JSON Event Transformer. ### 4.2 The JSON Event Transformer The JSON Event Transformer takes a Snowplow enriched event and converts it into a JSON ready for further processing. This transformer was adapted from the code used to load Snowplow events into Elasticsearch in the Kinesis real-time pipeline. The JSON Event Transformer converts a Snowplow enriched event into an instance of the `Event` case class, a representation of a canonical Snowplow event, like so: ```scala Event( app_id = Some("angry-birds"), platform = Some("web"), etl_tstamp = Some(Instant.parse("2017-01-26T00:01:25.292Z")), collector_tstamp = Instant.parse("2013-11-26T00:02:05Z"), dvce_created_tstamp = Some(Instant.parse("2013-11-26T00:03:57.885Z")), event = Some("page_view"), event_id = UUID.fromString("c6ef3124-b53a-4b13-a233-0088f79dcbcb"), txn_id = Some(41828), name_tracker = Some("cloudfront-1"), v_tracker = Some("js-2.1.0"), v_collector = "clj-tomcat-0.1.0", v_etl = "serde-0.5.2" /* ... */ ) ``` This case class can be rendered into a JSON object, and subsequently a JSON string, or worked with to interact with the event's fields in a typesafe manner. The most complex piece of processing is the handling of the self-describing JSONs found in the enriched event's `unstruct_event`, `contexts` and `derived_contexts` fields. Currently there are two alternative behaviors for handling them in the event transformer: 1. Under the original "lossy" behavior, if an enriched event contained a `com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1`, then the unstructured event field would be rendered in the final JSON like this: ```json "unstruct_event_com_snowplowanalytics_snowplow_link_click_1": { "targetUrl": "http://www.example.com", "elementClasses": ["foreground"], "elementId": "exampleLink" } ``` 2. Under the new "lossless" behavior, available since 0.3.1, if an enriched event contained a `com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1`, then the final JSON (if turned into a string) would contain a self-describing JSON object instead: ```json "unstruct_event": { "schema": "iglu:com.snowplowanalytics.snowplow/link_click/1-0-1", "data": { "targetUrl": "http://www.example.com", "elementClasses": ["foreground"], "elementId": "exampleLink" } } ``` Along with the `Event` case class, the JSON Event Transformer comes with the following functions: - `Event.parse(line)` - similar to the old `transform` function, this method accepts an enriched Snowplow event as a string and returns an `Event` instance as a result. - `event.toJson(lossy)` - similar to the old `getValidatedJsonEvent` function, it transforms an `Event` into a validated JSON whose keys are the field names corresponding to the EnrichedEvent POJO of the Scala Common Enrich project. If the lossy argument is true, any self-describing events in the fields (unstruct\_event, contexts and derived\_contexts) are returned in a "shredded" format, e.g. `"unstruct_event_com_acme_1_myField": "value"`. If it is set to false, they are not flattened into underscore-separated top-level fields, using a standard self-describing format instead. - `event.inventory` - extracts metadata from the event containing information about the types and Iglu URIs of its shred properties (unstruct\_event, contexts and derived\_contexts). Unlike version 0.3.0, it no longer requires a `transformWithInventory` call and can be obtained from any `Event` instance. - `atomic` - returns the event as a map of keys to Circe JSON values, while dropping inventory fields. This method can be used to modify an event's JSON AST before converting it into a final result. - `ordered` - returns the event as a list of key/Circe JSON value pairs. Unlike `atomic`, which has randomized key ordering, this method returns the keys in the order of the canonical event model, and is particularly useful for working with relational databases. #### Event inventory An event's inventory is simply a list of metadata about its shredded types: 1. Where it was extracted from: `unstruct_event` column (`UnstructEvent`), `contexts` column (`Contexts(CustomContexts)`) or `derived_contexts` column (`Contexts(DerivedContexts)`) 2. Its Iglu URI (e.g. `iglu:com.acme/context/jsonschema/1-0-0`), stored as an Iglu `SchemaKey` instance. ### 4.3 Examples #### 4.3.1 Using from Apache Spark The Scala Analytics SDK is a great fit for performing Snowplow [event data modeling](http://snowplowanalytics.com/blog/2016/03/16/introduction-to-event-data-modeling/) in Apache Spark and Spark Streaming. Here’s the code we use internally for our own data modeling jobs: ```scala import cats.data.Validated import com.snowplowanalytics.snowplow.analytics.scalasdk.Event val events = input.flatMap(line => Event.parse(line) match { case Validated.Valid(event) => Some(event.toJson(true).noSpaces) case Validated.Invalid(_) => None } ) val dataframe = spark.read.json(events: _*) ``` #### 4.3.2 Using from AWS Lambda The Scala Analytics SDK is a great fit for performing analytics-on-write, monitoring or alerting on Snowplow event streams using AWS Lambda. Here’s some sample code for transforming enriched events into JSON inside a Scala Lambda: ```scala import com.snowplowanalytics.snowplow.analytics.scalasdk.Event def recordHandler(event: KinesisEvent) { val events = for { rec <- event.getRecords line = new String(rec.getKinesis.getData.array()) event = Event.parse(line) } yield event /* ... */ } ``` --- # Analytics SDKs for event data transformation > Transform Snowplow enriched TSV events into JSON for data modeling and machine learning in Scala, JavaScript, Go, Python, and .NET. > Source: https://docs.snowplow.io/docs/api-reference/analytics-sdk/ The Snowplow Analytics SDKs are designed for data engineers and data scientists working with Snowplow in a number of languages. Some good use cases for the SDKs include: 1. Transforming the Enriched TSV to Enriched JSON for further processing 2. Developing AI/ML models on your event data 3. Performing analytics-on-write in AWS Lambda as part of our Kinesis real-time pipeline 4. Within Snowplow pipeline components to process event data ## Snowplow Analytics SDKs - [Scala Analytics SDK](/docs/api-reference/analytics-sdk/analytics-sdk-scala/) - lets you work with Snowplow enriched events in your Scala event processing, data modeling and machine-learning jobs. You can use this SDK with Apache Spark, AWS Lambda, GCP Cloud Functions, Apache Flink and other Scala-compatible data processing frameworks. - [JavaScript and TypeScript Analytics SDK](/docs/api-reference/analytics-sdk/analytics-sdk-javascript/) - lets you work with Snowplow enriched events in your Node.js or other JavaScript environments. This SDK can be used with AWS Lambda and Google Cloud Functions. - [Go Analytics SDK](/docs/api-reference/analytics-sdk/analytics-sdk-go/) - lets you work with Snowplow enriched events in your Go environments. This SDK can be used with AWS Lambda and Google Cloud Functions. - [Python Analytics SDK](/docs/api-reference/analytics-sdk/analytics-sdk-python/) - lets you work with Snowplow enriched events in your Python event processing, data modeling and machine-learning jobs. You can use this SDK with Apache Spark, AWS Lambda, GCP Cloud Functions and other Python-compatible data processing frameworks. - [.NET Analytics SDK](/docs/api-reference/analytics-sdk/analytics-sdk-net/) - lets you work with Snowplow enriched events in your .NET event processing, data modeling and machine-learning jobs. You can use this SDK with Azure Data Lake Analytics, Azure Function, AWS Lambda, GCP Cloud Functions other .NET-compatible data processing frameworks. --- # Dataflow Runner for AWS EMR clusters > CLI tool for launching and managing AWS EMR clusters with templated playbooks for Hadoop and Spark jobs with distributed locking support. > Source: https://docs.snowplow.io/docs/api-reference/dataflow-runner/ Dataflow Runner is a system for creating and running [AWS EMR](https://aws.amazon.com/emr/) jobflow clusters and steps. It uses templated playbooks to define your cluster, and the Hadoop/Spark/et al jobs that you want to run. ### Installation - Platform native binaries are available from the GitHub releases [page](https://github.com/snowplow/dataflow-runner/releases). - Docker images are available at [DockerHub](https://hub.docker.com/r/snowplow/dataflow-runner) as of version `0.7.3`. ### Cluster Configuration A cluster configuration contains all of the information needed to create a new cluster which is ready to accept a playbook. Currently AWS EMR is the only supported data-flow fabric. For the cluster template see: [config/cluster.json.sample](https://github.com/snowplow/dataflow-runner/blob/master/config/cluster.json.sample) ### Playbook Configuration A playbook consists of one of more _steps_. Steps are added to the cluster and run in series. For the playbook template see: [config/playbook.json.sample](https://github.com/snowplow/dataflow-runner/blob/master/config/playbook.json.sample) ### Templates Configuration files are run through Golang’s [text template processor](http://golang.org/pkg/text/template/). The template processor can access all _variables_ defined on the command line using the `--vars` argument. For example to use the `--vars` argument with a playbook step: ```json { "type": "CUSTOM_JAR", "name": "Combine Months", "actionOnFailure": "CANCEL_AND_WAIT", "jar": "s3://snowplow-hosted-assets/3-enrich/hadoop-event-recovery/snowplow-hadoop-event-recovery-0.2.0.jar", "arguments": [ "com.snowplowanalytics.hadoop.scalding.SnowplowEventRecoveryJob", "--hdfs", "--input", "hdfs:///local/monthly/{{.inputVariable}}", "--output", "hdfs:///local/recovery/{{.outputVariable}}" ] } ``` You would then pass the following command: ```bash host> ./dataflow-runner run --emr-playbook ${emr-playbook-path} --emr-cluster j-2DPBXD87LSGP9 --vars inputVariable,input,outputVariable,output ``` This would resolve to: ```json { "type": "CUSTOM_JAR", "name": "Combine Months", "actionOnFailure": "CANCEL_AND_WAIT", "jar": "s3://snowplow-hosted-assets/3-enrich/hadoop-event-recovery/snowplow-hadoop-event-recovery-0.2.0.jar", "arguments": [ "com.snowplowanalytics.hadoop.scalding.SnowplowEventRecoveryJob", "--hdfs", "--input", "hdfs:///local/monthly/input", "--output", "hdfs:///local/recovery/output" ] } ``` The following custom functions are also supported: - `nowWithFormat [timeFormat]`: where `timeFormat` is a valid Golang [time format](http://golang.org/pkg/time/#Time.Format) - `timeWithFormat [epoch] [timeFormat]`: where `epoch` is the number of seconds elapsed between January 1st 1970 and a certain point in time as a string and `timeFormat` is valid Golang [time format](http://golang.org/pkg/time/#Time.Format) - `systemEnv "ENV_VAR"`: where `ENV_VAR` is a key for a valid environment variable - `base64 [string]`: will base64-encode the string passed as argument - `base64File "path/to/file.txt"`: will base64-encode the content of the file located at the path passed as argument ### CLI Commands There are several commands that can be used to manage your data-flow fabric. #### `up`: Launches a new EMR cluster ```text NAME: dataflow-runner up - Launches a new EMR cluster USAGE: dataflow-runner up [command options] [arguments...] OPTIONS: --emr-config value EMR config path --vars value Variables that will be used by the templater ``` This command will launch a new cluster ready for step execution, the output should look something like the following: ```text NAME: dataflow-runner run - Adds jobflow steps to a running EMR cluster USAGE: dataflow-runner run [command options] [arguments...] OPTIONS: --emr-playbook value Playbook path --emr-cluster value Jobflow ID --async Asynchronous execution of the jobflow steps --lock value Path to the lock held for the duration of the jobflow steps. This is materialized by a file or a KV entry in Consul depending on the --consul flag. --softLock value Path to the lock held for the duration of the jobflow steps. This is materialized by a file or a KV entry in Consul depending on the --consul flag. Released no matter if the operation failed or succeeded. --consul value Address of the Consul server used for distributed locking --vars value Variables that will be used by the templater ``` #### `run`: Adds jobflow steps to a running EMR cluster ```bash host> ./dataflow-runner up --emr-config ${emr-config-path} INFO[0001] Launching EMR cluster with name 'dataflow-runner - sample name'... INFO[0001] EMR cluster is in state STARTING - need state WAITING, checking again in 20 seconds... INFO[0021] EMR cluster is in state STARTING - need state WAITING, checking again in 20 seconds... # this goes for a few lines, omitted for brevity INFO[0227] EMR cluster is in state STARTING - need state WAITING, checking again in 20 seconds... INFO[0248] EMR cluster is in state BOOTSTRAPPING - need state WAITING, checking again in 20 seconds... INFO[0269] EMR cluster is in state BOOTSTRAPPING - need state WAITING, checking again in 20 seconds... INFO[0289] EMR cluster is in state BOOTSTRAPPING - need state WAITING, checking again in 20 seconds... INFO[0310] EMR cluster launched successfully; Jobflow ID: j-2DPBXD87LSGP9 ``` This command adds new steps to the already running cluster. By default this command is blocking - however if you wish to submit and forget simply supply the `--async` argument, the output should look something like the following: ```bash host> ./dataflow-runner run --emr-playbook ${emr-playbook-path} --emr-cluster j-2DPBXD87LSGP9 INFO[0310] Successfully added 2 steps to the EMR cluster with jobflow id 'j-2DPBXD87LSGP9'... ERRO[0357] Step 'Combine Months' with id 's-9WZ0VFKC770J' was FAILED ERRO[0358] Step 'Combine Months 2' with id 's-37F9PKSXBHDAU' was CANCELLED ERRO[0358] 2/2 steps failed to complete successfully ``` In this case the first step failed which meant that the second step was cancelled. This behavior is dependent on your `actionOnFailure` - you can choose either to: 1. “CANCEL\_AND\_WAIT”: This will cancel all other currently queued jobs and return the cluster to a waiting state ready for new job submissions. 2. “CONTINUE”: This will go to the next step regardless if it failed or not. **Note**: We have removed the ability to terminate the job flow on failure, to terminate you will need to use the `down` command. Additionally, Dataflow Runner can acquire a lock before starting the job which can prevent other jobs from running at the same time. Its release will happen when: - the job has terminated (whether successfully or with failure) with the `--softLock` flag - the job has succeeded with the `--lock` flag (“hard lock”) As the above implies, if a job were to fail and the `--lock` flag was used, manual cleaning of the lock will be required. Additionally, supplying a [Consul](https://www.consul.io/) address, through the `--consul` flag will make this lock distributed. When the `--consul` flag is used, the lock will be materialized by a key-value pair in Consul for which the key is the value supplied with the `--lock` or `--softLock` argument. Otherwise, it will be materialized by a file on the machine located at the specified path (either relative to your working directory or absolute). #### `down`: Terminates a running EMR cluster ```text NAME: dataflow-runner down - Terminates a running EMR cluster USAGE: dataflow-runner down [command options] [arguments...] OPTIONS: --emr-config value EMR config path --emr-cluster value Jobflow ID --vars value Variables that will be used by the templater ``` When you are done with the EMR cluster you can terminate it by using the `down` command. This takes the original emr configuration and the job flow id to then go and terminate the cluster, the output should look something like the following: ```bash host> ./dataflow-runner down --emr-config ${emr-config-path} --emr-cluster j-2DPBXD87LSGP9 INFO[0358] Terminating EMR cluster with jobflow id 'j-2DPBXD87LSGP9'... INFO[0358] EMR cluster is in state TERMINATING - need state TERMINATED, checking again in 20 seconds... INFO[0378] EMR cluster is in state TERMINATING - need state TERMINATED, checking again in 20 seconds... INFO[0399] EMR cluster is in state TERMINATING - need state TERMINATED, checking again in 20 seconds... INFO[0420] EMR cluster is in state TERMINATING - need state TERMINATED, checking again in 20 seconds... INFO[0440] Transient EMR run completed successfully ``` #### `run-transient`: Launches, runs and then terminates an EMR cluster ```text NAME: dataflow-runner run-transient - Launches, runs and then terminates an EMR cluster USAGE: dataflow-runner run-transient [command options] [arguments...] OPTIONS: --emr-config value EMR config path --emr-playbook value Playbook path --lock value Path to the lock held for the duration of the jobflow steps. This is materialized by a file or a KV entry in Consul depending on the --consul flag. --softLock value Path to the lock held for the duration of the jobflow steps. This is materialized by a file or a KV entry in Consul depending on the --consul flag. Released no matter if the operation failed or succeeded. --consul value Address of the Consul server used for distributed locking --vars value Variables that will be used by the templater ``` This command is a combination of `up`, `run` and `down` which is designed to mimic the current `EmrEtlRunner` behavior. --- # Enrich configuration reference > Complete HOCON configuration reference for Snowplow Enrich applications including monitoring, validation, and stream-specific settings. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/configuration-reference/ This page lists the configuration options for Enrich applications. ## License Enrich is released under the [Snowplow Limited Use License](/limited-use-license-1.1/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run Enrich, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```json "license": { "accept": true } ``` ## Common parameters | parameter | description | | --------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `cpuParallelismFraction` (since _6.0.0_) | Optional. Default: `1`. Controls how the app splits the workload into concurrent batches which can be run in parallel. E.g. If there are 4 available processors, and cpuParallelismFactor = 0.75, then we process 3 batches concurrently. Adjusting this value can cause the app to use more or less of the available CPU. | | `sinkParallelismFraction` (since _6.0.0_) | Optional. Default: `2`. Controls number of sink job that can be run in parallel. E.g. If there are 4 available processors, and sinkParallelismFraction = 2, then we run 8 sink job concurrently. Adjusting this value can cause the app to use more or less of the available CPU. | | `assetsUpdatePeriod` | Optional. E.g. `7 days`. Period after which enrich assets (e.g. the maxmind database for the IpLookups enrichment) should be checked for updates. Assets will never be updated if this key is missing. | | `monitoring.sentry.dsn` | Optional. E.g. `http://sentry.acme.com`. To track uncaught runtime exceptions in Sentry. | | `monitoring.sentry.environment` (since _6.7.0_) | Optional. Environment name to use when reporting exceptions in Sentry. | | `monitoring.sentry.tags.*` | Optional. A map of key/value strings which are passed as tags when reporting exceptions to Sentry. | | `monitoring.metrics.statsd.hostname` | Optional. E.g. `localhost`. Hostname of the StatsD server to send enrichment metrics (latency and event counts) to. | | `monitoring.metrics.statsd.port` | Optional. E.g. `8125`. Port of the StatsD server. | | `monitoring.metrics.statsd.period` | Optional. E.g. `10 seconds`. How frequently to send metrics to StatsD server. | | `monitoring.metrics.statsd.tags` | Optional. E.g. `{ "env": "prod" }`. Key-value pairs attached to each metric sent to StatsD to provide contextual information. | | `monitoring.metrics.statsd.prefix` | Optional. Default: `snowplow.enrich`. Pefix of StatsD metric names. | | `monitoring.healthProbe.port` (since _6.0.0_) | Optional. Default: `8000`. Open a HTTP server that returns OK only if the app is healthy. | | `monitoring.healthProbe.unhealthyLatency` (since _6.0.0_) | Optional. Default: `2 minutes`. Health probe becomes unhealthy if any received event is still not fully processed before this cutoff time. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `validation.acceptInvalid` (since _6.0.0_) | Optional. Default: `false`. Enrich _6.0.0_ introduces the validation of the enriched events against atomic schema before emitting. If set to `false`, a failed event will be emitted instead of the enriched event if validation fails. If set to `true`, invalid enriched events will be emitted, as before. | | `validation.atomicFieldsLimits` (since _4.0.0_) | Optional. For the defaults, see [here](https://github.com/snowplow/enrich/blob/master/modules/common/src/main/resources/reference.conf). Configuration for custom maximum atomic fields (strings) length. It's a map-like structure with keys being atomic field names and values being their max allowed length. | | `validation.maxJsonDepth` (since _6.0.0_) | Optional. Default: `40`. Maximum allowed depth for the JSON entities in the events. Event will be sent to bad row stream if it contains JSON entity with a depth that exceeds this value. | | `validation.exitOnJsCompileError` (since _6.0.0_) | Optional. Default: `true`. If it is set to true, Enrich will exit with error if JS enrichment script is invalid. If it is set to false, Enrich will continue to run if JS enrichment script is invalid but every event will end up as bad row. | | `decompression.maxBytesInBatch` (since _6.1.0_) | Optional. Default: `10000000` (10MB). Although a compressed message from the Collector is limited to 1MB, it could become several times bigger after decompression. To avoid loading an enormous amount of data into memory, Enrich will decompress the message in portions (batches). This parameter specifies the maximum size of such a batch. As soon as the decompressed batch reaches `maxBytesInBatch`, it is a emitted for further processing, and a new batch is started. | | `decompression.maxBytesSinglePayload` (since _6.1.0_) | Optional. Default: `10000000` (10 MB). Each compressed Collector message contains a number of payloads, which contain one or more events. While the Collector already enforces some payload size limits, this setting exists as a safety check to prevent Enrich from loading large amounts of data into memory. Specifically, if an individual payload exceeds `maxBytesSinglePayload`, it will result in a [size violation](/docs/api-reference/failed-events/#size-violation). | | `http.client.requestTimeout` (since _6.5.0_) | Optional. Default: `5 seconds`. Timeout for internal HTTP requests used by Iglu resolver, alerts, telemetry, and metadata endpoints. | | `iglu.maxRetry` (since _6.5.0_) | Optional. Default: `2`. Maximum number of retries for failed Iglu requests. Lower values allow Enrich to fail faster when Iglu Server is unavailable, falling back to cached schemas instead of blocking on retries. | | `iglu.maxWait` (since _6.5.0_) | Optional. Default: `1 second`. Maximum wait time for exponential backoff between Iglu request retries. | | `jsAllowedJavaClasses` (since _6.6.0_) | Optional. Default: `["*"]`. List of Java classes that the [JavaScript enrichment](/docs/pipeline/enrichments/available-enrichments/custom-javascript-enrichment/) is allowed to access. This affects both `new ` and `.` usage. Examples: `["java.lang.String", "java.net.URL"]`, `["java.net.*"]`. By default, all classes are allowed. | ## enrich-pubsub A minimal configuration file can be found on the [Github repo](https://github.com/snowplow/enrich/blob/master/config/config.pubsub.minimal.hocon), as well as a [comprehensive one](https://github.com/snowplow/enrich/blob/master/config/config.pubsub.reference.hocon). | parameter | description | | ---------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.subscription` | Required. E.g. `projects/example-project/subscriptions/collectorPayloads`. PubSub subscription identifier for the collector payloads. | | `input.durationPerAckExtension` | Optional. Default: `15 seconds`. Pubsub ack deadlines are extended for this duration when needed. | | `input.minRemainingAckDeadline` | Optional. Default: `0.1`. Controls when ack deadlines are re-extended, for a message that is close to exceeding its ack deadline. For example, if `durationPerAckExtension` is `60 seconds` and `minRemainingAckDeadline` is `0.1` then the Source will wait until there is `6 seconds` left of the remining deadline, before re-extending the message deadline. | | `input.retries.transientErrors.delay` (since _6.2.0_) | Optional. Default: `100 millis`. Backoff delay for follow-up attempts of transient GRPC failures retries. | | `input.retries.transientErrors.attempts` (since _6.2.0_) | Optional. Default: `10`. Max number of attempts for transient GRPC failures retries. | | `output.good.topic` | Required. E.g. `projects/example-project/topics/enriched`. Name of the PubSub topic that will receive the enriched events. | | `output.good.attributes` | Optional. Enriched event fields to add as PubSub message attributes. For example, if this is `[ "app_id" ]` then the enriched event's `app_id` field will be an attribute of the PubSub message, as well as being a field within the enriched event. | | `output.good.batchSize` | Optional. Default: `100`. Enriched events are sent to pubsub in batches not exceeding this size. | | `output.good.requestByteThreshold` | Optional. Default: `1000000`. Enriched events are sent to pubsub in batches not exceeding this size number of bytes. | | `output.good.retries.transientErrors.delay` (since _6.2.0_) | Same as `input.retries.transientErrors.delay` for good events. | | `output.good.retries.transientErrors.attempts` (since _6.2.0_) | Same as `input.retries.transientErrors.attempts` for good events. | | `output.failed.topic` | Required. E.g. `projects/example-project/topics/failed`. Name of the PubSub topic that will receive the failed events (same format as the enriched events). | | `output.failed.batchSize` | Same as `output.good.batchSize` for failed events. | | `output.failed.requestByteThreshold` | Same as `output.good.requestByteThreshold` for failed events. | | `output.failed.retries.transientErrors.delay` (since _6.2.0_) | Same as `input.retries.transientErrors.delay` for failed events. | | `output.failed.retries.transientErrors.attempts` (since _6.2.0_) | Same as `input.retries.transientErrors.attempts` for failed events. | | `output.bad.topic` | Required. E.g. `projects/example-project/topics/bad`. Name of the PubSub topic that will receive the failed events in the "bad row" format (JSON). | | `output.bad.batchSize` | Same as `output.good.batchSize` for failed events in the "bad row" format (JSON). | | `output.bad.requestByteThreshold` | Same as `output.good.requestByteThreshold` for failed events in the "bad row" format (JSON). | | `output.bad.retries.transientErrors.delay` (since _6.2.0_) | Same as `input.retries.transientErrors.delay` for failed events in the "bad row" format (JSON). | | `output.bad.retries.transientErrors.attempts` (since _6.2.0_) | Same as `input.retries.transientErrors.attempts` for failed events in the "bad row" format (JSON). | ## enrich-kinesis A minimal configuration file can be found on the [Github repo](https://github.com/snowplow/enrich/blob/master/config/config.kinesis.minimal.hocon), as well as a [comprehensive one](https://github.com/snowplow/enrich/blob/master/config/config.kinesis.reference.hocon). | parameter | description | | ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.appName` | Optional. Default: `snowplow-enrich-kinesis`. Name of the application which the KCL daemon should assume. A DynamoDB table with this name will be created. | | `input.streamName` | Required. E.g. `raw`. Name of the Kinesis stream with the collector payloads to read from. | | `input.initialPosition.type` | Optional. Default: `TRIM_HORIZON`. Set the initial position to consume the Kinesis stream. Possible values: `LATEST` (most recent data), `TRIM_HORIZON` (oldest available data), `AT_TIMESTAMP` (start from the record at or after the specified timestamp). | | `input.initialPosition.timestamp` | Required for `AT_TIMESTAMP`. E.g. `2020-07-17T10:00:00Z`. | | `input.retrievalMode.type` | Optional. Default: `Polling`. Set the mode for retrieving records. Possible values: `Polling` or `FanOut`. | | `input.retrievalMode.maxRecords` | Required for `Polling`. Default: `1000`. Maximum size of a batch returned by a call to `getRecords`. Records are checkpointed after a batch has been fully processed, thus the smaller `maxRecords`, the more often records can be checkpointed into DynamoDb, but possibly reducing the throughput. | | `input.retrievalMode.idleTimeBetweenReads` | Optional for `Polling`. Default: `200 millis`. Idle time between `getRecords` requests. | | `input.workerIdentifier` (since _6.0.0_) | Required. Name of this KCL worker used in the DynamoDB lease table. | | `input.leaseDuration` (since _6.0.0_) | Optional. Default: `10 seconds`. Duration of shard leases. KCL workers must periodically refresh leases in the DynamoDB table before this duration expires. | | `input.maxLeasesToStealAtOneTimeFactor` (since _6.0.0_) | Optional. Default: `2.0`. Controls how to pick the max number of leases to steal at one time. E.g. If there are 4 available processors, and maxLeasesToStealAtOneTimeFactor = 2.0, then allow the KCL to steal up to 8 leases. Allows bigger instances to more quickly acquire the shard-leases they need to combat latency. | | `input.checkpointThrottledBackoffPolicy.minBackoff` (since _6.0.0_) | Optional. Default: `100 millis`. Minimum backoff before retrying when DynamoDB provisioned throughput exceeded. | | `input.checkpointThrottledBackoffPolicy.maxBackoff` (since _6.0.0_) | Optional. Default: `1 second`. Maximum backoff before retrying when DynamoDB provisioned throughput limit exceeded. | | `input.debounceCheckpoints` (since _6.0.0_) | Optional. Default: `10 seconds`. How frequently to checkpoint our progress to the DynamoDB table. By increasing this value, we can decrease the write-throughput requirements of the DynamoDB table. | | `input.apiCallAttemptTimeout` (since _6.6.0_) | Optional. Default: `15 seconds`. Timeout for API call attempts to Kinesis, DynamoDB, and CloudWatch. | | `output.good.streamName` | Required. E.g. `enriched`. Name of the Kinesis stream to write to the enriched events. | | `output.good.partitionKey` | Optional. How the output stream will be partitioned in Kinesis. Events with the same partition key value will go to the same shard. Possible values: `event_id`, `event_fingerprint`, `domain_userid`, `network_userid`, `user_ipaddress`, `domain_sessionid`, `user_fingerprint`. If not specified, the partition key will be a random UUID. | | `output.good.throttledBackoffPolicy.minBackoff` (since _6.0.0_) | Optional. Default: `100 milliseconds`. Minimum backoff before retrying when writing fails with exceeded kinesis write throughput. | | `output.good.throttledBackoffPolicy.maxBackoff` (since _6.0.0_) | Optional. Default: `1 second`. Maximum backoff before retrying when writing fails with exceeded kinesis write throughput. | | `output.good.recordLimit` | Optional. Default: `500`. Maximum allowed to records we are allowed to send to Kinesis in 1 PutRecords request. | | `output.good.byteLimit` | Optional. Default: `5242880`. Maximum allowed to bytes we are allowed to send to Kinesis in 1 PutRecords request. | | `output.good.maxRetries` (since _6.3.0_) | Optional. Default: `10`. Maximum number of retries by Kinesis Client. | | `output.failed.streamName` | Required. E.g. `failed`. Name of the Kinesis stream that will receive the failed events (same format as the enriched events). | | `output.failed.throttledBackoffPolicy.minBackoff` (since _6.0.0_) | Same as `output.good.throttledBackoffPolicy.minBackoff` for failed events. | | `output.failed.throttledBackoffPolicy.maxBackoff` (since _6.0.0_) | Same as `output.good.throttledBackoffPolicy.maxBackoff` for failed events. | | `output.failed.recordLimit` | Same as `output.good.recordLimit` for failed events. | | `output.failed.byteLimit` | Same as `output.good.byteLimit` for failed events. | | `output.failed.maxRetries` (since _6.3.0_) | Same as `output.good.maxRetries` for failed events. | | `output.bad.streamName` | Required. E.g. `bad`. Name of the Kinesis stream that will receive the failed events in the "bad row" format (JSON). | | `output.bad.throttledBackoffPolicy.minBackoff` (since _6.0.0_) | Same as `output.good.throttledBackoffPolicy.minBackoff` for failed events in the "bad row" format (JSON). | | `output.bad.throttledBackoffPolicy.maxBackoff` (since _6.0.0_) | Same as `output.good.throttledBackoffPolicy.maxBackoff` for failed events in the "bad row" format (JSON). | | `output.bad.recordLimit` | Same as `output.good.recordLimit` for failed events in the "bad row" format (JSON). | | `output.bad.byteLimit` | Same as `output.good.byteLimit` for failed events in the "bad row" format (JSON). | | `output.bad.maxRetries` (since _6.3.0_) | Same as `output.good.maxRetries` for failed events in the "bad row" format (JSON). | ## enrich-kafka A minimal configuration file can be found on the [Github repo](https://github.com/snowplow/enrich/blob/master/config/config.kafka.minimal.hocon), as well as a [comprehensive one](https://github.com/snowplow/enrich/blob/master/config/config.kafka.reference.hocon). | parameter | description | | ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.topicName` | Required. Name of the Kafka topic to read collector payloads from. | | `input.bootstrapServers` | Required. A list of `host:port` pairs to use for establishing the initial connection to the Kafka cluster | | `input.debounceCommitOffsets` (since _6.0.0_) | Optional. Default: `10 seconds`. How frequently to commit our progress back to kafka. By increasing this value, we decrease the number of requests made to the kafka broker. | | `input.commitTimeout` (since _6.3.0_) | Optional. Default: `15 seconds`. The time to wait for offset commits to complete. If an offset commit doesn't complete within this time, a CommitTimeoutException will be raised instead. | | `input.consumerConf` | Optional. Kafka consumer configuration. See [the docs](https://kafka.apache.org/documentation/#consumerconfigs) for all properties. | | `output.good.topicName` | Required. Name of the Kafka topic to write to | | `output.good.bootstrapServers` | Required. A list of host:port pairs to use for establishing the initial connection to the Kafka cluster | | `output.good.producerConf` | Optional. Kafka producer configuration. See [the docs](https://kafka.apache.org/documentation/#producerconfigs) for all properties | | `output.good.partitionKey` | Optional. Enriched event field to use as Kafka partition key | | `output.good.attributes` | Optional. Enriched event fields to add as Kafka record headers | | `output.failed.topicName` | Optional. Name of the Kafka topic that will receive the failed events (same format as the enriched events) | | `output.failed.bootstrapServers` | Same as `output.good.bootstrapServers` for failed events. | | `output.failed.producerConf` | Same as `output.good.producerConf` for failed events. | | `output.bad.topicName` | Optional. Name of the Kafka topic that will receive the failed events in the “bad row” format (JSON) | | `output.bad.bootstrapServers` | Same as `output.good.bootstrapServers` for failed events in the "bad row" format (JSON). | | `output.bad.producerConf` | Same as `output.good.producerConf` for failed events in the "bad row" format (JSON). | | `blobClients.accounts` (since _6.0.0_) | Optional. Array of Azure Blob Storage accounts to download enrichment assets. | | Example values for the Azure storage accounts : | | - `{ "name": "storageAccount1"}`: public account with no auth - `{ "name": "storageAccount2", "auth": { "type": "default"} }`: private account using default auth chain - `{ "name": "storageAccount3", "auth": { "type": "sas", "value": "tokenValue"}}`: private account using SAS token auth ## enrich-nsq A minimal configuration file can be found on the [Github repo](https://github.com/snowplow/enrich/blob/master/config/config.nsq.minimal.hocon), as well as a [comprehensive one](https://github.com/snowplow/enrich/blob/master/config/config.nsq.reference.hocon). | parameter | description | | -------------------------------------- | ------------------------------------------------------------------------------------------------------------- | | `input.topic` | Required. Name of the NSQ topic with the collector payloads. | | `input.lookupHost` | Required. The host name of NSQ lookup application. | | `input.lookupPort` | Required. The port number of NSQ lookup application. | | `input.channel` | Optional. Default: `collector-payloads-channel`. Name of the NSQ channel used to retrieve collector payloads. | | `output.good.topic` | Required. Name of the NSQ topic that will receive the enriched events. | | `output.good.nsqdHost` | Required. The host name of nsqd application. | | `output.good.nsqdPort` | Required. The port number of nsqd application. | | `output.failed.topic` | Required. Name of the NSQ topic that will receive the failed events (same format as the enriched events). | | `output.failed.nsqdHost` | Required. The host name of nsqd application. | | `output.failed.nsqdPort` | Required. The port number of nsqd application. | | `output.bad.topic` | Required. Name of the NSQ topic that will receive the failed events in the "bad row" format (JSON). | | `output.bad.nsqdHost` | Required. The host name of nsqd application. | | `output.bad.nsqdPort` | Required. The port number of nsqd application. | | `blobClients.accounts` (since _6.0.0_) | Optional. Array of Azure Blob Storage accounts to download enrichment assets. | ## Enriched events validation against atomic schema Enriched events are expected to match [atomic](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0) schema. However, until `3.0.0`, it was never checked that the enriched events emitted by enrich were valid. If an event is not valid against `atomic` schema, a [failed event](/docs/fundamentals/failed-events/) should be emitted instead of the enriched event. However, this is a breaking change, and we want to give some time to users to adapt, in case today they are working downstream with enriched events that are not valid against `atomic`. For this reason, this new validation was added as a feature that can be deactivated like that: ```json "validation": { "acceptInvalid": true } ``` In this case, enriched events that are not valid against `atomic` schema will still be emitted as before, so that Enrich `3.0.0` can be fully backward compatible. It will be possible to know if the new validation would have had an impact by 2 ways: 1. A new metric `invalid_enriched` has been introduced. It reports the number of enriched events that were not valid against `atomic` schema. As the other metrics, it can be seen on stdout and/or StatsD. 2. Each time there is an enriched event invalid against `atomic` schema, a line will be logged with the failed event (add `-Dorg.slf4j.simpleLogger.log.InvalidEnriched=debug` to the `JAVA_OPTS` to see it). If `acceptInvalid` is set to `false`, a failed event will be emitted instead of the enriched event in case it's not valid against `atomic` schema. When we'll know that all our customers don't have any invalid enriched events any more, we'll remove the feature flags and it will be impossible to emit invalid enriched events. Since `4.0.0`, it is possible to configure the lengths of the atomic fields, below is an example: ```hcl { ... # Optional. Configuration section for various validation-oriented settings. "validation": { # Optional. Configuration for custom maximum atomic fields (strings) length. # Map-like structure with keys being field names and values being their max allowed length "atomicFieldsLimits": { "app_id": 5 "mkt_clickid": 100000 # ...and any other 'atomic' field with custom limit } } } ``` ## Enrichments The list of the enrichments that can be configured can be found on [this page](/docs/pipeline/enrichments/available-enrichments/). --- # Enrich Kafka for Azure deployments > Standalone JVM application for enriching Snowplow events from Kafka topics with configurable enrichments and validation for Azure deployments. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/enrich-kafka/ `enrich-kafka` is a standalone JVM application that reads from and writes to Kafka. It can be run from anywhere, as long as it can communicate with your Kafka cluster. It is published on Docker Hub and can be run with the following command: ```bash docker run \ -it --rm \ -v $PWD:/snowplow \ snowplow/snowplow-enrich-kafka:6.9.0 \ --enrichments /snowplow/enrichments \ --iglu-config /snowplow/resolver.json \ --config /snowplow/config.hocon ``` Above assumes that you have following directory structure: 1. `enrichments` directory, (possibly empty) with all [enrichment configuration JSONs](/docs/pipeline/enrichments/available-enrichments/) 2. Iglu Resolver [configuration JSON](/docs/api-reference/iglu/iglu-resolver/) 3. [configuration HOCON](/docs/api-reference/enrichment-components/configuration-reference/) It is possible to use environment variables in all of the above (for Iglu and enrichments starting from `3.7.0` only). Configuration guide can be found on [this page](/docs/api-reference/enrichment-components/configuration-reference/) and information about the monitoring on [this one](/docs/api-reference/enrichment-components/monitoring/). **Telemetry notice** By default, Snowplow collects telemetry data for Enrich Kafka (since version 3.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Enrich Kinesis for AWS streams > Standalone JVM application for enriching Snowplow events from AWS Kinesis streams with configurable enrichments and validation. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/enrich-kinesis/ `enrich-kinesis` is a standalone JVM application that reads from and writes to Kinesis streams. It can be run from anywhere, as long as it has permissions to access the streams. It is published on Docker Hub and can be run with the following command: ```bash docker run \ -it --rm \ -v $PWD:/snowplow \ -e AWS_ACCESS_KEY_ID=xxx \ -e AWS_SECRET_ACCESS_KEY=xxx \ snowplow/snowplow-enrich-kinesis:6.9.0 \ --enrichments /snowplow/enrichments \ --iglu-config /snowplow/resolver.json \ --config /snowplow/config.hocon ``` Above assumes that you have following directory structure: - `enrichments` directory, (possibly empty) with all [enrichment configuration JSONs](/docs/pipeline/enrichments/available-enrichments/) - Iglu Resolver [configuration JSON](/docs/api-reference/iglu/iglu-resolver/) - [configuration HOCON](/docs/api-reference/enrichment-components/configuration-reference/) It is possible to use environment variables in all of the above (for Iglu and enrichments starting from `3.7.0` only). Depending on where the app runs, `AWS_ACCESS_KEY` and `AWS_SECRET_KEY` might not be required. Configuration guide can be found on [this page](/docs/api-reference/enrichment-components/configuration-reference/) and information about the monitoring on [this one](/docs/api-reference/enrichment-components/monitoring/). **Telemetry notice** By default, Snowplow collects telemetry data for Enrich Kinesis (since version 3.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Enrich NSQ for cloud-agnostic applications > Cloud-agnostic standalone JVM application for enriching Snowplow events from NSQ with configurable enrichments and validation. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/enrich-nsq/ `enrich-nsq` is a standalone JVM application that reads from and writes to NSQ. It can be run from anywhere, as long as it can communicate with your NSQ cluster. It is published on Docker Hub and can be run with the following command: ```bash docker run \ -it --rm \ -v $PWD:/snowplow \ snowplow/snowplow-enrich-nsq:6.9.0 \ --enrichments /snowplow/enrichments \ --iglu-config /snowplow/resolver.json \ --config /snowplow/config.hocon ``` Above assumes that you have following directory structure: 1. `enrichments` directory, (possibly empty) with all [enrichment configuration JSONs](/docs/pipeline/enrichments/available-enrichments/) 2. Iglu Resolver [configuration JSON](/docs/api-reference/iglu/iglu-resolver/) 3. [configuration HOCON](/docs/api-reference/enrichment-components/configuration-reference/) It is possible to use environment variables in all of the above. Configuration guide can be found on [this page](/docs/api-reference/enrichment-components/configuration-reference/) and information about the monitoring on [this one](/docs/api-reference/enrichment-components/monitoring/). **Telemetry notice** By default, Snowplow collects telemetry data for Enrich NSQ (since version 3.8.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Enrich PubSub for Google Cloud > Standalone JVM application for enriching Snowplow events from Google Cloud PubSub with configurable enrichments and validation. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/enrich-pubsub/ `enrich-pubsub` is a standalone JVM application that reads from and writes to PubSub topics. It can be run from anywhere, as long as it has permissions to access the topics. It is published on Docker Hub and can be run with the following command: ```bash docker run \ -it --rm \ -v $PWD:/snowplow \ -e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/snowplow-gcp-account-11aa55ff6b1b.json \ snowplow/snowplow-enrich-pubsub:6.9.0 \ --enrichments /snowplow/enrichments \ --iglu-config /snowplow/resolver.json \ --config /snowplow/config.hocon ``` Above assumes that you have following directory structure: 1. GCP credentials [JSON file](https://cloud.google.com/docs/authentication/getting-started) 2. `enrichments` directory, (possibly empty) with all [enrichment configuration JSONs](/docs/pipeline/enrichments/available-enrichments/) 3. Iglu Resolver [configuration JSON](/docs/api-reference/iglu/iglu-resolver/) 4. enrich-pubSub [configuration HOCON](/docs/api-reference/enrichment-components/configuration-reference/) It is possible to use environment variables in all of the above (for Iglu and enrichments starting from `3.7.0` only). Configuration guide can be found on [this page](/docs/api-reference/enrichment-components/configuration-reference/) and information about the monitoring on [this one](/docs/api-reference/enrichment-components/monitoring/). **Telemetry notice** By default, Snowplow collects telemetry data for Enrich PubSub (since version 3.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Enrich applications for different platforms > Technical documentation for Snowplow enrichment applications that validate and enrich collector payloads on Kinesis, PubSub, Kafka, and NSQ. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/ This is the technical documentation for the enrichment step. If you are not familiar yet with this step of the pipeline, please refer to [this page](/docs/pipeline/enrichments/). Here is the list of the enrichment assets: ## [enrich-kinesis](/docs/api-reference/enrichment-components/enrich-kinesis/) (AWS) Standalone JVM application that reads collector payloads events from a Kinesis stream and outputs back to Kinesis. ## [enrich-pubsub](/docs/api-reference/enrichment-components/enrich-pubsub/) (GCP) Standalone JVM application that reads collector payloads from a PubSub subscription and outputs back to PubSub. ## [enrich-kafka](/docs/api-reference/enrichment-components/enrich-kafka/) (Azure) Standalone JVM application that reads collector payloads from a Kafka topic and outputs back to Kafka. ## [enrich-nsq](/docs/api-reference/enrichment-components/enrich-nsq/) (cloud agnostic) Standalone JVM application that reads collector payloads from NSQ and outputs back to NSQ. --- # Monitoring in Enrich applications > Monitor Snowplow Enrich applications with StatsD metrics for event counts, latency tracking, and health probes. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/monitoring/ Enrich app has monitoring built in, to help the pipeline operator. ## Statsd [Statsd](https://github.com/statsd/statsd) is a daemon that aggregates and summarizes application metrics. It receives metrics sent by the application over UDP, and then periodically flushes the aggregated metrics to a [pluggable storage backend](https://github.com/statsd/statsd/blob/master/docs/backend.md). Enrich can periodically emit event-based metrics to a statsd daemon. Here is a string representation of the metrics it sends: ```text snowplow.enrich.raw:42|c|#tag1:value1 snowplow.enrich.good:30|c|#tag1:value1 snowplow.enrich.failed:10|c|#tag1:value1 snowplow.enrich.bad:12|c|#tag1:value1 snowplow.enrich.e2e_latency_millis:123.4|g|#tag1:value1 snowplow.enrich.latency_millis:123.4|g|#tag1:value1 snowplow.enrich.invalid_enriched:0|c|#tag1:value1 ``` - `raw`: total number of raw collector payloads received. - `good`: total number of good events successfully enriched. - `failed`(`incomplete` before version _6.0.0_): total number of failed events due to schema violations or enrichment failures (if feature is enabled). - `bad`: total number of failed events, e.g. due to schema violations, invalid collector payload, or an enrichment failure. - `e2e_latency_millis`(`latency` before version _6.0.0_): time difference between the collector timestamp and time the event is emitted to the output stream - `latency_millis` (since _6.0.0_): delay between the input record getting written to the stream and Enrich starting to process it - `invalid_enriched`: number of enriched events that were not valid against [atomic](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0) schema Note, the count metrics (`raw`, `good`, `bad` and `invalid_enriched`) refer to the updated count since the previous metric was emitted. A collector payload can carry multiple events, so it is possible for `good` to be larger than `raw`. The latency metrics (`e2e_latency_millis` and `latency_millis`) refer to the maximum latency of all events since the previous metric was emitted. Statsd monitoring is configured by setting the `monitoring.metrics.statsd` section in [the hocon file](/docs/api-reference/loaders-storage-targets/s3-loader/configuration-reference/): ```json "monitoring": { "metrics": { "hostname": "localhost" "port": 8125 "tags": { "tag1": "value1" "tag2": "value2" } "prefix": "snowplow.enrich." "period": "10 seconds" } } ``` ## Sentry [Sentry](https://docs.sentry.io/) is a popular error monitoring service, which helps developers diagnose and fix problems in an application. Enrich can send an error report to sentry whenever something unexpected happens when trying to enrich an event. The reasons for the error can then be explored in the sentry server’s UI. Sentry monitoring is configured by setting the `monitoring.sentry.dsn` key in [the hocon file](/docs/api-reference/loaders-storage-targets/s3-loader/configuration-reference/) with the url of your sentry server: ```json "monitoring": { "dsn": "http://sentry.acme.com" } ``` --- # Enrich 4.0.x upgrade guide > Upgrade guide for Snowplow Enrich 4.0.x covering license acceptance, atomic field limits, and deprecated assets. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/upgrade-guides/4-0-x-upgrade-guide/ ## Breaking changes ### New license Since version 4.0.0, Enrich has been migrated to use the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). ### `stream-enrich` assets and `enrich-rabbitmq` deprecated As announced a while ago, these assets are now removed from the codebase. ## Upgrading ### License acceptance You have to explicitly accept the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To do so, either set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable, or update the following section in the configuration: ```hcl { license { accept = true } ... } ``` ### Atomic fields limits Several [atomic](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0) fields, such as `mkt_clickid` have length limits defined (in this case, 128 characters). Recent versions of Enrich enforce these limits, so that oversized data does not break loading into the warehouse columns. However, over time we’ve observed that valid data does not always fit these limits. For example, TikTok click ids can be up to 500 (or 1000, according to some sources) characters long. In this release, we are adding a way to configure the limits, and we are increasing the default limits for several fields: - `mkt_clickid` limit increased from `128` to `1000` - `page_url` limit increased from `4096` to `10000` - `page_referrer` limit increased from `4096` to `10000` Depending on your [configuration](/docs/api-reference/enrichment-components/configuration-reference/), this might be a breaking change: - If you have `featureFlags.acceptInvalid` set to `true` in Enrich, then you probably don’t need to worry, because you had no validation in the first place (although we do recommend to enable it). - If you have `featureFlags.acceptInvalid` set to `false` (default), then previously invalid events might become valid (which is a good thing), and you need to prepare your warehouse for this eventuality: - For Redshift, you should resize the respective columns, e.g. to `VARCHAR(1000)` for `mkt_clickid`. If you don’t, Redshift will truncate the values. - For Snowflake and Databricks, we recommend removing the VARCHAR limit altogether. Otherwise, loading might break with longer values. Alternatively, you can alter the Enrich configuration to revert the changes in the defaults. - For BigQuery, no steps are necessary. Below is an example of how to configure these limits: ```hcl { ... # Optional. Configuration section for various validation-oriented settings. "validation": { # Optional. Configuration for custom maximum atomic fields (strings) length. # Map-like structure with keys being field names and values being their max allowed length "atomicFieldsLimits": { "app_id": 5 "mkt_clickid": 100000 # ...and any other 'atomic' field with custom limit } } } ``` --- # Enrich 6.0.x upgrade guide > Upgrade guide for Snowplow Enrich 6.0.x covering common-streams refactoring, configuration changes, and deprecated features. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/upgrade-guides/6-0-x-upgrade-guide/ In version 6.0.0, Enrich is refactored to use [common-streams](https://github.com/snowplow-incubator/common-streams) libraries under the hood. [common-streams](https://github.com/snowplow-incubator/common-streams) is the collection of libraries that contains streaming-related constructs commonly used across many Snowplow streaming applications. [common-streams](https://github.com/snowplow-incubator/common-streams) allows for the adjustment of many different settings. It also provides default values for most of these settings, which are battle-tested. Therefore, we recommend using default values whenever it is possible. You can find more information about defaults in [configuration reference](/docs/api-reference/enrichment-components/configuration-reference/). Also, we took this opportunity to make a few breaking changes. Here are the changes: ### Config Field Changes In version 6.0.0, some of the config fields are renamed or moved to a different section. Here are these changes: - `incomplete` stream config field is renamed to `failed`. - `acceptInvalid` and `exitOnJsCompileError` fields under `featureFlags` section are moved under the `validation` section. - `experimental.metadata` section is moved to the root level. - In enrich-kafka, `output.good.headers` field is renamed to `output.good.attributes`. - In enrich-kafka, `blobStorage.azureStorage.accounts` section is moved to the `blobClients.accounts`. - In enrich-kafka, we are now using [static membership](https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances) for the consumer, to reduce rebalancing in case of pod restart or crash. The default value for `group.instance.id` is set to the host name. ### Feature Deprecations - Output `pii` stream is removed as in our experience it is not used. There will no longer be an option to write `pii_transformation` events to an extra output stream. - Remote adapters are removed. This was another feature with little to no usage that allowed Enrich to support custom payloads (which would be sent to a configured URL for translation into the expected format). In practice, most of this can be already achieved with [Iglu Webhooks](/docs/sources/webhooks/iglu-webhook/). - Reading events from files and writing events to files is no longer supported. This has never been a viable option for production setups. - In enrich-kinesis, passing enrichment configs to the application via DynamoDB is no longer possible. - In enrich-kinesis, it is no longer possible to send KCL metrics to Cloudwatch. ## Metrics Changes Existing metrics will continue to be emitted. Three new metrics are added: - `failed`: Same value as the `incomplete` metric, for transition. The goal is to remove `incomplete` metric and to be consistent with the naming of streams/topics in configuration - `e2e_latency_millis`: Same value as the `latency` metric, for transition. The goal is to remove `latency` metric so that the naming is consistent across applications - `latency_millis`: Delay between the input record getting written to the stream and Enrich starting to process it Furthermore, the old `latency` metric has changed subtly. Before, it represented the latency of the most recently processed event. Now it refers to the _maximum latency of all events_ since the previous metric was emitted. --- # Upgrade guides for Enrich > Step-by-step guides for upgrading Snowplow Enrich applications to newer versions with breaking changes and migration instructions. > Source: https://docs.snowplow.io/docs/api-reference/enrichment-components/upgrade-guides/ This section contains information to help you upgrade to newer versions of Enrich. ## [📄️ 4.0.x upgrade guide](/docs/api-reference/enrichment-components/upgrade-guides/4-0-x-upgrade-guide/) [Upgrade guide for Snowplow Enrich 4.0.x covering license acceptance, atomic field limits, and deprecated assets.](/docs/api-reference/enrichment-components/upgrade-guides/4-0-x-upgrade-guide/) ## [📄️ 6.0.x upgrade guide](/docs/api-reference/enrichment-components/upgrade-guides/6-0-x-upgrade-guide/) [Upgrade guide for Snowplow Enrich 6.0.x covering common-streams refactoring, configuration changes, and deprecated features.](/docs/api-reference/enrichment-components/upgrade-guides/6-0-x-upgrade-guide/) --- # Failed event types > Reference guide for Snowplow failed event types including schema violations, enrichment failures, and loader errors, with recovery recommendations. > Source: https://docs.snowplow.io/docs/api-reference/failed-events/ This page lists all the possible types of [failed events](/docs/fundamentals/failed-events/). ## Where do failed events originate? While an event is being processed by the pipeline it is checked to ensure it meets the specific formatting or configuration expectations. These include checks like: does it match the schema it is associated with, were Enrichments successfully applied, and was the payload sent by the tracker acceptable. Generally, the [Collector](/docs/api-reference/stream-collector/) tries to write any payload to the raw stream, no matter its content, and no matter whether it is valid. This explains why many of the failure types are filtered out by the [Enrich](/docs/api-reference/enrichment-components/) application, and not any earlier. > **Note:** The Collector might receive events in batches. If something is wrong with the Collector payload as a whole (e.g. due to a [Collector payload format violation](#collector-payload-format-violation)), the generated failed event would represent an entire batch of Snowplow events. > > Once the Collector payload successfully reaches the validation and enrichment steps, it is split into its constituent events. Each of them would fail (or not fail) independently (e.g. due to an [enrichment failure](#enrichment-failure)). This means that each failed event generated at this stage represents a single Snowplow event. ## Schema violation This failure type is produced during the process of [validation and enrichment](/docs/pipeline/enrichments/). It concerns the [self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/) which can be attached to your snowplow event. **Details** In order for an event to be processed successfully: 1. There must be a schema in an [Iglu repository](/docs/api-reference/iglu/iglu-repositories/) corresponding to each self-describing event or entity. The enrichment app must be able to look up the schema in order to validate the event. 2. Each self-describing event or entity must conform to the structure described in the schema. For example, all required fields must be present, and all fields must be of the expected type. If your pipeline is generating schema violations, it might mean there is a problem with your tracking, or a problem with your [Iglu resolver](/docs/api-reference/iglu/iglu-resolver/) which lists where schemas should be found. The error details in the schema violation JSON object should give you a hint about what the problem might be. Snowplow customers should check in the Snowplow Console that all data structures are correct and have been [promoted to production](/docs/event-studio/data-structures/). Snowplow Self-Hosted users should check that the Enrichment app is configured with an [Iglu resolver file](/docs/api-reference/iglu/iglu-resolver/) that points to a repository containing the schemas. Next, check the tracking code in your custom application, and make sure the entities you are sending conform to the schema definition. Once you have fixed your tracking, you might want to also [recover the failed events](/docs/monitoring/recovering-failed-events/), to avoid any data loss. Because this failure is handled during enrichment, events in the real time good stream are free of this violation type. Schema violation schema can be found [here](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema). ## Enrichment failure This failure type is produced by the [Enrich](/docs/pipeline/enrichments/) application, and it represents any failure to enrich the event by one of your configured enrichments. **Details** There are many reasons why an enrichment will fail, but here are some examples: - You are using the [custom SQL enrichment](/docs/pipeline/enrichments/available-enrichments/custom-sql-enrichment/) but the credentials for accessing the database are wrong - You are using the [IP lookup enrichment](/docs/pipeline/enrichments/available-enrichments/ip-lookup-enrichment/) but have mis-configured the location of the MaxMind database - You are using the [custom API request enrichment](/docs/pipeline/enrichments/available-enrichments/custom-api-request-enrichment/) but the API server is not responding - The raw event contained an unstructured event field or a context field which was not valid JSON - An Iglu server responded with an unexpected error response, so the event schema could not be resolved If your pipeline is generating enrichment failures, it might mean there is a problem with your enrichment configuration. The error details in the enrichment failure JSON object should give you a hint about what the problem might be. Once you have fixed your enrichment configuration, you might want to also [recover the failed events](/docs/monitoring/recovering-failed-events/), to avoid any data loss. Because this failure is handled during enrichment, events in the real time good stream are free of this violation type. Enrichment failure schema can be found [here](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.badrows/enrichment_failures/jsonschema). ## Collector payload format violation This failure type is produced by the [Enrich](/docs/pipeline/enrichments/) application, when Collector payloads from the raw stream are deserialized. **Details** Violations could be: - Malformed HTTP requests - Truncation - Invalid query string encoding in URL - Path not respecting `/vendor/version` The most likely source of this failure type is bot traffic that has hit the Collector with an invalid HTTP request. Bots are prevalent on the web, so do not be surprised if your Collector receives some of this traffic. Generally you would ignore, and not try to recover, a Collector payload format violation, because it likely did not originate from a tracker or a webhook. Because this failure is handled during enrichment, events in the real time good stream are free of this violation type. Collector payload format violation schema can be found [here](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.badrows/collector_payload_format_violation/jsonschema). ## Adapter failure This failure type is produced by the [Enrich](/docs/pipeline/enrichments/) application, when it tries to interpret a Collector payload from the raw stream as a HTTP request from a [3rd party webhook](/docs/sources/webhooks/). > **Info:** Many adapter failures are caused by bot traffic, so do not be surprised to see some of them in your pipeline. **Details** The failure could be: 1. The vendor/version combination in the Collector URL is not supported. For example, imagine an HTTP request sent to `/com.sandgrod/v3` which is a mis-spelling of the [sendgrid adapter](http://sendgrid.com) endpoint. 2. The webhook sent by the 3rd party does not conform to the expected structure and list of fields for this webhook. For example, imagine the 3rd party webhook payload is updated and stops sending a field that it was sending before. Many adapter failures are caused by bot traffic, so do not be surprised to see some of them in your pipeline. However, if you believe you are missing data because of a misconfigured webhook, then you might try to fix the webhook and then [recover the failed events](/docs/monitoring/recovering-failed-events/). Because this failure is handled during enrichment, events in the real time good stream are free of this violation type. Adapter failure schema can be found [here](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.badrows/adapter_failures/jsonschema). ## Tracker protocol violation This failure type is produced by the [Enrich](/docs/pipeline/enrichments/) application, when an HTTP request does not conform to our [Snowplow Tracker Protocol](/docs/events/). **Details** Snowplow trackers send HTTP requests to the `/i` endpoint or the `/com.snowplowanalytics.snowplow/tp2` endpoint, and they are expected to conform to this protocol. Many tracker protocol violations are caused by bot traffic, so do not be surprised to see some of them in your pipeline. Another likely source is misconfigured query parameters if you are using the [pixel tracker](/docs/sources/pixel-tracker/). In this case you might try to fix your application sending events, and then [recover the failed events](/docs/monitoring/recovering-failed-events/). Because this failure is handled during enrichment, events in the real time good stream are free of this violation type. Tracker protocol violation schema can be found [here](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.badrows/tracker_protocol_violations/jsonschema). ## Size violation This failure type can be produced either by the [Collector](/docs/api-reference/stream-collector/) or by the [Enrich](/docs/pipeline/enrichments/) application. It happens when the size of the raw event or enriched event is too big for the output message queue. In this case it will be truncated and wrapped in a size violation failed event instead. **Details** Failures of this type cannot be [recovered](/docs/monitoring/recovering-failed-events/). The best you can do is to fix any application that is sending over-sized events. Because this failure is handled during collection or enrichment, events in the real time good stream are free of this violation type. The size violation schema can be found [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.badrows/size_violation/jsonschema/1-0-0). ## Loader parsing error This failure type can be produced by [any loader](/docs/api-reference/loaders-storage-targets/), if the enriched event in the real time good stream cannot be parsed as a canonical TSV event format. For example, if the row does not have enough columns (131 are expected) or the `event_id` is not a UUID. This error type is uncommon and unexpected, because it can only be caused by an invalid message in the stream of validated enriched events. **Details** This failure type cannot be [recovered](/docs/monitoring/recovering-failed-events/). The loader parsing error schema can be found [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.badrows/loader_parsing_error/jsonschema/2-0-0). ## Loader Iglu error This failure type can be produced by [any loader](/docs/api-reference/loaders-storage-targets/) and describes an error using the [Iglu](/docs/api-reference/iglu/) subsystem. **Details** For example: - A schema is not available in any of the repositories listed in the [Iglu resolver](/docs/api-reference/iglu/iglu-resolver/). - Some loaders (e.g. [RDB loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) and [Postgres loader](/docs/api-reference/loaders-storage-targets/snowplow-postgres-loader/)) make use of the "schema list" API endpoints, which are only implemented for an [Iglu server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) repository. A loader Iglu error will be generated if the schema is in a [static repo](/docs/api-reference/iglu/iglu-repositories/static-repo/) or [embedded repo](/docs/api-reference/iglu/iglu-repositories/jvm-embedded-repo/). - The loader cannot auto-migrate a database table. If a schema version is incremented from `1-0-0` to `1-0-1` then it is expected to be [a non-breaking change](/docs/api-reference/iglu/common-architecture/schemaver/), and many loaders (e.g. RDB loader) attempt to execute a `ALTER TABLE` statement to facilitate the new schema in the warehouse. But if the schema change is breaking (e.g. string field changed to integer field) then the database migration is not possible. This failure type cannot be [recovered](/docs/monitoring/recovering-failed-events/). Loader Iglu error schema can be found [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.badrows/loader_iglu_error/jsonschema/2-0-0). ## Loader recovery error (legacy) Only the [BigQuery repeater](/docs/api-reference/loaders-storage-targets/bigquery-loader/previous-versions/bigquery-loader-1.x/#snowplow-bigquery-repeater) generated this error. We call it "loader recovery error" because the purpose of the repeater was to recover from previously failed inserts. It represents the case when the software could not re-insert the row into the database due to a runtime failure or invalid data in a source. **Details** This failure type cannot be [recovered](/docs/monitoring/recovering-failed-events/). Loader recovery error schema can be found [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.badrows/loader_recovery_error/jsonschema/1-0-0) ## Loader runtime error This failure type can be produced by any loader and describes generally any runtime error that we did not catch. For example, a DynamoDB outage, or a null pointer exception. This error type is uncommon and unexpected, and it probably indicates a mistake in the configuration or a bug in the software. **Details** This failure type cannot be [recovered](/docs/monitoring/recovering-failed-events/). Loader runtime error schema can be found [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.badrows/loader_runtime_error/jsonschema/1-0-1). ## Relay failure This failure type is only produced by relay jobs, which transfer Snowplow data into a 3rd party platform. This error type is uncommon and unexpected, and it probably indicates a mistake in the configuration or a bug in the software. **Details** This failure type cannot be [recovered](/docs/monitoring/recovering-failed-events/). Relay failure schema can be found [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.badrows/relay_failure/jsonschema/1-0-0). ## Generic error This is a failure type for anything that does not fit into the other categories, and is unlikely enough that we have not created a special category. The failure error messages should give you a hint about what has happened. **Details** This failure type cannot be [recovered](/docs/monitoring/recovering-failed-events/). Generic error schema can be found [here](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.badrows/generic_error/jsonschema/1-0-0). --- # Core Iglu components and data structures > Platform-agnostic core components for Iglu with SchemaKey, SchemaVer, and SchemaCriterion data structures for consistent schema handling. > Source: https://docs.snowplow.io/docs/api-reference/iglu/common-architecture/iglu-core/ Iglu is designed to be not dependent on any particular programming language or platform. But there's growing set of applications beside clients and registries using different concepts originated from Iglu. To have consistent data structures and behavior among different applications, we're developing Iglu core libraries for different languages. ## Basic data structures All languages have their own unique features and particular Iglu Core implementation may or may not use these features. One common rule for all Iglu core implementations is to minimize dependencies. Ideally Iglu core should have no external dependencies. Another rule is to implement the required basic data structures (in form of classes, structs, ADTs or any other appropriate form) and functions. ### SchemaKey This data structure contains information about Self-describing datum, such as Snowplow unstructured event or context. It also should have related `parse` functions, which can parse `SchemaKey` from most common representation - Iglu URI (string with form of `iglu:com.acme/someschema/format/1-0-0`) and Iglu path (same, but without `iglu:` protocol part). Reverse `asString` function required as well. This also can include appropriate regular expressions to extract and validate schema key. Function for parsing `SchemaKey` from JSON Schemas is optional if there's no default JSON library like in JavaScript, but can be included within some interface. More information can be found in [Self-describing JSON Schemas](/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/) and [Self-describing JSONs](/docs/api-reference/iglu/common-architecture/self-describing-jsons/) wiki pages. ### SchemaMap This is almost isomorphic entity to `SchemaKey`, which also contains same information: vendor, name, format and version. But unlike `SchemaKey` it supposed to be attached only to Schemas instead of datums. In schemas same information usually has different representation and also version is always _full_ opposed to datum's possibly _partial_. ### SchemaVer This is a part of `SchemaKey` and `SchemaMap` with information about semantic Schema version, basically triplet of MODEL, REVISION, ADDITION. Like, `SchemaKey` it should contain `parse` function with regular expressions as well as `asString` method. It can either _full_ (e.g. `1-2-0`) or _partial_ (e.g. `1-?-?`) suited for schema inference. More information can be found in dedicated wiki page: [SchemaVer](/docs/api-reference/iglu/common-architecture/schemaver/). ### SchemaCriterion Last core data structure is `SchemaCriterion` which is a default way to filter Self-describing entities. Basically it represent `SchemaKey` divided into six parts, where last three (MODEL, REVISION, ADDITION) _can_ be unfilled, thus one can match all entities regardless parts which remain unfilled. `SchemaCriterion` also must contain regular expression, `parse` and `asString` (unfilled parts replaced with asterisks) functions. One other required function is `matches` which accepts `SchemaCriterion` and `SchemaKey` and returning boolean value indicating if key was matched. Bear in mind that criterions matching versions like `.../*-1-*` or `.../*-*-0` are absolutely valid, they're useful if want to match all initial Schemas. ## Implementations Currently we have only [Scala Iglu Core](https://github.com/snowplow/iglu/wiki/Scala-Iglu-Core) which can be considered as reference implementation. Among described above data structures it includes type classes and container classes to improve type-safety. These type classes and containers are completely optional in other implementations. --- # Iglu architecture and design principles > Technical design principles for the Iglu schema registry including self-describing JSON schemas, SchemaVer versioning, and schema resolution algorithms. > Source: https://docs.snowplow.io/docs/api-reference/iglu/common-architecture/ Iglu is built on a set of technical design decisions which are documented in this section. It is this set of design decisions that allow Iglu clients and repositories to interoperate. ## Common architecture aspects Please review the following design documents: - [Self-describing JSON Schemas](/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/) - simple extensions to JSON Schema which **semantically identify** and version a given JSON Schema - [Self-describing JSONs](/docs/api-reference/iglu/common-architecture/self-describing-jsons/) - a standardized JSON format which co-locates a reference to the instance's JSON Schema alongside the instance's data - [SchemaVer](/docs/api-reference/iglu/common-architecture/schemaver/) - how we semantically version schemas - [Schema resolution](/docs/api-reference/iglu/common-architecture/schema-resolution/) - our public algorithm for how we determine in which order we check Iglu repositories for a given schema --- # Schema resolution algorithm for Iglu clients > Standard schema resolution algorithm for Iglu clients with registry prioritization, caching, and lookup strategies. > Source: https://docs.snowplow.io/docs/api-reference/iglu/common-architecture/schema-resolution/ This page describes the Schema resolution algorithm which is standard for all Iglu clients. Currently only [Iglu Scala client](https://github.com/snowplow/iglu-scala-client) fully follow this algorithm, while other clients may miss some parts, but we're working on making their behavior consistent. ## 1. Prerequisites Before going further it is important to understand basic Iglu client configuration and essential concepts like Resolver, Registry (or Repository), Schema. Here is a quick overview of these concepts, if you're familiar with them you may want to skip this section. Iglu clients are configured via JSON object described in dedicated Schema called [resolver-config](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.iglu/resolver-config/jsonschema). Here we'll be using JSON resolver configuration which is platform independent and most wide-spread. ### 1.1 Resolver Resolver is an primary object of Iglu Client library, which contains all logic necessary to fetch requested Schema from appropriate registry (repository) and cache it properly. Resolver has two main properties: cache size (`cacheSize`) and list of registries (`repositories`). ### 1.2 Registries **NOTE:** term _repository_ was deprecated. _Registry_ is default term to use when referring to Schema storage. So far, we've not renamed all occurrences, so for now they can be used interchangeable. Each registry in resolver configuration has several values common for all types of registries, such as `name`, `vendorPrefixes` and `priority`. Also each registry has type, which is defined inside `connection` property. The only one important thing here about type of repository is that each type has its own priority hardcoded inside client library. Below we'll refer to this hard-coded priority by `classPriority` and to user-defined priority by `instancePriority` Usually, the "safer" registry - the higher `classPriority` it has, so local repositories are more preferable than remote. ### 1.3 Cache All Iglu clients use internal cache to store registry responses. By virtue of it, it is absolutely safe to launch Hadoop/Spark jobs with Iglu client embedded as it will not generate enormous amount of IO calls. #### 1.3.1 Cache algorithm Cache stores not just plain Schemas, but information about responses from each registry. It allows us to make different decisions depending on what exactly went wrong with particular request. Since Schema was successfuly fetched it will be stored until moment it get evicted by [LRU cache](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_Recently_Used_\(LRU\)) algorithm. This eviction it turn happens only if cache map reached its limit (defined in `cacheSize`) and particular Schema wasn't requested for longer time than all other. #### 1.3.2 Cache TTL Since version 0.5.0, Iglu Scala Client supports `cacheTtl` property. It is especially useful for real-time pipelines as they can store "failure" for very long time and TTL is a mechanism to ensure that day-long data won't go to bad stream. Note however that client also tries to re-resolve successfully fetched schemas, this allows operators to patch (re-upload) schemas without bringing pipeline down (although it is not recommended). `cacheTtl` is available since [`1-0-2` version](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-2) of resolver config. ## 2. Lookup algorithm Overall, Schema Resolution algorithm can be described by following flowchart: ![](/assets/images/schema-resolution-flowchart-156afc49078500123a19aaf9fab7e4e7.png) Few important things to note: - If registry responded with "NotFound" error - "missing" value will be cached and this repository won't be queried again, until this "missing" value not evicted by LRU-algorithm - If registry responded with error other than "NotFound", for example "TimeoutError", "NetworkError", "ServerFault" etc - "needToRetry" value will be cached and Resolver will give this registry 3 chances more. After three failed lookups - "missing" value will be cached - These "missing" and "needToRetry" values in cache are per-registry, not per-schema, which means if `registryA` responded "NotFound" for Schema `iglu:com.acme/event/jsonschema/1-0-0` and `registryB` responded with TimeoutError - resolver will immediately abandon `registryA` and keep try to query `registryB` for 3 more times. ## 3. Registry priority For each particular Schema lookup, registries will be prioritized. In other words they will be sorted according following input parameters (ordered by their significance): - `vendorPrefix` - Resolver always will look first into those registries which `vendorPrefix`es matches `SchemaKey`'s vendor. It **does not** mean registries with unmatched `vendorPrefix` will be skipped, it means they will be queried last. - `classPriority` - hardcoded in client library value for each type of registry. It means that whatever high priority (low integer value) was set up in configuration for a particular registry - it will be overridden by `classPriority`, so embedded repository will always be checked before HTTP (unless priority influenced by `vendorPrefix`) - `instancePriority` - user-defined value. Influence only repositories within same `classPriority`. One important thing to note is that both priorities (`classPriority` and `instancePriority`) order registries in ascending order. That means lower number means higher priority. Think of it as ascending list of number: `[1,2,3,4]` - smaller will be always first. --- # SchemaVer semantic versioning for schemas > Overview of a semantic versioning system for JSON schemas. > Source: https://docs.snowplow.io/docs/api-reference/iglu/common-architecture/schemaver/ _This page is adapted from the Snowplow Analytics blog post, [Introducing SchemaVer for semantic versioning of schemas](http://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/) ._ ### Overview With the advent of our new self-describing JSON Schemas, it became necessary to implement some kind of versioning to those JSON Schemas so they could evolve through time. Our approach is based on [semantic versioning](http://semver.org/) (SemVer for short) which, as a reminder, looks like this: `MAJOR.MINOR.PATCH` - `MAJOR` which you're supposed to use when you make backwards-incompatible API changes - `MINOR` when you add backwards-compatible functionalities - `PATCH` when you make backwards-compatible bug fixes As is, SemVer does not suit schema versioning well. Indeed, there is no such thing as bug fixes for a JSON Schema and the idea of an API doesn't really translate to JSON Schemas either. That's why we decided to introduce our own schema versioning notion: SchemaVer. SchemaVer is defined as follows: `MODEL-REVISION-ADDITION` - `MODEL` when you make a breaking schema change which will prevent interaction with _any_ historical data - `REVISION` when you introduce a schema change which _may_ prevent interaction with _some_ historical data - `ADDITION` when you make a schema change that is compatible with _all_ historical data ### Addition example By way of example, if we were to modify an existing JSON Schema representing an ad click with version `1-0-0` defined as follows: ```json { "$schema": "http://json-schema.org/schema#", "type": "object", "properties": { "bannerId": { "type": "string" } }, "required": ["bannerId"], "additionalProperties": false } ``` and introduce a new `impressionId` property to obtain the following JSON Schema: ```json { "$schema": "http://json-schema.org/schema#", "type": "object", "properties": { "bannerId": { "type": "string" }, "impressionId": { "type": "string" } }, "required": ["bannerId"], "additionalProperties": false } ``` Because the new `impressionId` is **not** a required property and because the `additionalProperties` in our `1-0-0` version was set to `false`, any historical data following the `1-0-0` schema will work with this new schema. According to our definition of SchemaVer, we are consequently looking at an `ADDITION` and the schema's version becomes `1-0-1`. ### Revision example If we continue with the same example, but modify the `additionalProperties` property to true to get the following schema: ```json { "$schema": "http://json-schema.org/schema#", "type": "object", "properties": { "bannerId": { "type": "string" }, "impressionId": { "type": "string" } }, "required": ["bannerId"], "additionalProperties": true } ``` We are now at version `1-0-2`. After a while, we decide to add a new `cost` property: ```json { "$schema": "http://json-schema.org/schema#", "type": "object", "properties": { "bannerId": { "type": "string" }, "impressionId": { "type": "string" }, "cost": { "type": "number", "minimum": 0 } }, "required": ["bannerId"], "additionalProperties": true } ``` The problem now is that since we modified the `additionalProperties` to true before adding the `cost` field, someone might have added another `cost` field in the meantime following a different set of rules (for example it could be an amount followed by the currency such as 1.00$, the effective type would be string and not number) and so we cannot be sure that this new schema validate all historical data. As a result, this new JSON Schema is a `REVISION` of the previous one, its version becomes `1-1-0`. ### Model example Times goes by and we choose to completely review our JSON Schema identifying an ad click only through a `clickId` property so our schema becomes: ```json { "$schema": "http://json-schema.org/schema#", "type": "object", "properties": { "clickId": { "type": "string" }, "cost": { "type": "number", "minimum": 0 } }, "required": ["clickId"], "additionalProperties": false } ``` The change is so important that we cannot realistically expect our historical data to interact with this new JSON Schema, consequently, the `MODEL` is changed and the schema's version becomes `2-0-0`. Another important thing to notice is that we switched the `additionalProperties` back to false in order to avoid unnecessary future revisions. ### Additional differences There are a few additional differences between our own SchemaVer and SemVer: - we use hyphens instead of periods to separate the components that make our SchemaVer - the versioning starts with `1-0-0` instead of `0.1.0` The design considerations behind those decisions can be found in the blog post on [SchemaVer](http://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/). --- # Self-describing JSON schemas and vendor metadata > JSON Schema extension with self property containing vendor, name, format, and version metadata for schema identification. > Source: https://docs.snowplow.io/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/ _This page is adapted from the Snowplow Analytics blog post, [Introducing self-describing JSONs](http://snowplowanalytics.com/blog/2014/05/15/introducing-self-describing-jsons/)._ With the explosion of possible event types due to Snowplow going from a web analytics to a general event analytics platform, it became necessary to give some coherence to the events sent in to Snowplow. Snowplow dealing only with JSON, we chose to rely on JSON Schemas. In addition to the usual JSON Schema we decided to make it self-describing by adding information we already knew about the schema such as: - `vendor` which tells us who created this JSON Schema - `name` which is the JSON Schema's name - `format` in our case this will be a JSON Schema - `version` which is the JSON Schema's version (using [SchemaVer](/docs/api-reference/iglu/common-architecture/schemaver/)) We encapsulated all this information in a `self` property. As an example, we would go from this JSON Schema: ```json { "$schema": "http://json-schema.org/schema#", "type": "object", "properties": { "bannerId": { "type": "string" } }, "required": ["bannerId"], "additionalProperties": false } ``` to this one: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "self": { "vendor": "com.snowplowanalytics", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "bannerId": { "type": "string" } }, "required": ["bannerId"], "additionalProperties": false } ``` incorporating the aforementioned `self` property. Notice that we also changed the `$schema` property to [our own JSON Schema](http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#) which enforces the `self` property. To make our JSONs self-describing we still have to reference this JSON Schema in our JSONs. This process is described in [Self-describing JSONs](/docs/api-reference/iglu/common-architecture/self-describing-jsons/). --- # Self-describing JSON format > Standardized JSON format linking data instances to their schemas via Iglu URI references in a schema field. > Source: https://docs.snowplow.io/docs/api-reference/iglu/common-architecture/self-describing-jsons/ _This page is adapted from the Snowplow Analytics blog post, [Introducing self-describing JSONs](http://snowplowanalytics.com/blog/2014/05/15/introducing-self-describing-jsons/)._ In this section, we will be describing the approach we chose to link together a JSON with its JSON Schema in order to make it self-describing. Instead of embedding the JSON Schema directly into the JSON itself which would be very wasteful in terms of space, we chose only to store a reference to its JSON Schema. For example, let's say we have a JSON representing a click on an ad like so: ```json { "bannerId": "4acd518feb82" } ``` which is supposed to conform to this [self-describing JSON Schema](/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/): ```json { "$schema": "http://json-schema.org/schema#", "self": { "vendor": "com.snowplowanalytics", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "bannerId": { "type": "string" } }, "required": ["bannerId"], "additionalProperties": false } ``` Our self-describing JSON will look like this: ```json { "schema": "iglu:com.snowplowanalytics/ad_click/jsonschema/1-0-0", "data": { "bannerId": "4acd518feb82" } } ``` Notice the two main differences compared to our original JSON: 1. There is a new `schema` field located at the root of the JSON which contains (in a space-efficient format) all the information required to uniquely identify the associated JSON Schema. The schema's URI follows the following pattern: ![](/assets/images/iglu-schema-key-bcb8f8d1b9814714ec9590690ebb4394.png) 1. The data contained in the original JSON has been encapsulated in a `data` field to prevent any accidental collisions should the JSON already have a `schema` field This way, our JSON becomes de facto self-describing, embedding a link to its JSON Schema. Back to [Common architecture](/docs/api-reference/iglu/common-architecture/). --- # Set up an Iglu Central mirror > Create public mirrors or private clones of Iglu Central schema registry for offline access or reduced latency. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-central-setup/ This guide is designed for Iglu users wanting to create a public mirror or private clone of [Iglu Central](/docs/api-reference/iglu/iglu-repositories/iglu-central/). There are a couple of reasons you may want to do this: 1. You may want to access Iglu Central from a software system that cannot access the open internet. 2. You may want a mirror of Iglu Central which has lower latency to your software system. This guide is divided into two sections: 1. Create Iglu Central Mirror 2. Update your Iglu client configuration to point to your new Iglu Central ## Create Iglu Central Mirror ### Hosting an Iglu Server based mirror You can mirror Iglu Central using `[igluctl](/docs/api-reference/iglu/igluctl-2/index.md)`: ```bash git clone https://github.com/snowplow/iglu-central cd iglu-central igluctl static push --public schemas/ http://MY-IGLU-URL 00000000-0000-0000-0000-000000000000 ``` For further information on Iglu Central, consult the [Iglu Central setup guide](/docs/api-reference/iglu/iglu-central-setup/). ### Hosting a Static Repository based mirror Iglu Central is built on top of the Iglu static repo server, so the first step is to [setup a static repo](/docs/api-reference/iglu/iglu-repositories/static-repo/). You can give your copy of Iglu Central a name like: ```text http://iglucentral.acme.com ``` Once you have completed this static repo setup, then copy into your `/schemas` sub-folder **all** of the schemas that you can find [in the Iglu Central GitHub Repo](https://github.com/snowplow/iglu-central/tree/master/schemas) Once you have done this, check that your schemas are publically accessible, for example: ```text http://iglucentral.acme.com/schemas/com.snowplowanalytics.self-desc/instance/jsonschema/1-0-2 ``` ## Update your Iglu client configuration You now need to update your Iglu client configuration to point to your Iglu Central mirror, rather than the original. Given a standard Iglu client configuration: ```json { "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-2", "data": { "cacheSize": 500, "repositories": [ { "name": "Iglu Central", "priority": 0, "vendorPrefixes": [ "com.snowplowanalytics" ], "connection": { "http": { "uri": "https://iglucentral.com" } } } ] } } ``` Update it to point to your Iglu Central mirror: ```json { "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-2", "data": { "cacheSize": 500, "repositories": [ { "name": "Acme Corp's Iglu Central mirror", "priority": 0, "vendorPrefixes": [ "com.snowplowanalytics" ], "connection": { "http": { "uri": "http://iglucentral.acme.com" } } } ] } } ``` And that's it - your Iglu client should now resolve to your Iglu Central mirror, rather than the original. --- # Iglu client libraries > Client libraries for resolving schemas from Iglu repositories in Scala and Objective-C with embedded and remote repository support. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-clients/ Iglu clients are used for interacting with Iglu server repos and for resolving schemas in embedded and remote Iglu schema repositories. ## Technical architecture In this diagram we show an Iglu client resolving a schema from Iglu Central, one embedded repository and a further two remote HTTP repositories: ![](/assets/images/iglu-clients-2a639a6f765d5146f869eb947a42f15c.png) For more information on the rules governing resolving schemas from multiple repositories, see [Schema resolution](/docs/api-reference/iglu/common-architecture/schema-resolution/). ## Available Iglu clients There are currently two Iglu client libraries implemented: | **Repo server** | **Description** | **Status** | | ------------------------------------------------------------- | ------------------------------------- | ---------------- | | [Scala client](https://github.com/snowplow/iglu-scala-client) | An Iglu client and resolver for Scala | Production-ready | | [Objc client](https://github.com/snowplow/iglu-objc-client) | An Iglu client and resolver for OSX | Unsupported | --- # Objective-C Iglu client > Iglu client library for Objective-C with JSON schema resolution and validation for iOS 7.0+ and macOS 10.9+. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-clients/objc-client/ The [Iglu Objc client](https://github.com/snowplow/iglu-objc-client) allows you to resolve JSON Schemas from embedded and remote repositories. It does not yet let you write to repositories in any way (e.g. you can't publish new schemas to an Iglu repository). This client library should be straightforward to use if you are comfortable with Objective-C development. ## Client compatibility The Obj-C client is compatible with OSX 10.9+ and iOS 7.0+. ## Dependencies The library is dependant on [KiteJSONValidator](https://github.com/samskiter/KiteJSONValidator) for all JSONSchema validation. ## Setup ### CocoaPods We support installing the Obj-C Client via CocoaPods since it's the easiest way to install the client. Doing so is simple: 1. Install CocoaPods using `gem install cocoapods` 2. Create the file `Podfile` in the root of your XCode project directory, if you don't have one already 3. Add the following line into it: ```ruby pod 'SnowplowIgluClient' ``` 4. Run `pod install` in the same directory ### Manual Setup If you prefer not to use CocoaPods, you can grab the client from our [GitHub repo](https://github.com/snowplow/iglu-objc-client.git) and import it into your project. #### Clone the client First, git clone the latest version of the client to your local machine: ```bash git clone https://github.com/snowplow/iglu-objc-client.git ``` If you don't have git installed locally, [install it](http://git-scm.com/downloads) first. #### Copy the client into your project You first need to copy the client's `SnowplowIgluClient` sub-folder into your XCode project's folder. The command will look something like this: ```bash cp -r iglu-objc-client/SnowplowIgluClient MyObjcApp/MyObjcApp/ ``` - Replace `MyObjcApp` with the name of your own app, and tweak the source code sub-folder accordingly. - Next, drag and drop the sub-folder `MyObjcApp/MyObjcApp/SnowplowIgluClient` into your XCode project's workspace. - Make sure that the suggested options for adding `SnowplowIgluClient` are set **Create groups**, then click **Finish**. #### Copy required resources (Optional) The client requires two schemas for initial operation; the first for validating that a JSON is a correct self-describing JSON and the second for validating the resolver-config JSON passed to it in startup. The client will look for these in a resource bundle named `SnowplowIgluResources`. To get this bundle you will need to: - Open the `SnowplowIgluClient.xcworkspace` in XCode. - Build the `SnowplowIgluResources` schema. - In your `Products` folder within XCode you should now see a `SnowplowIgluResources.bundle`. - Copy this bundle to your project. Alternatively you can also include the standard Snowplow repository in your resolver-config: ```json { "name": "Iglu Central", "vendorPrefixes": [ "com.snowplowanalytics" ], "connection": { "http": { "uri": "https://iglucentral.com" } }, "priority": 0 } ``` This will allow the client to download the required schemas at runtime. ## Initialization Assuming you have completed the setup for your Objective-C project, you are now ready to initialize the Obj-C client. ### Importing the library All interactions are handled through the ObjC client's `IGLUClient` class. Import the header for the client like so: ```objc #import "IGLUClient.h" ``` You are now ready to create your Obj-C Client. ### JSON-based initialization You will need to supply either a resolver-config as an `NSString` or the URL to your resolver-config as an argument for the client. If a valid resolver-config is not passed in the client will throw an **exception**. To make this step a touch easier we have included several utility functions for getting `NSString's` from URLs and File Paths. For example; grabbing your resolver-config from a local source and creating the client could look like this: ```objc #import "IGLUUtilities.h" // Create Client NSString * resolverAsString = [IGLUUtilities getStringWithFilePath:@"your_iglu_resolver.json" andDirectory:@"Your_Directory" andBundle:[NSBundle bundleForClass:[self class]]]; IGLUClient * client = [[IGLUClient alloc] initWithJsonString:resolverAsString andBundles:nil]; ``` To create a client from a URL: ```objc // The URL is passed as an NSString IGLUClient * client = [[IGLUClient alloc] initWithUrlPath:@"https://raw.githubusercontent.com/snowplow/snowplow/master/3-enrich/config/iglu_resolver.json" andBundles:nil]; ``` The `andBundle:` argument of the client init accepts an `NSMutableArray` of bundle objects. These objects will be used to search for files for any embedded repositories you include. To add to the available bundles you can use: ```objc [client addToBundles:yourBundleObject]; ``` ## Validating JSON Once you have successfully created a client you can start validating your self-describing JSON. **NOTE:** All JSONs must first be parsed into an NSDictionary before they can be validated. To parse your JSON String as an `NSDictionary` you can use `IGLUUtilities` like so: ```objc NSDictionary * jsonDictionary = [IGLUUtilities parseToJsonWithString:yourStringHere]; ``` To validate your JSON: ```objc BOOL result = [client validateJson:jsonDictionary]; ``` The above command is telling the client to: - Check the JSON is a valid self-describing JSON - Check the JSON validates against it's own schema Currently the only output from the client will be a `YES` or `NO` response as the underlying library does not support error printing as of yet. --- # Scala Iglu client > Production-ready Scala Iglu client and schema resolver for JVM applications with SBT and Gradle integration. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-clients/scala-client-setup/ The [Scala client](https://github.com/snowplow/iglu-scala-client) is an Iglu client and schema resolver implemented in Scala. Setting up the Scala client to use from your own code is straightforward. For actual examples of initialization you can look at [Scala client](https://github.com/snowplow/iglu-scala-client) page. ## Integration options To minimize jar bloat, we have tried to keep external dependencies to a minimum. The main dependencies are on Jackson and JSON Schema-related libraries. ## Setup ### Hosting The Scala client is published to Snowplow's [hosted Maven repository](http://maven.snplow.com), which should make it easy to add it as a dependency into your own Java app. The current version of the Scala client is 4.0.3. ### SBT Add this to your SBT config: ```scala // Dependency val igluClient = "com.snowplowanalytics" %% "iglu-scala-client" % "4.0.3" ``` ### Gradle Add into `build.gradle`: ```gradle dependencies { ... // Iglu client compile 'com.snowplowanalytics:iglu-scala-client:"4.0.3"' } ``` Now read the [Scala client API](https://github.com/snowplow/iglu-scala-client) to start using the Scala client. --- # Iglu Central public schema repository > Public machine-readable repository of Snowplow JSON schemas hosted on Amazon S3 with self-hosting support via igluctl. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-repositories/iglu-central/ [Iglu Central](https://iglucentral.com/) is a public repository of JSON Schemas hosted by Snowplow Analytics. As far as we know, Iglu Central is the first public **machine-readable** schema repository - all prior efforts we have seen are human-browsable directories of articles about schemas (e.g. [schema.org](http://schema.org/)). Think of Iglu Central as like [RubyGems.org](http://rubygems.org/) or [Maven Central](http://central.maven.org/) but for storing publically-available JSON Schemas. ## Technical architecture Under the hood, Iglu Central is built and run as a static Iglu repository, which is simply an Iglu repository server structured as a static website serving its whole content over http, and is hosted on Amazon S3. ![iglu-central-img](/assets/images/iglu-central-c0427b712c8c80ad53d1a8a2b7e6871d.png) The [deployment process](/docs/api-reference/iglu/iglu-central-setup/) for Iglu Central is documented on this wiki in case a user wants to setup a public mirror or private instance of Iglu Central. Iglu Central is available for view at [https://iglucentral.com](https://iglucentral.com/). Although Iglu Central is primarily designed to be consumed by [Iglu clients](/docs/api-reference/iglu/iglu-clients/), the root index page for Iglu Central links to all schemas currently hosted on Iglu Central. ## Self Hosting Iglu Central schemas The schemas for Iglu Central are stored in GitHub, in [snowplow/iglu-central](https://github.com/snowplow/iglu-central). You can mirror Iglu Central using `[igluctl](/docs/api-reference/iglu/igluctl-2/index.md)`: ```bash git clone https://github.com/snowplow/iglu-central cd iglu-central igluctl static push --public schemas/ http://CHANGE-TO-MY-IGLU-URL.elb.amazonaws.com 00000000-0000-0000-0000-000000000000 ``` For further information on Iglu Central, consult the [Iglu Central setup guide](/docs/api-reference/iglu/iglu-central-setup/). --- # Iglu Server > RESTful interface for publishing, testing, and serving Iglu schemas with comprehensive API endpoints for schema management, validation, and authentication. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-repositories/iglu-server/ The [Iglu Server](https://github.com/snowplow-incubator/iglu-server/) is an Iglu schema registry which allows you to publish, test and serve schemas via an easy-to-use RESTful interface. It is split into a few services which will be detailed in the following sections. ## Setup an Iglu Server Information on setting up an instance of the Iglu Server can be found in [the setup guide](/docs/api-reference/iglu/iglu-repositories/iglu-server/setup/). ## 1. The schema service (`/api/schemas`) The schema service allows you to interact with Iglu schemas using simple HTTP requests. ### 1.1 POST requests Sending a `POST` request to the schema service allows you to publish a new self-describing schema to the repository. As an example, let's assume you own the `com.acme` vendor prefix (more information on that can be found in the API authentication section) and have a JSON schema defined as follows: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an Acme Inc ad click event", "self": { "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "clickId": { "type": "string" }, "targetUrl": { "type": "string", "minLength": 1 } }, "required": ["targetUrl"], "additionalProperties": false } ``` The schema can be added to your repository by making a `POST` request to the following endpoint with the schema included in the request's body: ```text HOST/api/schemas/ ``` By default, the schema will not be public (available to others) - this can be changed by adding an `isPublic` query parameter and setting its value to `true`. For example, the following request: ```bash curl HOST/api/schemas -X POST -H "apikey: YOUR_APIKEY" -d @myschema.json ``` will produce a response like this one, if no errors are encountered: ```json { "message": "Schema created", "updated": false, "location": "iglu:com.acme/ad_click/jsonschema/1-0-0", "status":201 } ``` _Please note:_ This endpoint must be used with an API key with a `schema_action` permission of `CREATE`. ### 1.2 PUT requests Another way of adding schemas is using a `PUT` request. Just like a `POST` request, it allows you to publish a schema to the Iglu Server by including it in the request's body, with an optional `isPublic` parameter used to set the visibility of a schema. Unlike `POST` requests, `PUT` requests require a schema's Iglu URI to be included in the request URI (i.e. `HOST/api/schemas/vendor/name/format/version`. However, this means that a schema included in the request's body can be non-self-describing as well as self-describing. Note that in the latter case the URL path must match the schema's metadata, so a schema described as `iglu:com.snplow/foo/jsonschema/1-0-0` cannot be published using the `/api/schemas/com.acme/bar/jsonschema/1-0-0` PUT endpoint. For example: ```bash curl HOST/api/schemas/com.acme/ad_click/jsonschema/1-0-0 -X PUT -H "apikey: YOUR_APIKEY" -d @myschema.json ``` _Please note:_ This endpoint must be used with an API key with a `schema_action` permission of `CREATE`. ### 1.3 Single-schema GET requests A schema previously added to the repository can be retrieved by making a `GET` request: ```bash curl HOST/api/schemas/com.acme/ad_click/jsonschema/1-0-0 -X GET -H "apikey: YOUR_APIKEY" ``` The JSON response should look like this: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an Acme Inc ad click event", "self": { "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "clickId": { "type": "string" }, "targetUrl": { "type": "string", "minLength": 1 } }, "required": ["targetUrl"], "additionalProperties": false } ``` GET requests support a `repr` URL parameter, allowing you to specify three different ways of representing an Iglu schema. This can have values of either `canonical`, `meta` or `uri`. `repr=canonical` returns the schema as a self-describing Iglu event - it is also the default representation method if no query parameter is specified. (An example of this representation can be seen above.) `repr=meta` adds an additional `metadata` field to the schema, containing some meta information about it - this would make the JSON response look like this: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an Acme Inc ad click event", "self": { "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "clickId": { "type": "string" }, "targetUrl": { "type": "string", "minLength": 1 } }, "required": ["targetUrl"], "additionalProperties": false, "metadata": { "createdAt":"2019-04-01T14:23:45.173728Z", "updatedAt":"2019-04-01T14:23:45.173728Z", "isPublic":true } } ``` `repr=uri` simply returns a schema's Iglu URI - this is used internally in the Iglu Server, but public requests are also supported. A response with this URL parameter set would look like this: ```text "iglu:com.acme/ad_click/jsonschema/1-0-0" ``` _Please note:_ While `metadata`/`body` query parameters used in previous versions of the Iglu Server are supported, they have been deprecated in favor of the single `repr` parameter. ### 1.4 Multiple GET requests You can also retrieve multiple schemas in a single `GET` request using a few different endpoints #### Vendor-based requests Every schema belonging to a single vendor can be retrieved by sending a `GET` request to the following endpoint: ```text HOST/api/schemas/vendor ``` ```bash curl HOST/api/schemas/com.acme -X GET -H "apikey: YOUR_APIKEY" ``` You will get back an array of every schema belonging to this vendor. You can use the same approach to get a list of schemas by vendor and name: ```text HOST/api/schemas/vendor/name ``` ```bash curl HOST/api/schemas/com.acme/ad_click -X GET -H "apikey: YOUR_APIKEY" ``` Or simply retrieve every single public schema accessible to you: ```text HOST/api/schemas ``` or `/api/schemas/public` in pre-0.5.0 releases. ```bash curl HOST/api/schemas -X GET -H "apikey: YOUR_APIKEY" ``` _Please note:_ you can only retrieve schemas that can be read by your API key. This means that if you do not own a vendor you're requesting schemas for, you will only be able to retrieve the vendor's public schemas (if any exist). ### 1.5 Swagger support We have added [Swagger](https://swagger.io/) support to our API so you can explore the repository server’s API interactively. The Swagger UI is available at the following URL: ```text http://$HOST/static/swagger-ui/index.html ``` ## 2. Schema validation and the validation service (`/api/validation`) When adding a schema to the repository, the repository will validate that the provided schema is self-describing - an overview of this concept can be found in the [Self describing JSON schemas](/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/) wiki page. In practice this means your schema should contain a `self` property, which itself contains the following properties: - `vendor` - `name` - `format` - `version` Non-self-describing schemas can only be added to a repository using a PUT call to the schemas API that describes its vendor, name, format and version in the URL itself rather than the schema. The Iglu Server's Validation service can also be used to validate that a schema is valid without adding it to the repository using the following endpoint: ```text POST HOST/api/schemas/validation/validate/schema/format ``` ```bash curl HOST/api/validation/validate/schema/jsonschema -X POST -d @myevent.json ``` The response received will be a detailed report containing information about the schema's validity, as well as potential errors or warnings: ```json { "message": "The schema has some issues", "report": [ { "message": "The schema is missing the \"description\" property", "level": "INFO", "pointer": "/properties/targetUrl" }, { "message": "A string type in the schema doesn't contain \"maxLength\" or format which is required", "level": "WARNING", "pointer": "/properties/targetUrl" }, { "message": "The schema is missing the \"description\" property", "level": "INFO", "pointer": "/properties/clickId" }, { "message": "A string type in the schema doesn't contain \"maxLength\" or format which is required", "level": "WARNING", "pointer": "/properties/clickId" }, { "message": "Use \"type: null\" to indicate a field as optional for properties clickId", "level": "INFO", "pointer": "/" } ] } ``` Another endpoint in the validation service allows you to validate self-describing _data_ against a schema already located in the Iglu Server repository, if it is accessible to your API key: ```text POST HOST/api/schemas/validation/validate/instance ``` ```bash curl HOST/api/validation/validate/instance -X POST -H "apikey: YOUR_APIKEY" -d @myevent.json ``` The service will then either confirm the schema's validity: ```json { "message": "Instance is valid iglu:com.acme/ad_click/jsonschema/1-0-0" } ``` Or, if it has some issues, return a detailed report about its problems: ```json { "message":"The instance is invalid against its schema", "report":[ { "message": "$.targetUrl: must be at least 1 characters long", "path": "$.targetUrl", "keyword": "minLength", "targets": [ "1" ] } ] } ``` Like the schema service, the validation service is also accessible through Swagger UI. ## 3. Drafts service (`/api/drafts`) The draft schema service lets you interact with draft schemas using simple HTTP requests. Draft schemas are variants of Iglu schemas with simple versions that start with `1` and can be increased as needed. ### 3.1 POST requests Sending a `POST` request to the draft service allows you to publish a new self-describing schema to the repository. As an example, let's assume you own the `com.acme` vendor prefix (more information on that can be found in the API authentication section) and have a JSON schema defined as follows: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an Acme Inc ad click event", "self": { "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "clickId": { "type": "string" }, "targetUrl": { "type": "string", "minLength": 1 } }, "required": ["targetUrl"], "additionalProperties": false } ``` The schema can be added to your repository as a draft by making a `POST` request to the following endpoint with the schema included in the request's body, and its vendor, name, format and desired draft number added to the request's URL: ```text HOST/api/drafts/vendor/name/format/draftNumber ``` By default, the schema draft will not be public (available to others) - this can be changed by adding an `isPublic` query parameter and setting its value to `true`. For example, the following request: ```bash curl HOST/api/drafts/com.acme/ad_click/jsonschema/1 -X POST -H "apikey: YOUR_APIKEY" -d @myschema.json ``` will produce a response like this one, if no errors are encountered: ```json { "message": "Schema created", "updated": false, "location": "iglu:com.acme/ad_click/jsonschema/1", "status":201 } ``` _Please note:_ This endpoint must be used with an API key with a `schema_action` permission of `CREATE`. ### 3.2 Single-draft GET requests A schema draft previously added to the repository can be retrieved by making a `GET` request: ```bash curl HOST/api/drafts/com.acme/ad_click/jsonschema/1 -X GET -H "apikey: YOUR_APIKEY" ``` The JSON response should look like this: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an Acme Inc ad click event", "self": { "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "clickId": { "type": "string" }, "targetUrl": { "type": "string", "minLength": 1 } }, "required": ["targetUrl"], "additionalProperties": false } ``` GET requests support a `repr` URL parameter, allowing you to specify three different ways of representing an Iglu schema. This can have values of either `canonical`, `meta` or `uri`. `repr=canonical` returns the schema as a self-describing Iglu event - it is also the default representation method if no query parameter is specified. (An example of this representation can be seen above.) `repr=meta` adds an additional `metadata` field to the schema, containing some meta information about it - this would make the JSON response look like this: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an Acme Inc ad click event", "self": { "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "clickId": { "type": "string" }, "targetUrl": { "type": "string", "minLength": 1 } }, "required": ["targetUrl"], "additionalProperties": false, "metadata": { "createdAt":"2019-04-01T14:23:45.173728Z", "updatedAt":"2019-04-01T14:23:45.173728Z", "isPublic":true } } ``` `repr=uri` simply returns a schema's Iglu URI - this is used internally in the Iglu Server, but public requests are also supported. A response with this URL parameter set would look like this: ```text "iglu:com.acme/ad_click/jsonschema/1-0-0" ``` _Please note:_ While `metadata`/`body` query parameters used in previous versions of the Iglu Server are supported, they have been deprecated in favor of the single `repr` parameter. ## 4. Debug (`/api/debug`) and metadata (`/api/meta`) services The Iglu Server includes several endpoints for inspecting its own state. The `/api/debug` endpoint is only active when `debug` is set to true in the Iglu Server's configuration file, and returns the Iglu Server's internal state if in-memory storage is used instead of PostgreSQL. **This endpoint should be used for internal development and testing only!** The `/api/meta/health` endpoint will respond with a simple OK string if the server is available: ```bash curl HOST/api/meta/health OK ``` The `/api/meta/health/db` endpoint is similar, but will also check if the database is accessible if PostgreSQL storage is used (in-memory storage is always accessible): ```bash curl HOST/api/meta/health/db OK ``` The `/api/meta/server` endpoint returns information about the server - its version, database type, certain configuration settings etc. If an `apiKey` header is included in the request, it will also return information about the key's permissions and the number of schemas accessible to it: ```bash curl HOST/api/meta/server -H 'apikey: YOUR_APIKEY' { "version": "0.6.0", "authInfo": { "vendor": "", "schema": "CREATE_VENDOR", "key": [ "CREATE", "DELETE" ] }, "database": "postgres", "schemaCount": 18, "debug": true, "patchesAllowed": false } ``` ## 5. API keys and the authentication service (`/api/auth`) In order to use the routes of the Iglu Server's API that require either write access to the repository or readaccess to non-public schemas, you will need an API key, passed as an `apiKey` HTTP header to relevant calls of those services. The Iglu Server's administrator is responsible for distributing API keys to the repository's clients. To do so, they will need a super API key which will let them generate other keys - this key will have to be manually added to the database: ```sql INSERT INTO permissions VALUES ('api_key_here', '', TRUE, 'CREATE_VENDOR'::schema_action, '{"CREATE", "DELETE"}'::key_action[]) ``` Thanks to this super API key the server's administrator be able to use the API key generation service, which lets them create and revoke API keys. A pair of API keys for a given vendor can be generated by submitting a POST request to the keygen endpoint, with the prefix included in the request's body: ```text POST HOST/api/auth/keygen ``` ```bash curl HOST/api/auth/keygen -X POST -H 'apikey: ADMIN_APIKEY' -H "Content-Type: application/json" -d '{"vendorPrefix": "com.acme"}' ``` If no errors occur, the method should return two UUIDs that act as API keys for the given vendor: ```json { "read":"bfa90866-aa14-4b92-b6ef-d421fd688b54", "write":"6175aa41-d3b7-4e4f-9fb4-3a170f3c6c16" } ``` One of those API keys will have read access and will let you retrieve private schemas or drafts (or other vendors' public schemas) through `GET` requests. The other will have both read and write access - you will therefore be able to publish and modify schemas or drafts through `POST` and `PUT` requests in addition to being able to retrieve them. It is then up to you on to distribute those two keys however you want. The keys grant access to every schema whose vendor starts with `com.acme`, though wildcard (`*`) vendor keys can also be generated. Existing API keys can be revoked by sending a `DELETE` request to the same endpoint: ```text DELETE HOST/api/auth/keygen ``` ```bash curl HOST/api/auth/keygen?key=APIKEY_TO_DELETE -X DELETE -H 'apikey: ADMIN_APIKEY' ``` If the operation succeeds, it will return a simple JSON response: ```json { "message":"Keys have been deleted" } ``` --- # Iglu Server configuration reference > Complete reference of all configuration options for Iglu Server, including database, networking, webhooks, and advanced settings. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-repositories/iglu-server/reference/ This is a complete list of the options that can be configured in the Iglu Server HOCON config file. The [example configs in github](https://github.com/snowplow-incubator/iglu-server/tree/master/config) show how to prepare an input file. ## License Iglu Server is released under the [Snowplow Limited Use License](https://docs.snowplow.io/limited-use-license-1.1/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run Iglu Server, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```hcl license { accept = true } ``` ## Common options | parameter | description | | -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `repoServer.interface` | Optional. Default: `0.0.0.0`. Address on which the server listens to http connections. | | `repoServer.port` | Optional. Default: `8080`. Port on which the server listens. | | `repoServer.idleTimeout` | Default: `30 seconds`. TCP connections are dropped after this timeout expires. In case Iglu Server runs behind a load balancer, this should slightly exceed the load balancer's idle timeout. | | `repoServer.hsts.enable` _(since 0.12.0)_ | Default: `false`. Whether to send an [HSTS header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security). | | `repoServer.hsts.maxAge` _(since 0.12.0)_ | Default: `365 days`. The maximum age for the HSTS header. | | `database.type` | Optional. Default: `postgres`. Can be changed to `dummy` during development for in-memory only storage. | | `database.host` | Required. Host name for Postgres database. | | `database.port` | Optional. Default: `5432`. Port for Postgres database. | | `database.dbname` | Required. Name of Postgres database. | | `database.username` | Required. Username for connecting to Postgres. | | `database.password` | Required. Password for connecting to Postgres. | | `swagger.baseUrl` | Optional. Example: `/custom/prefix`. Customise the api base url in Swagger. Helpful for when running iglu-server behind a proxy server. | | `debug` | Optional. Default: `false`. Enable additional debug api endpoint to respond with all internal state. | | `patchesAllowed` | Optional. Default: `false`. If `true`, allows overwriting a given version of a schema with new content. See [amending schemas](/docs/fundamentals/schemas/versioning/). | | `webhooks.schemaPublished` | Optional. Array with the list of webhooks that will be called when a schema is published or updated with a vendor that matches the specified prefixes. See the [examples in github](https://github.com/snowplow-incubator/iglu-server/blob/0.8.7/config/config.reference.hocon#L81-L99). | | `webhooks.schemaPublished.uri` | Required. URI of the HTTP server that will receive the webhook event. | | `webhooks.schemaPublished.vendorPrefixes` | Optional. Example: `["com", "org.acme", "org.snowplow"]`. List of schema prefixes (regexes) that should be sent via the webhook. | | `webhooks.schemaPublished.usePost` (since _0.8.7_) | Optional. Default: `false`. Whether to use `POST` to send request via the webhook. | | `superApiKey` | Optional. Set a super api key with permission to read/write any schema, and add other api keys. | ## Advanced options We believe these advanced options are set to sensible defaults, and hopefully you won’t need to ever change them. | parameter | description | | ----------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `repoServer.threadPool.type` | Default: `fixed` for a fixed thread pool. Can be `cached` for a cached thread pool. Type of the thread pool used by the underlying BlazeServer for executing Futures. | | `repoServer.threadPool.size` | Optional. Default: `4`. Size of the thread pool if the type is `fixed`. | | `repoServer.maxConnections` | Optional. Default: `1024`. Maximum number of client connections that may be active at any time. | | `database.pool.type` | Optional. Default: `hikari` (recommended for production). Can be changed to `nopool` to remove the upper bound on the number of connections. | | `database.pool.maximumPoolSize` | Optional. Default: `5`. Maximum number of connections in the Hikari pool. | | `database.pool.connectionTimeout` | Optional. Default: `30 seconds`. Timeout on the Hikari connection pool. | | `database.pool.maxLifetime` | Optional. Default: `1800 seconds`. Maximum lifetime of a connection in the Hikari pool. | | `database.pool.minimumIdle` | Optional. Default: `5`. Minimum number of idle connections in the Hikari pool. | | `database.pool.connectionPool.type` | Optional. Default: `fixed` for a fixed thread pool (recommended in production). Type of the thread pool used for awaiting connection to the database. | | `database.pool.connectionPool.size` | Optional. Default: `4`. Number of threads to use when the connection pool has type `fixed`. | | `database.pool.transactionPool.type` | Optional. Default: `cached` (recommended for production). Type of the thread pool used for blocking JDBC operations. | | `preTerminationPeriod` (since _0.8.0_) | Optional. Default: `1 second`. How long the server should pause after receiving a sigterm before starting the graceful shutdown. During this period the server continues to accept new connections and respond to requests. | | `preTerminationUnhealthy` (since _0.8.0_) | Optional. Default: `false`. During the `preTerminationPeriod`, the server can be configured to return 503s on the `/health` endpoint. Can be helpful for removing the server from a load balancer’s targets. | --- # Setup guide for Iglu Server > Deploy Iglu Server with Docker or Terraform for PostgreSQL-backed schema repository with RESTful API and authentication. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-repositories/iglu-server/setup/ For more information on the architecture of the Iglu server, please read [the technical documentation](/docs/api-reference/iglu/iglu-repositories/iglu-server/). ## Available on Terraform Registry A Terraform module is available which deploys an Iglu Server on AWS EC2 without the need for this manual setup. ## 1. Run the Iglu server Iglu Server is [published on Docker Hub](https://hub.docker.com/repository/docker/snowplow/iglu-server). ```bash $ docker pull snowplow/iglu-server:0.14.1 ``` The application is configured by passing a hocon file on the command line: ```bash $ docker run --rm \ -v $PWD/config.hocon:/iglu/config.hocon \ snowplow/iglu-server:0.14.1 --config /iglu/config.hocon ``` Alternatively, you can download and run [a jar file from the github release](https://github.com/snowplow-incubator/iglu-server/releases). ```bash $ java -jar iglu-server-0.14.1.jar --config /path/to/config.hocon ``` Here is an example of a minimal configuration file: ```json { "database": { "host": "postgres" "dbname": "igludb" "username": "postgres" "password": "mysecret" } "superApiKey": "bb7b7503-40d3-459c-943a-f8d31a6f5638" } ``` See [the configuration reference](/docs/api-reference/iglu/iglu-repositories/iglu-server/reference/) for a complete description of all parameters. We also provide a [docker-compose.yml](https://github.com/snowplow-incubator/iglu-server/blob/master/docker/docker-compose.yml) to help you get started. ## 2. Initialize the database > **Note:** Iglu Server has been successfully tested with PostgreSQL 16.3, but should work with PostgreSQL 8.2 or newer. With a fresh install you need to manually create the database: ```bash $ psql -U postgres -c "CREATE DATABASE igludb" ``` And then use the `setup` command of the iglu server to create the database tables: ```bash $ docker run --rm \ -v $PWD/config.hocon:/iglu/config.hocon \ snowplow/iglu-server:0.14.1 setup --config /iglu/config.hocon ``` ## 3. Use the API key generation service The super API key you put in the configuration file is able to generate further API keys for your clients through the API key generation service. To generate a pair of read and write API keys for a specific vendor prefix, simply send a `POST` request to this URL using your super API key in an `apikey` HTTP header: ```text HOST/api/auth/keygen ``` For example: ```bash curl \ HOST/api/auth/keygen \ -X POST \ -H 'apikey: your_super_apikey' \ -d '{"vendorPrefix":"com.acme"}' ``` **Note:** From 0.6.0+ the vendor prefix should be `vendorPrefix` within a JSON body however prior to this it was `vendor_prefix` as a query parameter. You should receive a JSON response like this one: ```json { "read": "an-uuid", "write": "another-uuid" } ``` If you need to revoke a specific API key, you can do so by sending a `DELETE` request to the following endpoint: ```text HOST/api/auth/keygen?key=some-uuid ``` For example: ```bash curl \ HOST/api/auth/keygen \ -X DELETE \ -H 'apikey: your_super_apikey' \ -d 'key=some-uuid' ``` You should now be all set up to use the Iglu server, if you would like to know more about the Iglu server, please read the [technical documentation](/docs/api-reference/iglu/iglu-repositories/iglu-server/). ## Dummy mode Since 0.6.0 Iglu Server supports new dummy DB mode. In this mode, Server does not require persistent storage as PostgreSQL and stores all data in memory. Use this for debug purposes only, all your data will be lost after restart. To enable dummy mode, you need to set `database.type` setting to `"dummy"`. Dummy Iglu Server works with single hardcoded master API key - `48b267d7-cd2b-4f22-bae4-0f002008b5ad`, which you can use to upload your schemas and create new api keys. ## Logging Iglu Server uses [SLF4J Simple Logger](https://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html) underneath. Which can be configured via system properties. For example: ```bash $ iglu-server-0.14.1.jar \ -Dorg.slf4j.simpleLogger.logFile=server.log # In order to redirect logs \ -Dorg.slf4j.simpleLogger.log.org.http4s.blaze.channel.nio1.SelectorLoop=warn # To suppress very verbose SelectorLoop output ``` On debug loglevel `SchemaService` will print all HTTP requests and responses. --- # Introduction to Iglu repositories for schema storage > Remote and embedded Iglu repositories for storing and serving JSON schemas via HTTP or embedded in JVM applications. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-repositories/ An Iglu repository acts as a store of data schemas (currently JSON Schemas only). Hosting JSON Schemas in an Iglu repository allows you to use those schemas in Iglu-capable systems such as Snowplow. ## Technical architecture So far we support two types of Iglu repository: 1. **Remote repositories** - essentially websites containing schemas which an Iglu client can query over HTTP 2. **Embedded repositories** - which are embedded in a piece of software (typically alongside an Iglu client) In this diagram we show an Iglu client resolving a schema from Iglu Central, one embedded repository and a further two remote HTTP repositories: ![iglu-repos-img](/assets/images/iglu-repos-82e5f47255a46f97fe0b46b8abe90934.png) ## Available Iglu repositories We currently have two Iglu "repo" technologies available for deploying your Iglu repository - follow the links to find out more: | **Repository** | **Category** | **Description** | **Status** | | ----------------- | ------------ | ---------------------------------------------------------- | ---------------- | | Iglu Server | Remote | An Iglu repository server structured as a RESTful API | Production-ready | | Static repo | Remote | An Iglu repository server structured as a static website | Production-ready | | JVM-embedded repo | Embedded | An Iglu repository embedded in a Java or Scala application | Production-ready | ## Iglu Central [Iglu Central](https://iglucentral.com/) is a public repository of JSON Schemas hosted by [Snowplow Analytics](http://snowplowanalytics.com/). For more information on its technical architecture, see [Iglu Central](/docs/api-reference/iglu/iglu-central-setup/). --- # JVM-embedded Iglu repository for applications > Embed JSON schemas in Java or Scala application resources for bootstrap schema resolution without remote dependencies. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-repositories/jvm-embedded-repo/ A JVM-embedded repo is an Iglu repository **embedded** inside a Java or Scala application, typically alongside the [Scala client](/docs/api-reference/analytics-sdk/analytics-sdk-scala/). ## Technical architecture A JVM-embedded repo is simply a set of schemas stored in an Iglu-compatible path inside the `resources` folder of a Java or Scala application. As an embedded repo, there is a no mechanism for updating the schemas stored in the repository following the release of the host application. ## Example For an example of a JVM-embedded repo, check out the repository embedded in the Iglu Scala client itself: This embedded repository is used to bootstrap the Iglu Scala client with JSON Schemas that it needs before it can access any remote repositories. ## Setup ### 1. Prepare your files You need to create a file structure for your JSON Schemas. Please check out the template we provide here: Make the following changes: - Replace `com.myvendor` with your company domain, reverse-ordered - Replace `myschema` with the name of your first JSON Schema - Leave `jsonschema` as-is (we only support JSON Schemas for now) - Replace `1-0-0` with the schema specification of your first JSON Schema Writing JSON Schemas is out of scope for this setup guide - see [Self-describing-JSONs-and-JSON-Schemas](/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/) for details. Done? Now you are ready to embed your files. ### 2. Embed your files You now need to embed your JSON Schema files into your Java or Scala application. The Iglu Scala client will expect to find these JSON Schema files included in the application as resources. Therefore, you should store the files in a path something like this: ```text myapp/src/main/resources/my-repo/schemas ``` ### 3. Update your Iglu client configuration Finally, update your Iglu client configuration so that it can resolve your new repository. For details on how to do this, check out the page on [Iglu client configuration](/docs/api-reference/iglu/iglu-resolver/). In the case above, the `path` you would specify for your embedded Iglu repository would be simply `/my-repo`. --- # Static Iglu repository on web servers > Host Iglu schemas as a static website on S3, Apache, Nginx, or IIS for HTTP-accessible schema repositories. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-repositories/static-repo/ A static repo is simply an Iglu repository server structured as a static website. [Iglu Central](/docs/api-reference/iglu/iglu-central-setup/) can be used as an example, [serving](https://iglucentral.com/) its whole content over HTTP. ## 1. Choose a hosting partner We host static Iglu registry using Amazon S3, but you can choose any existing webserver your company is already using, such as Apache, IIS or Nginx. ## 2. Prepare your files You need to create a file structure for your JSON Schemas. Please check out the template we provide here: Make the following changes: - Replace `com.myvendor` with your company domain, reverse-ordered - Replace `myschema` with the name of your first JSON Schema - Leave `jsonschema` as-is (we only support JSON Schemas for now) - Replace `1-0-0` with the schema specification of your first JSON Schema Writing JSON Schemas is out of scope for this setup guide - see [Self-describing-JSONs-and-JSON-Schemas](/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/) for details. Done? Now you are ready to host your files. ## 3. Host the files in your schema registry To host your static schema registry, follow the AWS guide, [Host a Static Website on Amazon Web Services](http://docs.aws.amazon.com/gettingstarted/latest/swh/website-hosting-intro.html). To host your static schema on an alternative webserving platform, please consult the appropriate webserver documentation or talk to your Systems team. ## 4. Update your Iglu client configuration Finally, update your Iglu client configuration so that it can resolve your new registry. For details on how to do this, check out the page on [Iglu client configuration](/docs/api-reference/iglu/iglu-resolver/). --- # Iglu Resolver configuration for Snowplow applications > Configure Iglu Resolver for schema fetching and validation in Snowplow enrichers and loaders with cache and repository settings. > Source: https://docs.snowplow.io/docs/api-reference/iglu/iglu-resolver/ Iglu Resolver is a component embedded into many Snowplow applications, including enrichers and loaders. It's responsible for fetching schemas from Iglu registries and validating data against these schemas. Most of the time, configuring Iglu Resolver (or Client) means adding following JSON file: ```json { "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-3", "data": { "cacheSize": 500, "cacheTtl": 600, "repositories": [ { "name": "Iglu Central", "priority": 0, "vendorPrefixes": [ "com.snowplowanalytics" ], "connection": { "http": { "uri": "https://iglucentral.com" } } }, { "name": "Custom Iglu Server", "priority": 0, "vendorPrefixes": [ "com.snowplowanalytics" ], "connection": { "http": { "uri": "https://${iglu_server_hostname}/api", "apikey": "${iglu_server_apikey}" } } } ] } } ``` The above configuration assumes Snowplow-authored schemas (Iglu Central) will be used in a pipeline, and that you have your own registry (Iglu Server) being hosted at `https://${iglu_server_hostname}/` with an API Key, `${iglu_server_apikey}`, with read rights. ### Configuration parameters - `cacheSize` determines how many individual schemas we will keep cached in our Iglu client (to save additional lookups) - `cacheTtl` determines how long a schema can live in the cache before being reloaded (in seconds) - `repositories` is a JSON array of repositories to look up schemas in - `priority` and `vendorPrefixes` help the resolver to know which repository to check first for a given schema. For details see Iglu's [repository resolution algorithm](/docs/api-reference/iglu/common-architecture/schema-resolution/#3-registry-priority) --- # Igluctl CLI for schema management > Command-line tool for validating, publishing, and managing JSON schemas in Iglu registries with DDL generation and verification. > Source: https://docs.snowplow.io/docs/api-reference/iglu/igluctl-2/ Iglu is a schema repository for JSON Schema. A schema repository (sometimes called a registry) is like npm or Maven or git but holds data schemas instead of software or code. Iglu is used extensively in Snowplow. This document is for version 0.13.0. ## Igluctl Iglu provides a CLI application, called igluctl which allows you to perform most common tasks on Iglu registry. So far, the overall structure of igluctl commands looks like the following: - `lint` - validate set of JSON Schemas for syntax and consistency of their properties - `static` - work with static Iglu registry - `generate` - verify that schema is evolved correctly within the same major version (e.g. from `1-a-b` to `1-c-d`) for Redshift and Postgres warehouses. Generate DDLs and migrations from set of JSON Schemas. If the schema is not evolved correctly and backward incompatible data is sent within transformer's aggregation window, loading would fail for all events. - `push` - push set of JSON Schemas from static registry to full-featured (Scala Registry for example) one - `pull` - pull set of JSON Schemas from registry to local folder - `deploy` - run entire schema workflow using a config file. This could be used to chain multiple commands, i.e. `lint` followed by `push` and `s3cp`. - `s3cp` - copy JSONPaths or schemas to S3 bucket - `server` - work with an Iglu server - `keygen` - generate read and write API keys on Iglu Server - `table-check` - will check a given Redshift or Postgres tables against iglu server. - `verify` (since 0.13.0) - work with schemas to check their evolution - `redshift` - verify that schema is evolved correctly within the same major version (e.g. from `1-a-b` to `1-c-d`) for loading into Redshift. It reports the major schema versions within which schema evolution rules were broken. - `parquet` - verify that schema is evolved correctly within the same major version (e.g. from `1-a-b` to `1-c-d`) for parquet transformation (for loading into Databricks). It reports the breaking schema versions. ## Downloading and running Igluctl Download the latest Igluctl from GitHub releases and unzip the file: ```bash $ wget https://github.com/snowplow/igluctl/releases/download/0.13.0/igluctl_0.13.0.zip $ unzip igluctl_0.13.0.zip ``` To run Igluctl you can, for example, can pass the `--help` option to see information on the different commands and flags like this: ```bash $ ./igluctl --help ``` > **Note:** If you are on Windows, then you'll need to run Igluctl like this: > > ```bash > $ java -jar igluctl --help > ``` > > Below and everywhere in documentation you'll find example commands without this `java -jar` prefix, so please remember to add it when running Igluctl. Note that Igluctl expects [JRE 8](http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html) or later, and [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) 0.6.0 or later to run. ## lint `igluctl lint` validates JSON Schemas. It is designed to be run against file-based schema registries with the standard Iglu folder structure: ```text schemas └── com.example └── my-schema └── jsonschema ├── 1-0-0 └── 1-0-1 ``` You can validate _all_ the schemas in the registry: ```bash $ /path/to/igluctl lint /path/to/schema/registry/schemas ``` Alternatively you can validate an individual schema e.g.: ```bash $ /path/to/igluctl lint /path/to/schema/registry/schemas/com.example_company/example_event/jsonschema/1-0-0 ``` Examples of errors that are identified: - JSON Schema has inconsistent self-describing information and path on filesystem - JSON Schema has invalid `$schema` keyword. It should be always set to [iglu-specific](http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#), while users tend to set it to Draft v4 or even to self-referencing Iglu URI - JSON Schema is invalid against its standard (empty `required`, string `maximum` and similar) - JSON Schema contains properties which contradict each other, like `{"type": "integer", "maxLength": 0}` or `{"maximum": 0, "minimum": 10'}`. These schemas are inherently useless as for some valiators there is no JSON instance they can validate The above cases can very hard to spot without a specialized tool as they are still valid JSONs and in last case it is even valid JSON Schemas - so will validate against a standard JSON schema validator. `lint` has two options: - `--skip-checks` which will lint without specified linters, given comma separated. To see available linters and their explanations, `$ /path/to/igluctl --help` - `--skip-schemas` which will lint all the schemas except the schemas passed to this option as a comma separated list. For example running: `/path/to/igluctl lint /path/to/schema/registry/schemas --skip-schemas iglu:com.acme/click/jsonschema/1-0-1,iglu:com.acme/scroll/jsonschema/1-0-1` will lint all schemas in `/path/to/schema/registry/schemas` except the two schemas passed via `--skip-schemas`. Note: `--severityLevel` option is deprecated and removed as of version 0.4.0. Below are two groups of linters; allowed to be skipped and not allowed to be skipped. By default, all of them are enabled but igluctl users can skip any combination of `rootObject`, `unknownFormats`, `numericMinMax`, `stringLength`, `optionalNull`, `description` through `--skip-checks`. Igluctl let you skip below checks: | NAME | DEFINITION | | ---------------- | -------------------------------------------------------------------------------------------------- | | `rootObject` | Check that root of schema has object type and contains properties | | `unknownFormats` | Check that schema doesn’t contain unknown formats | | `numericMinMax` | Check that schema with numeric type contains both minimum and maximum properties | | `stringLength` | Check that schema with string type contains maxLength property or other ways to extract max length | | `optionalNull` | Check that non-required fields have null type | | `description` | Check that property contains description | A sample usage could be as following: ```bash $ ./igluctl lint --skip-checks description,rootObject /path/to/schema/registry/schemas ``` Note that linter names are case sensitive Igluctl also includes many checks proving that schemas doesn’t have conflicting expectations (such as `minimum` value bigger than `maximum`). Schemas with such expectations are valid according to specification, but do not make any sense in real-world use cases. These checks are mandatory and cannot be disabled. `igluctl lint` will exit with status code 1 if encounter at least one error. ## static generate `igluctl static generate` generates corresponding [Redshift](http://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html) DDL files (`CREATE TABLE` statements) and migration scripts (`ALTER TABLE` statements). As of version 0.11.0 this command will also validate the compatibility of schema family and display warnings if there is an incompatible evolution. ```bash $ ./igluctl static generate $INPUT ``` You also can specify directory for output (current dir is used as default): ```bash $ ./igluctl static generate --output $DDL_DIR $INPUT ``` ### Generating migration Redshift table scripts to accommodate updated schema versions If an input directory is specified with several self-describing JSON schemas with a single REVISION, Igluctl will generate migration scripts to update (`ALTER`) Redshift tables for older schema versions to support the latest schema version. For example, having the following Self-describing JSON Schemas as an input: - schemas/com.acme/click\_event/1-0-0 - schemas/com.acme/click\_event/1-0-1 - schemas/com.acme/click\_event/1-0-2 Igluctl will generate the following migration scripts: - sql/com.acme/click\_event/1-0-0/1-0-1 to alter table from 1-0-0 to 1-0-1 - sql/com.acme/click\_event/1-0-0/1-0-2 to alter table from 1-0-0 to 1-0-2 - sql/com.acme/click\_event/1-0-1/1-0-2 to alter table from 1-0-1 to 1-0-2 This migrations (and all subsequent table definitions) are aware of column order and will ensure that new columns are added at the end of the table definition. This means that the tables can be updated in-place with single `ALTER TABLE` statements. ### Handling union types One of the more problematic scenarios to handle when generating Redshift table definitions is handling `UNION` field types e.g. `["integer", "string"]`. Union types will be transformed as most general. In the above example (union of an integer and string type) the corresponding Redshift column will be a `VARCHAR(4096)`. ### Missing schema versions `static generate` command will check versions of schemas inside `input` as following: - If user specified folder and one of schemas has no 1-0-0 or misses any other schemas in between (like it has 1-0-0 and 1-0-2) - refuse to do anything (but proceed with –force option) - If user specified full path to file with schema and this file is not 1-0-0 - just print a warning - If user specified full path to file with schema and it is 1-0-0 - all good ## static push `igluctl static push` publishes schemas stored locally to a remote [Iglu Server](https://github.com/snowplow/iglu-server). It accepts three required arguments: - `host` - Iglu Server host name or IP address with optional port and endpoint. It should conform to the pattern `host:port/path` (or just `host`) **without** http\:// prefix. - `apikey` - master API key, used to create temporary write and read keys - `path` - path to your static registry (local folder containing schemas) Also it accepts optional `--public` argument which makes schemas available without `apikey` header. ```bash $ ./igluctl static push /path/to/static/registry iglu.acme.com:80/iglu-server f81d4fae-7dec-11d0-a765-00a0c91e6bf6 ``` ## static pull `igluctl static pull` downloads schemas stored on a remote [](https://github.com/snowplow/iglu/tree/master/2-repositories/iglu-server)[Iglu Server](https://github.com/snowplow/iglu-server) to a local folder. It accepts three required arguments: - `host` - Scala Iglu Registry host name or IP address with optional port and endpoint. It should conform to the pattern `host:port/path` (or just `host`) **without** http\:// prefix. - `apikey` - master API key, used to create temporary write and read keys - `path` - path to your static registry (local folder to download to) ```bash $ ./igluctl static pull /path/to/static/registry iglu.acme.com:80/iglu-server f81d4fae-7dec-11d0-a765-00a0c91e6bf6 ``` ## static s3cp `igluctl static s3cp` enables you to upload JSON Schemas to chosen S3 bucket. This is helpful for generating a remote iglu registry which can be served from S3 over http(s). `igluctl static s3cp` accepts two required arguments and several options: - `input` - path to your files. Required. - `bucket` - S3 bucket name. Required. - `s3path` - optional S3 path to prepend your input root. Usually you don’t need it. - `accessKeyId` - your AWS Access Key Id. This may or or may not be required, depending on your preferred authentication option. - `secretAccessKey` - your AWS Secret Access Key. This may or or may not be required, depending on your preferred authentication option. - `profile` - your AWS profile name. This may or or may not be required, depending on your preferred authentication option. - `region` - AWS S3 region. Default: `us-west-2` - `skip-schema-lists` - Do not generate and upload schema list objects. If using a static registry for all Snowplow applications, don’t enable this setting as some components still require lists to function correctly. `igluctl static s3cp` tries to closely follow AWS CLI authentication process. First it checks if profile name or `accessKeyId`/`secretAccessKey` pair provided and uses it. If neither of above provided - it looks into `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` environment variables. If above aren’t available as well - it `~/.aws/config` file. If all above failed - it exits with error. ## static deploy `igluctl static deploy` performs whole schema workflow at once. It accepts one required arguments: - `config` - Path to configuration file ```bash $ ./igluctl static deploy /path/to/config/file ``` Your configuration file should be a hocon file, following the [reference example](https://github.com/snowplow/igluctl/blob/0.8.0/config/deploy.reference.hocon). For backwards compatibility with previous versions, you could also provide a [self-describing json](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.iglu/igluctl_config/jsonschema/1-0-0). Example: ```json { "lint": { "skipWarnings": true "includedChecks": [ "rootObject" "unknownFormats" "numericMinMax" "stringLength" "optionalNull" "description" "stringMaxLengthRange" ] } "generate": { "dbschema": "atomic" "force": false } "actions": [ { "action": "push" "isPublic": true "apikey": "bd96b5ff-7eb7-4085-83e0-97ac4954b891" "apikey": ${APIKEY_1} } { "action": "s3cp" "uploadFormat": "jsonschema" "profile": "profile-1" "region": "eu-east-2" } ] } ``` ## server keygen `igluctl server keygen` generates read and write API keys on Iglu Server. It accepts two required arguments: - `host` - Scala Iglu Registry host name or IP address with optional port and endpoint. It should conform pattern `host:port/path` (or just `host`) **without** http\:// prefix. - `apikey` - master API key, used to create temporary write and read keys Also it accepts `--vendor-prefix` argument which will be associated with generated key. ```bash $ ./igluctl server keygen --vendor-prefix com.acme iglu.acme.com:80/iglu-server f81d4fae-7dec-11d0-a765-00a0c91e6bf6 ``` ## table-check `igluctl table-check` will check given RedShift or Postgres schema against iglu repository. As of version 0.11.0 it would cross verify the column types as well as names. It supports two interfaces: - `igluctl table-check --server ` to check all tables - `igluctl table-check --resolver --schema ` to check particular table It also accepts a number of arguments: ```bash --resolver Iglu resolver config path --schema Schema to check against. It should have iglu: format --server Iglu Server URL --apikey Iglu Server Read ApiKey (non master) --dbschema Database schema --host Database host address --port Database port --dbname Database name --username Database username --password Database password ``` ```bash $ ./igluctl table-check --resolver --schema ...connection parameters ``` or ```bash $ ./igluctl table-check --server ...connection params ``` ## verify parquet `igluctl verify parquet` verifies that schema is evolved correctly within the same major version (e.g. from `1-a-b` to `1-c-d`) for parquet transformation (for loading into Databricks). It reports the breaking schema versions. It accepts one required arguments: - `input` - path to your schema files. Example command: ```bash $ ./igluctl verify parquet /path/to/static/registry ``` Example output: ```text Breaking change introduced by 'com.acme/product/jsonschema/1-0-2'. Changes: Incompatible type change Long to Double at /item/price Breaking change introduced by 'com.acme/user/jsonschema/1-0-1'. Changes: Incompatible type change Long to Integer at /id Breaking change introduced by 'com.acme/item/jsonschema/1-1-0'. Changes: Incompatible type change String to Json at /metadata ``` ## verify redshift `igluctl verify redshift` verifies that schema is evolved correctly within the same major version (e.g. from `1-a-b` to `1-c-d`) for loading into Redshift. It reports the major schema versions within which schema evolution rules were broken. It accepts two required and one optional arguments: - `server` - Iglu Server URL. - `apikey` - Iglu Server Read ApiKey - `--verbose/-v` - emit detailed report or not (disabled by default) Example command: ```bash $ ./igluctl verify redshift --server iglu.acme.com --apikey f81d4fae-7dec-11d0-a765-00a0c91e6bf6 ``` Example output: ```text iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-*-* iglu:com.snowplowanalytics.snowplow/event_fingerprint_config/jsonschema/1-*-* iglu:com.snowplowanalytics.snowplow.badrows/loader_runtime_error/jsonschema/1-*-* ``` Example command with verbose output: ```bash $ ./igluctl verify redshift --server iglu.acme.com --apikey f81d4fae-7dec-11d0-a765-00a0c91e6bf6 --verbose ``` Example output: ```text iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-*-*: iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-2: [Incompatible types in column cache_size old RedshiftBigInt new RedshiftDouble] iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-3: [Incompatible types in column cache_size old RedshiftBigInt new RedshiftDouble] iglu:com.snowplowanalytics.snowplow/event_fingerprint_config/jsonschema/1-*-*: iglu:com.snowplowanalytics.snowplow/event_fingerprint_config/jsonschema/1-0-1: [Incompatible types in column parameters.hash_algorithm old RedshiftChar(3) new RedshiftVarchar(6)] iglu:com.snowplowanalytics.snowplow.badrows/loader_runtime_error/jsonschema/1-*-*: iglu:com.snowplowanalytics.snowplow.badrows/loader_runtime_error/jsonschema/1-0-1: [Making required column nullable error] ``` --- # Introduction to the Iglu schema registry > Machine-readable schema registry for JSON and Thrift schemas with self-describing JSON support, schema versioning, and distributed repositories. > Source: https://docs.snowplow.io/docs/api-reference/iglu/ **Iglu** is a machine-readable schema registry for [JSON](http://json-schema.org/) and Thrift schema. A schema registry is like npm, Maven, or Git but holds data schemas instead of software or code. Iglu consists of three key technical aspects: 1. A [common architecture](/docs/api-reference/iglu/common-architecture/) that informs all aspects of Iglu 2. [Iglu registries](/docs/api-reference/iglu/iglu-repositories/) that can host a set of [self-describing JSON schemas](/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/) 3. [Iglu clients](/docs/api-reference/iglu/iglu-clients/) that can resolve schemas from one or more Iglu registries ## Iglu explained **Iglu** is built on a set of technical design decisions that allow Iglu clients and registries to interoperate. The key design components are: - [Self-describing JSON schema](/docs/api-reference/iglu/common-architecture/self-describing-json-schemas/): extensions to JSON schema that semantically identify and version a given JSON schema - [Self-describing JSON](/docs/api-reference/iglu/common-architecture/self-describing-jsons/): a standardized JSON format which co-locates a reference to the instance's JSON schema alongside the instance's data - [SchemaVer](/docs/api-reference/iglu/common-architecture/schemaver/): how we semantically version schemas - [Schema resolution](/docs/api-reference/iglu/common-architecture/schema-resolution/): our public algorithm for how we determine in which order we check Iglu registries for a given schema **Iglu clients** are used for interacting with Iglu server repos and for resolving schemas in embedded and remote Iglu schema registries. In the below diagram we show an Iglu client resolving a schema from Iglu Central, one embedded registry and a further two remote HTTP registries: ![Iglu client](/assets/images/iglu-clients-2a639a6f765d5146f869eb947a42f15c.png) An **Iglu registry** acts as a store of data schemas. Hosting JSON schemas in an Iglu registry allows you to use those schemas in Iglu-capable systems such as Snowplow. So far we support two types of Iglu registry: - **Remote registries** - essentially websites containing schemas which an Iglu client can query over HTTP - **Embedded registries** - which are embedded in a piece of software (typically alongside an Iglu client) In the below diagram we show an Iglu client resolving a schema from Iglu Central, one embedded registry and a further two remote HTTP registries: ![Iglu repositories](/assets/images/iglu-repos-82e5f47255a46f97fe0b46b8abe90934.png) **Iglu Central** ([https://iglucentral.com](https://iglucentral.com/)) is a public registry of Snowplow JSON schemas. Under the covers, Iglu Central is built and run as a **static Iglu registry**, hosted on Amazon S3. > A **static repo** is simply an Iglu registry server structured as a static website. ![Iglu Central](/assets/images/iglu-central-c0427b712c8c80ad53d1a8a2b7e6871d.png) The **deployment process** for Iglu Central is documented in [Iglu Central setup](/docs/api-reference/iglu/iglu-central-setup/) in case you want to set up a public mirror or private instance of Iglu Central. --- # Manage schemas using Iglu Server > Manage schemas with Iglu Server or host a static Iglu registry in Amazon S3 or Google Cloud Storage for self-hosted Snowplow deployments. > Source: https://docs.snowplow.io/docs/api-reference/iglu/manage-schemas/ To manage your [schemas](/docs/fundamentals/schemas/), you will need to have an [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) installed (you will already have one if you followed the [Snowplow Self-Hosted Quick Start](/docs/get-started/self-hosted/)). Alternatively, you can host a [static Iglu registry](/docs/api-reference/iglu/iglu-repositories/static-repo/) in Amazon S3 or Google Cloud Storage. ## Create a schema First, design the schema for your custom event (or entity). For example: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for a button click event", "self": { "vendor": "com.snowplowanalytics", "name": "button_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "id": { "type": "string", "minLength": 1 }, "target": { "type": "string" }, "content": { "type": "string" } }, "required": ["id"], "additionalProperties": false } ``` Next, save this schema in the following folder structure, with a filename of `1-0-0` (without any extension): ```text schemas └── com.snowplowanalytics └── button_click └── jsonschema └── 1-0-0 ``` > **Tip:** If you update the `vendor` or the `name` in the example, you should update the above path too. Finally, to upload your schema to your Iglu registry, you can use [igluctl](/docs/api-reference/iglu/igluctl-2/): ```bash igluctl static push --public ``` See the [Igluctl reference page](/docs/api-reference/iglu/igluctl-2/#static-push) for more information on the `static push` command. ## Versioning schemas When evolving your [schema](/docs/fundamentals/schemas/) and [uploading](/docs/api-reference/iglu/manage-schemas/) it to your [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/), you will need to choose how to increment its version. There are two kinds of schema changes: - **Non-breaking** - a non-breaking change is backward compatible with historical data and increments the `patch` number i.e. `1-0-0` -> `1-0-1`, or the middle digit i.e. `1-0-0` -> `1-1-0`. - **Breaking** - a breaking change is not backwards compatible with historical data and increments the `model` number i.e. `1-0-0` -> `2-0-0`. Different data warehouses handle schema evolution slightly differently. Use the table below as a guide for incrementing the schema version appropriately. | | Redshift | Snowflake, BigQuery, Databricks | | -------------------------------------------- | ------------ | ------------------------------- | | **Add / remove / rename an optional field** | Non-breaking | Non-breaking | | **Add / remove / rename a required field** | Breaking | Breaking | | **Change a field from optional to required** | Breaking | Breaking | | **Change a field from required to optional** | Breaking | Non-breaking | | **Change the type of an existing field** | Breaking | Breaking | | **Change the size of an existing field** | Non-breaking | Non-breaking | > **Warning:** In Redshift and Databricks, changing _size_ may also mean _type_ change. For example, changing the `maximum` integer from `30000` to `100000`. See our documentation on [how schemas translate to database types](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). --- # JSON Schema reference > Complete reference for JSON Schema draft 4 features supported in Snowplow data structures, including validation keywords and type definitions. > Source: https://docs.snowplow.io/docs/api-reference/json-schema-reference/ [Snowplow schemas](/docs/fundamentals/schemas/) are based on the [JSON Schema](https://json-schema.org/) standard ([draft 4](https://datatracker.ietf.org/doc/html/draft-fge-json-schema-validation-00)). This reference provides comprehensive documentation for all JSON Schema features that are supported in Snowplow. Understanding the full capabilities of JSON Schema allows you to create more precise and robust [data structures](/docs/fundamentals/schemas/) that ensure your data quality and provide clear documentation for your tracking implementation. ## Schema structure Every Snowplow schema must follow this basic structure with Snowplow-specific metadata and JSON Schema validation rules: ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Human-readable description of the schema purpose", "self": { "vendor": "com.example", "name": "schema_name", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { // Field definitions go here }, "additionalProperties": false, "required": ["required_field_name"] } ``` ### Required components Every Snowplow schema must include these components: - **`$schema`**: Must be `"http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#"` - **`self`** object containing: - **`vendor`**: Your organization identifier (e.g., `"com.example"`) - **`name`**: The schema name - **`format`**: Must be `"jsonschema"` - **`version`**: Semantic version (e.g., `"1-0-0"`) - **`type`**: Must be `"object"` for the root level ### Optional components These components are optional but commonly used: - **`description`**: Human-readable description of the schema purpose (highly recommended) - **`properties`**: Object defining the fields and their validation rules - **`additionalProperties`**: Whether additional properties are allowed (commonly set to `false`) - **`required`**: Array of required field names - **`minProperties`** / **`maxProperties`**: Constraints on number of properties ## Core validation keywords ### Type validation The `type` keyword specifies the expected data type for a value. Snowplow supports all JSON Schema primitive types: #### String type ```json { "user_name": { "type": "string", "description": "The user's display name" } } ``` #### Number and integer types ```json { "price": { "type": "number", "description": "Product price in USD" }, "quantity": { "type": "integer", "description": "Number of items purchased" } } ``` #### Boolean type ```json { "is_premium": { "type": "boolean", "description": "Whether the user has a premium account" } } ``` #### Array type ```json { "tags": { "type": "array", "description": "Product tags", "items": { "type": "string" } } } ``` #### Object type ```json { "address": { "type": "object", "description": "User's shipping address", "properties": { "street": {"type": "string"}, "city": {"type": "string"}, "postal_code": {"type": "string"} } } } ``` #### Null type ```json { "middle_name": { "type": ["string", "null"], "description": "User's middle name (optional)" } } ``` ### Multiple types You can specify multiple acceptable types using an array: ```json { "user_id": { "type": ["string", "integer"], "description": "User identifier (string or numeric)" }, "optional_field": { "type": ["string", "null"], "description": "Optional text field" } } ``` ## String validation ### Length constraints Control the minimum and maximum length of string values: ```json { "username": { "type": "string", "minLength": 3, "maxLength": 20, "description": "Username between 3-20 characters" }, "password": { "type": "string", "minLength": 8, "description": "Password must be at least 8 characters" } } ``` ### Enumeration Restrict values to a specific set of allowed strings: ```json { "status": { "type": "string", "enum": ["active", "inactive", "pending", "suspended"], "description": "Account status" }, "color": { "type": "string", "enum": ["red", "green", "blue", "yellow"], "description": "Primary color selection" } } ``` ### Pattern matching Use regular expressions to validate string format: ```json { "product_code": { "type": "string", "pattern": "^[A-Z]{2}-\\d{4}$", "description": "Product code format (e.g., AB-1234)" }, "phone_number": { "type": "string", "pattern": "^\\+?[1-9]\\d{1,14}$", "description": "International phone number format" } } ``` > **Tip:** For common formats like email addresses, URLs, and dates, prefer using the `format` keyword instead of regular expressions for better readability and standardized validation. ### Format validation Use the `format` keyword to validate common string formats: ```json { "email": { "type": "string", "format": "email", "description": "Valid email address" }, "website": { "type": "string", "format": "uri", "description": "Website URL" }, "server_ip": { "type": "string", "format": "ipv4", "description": "IPv4 address of the server" }, "created_at": { "type": "string", "format": "date-time", "description": "ISO 8601 timestamp" }, "user_id": { "type": "string", "format": "uuid", "description": "UUID identifier" } } ``` #### Supported format values - **`uri`**: Uniform Resource Identifier - **`ipv4`**: IPv4 address (e.g., "192.168.1.1") - **`ipv6`**: IPv6 address - **`email`**: Email address - **`date-time`**: ISO 8601 date-time (e.g., "2023-12-25T10:30:00Z") - **`date`**: ISO 8601 date (e.g., "2023-12-25") - **`hostname`**: Internet hostname - **`uuid`**: UUID string ## Numeric validation ### Range constraints Set minimum and maximum values for numbers and integers: ```json { "age": { "type": "integer", "minimum": 0, "maximum": 150, "description": "Person's age in years" }, "discount_rate": { "type": "number", "minimum": 0, "maximum": 1, "description": "Discount rate between 0 and 1" } } ``` ### Multiple constraints Combine multiple numeric validations: ```json { "rating": { "type": "number", "minimum": 1, "maximum": 5, "multipleOf": 0.5, "description": "Star rating in half-point increments" } } ``` ## Array validation ### Length constraints Control the size of arrays: ```json { "favorite_colors": { "type": "array", "minItems": 1, "maxItems": 5, "description": "User's favorite colors (1-5 selections)", "items": { "type": "string", "enum": ["red", "blue", "green", "yellow", "purple", "orange"] } } } ``` ### Item validation Define validation rules for array items: ```json { "purchase_items": { "type": "array", "description": "Items in the purchase", "items": { "type": "object", "properties": { "product_id": {"type": "string"}, "quantity": {"type": "integer", "minimum": 1}, "price": {"type": "number", "minimum": 0} }, "required": ["product_id", "quantity", "price"], "additionalProperties": false } } } ``` ### Unique items Ensure all array items are unique: ```json { "user_tags": { "type": "array", "uniqueItems": true, "description": "Unique tags assigned to user", "items": { "type": "string" } } } ``` ## Object validation ### Property requirements Specify which object properties are required: ```json { "user_profile": { "type": "object", "properties": { "first_name": {"type": "string"}, "last_name": {"type": "string"}, "email": {"type": "string"}, "phone": {"type": ["string", "null"]} }, "required": ["first_name", "last_name", "email"], "additionalProperties": false } } ``` ### Additional properties Control whether additional properties are allowed: ```json { "strict_object": { "type": "object", "properties": { "name": {"type": "string"}, "value": {"type": "number"} }, "additionalProperties": false, "description": "Only name and value properties allowed" }, "flexible_object": { "type": "object", "properties": { "core_field": {"type": "string"} }, "additionalProperties": true, "description": "Additional properties are permitted" } } ``` ### Property count constraints Limit the number of properties in an object: ```json { "metadata": { "type": "object", "minProperties": 1, "maxProperties": 10, "additionalProperties": {"type": "string"}, "description": "Metadata with 1-10 string properties" } } ``` ## Advanced validation patterns ### Schema composition Use `oneOf` and `anyOf` to create flexible validation rules: #### Using oneOf Validate that data matches exactly one of several schemas: ```json { "contact_info": { "type": "object", "oneOf": [ { "properties": { "type": {"enum": ["email"]}, "email": {"type": "string", "format": "email"} }, "required": ["type", "email"], "additionalProperties": false }, { "properties": { "type": {"enum": ["phone"]}, "phone": {"type": "string", "pattern": "^\\+?[1-9]\\d{1,14}$"} }, "required": ["type", "phone"], "additionalProperties": false }, { "properties": { "type": {"enum": ["address"]}, "street": {"type": "string"}, "city": {"type": "string"}, "postal_code": {"type": "string"} }, "required": ["type", "street", "city", "postal_code"], "additionalProperties": false } ] } } ``` #### Using anyOf Validate that data matches one or more of several schemas: ```json { "user_permissions": { "type": "object", "anyOf": [ { "properties": { "can_read": {"type": "boolean"} }, "required": ["can_read"] }, { "properties": { "can_write": {"type": "boolean"} }, "required": ["can_write"] }, { "properties": { "can_admin": {"type": "boolean"} }, "required": ["can_admin"] } ] } } ``` ## Limitations and unsupported features While Snowplow supports most JSON Schema Draft 4 features, there are some limitations to be aware of: - **`$ref`**: Schema references are not supported in property definitions - **`allOf`**: Schema intersection is not supported - **`not`**: Negation validation is not supported - **`dependencies`**: Property dependencies are not supported - **`exclusiveMinimum`** and **`exclusiveMaximum`**: Exclusive bounds are not supported Instead of unsupported features, use these approaches: ```json { // Instead of $ref, define inline schemas "address": { "type": "object", "properties": { "street": {"type": "string"}, "city": {"type": "string"}, "country": {"type": "string", "enum": ["US", "CA", "UK", "DE"]} }, "required": ["street", "city", "country"], "additionalProperties": false }, // Instead of exclusiveMinimum/exclusiveMaximum, use minimum/maximum with adjusted values "percentage": { "type": "number", "minimum": 0, "maximum": 99.99, "description": "Percentage value (0 to less than 100)" }, // Use format validation for common patterns "created_date": { "type": "string", "format": "date-time", "description": "ISO 8601 timestamp" } } ``` --- # BigQuery Loader configuration reference > Configure BigQuery Streaming Loader with BigQuery, Kinesis, and Pub/Sub settings for streaming enriched Snowplow events. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/bigquery-loader/configuration-reference/ The configuration reference in this page is written for BigQuery Loader `2.1.0` ### License The BigQuery Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.1/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run the loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, configure the `license.accept` option in the config file: ```json "license": { "accept": true } ``` ### BigQuery configuration | Parameter | Description | | ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `output.good.project` | Required. The GCP project to which the BigQuery dataset belongs | | `output.good.dataset` | Required. The BigQuery dataset to which events will be loaded | | `output.good.table` | Optional. Default value `events`. Name to use for the events table | | `output.good.credentials` | Optional. Service account credentials (JSON). If not set, default credentials will be sourced from the usual locations, e.g. file pointed to by the `GOOGLE_APPLICATION_CREDENTIALS` environment variable | ### Streams configuration **AWS:** | Parameter | Description | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.streamName` | Required. Name of the Kinesis stream with the enriched events | | `input.appName` | Optional, default `snowplow-bigquery-loader`. Name to use for the dynamodb table, used by the underlying Kinesis Consumer Library for managing leases. | | `input.initialPosition.type` | Optional, default `LATEST`. Allowed values are `LATEST`, `TRIM_HORIZON`, `AT_TIMESTAMP`. When the loader is deployed for the first time, this controls from where in the kinesis stream it should start consuming events. On all subsequent deployments of the loader, the loader will resume from the offsets stored in the DynamoDB table. | | `input.initialPosition.timestamp` | Required if `input.initialPosition` is `AT_TIMESTAMP`. A timestamp in ISO8601 format from where the loader should start consuming events. | | `input.retrievalMode` | Optional, default Polling. Change to FanOut to enable the enhance fan-out feature of Kinesis. | | `input.retrievalMode.maxRecords` | Optional. Default value 1000. How many events the Kinesis client may fetch in a single poll. Only used when `input.retrievalMode` is Polling. | | `input.workerIdentifier` | Optional. Defaults to the `HOSTNAME` environment variable. The name of this KCL worker used in the dynamodb lease table. | | `input.leaseDuration` | Optional. Default value `10 seconds`. The duration of shard leases. KCL workers must periodically refresh leases in the dynamodb table before this duration expires. | | `input.maxLeasesToStealAtOneTimeFactor` | Optional. Default value `2.0`. Controls how to pick the max number of shard leases to steal at one time. E.g. If there are 4 available processors, and `maxLeasesToStealAtOneTimeFactor = 2.0`, then allow the loader to steal up to 8 leases. Allows bigger instances to more quickly acquire the shard-leases they need to combat latency. | | `input.checkpointThrottledBackoffPolicy.minBackoff` | Optional. Default value `100 milliseconds`. Initial backoff used to retry checkpointing if we exceed the DynamoDB provisioned write limits. | | `input.checkpointThrottledBackoffPolicy.maxBackoff` | Optional. Default value `1 second`. Maximum backoff used to retry checkpointing if we exceed the DynamoDB provisioned write limits. | | `output.bad.streamName` | Required. Name of the Kinesis stream that will receive failed events. | | `output.bad.throttledBackoffPolicy.minBackoff` | Optional. Default value `100 milliseconds`. Initial backoff used to retry sending failed events if we exceed the Kinesis write throughput limits. | | `output.bad.throttledBackoffPolicy.maxBackoff` | Optional. Default value `1 second`. Maximum backoff used to retry sending failed events if we exceed the Kinesis write throughput limits. | | `output.bad.recordLimit` | Optional. Default value 500. The maximum number of records we are allowed to send to Kinesis in 1 PutRecords request. | | `output.bad.byteLimit` | Optional. Default value 5242880. The maximum number of bytes we are allowed to send to Kinesis in 1 PutRecords request. | | `output.bad.maxRecordSize` | Optional. Default value 1000000. Any single event failed event sent to Kinesis should not exceed this size in bytes | | `output.bad.maxRetries` (since 2.1.0) | Optional. Default value 10. Maximum number of retries by Kinesis Client. | **GCP:** | Parameter | Description | | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `input.subscription` | Required, e.g. `projects/myproject/subscriptions/snowplow-enriched`. Name of the Pub/Sub subscription with the enriched events | | `input.durationPerAckExtension` | Optional. Default value `15 seconds`. Pub/Sub ack deadlines are extended for this duration when needed. | | `input.minRemainingAckDeadline` | Optional. Default value `0.1`. Controls when ack deadlines are re-extended, for a message that is close to exceeding its ack deadline. For example, if `durationPerAckExtension` is `15 seconds` and `minRemainingAckDeadline` is `0.1` then the loader will wait until there is `1.5 seconds` left of the remaining deadline, before re-extending the message deadline. | | `input.maxMessagesPerPull` | Optional. Default value 1000. How many Pub/Sub messages to pull from the server in a single request. | | `input.debounceRequests` | Optional. Default value `100 millis`. Adds an artifical delay between consecutive requests to Pub/Sub for more messages. Under some circumstances, this was found to slightly alleviate a problem in which Pub/Sub might re-deliver the same messages multiple times. | | `input.retries.transientErrors.delay` (since 2.1.0) | Optional. Default value `100 millis`. Backoff delay for follow-up attempts after transient errors. | | `input.retries.transientErrors.attempts` (since 2.1.0) | Optional. Default value `10`. Maximum number of attempts, after which the loader will crash and exit. | | `output.bad.topic` | Required, e.g. `projects/myproject/topics/snowplow-bad`. Name of the Pub/Sub topic that will receive failed events. | | `output.bad.batchSize` | Optional. Default value 1000. Bad events are sent to Pub/Sub in batches not exceeding this count. | | `output.bad.requestByteThreshold` | Optional. Default value 1000000. Bad events are sent to Pub/Sub in batches with a total size not exceeding this byte threshold | | `output.bad.maxRecordSize` | Optional. Default value 9000000. Any single failed event sent to Pub/Sub should not exceed this size in bytes | | `output.retries.transientErrors.delay` (since 2.1.0) | Optional. Default value `100 millis`. Backoff delay for follow-up attempts after transient errors. | | `output.retries.transientErrors.attempts` (since 2.1.0) | Optional. Default value `10`. Maximum number of attempts, after which the loader will crash and exit. | *** ## Other configuration options | Parameter | Description | | --------- | ----------- | | | | | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `batching.maxBytes` | Optional. Default value `10000000`. Events are emitted to BigQuery when the batch reaches this size in bytes | | `batching.maxDelay` | Optional. Default value `1 second`. Events are emitted to BigQuery after a maximum of this duration, even if the `maxBytes` size has not been reached | | `batching.writeBatchConcurrency` | Optional. Default value 2. How many batches can we send simultaneously over the network to BigQuery | | `cpuParallelism.parseBytesFactor` | Optional. Default value `0.1`.Controls how many batches of bytes we can parse into enriched events simultaneously.E.g. If there are 2 cores and `parseBytesFactor = 0.1` then only one batch gets processed at a time.Adjusting this value can cause the app to use more or less of the available CPU. | | `cpuParallelism.transformFactor` | Optional. Default value `0.75`.Controls how many batches of enriched events we can transform into BigQuery format simultaneously.E.g. If there are 4 cores and `transformFactor = 0.75` then 3 batches gets processed in parallel.Adjusting this value can cause the app to use more or less of the available CPU. | | `retries.setupErrors.delay` | Optional. Default value `30 seconds`.Configures exponential backoff on errors related to how BigQuery is set up for this loader.Examples include authentication errors and permissions errors.This class of errors are reported periodically to the monitoring webhook. | | `retries.transientErrors.delay` | Optional. Default value `1 second`.Configures exponential backoff on errors that are likely to be transient.Examples include server errors and network errors. | | `retries.transientErrors.attempts` | Optional. Default value 5. Maximum number of attempts to make before giving up on a transient error. | | `skipSchemas` | Optional, e.g. `["iglu:com.example/skipped1/jsonschema/1-0-0"]` or with wildcards `["iglu:com.example/skipped2/jsonschema/1--"]`.A list of schemas that won't be loaded to BigQuery.This feature could be helpful when recovering from edge-case schemas which for some reason cannot be loaded to the table. | | `legacyColumnMode` | Optional. Default value `false`.When this mode is enabled, the loader uses the legacy column style used by the v1 BigQuery loader.For example, an entity for a `1-0-0` schema is loaded into a column ending in `_1_0_0`, instead of a column ending in `_1`.This feature could be helpful when migrating from the v1 loader to the v2 loader. | | `legacyColumns` | Optional, e.g. `["iglu:com.example/legacy/jsonschema/1-0-0"]` or with wildcards `["iglu:com.example/legacy/jsonschema/1--"]`.Schemas for which to use the legacy column style used by the v1 BigQuery loader, even when `legacyColumnMode` is disabled. | | `exitOnMissingIgluSchema` | Optional. Default value `true`.Whether the loader should crash and exit if it fails to resolve an Iglu Schema.We recommend `true` because Snowplow enriched events have already passed validation, so a missing schema normally indicates an error that needs addressing.Change to `false` so events go the failed events stream instead of crashing the loader. | | `monitoring.metrics.statsd.hostname` | Optional. If set, the loader sends statsd metrics over UDP to a server on this host name. | | `monitoring.metrics.statsd.port` | Optional. Default value 8125. If the statsd server is configured, this UDP port is used for sending metrics. | | `monitoring.metrics.statsd.tags.*` | Optional. A map of key/value pairs to be sent along with the statsd metric. | | `monitoring.metrics.statsd.period` | Optional. Default `1 minute`. How often to report metrics to statsd. | | `monitoring.metrics.statsd.prefix` | Optional. Default `snowplow.bigquery-loader`. Prefix used for the metric name when sending to statsd. | | `monitoring.webhook.endpoint` | Optional, e.g. `https://webhook.example.com`. The loader will send to the webhook a payload containing details of any error related to how BigQuery is set up for this loader. | | `monitoring.webhook.tags.*` | Optional. A map of key/value strings to be included in the payload content sent to the webhook. | | `monitoring.webhook.heartbeat.*` | Optional. Default value `5.minutes`. How often to send a heartbeat event to the webhook when healthy. | | `monitoring.sentry.dsn` | Optional. Set to a Sentry URI to report unexpected runtime exceptions. | | `monitoring.sentry.tags.*` | Optional. A map of key/value strings which are passed as tags when reporting exceptions to Sentry. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `http.client.maxConnectionsPerServer` | Optional. Default value 4. Configures the internal HTTP client used for iglu resolver, alerts and telemetry. The maximum number of open HTTP requests to any single server at any one time. | --- # BigQuery Streaming Loader > Stream Snowplow events to BigQuery from Kinesis or Pub/Sub with real-time loading and schema evolution. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/bigquery-loader/ The BigQuery Streaming Loader is an application that loads Snowplow events to BigQuery. **AWS:** On AWS, the BigQuery Streaming Loader continually pulls events from Kinesis and writes to BigQuery using the [BigQuery Storage API](https://cloud.google.com/bigquery/docs/write-api). ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] loader{{"BigQuery Loader"}} subgraph bigquery [BigQuery] table[("Events table")] end stream-->loader-->|BigQuery Storage API|bigquery ``` The BigQuery Loader is published as a Docker image which you can run on any AWS VM. ```bash docker pull snowplow/bigquery-loader-kinesis:2.1.0 ``` To run the loader, mount your config file into the docker image, and then provide the file path on the command line. ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ snowplow/bigquery-loader-kinesis:2.1.0 \ --config=/myconfig/loader.hocon \ --iglu-config /myconfig/iglu.hocon ``` Where `loader.hocon` is loader's [configuration file](/docs/api-reference/loaders-storage-targets/bigquery-loader/#configuring-the-loader) and `iglu.hocon` is [iglu resolver](/docs/api-reference/iglu/iglu-resolver/) configuration. **GCP:** On GCP, the BigQuery Streaming Loader continually pulls events from Pub/Sub and writes to BigQuery using the [BigQuery Storage API](https://cloud.google.com/bigquery/docs/write-api). ```mermaid flowchart LR stream[["Enriched Events (Pub/Sub stream)"]] loader{{"BigQuery Loader"}} subgraph bigquery [BigQuery] table[("Events table")] end stream-->loader-->|BigQuery Storage API|bigquery ``` The BigQuery Loader is published as a Docker image which you can run on any GCP VM. ```bash docker pull snowplow/bigquery-loader-pubsub:2.1.0 ``` To run the loader, mount your config file into the docker image, and then provide the file path on the command line. ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ snowplow/bigquery-loader-pubsub:2.1.0 \ --config=/myconfig/loader.hocon \ --iglu-config /myconfig/iglu.hocon ``` Where `loader.hocon` is loader's [configuration file](/docs/api-reference/loaders-storage-targets/bigquery-loader/#configuring-the-loader) and `iglu.hocon` is [iglu resolver](/docs/api-reference/iglu/iglu-resolver/) configuration. *** > **Tip:** For more information on how events are stored in BigQuery, check the [mapping between Snowplow schemas and the corresponding BigQuery column types](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/?warehouse=bigquery). ## Configuring the loader The loader config file is in HOCON format, and it allows configuring many different properties of how the loader runs. The simplest possible config file just needs a description of your pipeline inputs and outputs: **AWS:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/snowplow-bigquery-loader/blob/master/config/config.kinesis.minimal.hocon) **GCP:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/snowplow-bigquery-loader/blob/master/config/config.pubsub.minimal.hocon) *** See the [configuration reference](/docs/api-reference/loaders-storage-targets/bigquery-loader/configuration-reference/) for all possible configuration parameters. ### Iglu The BigQuery Loader requires an [Iglu resolver file](/docs/api-reference/iglu/iglu-resolver/) which describes the Iglu repositories that host your schemas. This should be the same Iglu configuration file that you used in the Enrichment process. ## Metrics The BigQuery Loader can be configured to send the following custom metrics to a [StatsD](https://www.datadoghq.com/statsd-monitoring/) receiver: | Metric | Definition | | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `events_good` | A count of events that are successfully written to BigQuery. | | `events_bad` | A count of failed events that could not be loaded, and were instead sent to the bad output stream. | | `latency_millis` | The time in milliseconds from when events are written to the source stream of events (i.e. by Enrich) until when they are read by the loader. | | `e2e_latency_millis` | The end-to-end latency of the snowplow pipeline. The time in milliseconds from when an event was received by the collector, until it is written into BigQuery. | See the `monitoring.metrics.statsd` options in the [configuration reference](/docs/api-reference/loaders-storage-targets/bigquery-loader/configuration-reference/) for how to configure the StatsD receiver. **Telemetry notice** By default, Snowplow collects telemetry data for BigQuery Loader (since version 2.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # BigQuery Loader 1.0.x upgrade guide > Upgrade BigQuery Loader from 0.6.x to 1.0.x with HOCON config, new load_tstamp field, and StreamLoader migration steps. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/bigquery-loader/upgrade-guides/1-0-x-upgrade-guide/ ## Configuration The only breaking change from the 0.6.x series is the new format of the configuration file. That used to be a self-describing JSON but is now HOCON. Additionally, some app-specific command-line arguments have been incorporated into the config, such as Repeater's `--failedInsertsSub` option. For more details, see the [setup guide](/docs/api-reference/loaders-storage-targets/bigquery-loader/) and [configuration reference](/docs/api-reference/loaders-storage-targets/bigquery-loader/previous-versions/bigquery-loader-1.x/configuration-reference/). Using Repeater as an example, if your configuration for 0.6.x looked like this: ```json { "schema": "iglu:com.snowplowanalytics.snowplow.storage/bigquery_config/jsonschema/1-0-0", "data": { "name": "Alpha BigQuery test", "id": "31b1559d-d319-4023-aaae-97698238d808", "projectId": "com-acme", "datasetId": "snowplow", "tableId": "events", "input": "enriched-sub", "typesTopic": "types-topic", "typesSubscription": "types-sub", "badRows": "bad-topic", "failedInserts": "failed-inserts-topic", "load": { "mode": "STREAMING_INSERTS", "retry": false }, "purpose": "ENRICHED_EVENTS" } } ``` it will now look like this: ```json { "projectId": "com-acme" "loader": { "input": { "subscription": "enriched-sub" } "output": { "good": { "datasetId": "snowplow" "tableId": "events" } "bad": { "topic": "bad-topic" } "types": { "topic": "types-topic" } "failedInserts": { "topic": "failed-inserts-topic" } } } "mutator": { "input": { "subscription": "types-sub" } "output": { "good": ${loader.output.good} # will be automatically inferred } } "repeater": { "input": { "subscription": "failed-inserts-sub" } "output": { "good": ${loader.output.good} # will be automatically inferred "deadLetters": { "bucket": "gs://dead-letter-bucket" } } } "monitoring": {} # disabled } ``` And instead of running it like this: ```bash $ ./snowplow-bigquery-repeater \ --config=$CONFIG \ --resolver=$RESOLVER \ --failedInsertsSub="failed-inserts-sub" \ --deadEndBucket="gs://dead-letter-bucket" --desperatesBufferSize=20 \ --desperatesWindow=20 \ --backoffPeriod=900 \ --verbose ``` you will run it like this: ```bash $ docker run \ -v /path/to/resolver.json:/resolver.json \ snowplow/snowplow-bigquery-repeater:1.0.1 \ --config=$CONFIG \ --resolver=/resolver.json \ --bufferSize=20 \ --timeout=20 \ --backoffPeriod=900 \ --verbose ``` ## New events table field The first time you deploy Mutator 1.0.0 it will add a new column to your events table: `load_tstamp`. This represents the exact moment when the row was inserted into BigQuery. It shows you when events have arrived in the warehouse, which makes it possible to use incremental processing of newly arrived data in your downstream data modeling. Depending on your traffic volume and pattern, there might be a short time period in which the loader app cannot write to BigQuery because the new column hasn't propagated and is not yet visible to all workers. For that reason, **we recommend that you upgrade Mutator first**. ## Migrating to StreamLoader StreamLoader has been built as a standalone application, replacing Apache Beam and no longer requires you to use Dataflow. Depending on your data volume and traffic patterns, this might lead to significant cost reductions. However, by migrating away from Dataflow, you no longer benefit from its exactly-once processing guarantees. As such, there could be a slight increase in the number of duplicate events loaded into BigQuery. Duplicate events generally are to be expected in a Snowplow pipeline, which provides an at-least-once guarantee. In our tests, we found that duplicates arise only during extreme autoscaling of the loader, eg if your pipeline has a sudden extreme spike in events. Aside from autoscaling events, we found the number of duplicate rows to be very low, however this depends on the type of worker infrastructure you use. --- # BigQuery Loader 2.0.0 upgrade guide > Guide for upgrading to BigQuery Loader 2.0.0, including configuration changes, new column naming strategy, and recovery columns for schema evolution. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/ ## Configuration BigQuery Loader 2.0.0 brings changes to the loading setup. It is no longer necessary to configure and deploy three independent applications (Loader, Repeater and Mutator in [1.X](/docs/api-reference/loaders-storage-targets/bigquery-loader/previous-versions/bigquery-loader-1.x/)) in order to load your data to BigQuery. Starting from 2.0.0, only one application is needed, which naturally introduces some breaking changes to the configuration file structure. See the [configuration reference](/docs/api-reference/loaders-storage-targets/bigquery-loader/configuration-reference/) for all possible configuration parameters and the minimal [configuration samples](https://github.com/snowplow-incubator/snowplow-bigquery-loader/blob/v2/config) for each of supported cloud environments. ## Infrastructure Apart from Repeater and Mutator, other infrastructure components have become obsolete: - The `types` PubSub topic connecting Loader and Mutator. - The `failedInserts` PubSub topic connecting Loader and Repeater. - The `deadLetter` GCS bucket used by Repeater to store data that repeatedly failed to be inserted into BigQuery. This means that failed events are now written to the failed events PubSub topic, configured as `output.bad.topic`, rather than directly to the GCS bucket as before. This change was made to consolidate all types of event failures into a single place. ## Events table format Starting from 2.0.0, BigQuery Loader changes its output column naming strategy. For example, for [ad\_click event](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.media/ad_click_event/jsonschema/1-0-0): - Before an upgrade, the corresponding column would be named `unstruct_event_com_snowplowanalytics_snowplow_media_ad_click_event_1_0_0`. - After an upgrade, new column will be named `unstruct_event_com_snowplowanalytics_snowplow_media_ad_click_event_1`. All self-describing events and entities will be loaded to new "major version"-oriented columns. Old "full version"-oriented columns would remain unchanged, but no new data would be loaded into them (the 2.0.0 loader just ignores these columns). The new column naming scheme has several advantages: - Fewer columns created (BigQuery has a limit on the total number of columns) - No need to update data models (or use complex macros) every time a new minor version of a schema is created The catch is that you have to follow the rules of schema evolution more strictly to ensure data from different schema versions can fit in the same column — see below. > **Tip:** If you are using [Snowplow dbt models](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/), they will automatically consolidate the data between `_1_0_0` and `_1` style columns, because they look at the major version prefix (e.g. `_1`), which is common to both. > > If you are not using Snowplow dbt models but still use dbt, you can employ [this macro](https://github.com/snowplow/dbt-snowplow-utils#combine_column_versions-source) to manually aggregate the data across old and new columns. ### Enable legacy mode for the old table format To simplify migration to the new table format, it is possible to run the 2.x loader in legacy mode, so it loads self-describing events and entities using the old column names of the 1.x loader. **Option 1:** In the configuration file, set `legacyColumnMode` to `true`. When this mode is enabled, the loader uses the legacy column style for all self-describing events and entities. **Option 2:** In the configuration file, set `legacyColumns` to list specific schemas for which to use the legacy column style. This list is used when `legacyColumnMode` is `false` (the default). For example: ```json "legacyColumns": [ "iglu:com.example/legacy_a/jsonschema/1-0-0", "iglu:com.example/legacy_b/jsonschema/1-*-*" ] ``` ## Recovery columns ### What is schema evolution? One of Snowplow’s key features is the ability to [define custom schemas and validate events against them](/docs/fundamentals/schemas/). Over time, users often evolve the schemas, e.g. by adding new fields or changing existing fields. To accommodate these changes, BigQuery Loader 2.0.0 automatically adjusts the database tables in the warehouse accordingly. There are two main types of schema changes: **Breaking**: The schema version has to be changed in a major way (`1-2-3` → `2-0-0`). As of BigQuery Loader 2.0.0, each major schema version has its own column (`..._1`, `..._2`, etc, for example: `contexts_com.snowplowanalytics_ad_click_1`). **Non-breaking**: The schema version can be changed in a minor way (`1-2-3` → `1-3-0` or `1-2-3` → `1-2-4`). Data is stored in the same database column. Loader tries to format the incoming data according to the latest version of the schema it saw (for a given major version, e.g. `1-*-*`). For example, if a batch contains events with schema versions `1-0-0`, `1-0-1` and `1-0-2`, the loader derives the output schema based on version `1-0-2`. Then the loader instructs BigQuery to adjust the database column and load the data. ### Recovering from invalid schema evolution Let's consider these two schemas as an example of breaking schema evolution (changing the type of a field from `integer` to `string`) using the same major version (`1-0-0` and `1-0-1`): ```json { // 1-0-0 "properties": { "a": {"type": "integer"} } } ``` ```json { // 1-0-1 "properties": { "a": {"type": "string"} } } ``` With BigQuery Loader 1.x, data for each version would go to its own column — no issue. With BigQuery Loader 2.x, there is only one column. But strings and integers can’t coexist! To avoid crashing or losing data, BigQuery Loader 2.0.0 proceeds by creating a new column for the data with schema `1-0-1`, e.g. `contexts_com_snowplowanalytics_ad_click_1_0_1_recovered_9999999`, where: - `1_0_1` is the version of the offending schema; - `9999999` is a hash code unique to the schema (i.e. it will change if the schema is overwritten with a different one). If you create a new schema `1-0-2` that reverts the offending changes and is again compatible with `1-0-0`, the data for events with that schema will be written to the original column as expected. > **Tip:** You might find that some of your schemas were evolved incorrectly in the past, which results in the creation of these “recovery” columns after the upgrade. To address this for a given schema, create a new _minor_ schema version that reverts the breaking changes introduced in previous versions. (Or, if you want to keep the breaking change, create a new _major_ schema version.) You can set it to [supersede](/docs/fundamentals/schemas/versioning/#mark-a-schema-as-superseded) the previous version(s), so that events are automatically validated against the new schema. > **Note:** If events with incorrectly evolved schemas never arrive, then the recovery column would not be created. You can read more about schema evolution and how recovery columns work [here](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/?warehouse=bigquery#versioning). --- # Upgrade guides for the BigQuery Loader > Upgrade guides for BigQuery Loader with breaking changes, migration steps, and compatibility notes for major versions. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/bigquery-loader/upgrade-guides/ This section contains information to help you upgrade to newer versions of the BigQuery Loader. --- # Databricks Streaming Loader configuration reference > Configure Databricks Streaming Loader with Unity Catalog, Kinesis, Pub/Sub, and Kafka settings for lakehouse streaming. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/configuration-reference/ The configuration reference in this page is written for Databricks Streaming Loader `0.4.0` ### License The Databricks Streaming Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.1/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run the loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, configure the `license.accept` option in the config file: ```json "license": { "accept": true } ``` ### Databricks configuration | Parameter | Description | | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `output.good.host` | Required, e.g. `https://{workspace-id}.cloud.databricks.com`. URL of the Databricks workspace. | | `output.good.catalog` | Required. Name of the Databricks catalog containing the volume. | | `output.good.schema` | Required. Name of the Databricks schema containing the volume. | | `output.good.volume` | Required. Name of the Databricks volume to which this loader will upload staging files. This must be an [external](https://docs.databricks.com/aws/en/volumes/managed-vs-external) volume. | | `output.good.token` | Required if using PAT authentication. A Databricks [personal access token](https://docs.databricks.com/aws/en/dev-tools/auth). | | `output.good.oauth.clientId` | Required if using OAUTH authentication. The client ID for a Databricks [service principal](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m). | | `output.good.oauth.clientSecret` | Required if using OAUTH authentication. The client secret for a Databricks [service principal](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m). | | `output.good.compression` | Optional. Default value `snappy`. Compression algorithm for the uploaded staging parquet files. | | `output.good.httpTimeout` | Optional. Default value `20 seconds`. Timeout duration of Databricks SDK's HTTP client. | ### Streams configuration **AWS:** | Parameter | Description | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.streamName` | Required. Name of the Kinesis stream with the enriched events | | `input.appName` | Optional, default `snowplow-databricks-loader`. Name to use for the dynamodb table, used by the underlying Kinesis Consumer Library for managing leases. | | `input.initialPosition.type` | Optional, default `LATEST`. Allowed values are `LATEST`, `TRIM_HORIZON`, `AT_TIMESTAMP`. When the loader is deployed for the first time, this controls from where in the kinesis stream it should start consuming events. On all subsequent deployments of the loader, the loader will resume from the offsets stored in the DynamoDB table. | | `input.initialPosition.timestamp` | Required if `input.initialPosition` is `AT_TIMESTAMP`. A timestamp in ISO8601 format from where the loader should start consuming events. | | `input.retrievalMode` | Optional, default Polling. Change to FanOut to enable the enhance fan-out feature of Kinesis. | | `input.retrievalMode.maxRecords` | Optional. Default value 1000. How many events the Kinesis client may fetch in a single poll. Only used when `input.retrievalMode` is Polling. | | `input.workerIdentifier` | Optional. Defaults to the `HOSTNAME` environment variable. The name of this KCL worker used in the dynamodb lease table. | | `input.leaseDuration` | Optional. Default value `10 seconds`. The duration of shard leases. KCL workers must periodically refresh leases in the dynamodb table before this duration expires. | | `input.maxLeasesToStealAtOneTimeFactor` | Optional. Default value `2.0`. Controls how to pick the max number of shard leases to steal at one time. E.g. If there are 4 available processors, and `maxLeasesToStealAtOneTimeFactor = 2.0`, then allow the loader to steal up to 8 leases. Allows bigger instances to more quickly acquire the shard-leases they need to combat latency. | | `input.checkpointThrottledBackoffPolicy.minBackoff` | Optional. Default value `100 milliseconds`. Initial backoff used to retry checkpointing if we exceed the DynamoDB provisioned write limits. | | `input.checkpointThrottledBackoffPolicy.maxBackoff` | Optional. Default value `1 second`. Maximum backoff used to retry checkpointing if we exceed the DynamoDB provisioned write limits. | | `output.bad.streamName` | Required. Name of the Kinesis stream that will receive failed events. | | `output.bad.throttledBackoffPolicy.minBackoff` | Optional. Default value `100 milliseconds`. Initial backoff used to retry sending failed events if we exceed the Kinesis write throughput limits. | | `output.bad.throttledBackoffPolicy.maxBackoff` | Optional. Default value `1 second`. Maximum backoff used to retry sending failed events if we exceed the Kinesis write throughput limits. | | `output.bad.recordLimit` | Optional. Default value 500. The maximum number of records we are allowed to send to Kinesis in 1 PutRecords request. | | `output.bad.byteLimit` | Optional. Default value 5242880. The maximum number of bytes we are allowed to send to Kinesis in 1 PutRecords request. | | `output.bad.maxRecordSize` | Optional. Default value 1000000. Any single event failed event sent to Kinesis should not exceed this size in bytes | | `output.bad.maxRetries` (since 0.4.0) | Optional. Default value 10. Maximum number of retries by Kinesis Client. | **GCP:** | Parameter | Description | | ----------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.subscription` | Required, e.g. `projects/myproject/subscriptions/snowplow-enriched`. Name of the Pub/Sub subscription with the enriched events | | `input.durationPerAckExtension` | Optional. Default value `15 seconds`. Pub/Sub ack deadlines are extended for this duration when needed. | | `input.minRemainingAckDeadline` | Optional. Default value `0.1`. Controls when ack deadlines are re-extended, for a message that is close to exceeding its ack deadline. For example, if `durationPerAckExtension` is `60 seconds` and `minRemainingAckDeadline` is `0.1` then the loader will wait until there is `6 seconds` left of the remining deadline, before re-extending the message deadline. | | `input.maxMessagesPerPull` | Optional. Default value 1000. How many Pub/Sub messages to pull from the server in a single request. | | `input.debounceRequests` | Optional. Default value `100 millis`. Adds an artifical delay between consecutive requests to Pub/Sub for more messages. Under some circumstances, this was found to slightly alleviate a problem in which Pub/Sub might re-deliver the same messages multiple times. | | `input.retries.transientErrors.delay` | Optional. Default value `100 millis`. Backoff delay for follow-up attempts | | `input.retries.transientErrors.attempts` | Optional. Default value `10`. Max number of attempts, after which Enrich will crash and exit | | `output.bad.topic` | Required, e.g. `projects/myproject/topics/snowplow-bad`. Name of the Pub/Sub topic that will receive failed events. | | `output.bad.batchSize` | Optional. Default value 1000. Bad events are sent to Pub/Sub in batches not exceeding this count. | | `output.bad.requestByteThreshold` | Optional. Default value 1000000. Bad events are sent to Pub/Sub in batches with a total size not exceeding this byte threshold | | `output.bad.maxRecordSize` | Optional. Default value 9000000. Any single failed event sent to Pub/Sub should not exceed this size in bytes | | `output.retries.transientErrors.delay` | Optional. Default value `100 millis`. Backoff delay for follow-up attempts | | `output.retries.transientErrors.attempts` | Optional. Default value `10`. Max number of attempts, after which Enrich will crash and exit | **Azure:** | Parameter | Description | | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.topicName` | Required. Name of the Kafka topic for the source of enriched events. | | `input.bootstrapServers` | Required. Hostname and port of Kafka bootstrap servers hosting the source of enriched events. | | `input.consumerConf.*` | Optional. A map of key/value pairs for [any standard Kafka consumer configuration option](https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html). | | `input.debounceCommitOffsets` | Optional. Default value 10 seconds. How frequently to commit our progress back to kafka. By increasing this value, we decrease the number of requests made to the kafka broker. | | `input.commitTimeout` (since 0.4.0) | Optional. Default value 15 seconds. The time to wait for offset commits to complete. If an offset commit doesn't complete within this time, a CommitTimeoutException will be raised instead. | | `output.bad.topicName` | Required. Name of the Kafka topic that will receive failed events. | | `output.bad.bootstrapServers` | Required. Hostname and port of Kafka bootstrap servers hosting the bad topic | | `output.bad.producerConf.*` | Optional. A map of key/value pairs for [any standard Kafka producer configuration option](https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html). | | `output.bad.maxRecordSize` | Optional. Default value 1000000. Any single failed event sent to Kafka should not exceed this size in bytes | > **Info:** You can use the `input.consumerConf` and `output.bad.producerConf` options to configure authentication to Azure event hubs using SASL. For example: > > ```json > "input.consumerConf": { > "security.protocol": "SASL_SSL" > "sasl.mechanism": "PLAIN" > "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\$ConnectionString\" password=;" > } > ``` *** ## Other configuration options | Parameter | Description | | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `batching.maxBytes` | Optional. Default value `16000000`. Events are uploaded to the Databricks volume when the batch reaches this size in bytes | | `batching.maxDelay` | Optional. Default value `1 second`. Events are uploaded to the Databricks volume after a maximum of this duration, even if the `maxBytes` size has not been reached | | `batching.uploadParallelismFactor` | Optional. Default value 3. Controls how many batches can we send simultaneously over the network to Databricks. E.g. If there are 4 available processors, and `uploadParallelismFactor` is 3.5, then the loader sends up to 14 batches in parallel. Adjusting this value can cause the app to use more or less of the available CPU. | | `cpuParallelismFactor` | Optional. Default value 0.75. Controls how the app splits the workload into concurrent batches which can be run in parallel. E.g. If there are 4 available processors, and `cpuParallelismFactor` is 0.75, then the loader processes 3 batches concurrently. Adjusting this value can cause the app to use more or less of the available CPU. | | `retries.setupErrors.delay` | Optional. Default value `30 seconds`. Configures exponential backoff on errors related to how Databricks is set up for this loader. Examples include authentication errors and permissions errors. This class of errors are reported periodically to the monitoring webhook. | | `retries.transientErrors.delay` | Optional. Default value `1 second`. Configures exponential backoff on errors that are likely to be transient. Examples include server errors and network errors. | | `retries.transientErrors.attempts` | Optional. Default value 5. Maximum number of attempts to make before giving up on a transient error. | | `skipSchemas` | Optional, e.g. `["iglu:com.example/skipped1/jsonschema/1-0-0"]` or with wildcards `["iglu:com.example/skipped2/jsonschema/1--"]`. A list of schemas that won't be loaded to Databricks. This feature could be helpful when recovering from edge-case schemas which for some reason cannot be loaded to the table. | | `exitOnMissingIgluSchema` | Optional. Default value `true`. Whether the loader should crash and exit if it fails to resolve an Iglu Schema. We recommend `true` because Snowplow enriched events have already passed validation, so a missing schema normally indicates an error that needs addressing. Change to `false` so events go the failed events stream instead of crashing the loader. | | `monitoring.metrics.statsd.hostname` | Optional. If set, the loader sends statsd metrics over UDP to a server on this host name. | | `monitoring.metrics.statsd.port` | Optional. Default value 8125. If the statsd server is configured, this UDP port is used for sending metrics. | | `monitoring.metrics.statsd.tags.*` | Optional. A map of key/value pairs to be sent along with the statsd metric. | | `monitoring.metrics.statsd.period` | Optional. Default `1 minute`. How often to report metrics to statsd. | | `monitoring.metrics.statsd.prefix` | Optional. Default `snowplow.databricks-loader`. Prefix used for the metric name when sending to statsd. | | `monitoring.webhook.endpoint` | Optional, e.g. `https://webhook.example.com`. The loader will send to the webhook a payload containing details of any error related to how Databricks is set up for this loader. | | `monitoring.webhook.tags.*` | Optional. A map of key/value strings to be included in the payload content sent to the webhook. | | `monitoring.webhook.heartbeat.*` | Optional. Default value `5.minutes`. How often to send a heartbeat event to the webhook when healthy. | | `monitoring.sentry.dsn` | Optional. Set to a Sentry URI to report unexpected runtime exceptions. | | `monitoring.sentry.tags.*` | Optional. A map of key/value strings which are passed as tags when reporting exceptions to Sentry. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `http.client.maxConnectionsPerServer` | Optional. Default value 4. Configures the internal HTTP client used for iglu resolver, alerts and telemetry. The maximum number of open HTTP requests to any single server at any one time. | --- # Databricks Streaming Loader > Load Snowplow events to Databricks with low latency using Lakeflow Declarative Pipelines and Unity Catalog volumes. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/ > **Info:** You will need a premium Databricks plan to use Lakeflow Declarative Pipelines. The Databricks Streaming Loader is an application that integrates with a Databricks [Lakeflow Declarative Pipeline](https://docs.databricks.com/aws/en/dlt/) to load Snowplow events into Databricks with low latency. **AWS:** There are two parts to how the Databricks Streaming Loader works. In the first part, you use Snowplow's Databricks Streaming Loader to push staging files into a [Unity Catalog volume](https://docs.databricks.com/aws/en/volumes/). ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] loader{{"Databricks Streaming Loader"}} subgraph databricks [Databricks] volume[("Volume")] end stream-->loader-->|REST API|databricks ``` In the second part, you use a Databricks Lakeflow Declarative Pipeline to load the staging files into a Streaming Live Table. ```mermaid flowchart LR subgraph databricks [Databricks] direction LR volume[("Volume")] pipeline{{"Lakeflow Declarative Pipeline"}} table[("Streaming Live Table")] volume-->pipeline-->table end ``` The Databricks Streaming Loader is published as a Docker image which you can run on any AWS VM. ```bash docker pull snowplow/databricks-loader-kinesis:0.4.0 ``` To run the loader, mount your config file into the docker image, and then provide the file path on the command line. We recommend setting your Databricks credentials via environment variables, e.g. `DATABRICKS_CLIENT_SECRET`, so that you can refer to them in the config file. ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ --env DATABRICKS_CLIENT_ID="${DATABRICKS_CLIENT_ID}" \ --env DATABRICKS_CLIENT_SECRET="${DATABRICKS_CLIENT_SECRET}" \ snowplow/databricks-loader-kinesis:0.4.0 \ --config=/myconfig/loader.hocon ``` Where `loader.hocon` is the loader's [configuration file](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/#configuring-the-loader) and `iglu.hocon` is the [iglu resolver](/docs/api-reference/iglu/iglu-resolver/) configuration. **GCP:** There are two parts to how the Databricks Streaming Loader works. In the first part, you use Snowplow's Databricks Streaming Loader to push staging files into a [Unity Catalog volume](https://docs.databricks.com/aws/en/volumes/). ```mermaid flowchart LR stream[["Enriched Events (Pub/Sub stream)"]] loader{{"Databricks Streaming Loader"}} subgraph databricks [Databricks] volume[("Volume")] end stream-->loader-->|REST API|databricks ``` In the second part, you use a Databricks Lakeflow Declarative Pipeline to load the staging files into a Streaming Live Table. ```mermaid flowchart LR subgraph databricks [Databricks] direction LR volume[("Volume")] pipeline{{"Lakeflow Declarative Pipeline"}} table[("Streaming Live Table")] volume-->pipeline-->table end ``` The Databricks Streaming Loader is published as a Docker image which you can run on any GCP VM. ```bash docker pull snowplow/databricks-loader-pubsub:0.4.0 ``` To run the loader, mount your config file into the docker image, and then provide the file path on the command line. We recommend setting your Databricks credentials via environment variables, e.g. `DATABRICKS_CLIENT_SECRET`, so that you can refer to them in the config file. ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ --env DATABRICKS_CLIENT_ID="${DATABRICKS_CLIENT_ID}" \ --env DATABRICKS_CLIENT_SECRET="${DATABRICKS_CLIENT_SECRET}" \ snowplow/databricks-loader-pubsub:0.4.0 \ --config=/myconfig/loader.hocon ``` Where `loader.hocon` is the loader's [configuration file](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/#configuring-the-loader) and `iglu.hocon` is the [iglu resolver](/docs/api-reference/iglu/iglu-resolver/) configuration. **Azure:** There are two parts to how the Databricks Streaming Loader works. In the first part, you use Snowplow's Databricks Streaming Loader to push staging files into a [Unity Catalog volume](https://docs.databricks.com/aws/en/volumes/). ```mermaid flowchart LR stream[["Enriched Events (Kafka stream)"]] loader{{"Databricks Streaming Loader"}} subgraph databricks [Databricks] volume[("Volume")] end stream-->loader-->|REST API|databricks ``` In the second part, you use a Databricks Lakeflow Declarative Pipeline to load the staging files into a Streaming Live Table. ```mermaid flowchart LR subgraph databricks [Databricks] direction LR volume[("Volume")] pipeline{{"Lakeflow Declarative Pipeline"}} table[("Streaming Live Table")] volume-->pipeline-->table end ``` The Databricks Streaming Loader is published as a Docker image which you can run on any Azure VM. ```bash docker pull snowplow/databricks-loader-kafka:0.4.0 ``` To run the loader, mount your config file into the docker image, and then provide the file path on the command line. We recommend setting your Databricks credentials via environment variables, e.g. `DATABRICKS_CLIENT_SECRET`, so that you can refer to them in the config file. ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ --env DATABRICKS_CLIENT_ID="${DATABRICKS_CLIENT_ID}" \ --env DATABRICKS_CLIENT_SECRET="${DATABRICKS_CLIENT_SECRET}" \ snowplow/databricks-loader-kafka:0.4.0 \ --config=/myconfig/loader.hocon ``` Where `loader.hocon` is the loader's [configuration file](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/#configuring-the-loader) and `iglu.hocon` is the [iglu resolver](/docs/api-reference/iglu/iglu-resolver/) configuration. *** > **Tip:** For more information on how events are stored in Databricks, check the [mapping between Snowplow schemas and the corresponding Databricks column types](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/?warehouse=databricks). ## Configuring the loader The loader config file is in HOCON format, and it allows configuring many different properties of how the loader runs. The simplest possible config file just needs a description of your pipeline inputs and outputs: **AWS:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/databricks-loader/blob/develop/config/config.kinesis.minimal.hocon) **GCP:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/databricks-loader/blob/develop/config/config.pubsub.minimal.hocon) **Azure:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/databricks-loader/blob/develop/config/config.kafka.minimal.hocon) *** See the [configuration reference](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/configuration-reference/) for all possible configuration parameters. ### Iglu The Databricks Streaming Loader requires an [Iglu resolver file](/docs/api-reference/iglu/iglu-resolver/) which describes the Iglu repositories that host your schemas. This should be the same Iglu configuration file that you used in the Enrichment process. ## Configuring the Databricks Lakeflow Declarative Pipeline Create a Pipeline in your Databricks workspace, and copy the following SQL into the associated `.sql` file: ```sql CREATE STREAMING LIVE TABLE events CLUSTER BY (load_tstamp, event_name) TBLPROPERTIES ( 'delta.dataSkippingStatsColumns' = 'load_tstamp,collector_tstamp,derived_tstamp,dvce_created_tstamp,true_tstamp,event_name' ) AS SELECT *, current_timestamp() as load_tstamp FROM cloud_files( "/Volumes////events", "parquet", map( "cloudfiles.inferColumnTypes", "false", "cloudfiles.includeExistingFiles", "false", -- set to true to load files already present in the volume "cloudfiles.schemaEvolutionMode", "addNewColumns", "cloudfiles.partitionColumns", "", "cloudfiles.useManagedFileEvents", "true", "datetimeRebaseMode", "CORRECTED", "int96RebaseMode", "CORRECTED", "mergeSchema", "true" ) ) ``` Replace `/Volumes////events` with the correct path to your volume. Note that the volume must be an [external volume](https://docs.databricks.com/aws/en/volumes/) in to order to use the cloud files option `cloudfiles.useManagedFileEvents`, which is highly recommended for this integration. ## Metrics The Databricks Streaming Loader can be configured to send the following custom metrics to a [StatsD](https://www.datadoghq.com/statsd-monitoring/) receiver: | Metric | Definition | | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `events_good` | A count of events that are successfully written to the Databricks volume. | | `events_bad` | A count of failed events that could not be loaded, and were instead sent to the bad output stream. | | `latency_millis` | The time in milliseconds from when events are written to the source stream of events (i.e. by Enrich) until when they are read by the loader. | | `e2e_latency_millis` | The end-to-end latency of the snowplow pipeline. The time in milliseconds from when an event was received by the collector, until it is written to the Databricks volume. | See the `monitoring.metrics.statsd` options in the [configuration reference](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/configuration-reference/) for how to configure the StatsD receiver. **Telemetry notice** By default, Snowplow collects telemetry data for Databricks Streaming Loader (since version 0.1.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Elasticsearch Loader for Kinesis and NSQ > Load Snowplow enriched and bad events from Kinesis or NSQ streams into Elasticsearch or OpenSearch clusters. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/elasticsearch/ If you are using [Enrich](/docs/api-reference/enrichment-components/) to write enriched Snowplow events to one stream and bad events to another, you can use the Elasticsearch Loader to read events from either of those streams and write them to [Elasticsearch](http://www.elasticsearch.org/overview/). It works with either Kinesis or NSQ streams. > **Warning:** We only offer this loader on AWS or as part of [Snowplow Mini](/docs/api-reference/snowplow-mini/). ## What the data looks like There are a few changes compared to the [standard structure of Snowplow data](/docs/fundamentals/canonical-event/). ### Boolean fields reformatted All boolean fields like `br_features_java` are normally either `"0"` or `"1"`. In Elasticsearch, these values are converted to `false` and `true`. ### New `geo_location` field The `geo_latitude` and `geo_longitude` fields are combined into a single `geo_location` field of Elasticsearch's ["geo\_point" type](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html). ### Self-describing events Each [self-describing event](/docs/fundamentals/events/#self-describing-events) gets its own field (same [naming rules](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/?warehouse=snowflake#location) as for Snowflake). For example: ```json { "unstruct_com_snowplowanalytics_snowplow_link_click_1": { "targetUrl": "http://snowplow.io", "elementId": "action", "elementClasses": [], "elementTarget": "" } } ``` ### Entities Each [entity](/docs/fundamentals/entities/) type attached to the event gets its own field (same [naming rules](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/?warehouse=snowflake#location) as for Snowflake). The field contains an array with the data for all entities of the given type. For example: ```json { "contexts_com_acme_user_1": [ { "name": "Alice" } ], "contexts_com_acme_product_1": [ { "name": "Apple" }, { "name": "Orange" } ] } ``` ## Setup guide ### Configuring Elasticsearch #### Getting started First off, install and set up Elasticsearch. For more information, check out the [installation guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html) for installation information and [Supported versions of OpenSearch and Elasticsearch](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html#choosing-version) for the latest information of ElasticSearch/OpenSearch supported versions by AWS. > **Note:** We support ElasticSearch v6.x and v7.x. We also support OpenSearch v1.x and v2.x. We do not support ElasticSearch v8.x currently. #### Raising the file limit Elasticsearch keeps a lot of files open simultaneously so you will need to increase the maximum number of files a user can have open. To do this: ```bash sudo vim /etc/security/limits.conf ``` Append the following lines to the file: ```bash {{USERNAME}} soft nofile 32000 {{USERNAME}} hard nofile 32000 ``` Where `{{USERNAME}}` is the name of the user running Elasticsearch. You will need to logout and restart Elasticsearch before the new file limit takes effect. To check that this new limit has taken effect you can run the following command from the terminal: ```bash curl localhost:9200/_nodes/process?pretty ``` If the `max_file_descriptors` equals 32000 it is running with the new limit. #### Defining the mapping Use the following request to create the mapping with Elasticsearch 7+: ```bash curl -XPUT 'http://localhost:9200/snowplow' -d '{ "settings": { "analysis": { "analyzer": { "default": { "type": "keyword" } } } }, "mappings": { "properties": { "geo_location": { "type": "geo_point" } } } }' ``` Note that Elasticsearch 7+ [no longer uses mapping types](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html). If you have an older version, you might need to include mapping types in the above snippet. This initialization sets the default analyzer to "keyword". This means that string fields will not be split into separate tokens for the purposes of searching. This saves space and ensures that URL fields are handled correctly. If you want to tokenize specific string fields, you can change the "properties" field in the mapping like this: ```bash curl -XPUT 'http://localhost:9200/snowplow' -d '{ "settings": { "analysis": { "analyzer": { "default": { "type": "keyword" } } } }, "mappings": { "properties": { "geo_location": { "type": "geo_point" }, "field_to_tokenize": { "type": "string", "analyzer": "english" } } } }' ``` ### Installing the Elasticsearch Loader The Elasticsearch Loader is published on Docker Hub: ```bash docker pull snowplow/snowplow-elasticsearch-loader:2.1.3 ``` The container can be run with the following command: ```bash docker run \ -v /path/to/config.hocon:/snowplow/config.hocon \ snowplow/snowplow-elasticsearch-loader:2.1.3 \ --config /snowplow/config.hocon ``` Alternatively you can download and run a [jar file from the github release](https://github.com/snowplow/snowplow-elasticsearch-loader/releases): ```bash java -jar snowplow-elasticsearch-loader-2.1.3.jar --config /path/to/config.hocon ``` ### Using the Elasticsearch Loader #### Configuration The sink is configured using a HOCON file, for which you can find examples [here](https://github.com/snowplow/snowplow-elasticsearch-loader/tree/master/config). These are the fields: | Name | Description | | ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | purpose | Required. "ENRICHED\_EVENTS" for a stream of successfully enriched events "BAD\_ROWS" for a stream of bad events "JSON" for writing plain json | | input.type | Required. Configures where input events will be read from. Can be “kinesis”, “stdin” or “nsq” | | input.streamName | Required when `input.type` is kinesis or nsq. Name of the stream to read from. | | input.initialPosition | Required when `input.type` is kinesis. Used when `input.type` is Kinesis. Specifies where to start reading from the stream the first time the app is run. "TRIM\_HORIZON" for as far back as possible, "LATEST" for as recent as possibly, "AT\_TIMESTAMP" for after specified timestamp. | | input.initialTimestamp | Used when `input.type` is kinesis. Required when `input.initialTimestamp` is "AT\_TIMESTAMP". Specifies the timestamp to start read. | | input.maxRecords | Used when `input.type` is kinesis. Optional. Maximum number of records fetched in a single request. Default value 10000. | | input.region | Used when `input.type` is kinesis. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). Region where the Kinesis stream is located. | | input.customEndpoint | Used when `input.type` is kinesis. Optional. Custom endpoint to override AWS Kinesis endpoints, this can be used to specify local endpoints when using localstack. | | input.dynamodbCustomEndpoint | Used when `input.type` is kinesis. Optional. Custom endpoint to override AWS DynamoDB endpoints for Kinesis checkpoints lease table, this can be used to specify local endpoints when using Localstack. | | input.appName | Used when `input.type` is kinesis. Optional. Used by a DynamoDB table to maintain stream state. Default value "snowplow-elasticsearch-loader". | | input.buffer.byteLimit | Used when `input.type` is kinesis. Optional. The limit of the buffer in terms of bytes. When this value is exceeded, events will be sent to Elasticsearch. Default value 1000000. | | input.buffer.recordLimit | Used when `input.type` is kinesis. Optional. The limit of the buffer in terms of record count. When this value is exceeded, events will be sent to Elasticsearch. Default value 500. | | input.buffer.timeLimit | Used when `input.type` is kinesis. Optional. The time limit in milliseconds to wait to send the buffer to Elasticsearch. Default value 500. | | input.channelName | Required when `input.type` is nsq. Channel name for NSQ source stream. If more than one application reading from the same NSQ topic at the same time, all of them must have unique channel name to be able to get all the data from the same topic. | | input.nsqlookupdHost | Required when `input.type` is nsq. Host name for nsqlookupd | | input.nsqlookupdPort | Required when `input.type` is nsq. HTTP port for nsqd. | | output.good.type | Required. Configure where to write good events. Can be "elasticsearch" or "stdout". | | output.good.client.endpoint | Required. The Elasticsearch cluster endpoint. | | output.good.client.port | Optional. The port the Elasticsearch cluster can be accessed on. Default value 9200. | | output.good.client.username | Optional. HTTP Basic Auth username. Can be removed if not active. | | output.good.client.password | Optional. HTTP Basic Auth password. Can be removed if not active. | | output.good.client.shardDateFormat | Optional. Formatting used for sharding good stream, i.e. \_yyyy-MM-dd. Can be removed if not needed. | | output.good.client.shardDateField | Optional. Timestamp field for sharding good stream. If not specified derived\_tstamp is used. | | output.good.client.maxRetries | Optional. The maximum number of request attempts before giving up. Default value 6. | | output.good.client.ssl | Optional. Whether to use ssl or not. Default value false. | | output.good.aws.signing | Optional. Whether to activate AWS signing or not. It should be activated if AWS OpenSearch service is used. Default value false. | | output.good.aws.region | Optional. Region where the AWS OpenSearch service is located. | | output.good.cluster.index | Required. The Elasticsearch index name. | | output.good.cluster.documentType | Optional. The Elasticsearch index type. Index types are deprecated in ES >=7.x Therefore, it shouldn't be set with ES >=7.x | | output.good.chunk.byteLimit | Optional. Bulk request to Elasticsearch will be splitted to chunks according given byte limit. Default value 1000000. | | output.good.chunk.recordLimit | Optional. Bulk request to Elasticsearch will be splitted to chunks according given record limit. Default value 500. | | output.bad.type | Required. Configure where to write failed events. Can be "kinesis", "nsq", "stderr" or "none". | | output.bad.streamName | Required. Stream name for events which are rejected by Elasticsearch. | | output.bad.region | Used when `output.bad.type` is kinesis. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). Region where the bad Kinesis stream is located. | | output.bad.customEndpoint | Used when `output.bad.type` is kinesis. Optional. Custom endpoint to override AWS Kinesis endpoints, this can be used to specify local endpoints when using localstack. | | output.bad.nsqdHost | Required when `output.bad.type` is nsq. Host name for nsqd. | | output.bad.nsqdPort | Required when `output.bad.type` is nsq. HTTP port for nsqd. | | monitoring.snowplow\.collector | Optional. Snowplow collector URI for monitoring. Can be removed together with monitoring section. | | monitoring.snowplow\.appId | Optional. The app id used in decorating the events sent for monitoring. Can be removed together with monitoring section. | | monitoring.metrics.cloudWatch | Optional. Whether to enable Cloudwatch metrics or not. Default value true. | #### Document count To check the number of documents in an Elasticsearch or OpenSearch cluster, use the [Count API](https://docs.opensearch.org/latest/api-reference/search-apis/count/) provided by Elasticsearch/OpenSearch. For example, to get the total number of documents in the cluster, use `GET _count`. --- # Google Cloud Storage Loader > Archive Snowplow events from Pub/Sub to Google Cloud Storage buckets using Dataflow with windowing and partitioning. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/google-cloud-storage-loader/ [Cloud Storage Loader](https://github.com/snowplow-incubator/snowplow-google-cloud-storage-loader/) is a [Dataflow](https://cloud.google.com/dataflow/) job which dumps event from an input [PubSub](https://cloud.google.com/pubsub/) subscription into a [Cloud Storage](https://cloud.google.com/storage/) bucket. Cloud Storage loader is built on top of [Apache Beam](https://beam.apache.org/) and its Scala wrapper [SCIO](https://github.com/spotify/scio). ## Running Cloud Storage Loader comes both as a Docker image and a ZIP archive. ### Docker Docker image can be found on [Docker Hub](https://hub.docker.com/r/snowplow/snowplow-google-cloud-storage-loader). A container can be run as follows: ```bash docker run \ -v $PWD/config:/snowplow/config \ # if running outside GCP -e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \ # if running outside GCP snowplow/snowplow-google-cloud-storage-loader:0.5.6 \ --runner=DataFlowRunner \ --jobName=[JOB-NAME] \ --project=[PROJECT] \ --streaming=true \ --workerZone=[ZONE] \ --inputSubscription=projects/[PROJECT]/subscriptions/[SUBSCRIPTION] \ --outputDirectory=gs://[BUCKET]/YYYY/MM/dd/HH/ \ # partitions by date --outputFilenamePrefix=output \ # optional --shardTemplate=-W-P-SSSSS-of-NNNNN \ # optional --outputFilenameSuffix=.txt \ # optional --windowDuration=5 \ # optional, in minutes --compression=none \ # optional, gzip, bz2 or none --numShards=1 # optional ``` To display the help message: ```bash docker run snowplow/snowplow-google-cloud-storage-loader:0.5.6 \ --help ``` To display documentation about Cloud Storage Loader-specific options: ```bash docker run snowplow/snowplow-google-cloud-storage-loader:0.5.6 \ --help=com.snowplowanalytics.storage.googlecloudstorage.loader.Options ``` ### ZIP archive Archive is hosted on GitHub at this URI: ```bash https://github.com/snowplow-incubator/snowplow-google-cloud-storage-loader/releases/download/0.5.6/snowplow-google-cloud-storage-loader-0.5.6.zip ``` Once unzipped the artifact can be run as follows: ```bash ./bin/snowplow-google-cloud-storage-loader \ --runner=DataFlowRunner \ --project=[PROJECT] \ --streaming=true \ --workerZone=[ZONE] \ --inputSubscription=projects/[PROJECT]/subscriptions/[SUBSCRIPTION] \ --outputDirectory=gs://[BUCKET]/YYYY/MM/dd/HH/ \ # partitions by date --outputFilenamePrefix=output \ # optional --shardTemplate=-W-P-SSSSS-of-NNNNN \ # optional --outputFilenameSuffix=.txt \ # optional --windowDuration=5 \ # optional, in minutes --compression=none \ # optional, gzip, bz2 or none --numShards=1 # optional ``` To display the help message: ```bash ./bin/snowplow-google-cloud-storage-loader --help ``` To display documentation about Cloud Storage Loader-specific options: ```bash ./bin/snowplow-google-cloud-storage-loader --help=com.snowplowanalytics.storage.googlecloudstorage.loader.Options ``` ## Configuration ### Cloud Storage Loader specific options - `--inputSubscription=String` The Cloud Pub/Sub subscription to read from, formatted like projects/\[PROJECT]/subscriptions/\[SUB]. Required. - `--outputDirectory=gs://[BUCKET]/` The Cloud Storage directory to output files to, ending in /. Required. - `--outputFilenamePrefix=String` The prefix for output files. Default: output. Optional. - `--shardTemplate=String` A valid shard template as described [here](https://javadoc.io/static/com.google.cloud.dataflow/google-cloud-dataflow-java-sdk-all/1.7.0/com/google/cloud/dataflow/sdk/io/ShardNameTemplate.html), which will be part of the filenames. Default: `-W-P-SSSSS-of-NNNNN`. Optional. - `--outputFilenameSuffix=String` The suffix for output files. Default: .txt. Optional. - `--windowDuration=Int` The window duration in minutes. Default: 5. Optional. - `--compression=String` The compression used (gzip, bz2 or none). Note that bz2 can't be loaded into BigQuery. Default: no compression. Optional. - `--numShards=Int` The maximum number of output shards produced when writing. Default: 1. Optional. - `--dateFormat=YYYY/MM/dd/HH/` A date format string used for partitioning via date in `outputDirectory` and `partitionedOutputDirectory`. Default: `YYYY/MM/dd/HH/`. Optional. For example, the date format `YYYY/MM/dd/HH/` would produce a directory structure like this: ```bash gs://bucket/ └── 2022 └── 12 └── 15 ├── ... ├── 18 ├── 19 ├── 20 └── ... ``` - `--partitionedOutputDirectory=gs://[BUCKET]/` The Cloud Storage directory to output files to, partitioned by schema, ending with /. Unpartitioned data will be sent to `outputDirectory`. Optional. ### Dataflow options To run the Cloud Storage Loader on Dataflow, it is also necessary to specify additional configuration options. None of these options have default values, and they are all required. - `--runner=DataFlowRunner` Passing the string `DataFlowRunner` specifies that we want to run on Dataflow. - `--jobName=[NAME]` Specify a name for your Dataflow job that will be created. - `--project=[PROJECT]` The name of your GCP project. - `--streaming=true` Pass `true` to notify Dataflow that we're running a streaming application. - `--workerZone=[ZONE]` The [zone](https://cloud.google.com/compute/docs/regions-zones) where the Dataflow nodes (effectively [GCP Compute Engine](https://cloud.google.com/compute/) nodes) will be launched. - `--region=[REGION]` The [region](https://cloud.google.com/compute/docs/regions-zones) where the Dataflow job will be launched. - `--gcpTempLocation=gs://[BUCKET]/` The GCS bucket where temporary files necessary to run the job (e.g. JARs) will be stored. The list of all the options can be found at . --- # Warehouse and lake data loaders > Load Snowplow enriched events into data warehouses and lakes including BigQuery, Redshift, Snowflake, Databricks, and S3. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/ Snowplow provides loader applications for several different warehouse and lakes, across different clouds. Choose a loader based on your use case and data needs. --- # Lake Loader configuration reference > Configure Lake Loader for Delta Lake and Iceberg tables with Kinesis, Pub/Sub, and Kafka stream settings for data lakes. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/lake-loader/configuration-reference/ The configuration reference in this page is written for Lake Loader `0.9.1` ### License The Lake Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.1/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run the loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, configure the `license.accept` option in the config file: ```json "license": { "accept": true } ``` ### Table configuration **Delta Lake:** | Parameter | Description | | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `output.good.location` | Required, e.g. `gs://mybucket/events`. URI of the bucket location to which to write Snowplow enriched events in Delta format. The URI should start with the following prefix:* `s3a://` on AWS * `gs://` on GCP * `abfs://` on Azure | | `output.good.deltaTableProperties.*` | Optional. A map of key/value strings corresponding to Delta's table properties. These can be anything [from the Delta table properties documentation](https://docs.delta.io/latest/table-properties.html). The default properties include configuring Delta's [data skipping feature](https://docs.delta.io/latest/optimizations-oss.html#data-skipping) for the important Snowplow timestamp columns: `load_tstamp`, `collector_tstamp`, `derived_tstamp`, `dvce_created_tstamp`. | **Iceberg / Glue:** | Parameter | Description | | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `output.good.type` | Required, set this to `Iceberg` | | `output.good.catalog.type` | Required, set this to `Glue` | | `output.good.location` | Optional, e.g. `s3a://mybucket/events`. URI of the bucket location to which to write Snowplow enriched events in Iceberg format. The URI should start with `s3a://`. If not provided, the catalog's default warehouse location will be used. | | `output.good.database` | Required. Name of the database in the Glue catalog | | `output.good.table` | Required. The name of the table in the Glue database | | `output.good.icebergTableProperties.*` | Optional. A map of key/value strings corresponding to Iceberg's table properties. These can be anything [from the Iceberg table properties documentation](https://iceberg.apache.org/docs/latest/configuration/). The default properties include configuring Iceberg's column-level statistics for the important Snowplow timestamp columns: `load_tstamp`, `collector_tstamp`, `derived_tstamp`, `dvce_created_tstamp`. | | `output.good.catalog.options.*` | Optional. A map of key/value strings which are passed to the catalog configuration. These can be anything [from the Iceberg catalog documentation](https://iceberg.apache.org/docs/latest/aws/) e.g. `"glue.id": "1234567"` | **Iceberg / REST:** > **Note:** The REST catalog integration has been tested with Snowflake Open Catalog. | Parameter | Description | | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `output.good.type` | Required, set this to `Iceberg` | | `output.good.catalog.type` | Required, set this to `Rest` | | `output.good.catalog.uri` | Required. URI of the REST catalog server, e.g. `http://localhost:8080` | | `output.good.catalog.name` | Required. Name of the catalog | | `output.good.location` | Optional. URI of the bucket location to which to write Snowplow enriched events in Iceberg format. The URI should start with `s3a://`, `gs://`, or `abfs://` depending on your cloud provider. If not provided, the catalog's default warehouse location will be used. | | `output.good.database` | Required. Name of the database in the catalog | | `output.good.table` | Required. The name of the table in the database | | `output.good.icebergTableProperties.*` | Optional. A map of key/value strings corresponding to Iceberg's table properties. These can be anything [from the Iceberg table properties documentation](https://iceberg.apache.org/docs/latest/configuration/). The default properties include configuring Iceberg's column-level statistics for the important Snowplow timestamp columns: `load_tstamp`, `collector_tstamp`, `derived_tstamp`, `dvce_created_tstamp`. | | `output.good.catalog.options.*` | Optional. A map of key/value strings which are passed to the catalog configuration. These can be anything [from the Iceberg REST catalog documentation](https://iceberg.apache.org/docs/latest/rest-catalog/), such as authentication credentials. | *** ### Streams configuration **AWS:** | Parameter | Description | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.streamName` | Required. Name of the Kinesis stream with the enriched events | | `input.appName` | Optional, default `snowplow-lake-loader`. Name to use for the dynamodb table, used by the underlying Kinesis Consumer Library for managing leases. | | `input.initialPosition.type` | Optional, default `LATEST`. Allowed values are `LATEST`, `TRIM_HORIZON`, `AT_TIMESTAMP`. When the loader is deployed for the first time, this controls from where in the kinesis stream it should start consuming events. On all subsequent deployments of the loader, the loader will resume from the offsets stored in the DynamoDB table. | | `input.initialPosition.timestamp` | Required if `input.initialPosition` is `AT_TIMESTAMP`. A timestamp in ISO8601 format from where the loader should start consuming events. | | `input.retrievalMode` | Optional, default Polling. Change to FanOut to enable the enhance fan-out feature of Kinesis. | | `input.retrievalMode.maxRecords` | Optional. Default value 1000. How many events the Kinesis client may fetch in a single poll. Only used when `input.retrievalMode` is Polling. | | `input.workerIdentifier` | Optional. Defaults to the `HOSTNAME` environment variable. The name of this KCL worker used in the dynamodb lease table. | | `input.leaseDuration` | Optional. Default value `10 seconds`. The duration of shard leases. KCL workers must periodically refresh leases in the dynamodb table before this duration expires. | | `input.maxLeasesToStealAtOneTimeFactor` | Optional. Default value `2.0`. Controls how to pick the max number of shard leases to steal at one time. E.g. If there are 4 available processors, and `maxLeasesToStealAtOneTimeFactor = 2.0`, then allow the loader to steal up to 8 leases. Allows bigger instances to more quickly acquire the shard-leases they need to combat latency. | | `input.checkpointThrottledBackoffPolicy.minBackoff` | Optional. Default value `100 milliseconds`. Initial backoff used to retry checkpointing if we exceed the DynamoDB provisioned write limits. | | `input.checkpointThrottledBackoffPolicy.maxBackoff` | Optional. Default value `1 second`. Maximum backoff used to retry checkpointing if we exceed the DynamoDB provisioned write limits. | | `input.maxRetries` (since 0.9.0) | Optional. Default value 10. Maximum number of retries for AWS SDK operations when reading from Kinesis. | | `output.bad.streamName` | Required. Name of the Kinesis stream that will receive failed events. | | `output.bad.throttledBackoffPolicy.minBackoff` | Optional. Default value `100 milliseconds`. Initial backoff used to retry sending failed events if we exceed the Kinesis write throughput limits. | | `output.bad.throttledBackoffPolicy.maxBackoff` | Optional. Default value `1 second`. Maximum backoff used to retry sending failed events if we exceed the Kinesis write throughput limits. | | `output.bad.recordLimit` | Optional. Default value 500. The maximum number of records we are allowed to send to Kinesis in 1 PutRecords request. | | `output.bad.byteLimit` | Optional. Default value 5242880. The maximum number of bytes we are allowed to send to Kinesis in 1 PutRecords request. | | `output.bad.maxRecordSize` | Optional. Default value 1000000. Any single event failed event sent to Kinesis should not exceed this size in bytes | | `output.bad.maxRetries` (since 0.8.0) | Optional. Default value 10. Maximum number of retries by Kinesis Client. | **GCP:** | Parameter | Description | | ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.subscription` | Required, e.g. `projects/myproject/subscriptions/snowplow-enriched`. Name of the Pub/Sub subscription with the enriched events | | `input.parallelPullFactor` | Optional. Default value 0.5. `parallelPullFactor * cpu count` will determine the number of threads used internally by the Pub/Sub client library for fetching events | | `input.durationPerAckExtension` | Optional. Default value `600 seconds`. Pub/Sub ack deadlines are extended for this duration when needed. A sensible value is double the size of the `windowing` config parameter, but no higher than 10 minutes. | | `input.minRemainingAckDeadline` | Optional. Default value `0.1`. Controls when ack deadlines are re-extended, for a message that is close to exceeding its ack deadline. For example, if `durationPerAckExtension` is `600 seconds` and `minRemainingAckDeadline` is `0.1` then the loader will wait until there is `60 seconds` left of the remining deadline, before re-extending the message deadline. | | `input.maxMessagesPerPull` | Optional. Default value 1000. How many Pub/Sub messages to pull from the server in a single request. | | `input.retries.transientErrors.delay` (since 0.8.0) | Optional. Default value `100 millis`. Backoff delay for follow-up attempts | | `input.retries.transientErrors.attempts` (since 0.8.0) | Optional. Default value `10`. Max number of attempts, after which Loader will crash and exit | | `output.bad.topic` | Required, e.g. `projects/myproject/topics/snowplow-bad`. Name of the Pub/Sub topic that will receive failed events. | | `output.bad.batchSize` | Optional. Default value 1000. Bad events are sent to Pub/Sub in batches not exceeding this count. | | `output.bad.requestByteThreshold` | Optional. Default value 1000000. Bad events are sent to Pub/Sub in batches with a total size not exceeding this byte threshold | | `output.bad.maxRecordSize` | Optional. Default value 10000000. Any single failed event sent to Pub/Sub should not exceed this size in bytes | | `output.bad.retries.transientErrors.delay` (since 0.8.0) | Optional. Default value `100 millis`. Backoff delay for follow-up attempts | | `output.bad.retries.transientErrors.attempts` (since 0.8.0) | Optional. Default value `10`. Max number of attempts, after which Loader will crash and exit | **Azure:** | Parameter | Description | | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.topicName` | Required. Name of the Kafka topic for the source of enriched events. | | `input.bootstrapServers` | Required. Hostname and port of Kafka bootstrap servers hosting the source of enriched events. | | `input.consumerConf.*` | Optional. A map of key/value pairs for [any standard Kafka consumer configuration option](https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html). | | `input.commitTimeout` (since 0.8.0) | Optional. Default value 15 seconds. The time to wait for offset commits to complete. If an offset commit doesn't complete within this time, a CommitTimeoutException will be raised instead. | | `output.bad.topicName` | Required. Name of the Kafka topic that will receive failed events. | | `output.bad.bootstrapServers` | Required. Hostname and port of Kafka bootstrap servers hosting the bad topic | | `output.bad.producerConf.*` | Optional. A map of key/value pairs for [any standard Kafka producer configuration option](https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html). | | `output.bad.maxRecordSize` | Optional. Default value 1000000. Any single failed event sent to Kafka should not exceed this size in bytes | > **Info:** You can use the `input.consumerConf` and `output.bad.producerConf` options to configure authentication to Azure event hubs using SASL. For example: > > ```json > "input.consumerConf": { > "security.protocol": "SASL_SSL" > "sasl.mechanism": "PLAIN" > "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\$ConnectionString\" password=;" > } > ``` *** ## Other configuration options | Parameter | Description | | ----------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `windowing` | Optional. Default value `5 minutes`. Controls how often the loader writes/commits pending events to the lake. | | `exitOnMissingIgluSchema` | Optional. Default value `true`. Whether the loader should crash and exit if it fails to resolve an Iglu Schema. We recommend `true` because Snowplow enriched events have already passed validation, so a missing schema normally indicates an error that needs addressing. Change to `false` so events go the failed events stream instead of crashing the loader. | | `respectIgluNullability` | Optional. Default value `true`. Whether the output parquet files should declare nested fields as non-nullable according to the Iglu schema. When `true`, nested fields are nullable only if they are not required fields according to the Iglu schema. When `false`, all nested fields are defined as nullable in the output table's schemas. Set this to `false` if you use a query engine that dislikes non-nullable nested fields of a nullable struct. | | `skipSchemas` | Optional, e.g. `["iglu:com.example/skipped1/jsonschema/1-0-0"]` or with wildcards `["iglu:com.example/skipped2/jsonschema/1--"]`. A list of schemas that won't be loaded to the lake. This feature could be helpful when recovering from edge-case schemas which for some reason cannot be loaded. | | `spark.conf.*` | Optional. A map of key/value strings which are passed to the internal spark context. | | `spark.taskRetries` | Optional. Default value 3. How many times the internal spark context should be retry a task in case of failure | | `retries.setupErrors.delay` | Optional. Default value `30 seconds`. Configures exponential backoff on errors related to how the lake is set up for this loader. Examples include authentication errors and permissions errors. This class of errors are reported periodically to the monitoring webhook. | | `retries.transientErrors.delay` | Optional. Default value `1 second`. Configures exponential backoff on errors that are likely to be transient. Examples include server errors and network errors. | | `retries.transientErrors.attempts` | Optional. Default value 5. Maximum number of attempts to make before giving up on a transient error. | | `monitoring.metrics.statsd.hostname` | Optional. If set, the loader sends statsd metrics over UDP to a server on this host name. | | `monitoring.metrics.statsd.port` | Optional. Default value 8125. If the statsd server is configured, this UDP port is used for sending metrics. | | `monitoring.metrics.statsd.tags.*` | Optional. A map of key/value pairs to be sent along with the statsd metric. | | `monitoring.metrics.statsd.period` | Optional. Default `1 minute`. How often to report metrics to statsd. | | `monitoring.metrics.statsd.prefix` | Optional. Default `snowplow.lakeloader`. Prefix used for the metric name when sending to statsd. | | `monitoring.webhook.endpoint` | Optional, e.g. `https://webhook.example.com`. The loader will send to the webhook a payload containing details of any error related to how Snowflake is set up for this loader. | | `monitoring.webhook.tags.*` | Optional. A map of key/value strings to be included in the payload content sent to the webhook. | | `monitoring.webhook.heartbeat.*` | Optional. Default value `5.minutes`. How often to send a heartbeat event to the webhook when healthy. | | `monitoring.healthProbe.port` | Optional. Default value `8000`. Open a HTTP server that returns OK only if the app is healthy. | | `monitoring.healthProbe.unhealthyLatency` | Optional. Default value `15 minutes`. Health probe becomes unhealthy if any received event is still not fully processed before this cutoff time. | | `monitoring.sentry.dsn` | Optional. Set to a Sentry URI to report unexpected runtime exceptions. | | `monitoring.sentry.tags.*` | Optional. A map of key/value strings which are passed as tags when reporting exceptions to Sentry. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `inMemBatchBytes` | Optional. Default value 50000000. Controls how many events are buffered in memory before saving the batch to local disk. The default value works well for reasonably sized VMs. For smaller VMs (e.g. less than 2 cpu core, 8 GG memory) consider decreasing this value. | | `cpuParallelismFraction` | Optional. Default value 0.75. Controls how the app splits the workload into concurrent batches which can be run in parallel. E.g. If there are 4 available processors, and cpuParallelismFraction = 0.75, then we process 3 batches concurrently. The default value works well for most workloads. | | `numEagerWindows` | Optional. Default value 1. Controls how eagerly the loader starts processing the next timed window even when the previous timed window is still finalizing (committing into the lake). By default, we start processing a timed windows if the previous 1 window is still finalizing, but we do not start processing a timed window if any more older windows are still finalizing. The default value works well for most workloads. | | `http.client.maxConnectionsPerServer` | Optional. Default value 4. Configures the internal HTTP client used for Iglu resolver, alerts and telemetry. The maximum number of open HTTP requests to any single server at any one time. For Iglu Server in particular, this avoids overwhelming the server with multiple concurrent requests. | --- # Open Table Format Lake Loader > Load Snowplow events to data lakes using Delta or Iceberg table formats on S3, GCS, or Azure ADLS Gen2. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/lake-loader/ The Lake Loader is an application that loads Snowplow events to a cloud storage bucket using Open Table Formats. > **Info:** The Lake Loader supports the two major Open Table Formats: [Delta](https://delta.io/) and [Iceberg](https://iceberg.apache.org/). > > For Iceberg tables, the loader supports [AWS Glue](https://docs.aws.amazon.com/glue/) and [Iceberg REST](https://iceberg.apache.org/docs/latest/rest-catalog/) as catalogs. The REST catalog integration has been tested with Snowflake Open Catalog. **AWS:** On AWS the Lake Loader continually pulls events from Kinesis and writes to S3. ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] loader{{"Lake Loader"}} subgraph bucket ["S3"] table[("Events table")] end stream-->loader-->bucket ``` The Lake Loader is published as a Docker image which you can run on any AWS VM. You do not need a Spark cluster to run this loader. ```bash docker pull snowplow/lake-loader-aws:0.9.1 ``` To run the loader, mount your config files into the docker image, and then provide the file paths on the command line: ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ snowplow/lake-loader-aws:0.9.1 \ --config=/myconfig/loader.hocon \ --iglu-config=/myconfig/iglu.hocon ``` For some output formats, you need to pull a slightly different tag to get a compatible docker image. The [configuration reference](/docs/api-reference/loaders-storage-targets/lake-loader/configuration-reference/) page explains when this is needed. **GCP:** On GCP the Lake Loader continually pulls events from Pub/Sub and writes to GCS. ```mermaid flowchart LR stream[["Enriched Events (Pub/Sub stream)"]] loader{{"Lake Loader"}} subgraph bucket ["GCS"] table[("Events table")] end stream-->loader-->bucket ``` The Lake Loader is published as a Docker image which you can run on any GCP VM. You do not need a Spark cluster to run this loader. ```bash docker pull snowplow/lake-loader-gcp:0.9.1 ``` To run the loader, mount your config files into the docker image, and then provide the file paths on the command line: ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ snowplow/lake-loader-gcp:0.9.1 \ --config=/myconfig/loader.hocon \ --iglu-config=/myconfig/iglu.hocon ``` For some output formats, you need to pull a slightly different tag to get a compatible docker image. The [configuration reference](/docs/api-reference/loaders-storage-targets/lake-loader/configuration-reference/) page explains when this is needed. **Azure:** On Azure the Lake Loader continually pulls events from Kafka and writes to ADLS Gen 2. ```mermaid flowchart LR stream[["Enriched Events (Kafka stream)"]] loader{{"Lake Loader"}} subgraph bucket ["ADLS Gen 2"] table[("Events table")] end stream-->loader-->bucket ``` The Lake Loader is published as a Docker image which you can run on any Azure VM. You do not need a Spark cluster to run this loader. ```bash docker pull snowplow/lake-loader-azure:0.9.1 ``` To run the loader, mount your config files into the docker image, and then provide the file paths on the command line: ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ snowplow/lake-loader-azure:0.9.1 \ --config=/myconfig/loader.hocon \ --iglu-config=/myconfig/iglu.hocon ``` For some output formats, you need to pull a slightly different tag to get a compatible docker image. The [configuration reference](/docs/api-reference/loaders-storage-targets/lake-loader/configuration-reference/) page explains when this is needed. *** ## Configuring the loader The loader config file is in HOCON format, and it allows configuring many different properties of how the loader runs. The simplest possible config file just needs a description of your pipeline inputs and outputs: **AWS:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/snowplow-lake-loader/blob/main/config/config.aws.minimal.hocon) **GCP:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/snowplow-lake-loader/blob/main/config/config.gcp.minimal.hocon) **Azure:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/snowplow-lake-loader/blob/main/config/config.azure.minimal.hocon) *** See the [configuration reference](/docs/api-reference/loaders-storage-targets/lake-loader/configuration-reference/) for all possible configuration parameters. ### Windowing "Windowing" is an important config setting, which controls how often the Lake Loader commits a batch of events to the data lake. If you adjust this config setting, you should be aware that data lake queries are most efficient when the size of the parquet files in the lake are relatively large. - If you set this to a **low** value, the loader will write events to the lake more frequently, reducing latency. However, the output parquet files will be smaller, which will make querying the data less efficient. - Conversely, if you set this to a **high** value, the loader will generate bigger output parquet files, which are efficient for queries — at the cost of events arriving to the lake with more delay. The default setting is `5 minutes`. For moderate to high volumes, this value strikes a nice balance between the need for large output parquet files and the need for reasonably low latency data. ```text { "windowing": "5 minutes" } ``` If you tune this setting correctly, then your lake can support efficient analytic queries without the need to run an `OPTIMIZE` job on the files. ### Iglu The Lake Loader requires an [Iglu resolver file](/docs/api-reference/iglu/iglu-resolver/) which describes the Iglu repositories that host your schemas. This should be the same Iglu configuration file that you used in the Enrichment process. ### Metrics The Lake Loader can be configured to send the following custom metrics to a [StatsD](https://www.datadoghq.com/statsd-monitoring/) receiver: | Metric | Definition | | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `events_committed` | A count of events that are successfully written and committed to the lake. Because the loader works in timed windows of several minutes, this metric has a "spiky" value, which is often zero and then periodically spikes up to larger values. | | `events_received` | A count of events received by the loader. Unlike `events_committed` this is a smooth varying metric, because the loader is constantly receiving events throughout a timed window. | | `events_bad` | A count of failed events that could not be loaded, and were instead sent to the bad output stream. | | `latency_millis` | The time in milliseconds from when events are written to the source stream of events (i.e. by Enrich) until when they are read by the loader. | | `processing_latency_millis` | For each window of events, the time in milliseconds from when the first event is read from the stream, until all events are written and committed to the lake. | | `e2e_latency_millis` | The end-to-end latency of the snowplow pipeline. For each window of events, the time in milliseconds from when the first event was received by the collector, until all events are written and committed to the lake. | | `table_data_files_total` | The total number of data files in the table after a commit. This metric helps monitor table growth and the effectiveness of file compaction strategies. | | `table_snapshots_retained` | The number of snapshots retained in the table metadata. This metric helps monitor snapshot accumulation and the effectiveness of snapshot expiration policies. | See the `monitoring.metrics.statsd` options in the [configuration reference](/docs/api-reference/loaders-storage-targets/lake-loader/configuration-reference/) for how to configure the StatsD receiver. **Telemetry notice** By default, Snowplow collects telemetry data for Lake Loader (since version 0.1.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Lake maintenance jobs > Maintain Delta Lake and Iceberg tables with OPTIMIZE operations and snapshot expiration to manage file size and storage costs. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/lake-loader/maintenance/ **Delta Lake:** The [Delta documentation](https://docs.delta.io/latest/best-practices.html#-delta-compact-files) makes recommendations for running regular maintenance jobs to get the best performance from your lake. This guide expands on those recommendations specifically for your Snowplow events lake. The Snowplow Lake Loader **does not** automatically run the maintenance tasks described below. ## Compact data files We recommend that you schedule a compaction job to run once per day. Your daily compaction job should operate on files that were loaded in the previous calendar day, i.e. the last completed timestamp partition. For example, if you run the job via SQL, then use a `WHERE` clause on `load_tstamp_date < current_date()`: ```sql OPTIMIZE WHERE load_tstamp_date < current_date() ``` Data compaction has two benefits for Snowplow data: 1. Queries are more efficient when the underlying parquet files are large. After you compact your files, you will benefit from this whenever you run queries over your historic data, i.e. not just the most recently loaded events. 2. When there are fewer data files, the size of the table's delta log files is smaller. This reduces the overhead of creating a new delta log file, and thus improves the performance of the Lake Loader when committing new events into the lake. This becomes especially important as your lake grows in size over time. ## Vacuum data files We recommend that you schedule a Vacuum job to run once per week. A vacuum is needed to clean up unreferenced data files that were logically deleted by the daily compaction jobs. ```sql VACUUM ; ``` Unreferenced data files do not negatively impact query performance or write performance. But they do contribute to storage costs. Vacuum jobs need to list every file in the lake directory. In large lakes, there might be a lot of files, requiring the job to use a lot of Spark compute resources. This is why we recommend to run it infrequently to offset this impact. **Iceberg:** > **Note:** We support different catalog options for Iceberg lakes. The instructions below are not necessary when using Snowflake Open Catalog. The [Iceberg documentation](https://iceberg.apache.org/docs/latest/maintenance/) makes recommendations for running regular maintenance jobs to get the best performance from your lake. This guide expands on those recommendations specifically for your Snowplow events lake. The Snowplow Lake Loader **does not** automatically run the maintenance tasks described below. ## Expire snapshots We recommend that you schedule the `expireSnapshots` Iceberg action to run once per day. The Snowplow Lake Loader is a continuously-running streaming loader, so it creates new snapshots very frequently, each time it commits more events into the lake. So it is especially important to manage the total number of snapshots held by the Iceberg metadata. There are two benefits of expiring snapshots in your Snowplow lake: 1. The snapshot metadata files can be much smaller, because the list of metadata files to track is much smaller. This reduces the overhead of creating a new snapshot file, and thus improves the performance of the Lake Loader when committing new events into the lake. This becomes especially important as your lake grows in size over time. 2. If you regularly run compaction jobs (see below) then you will amass lots of small parquet files, which have since been rewritten into larger parquet files. By expiring snapshots, you will delete the redundant small data files, which will save you some storage cost. For example, if you run the action via a Spark SQL procedure: ```sql CALL catalog_name.system.expire_snapshots( table => 'snowplow.events', stream_results => true ) ``` ## Compact data files We recommend that you schedule the `rewriteDataFiles` Iceberg action to run once per day. Your daily compaction job should operate on files that were loaded in the previous calendar day, i.e. the last completed timestamp partition. For example, if you run the action via a Spark SQL procedure, then use a `where` clause on `load_tstamp < current_date()`: ```sql CALL catalog_name.system.rewrite_data_files( table => 'snowplow.events', where => 'load_tstamp < current_date()' ); ``` The `rewriteDataFiles` action has two benefits for Snowplow data: 1. Queries are more efficient when the underlying parquet files are large. After you compact your files, you will benefit from this whenever you run queries over your historic data, i.e. not just the most recently loaded events. 2. When there are fewer data files, the size of the table's manifest files is smaller. This reduces the overhead of creating a new manifest file, and thus improves the performance of the Lake Loader when committing new events into the lake. This becomes especially important as your lake grows in size over time. ## Remove orphan files We recommend that you schedule the `removeOrphanFiles` Iceberg action to run once per month. > **Tip:** Your Iceberg table should have `write.metadata.delete-after-commit.enabled=true` set in the table properties. If your Iceberg table was originally created by a Lake Loader older than version 0.7.0, then please run: > > ```sql > ALTER TABLE > SET TBLPROPERTIES ('write.metadata.delete-after-commit.enabled'='true') > ``` As long as `delete-after-commit` is enabled in the table properties, the Snowplow Lake Loader should not create orphan files under normal circumstances. But it is technically still possible for the loader to create orphan files under rare exceptional circumstances, e.g. transient network errors, or if the loader exits without completing a graceful shutdown. Orphan files do not negatively impact query performance or write performance. But they do contribute to storage costs. This action needs to list every file in the lake directory. In large lakes that might be a very large number of files. In large lakes, there might be a lot of files, requiring the job to use a lot of Spark compute resources. This is why we recommend to run it infrequently to offset this impact. ```sql CALL catalog_name.system.remove_orphan_files( table => 'snowplow.events' ); ``` *** --- # Partitioning for data lakes > Data lake partitioning by load_tstamp and event_name for efficient querying and incremental processing in Delta and Iceberg tables. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/lake-loader/partitions/ A lake created by the Lake Loader has two levels of partitioning: 1. By the date that the event is loaded to the lake. For Iceberg, we use a [partition transform](https://iceberg.apache.org/spec/#partitioning) `date(load_tstamp)`. For Delta, we create a [generated column](https://delta.io/blog/2023-04-12-delta-lake-generated-columns/) called `date_load_tstamp` defined as `generatedAlwaysAs(CAST(load_tstamp AS DATE))`. 2. By the `event_name` field. This structure of partitioning works very well with queries that filter on `load_tstamp` and/or `event_name`. It works especially well with incremental models, which only ever process the most recently loaded events. > **Note:** If you are using Snowplow's DBT packages, then set the `session_timestamp` variable to `load_stamp` to match the table's partitioning. If you run a query with a clause like `WHERE load_tstamp > ?`, then your query engine can go directly to the partition containing the relevant files. Even better, because Delta and Iceberg collect file-level statistics, such a query can go directly to the relevant files within the partition, matching exactly the `load_tstamp` of interest. If you often write queries over a single type of event, e.g. `WHERE event_name = 'add_to_cart'` then the query engine can do a very efficient query over the parquet files for the specific event. > **Note:** The Lake Loader has been optimized for writing into a lake with the default partitioning, and the loader will not perform so well with any other partitioning. For these reasons, we strongly advise that you do not change the partitioning structure of your lake. --- # S3 loader configuration reference > Configure S3 Loader with Kinesis input, S3 output, compression, partitioning, and monitoring settings for event archiving. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/s3-loader/configuration-reference/ S3 Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.1/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run S3 Loader, configure the `license.accept` option, like this: ```hcl license { accept = true } ``` This is a complete list of the options that can be configured in the S3 loader HOCON config file. The [example configs in github](https://github.com/snowplow/snowplow-s3-loader/tree/master/config) show how to prepare an input file. | parameter | description | | ----------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.streamName` | Required. Name of the kinesis stream from which to read | | `input.appName` | Optional. Default: `snowplow-s3-loader`. Kinesis Client Lib app name (corresponds to DynamoDB table name) | | `input.initialPosition.type` (since 3.0.0) | Optional. Default: `TRIM_HORIZON`. Set the initial position to consume the Kinesis stream. Possible values: `LATEST` (most recent data), `TRIM_HORIZON` (oldest available data), `AT_TIMESTAMP` (start from the record at or after the specified timestamp) | | `input.initialPosition.timestamp` (since 3.0.0) | Required for `AT_TIMESTAMP`. E.g. `2020-07-17T10:00:00Z` | | `input.retrievalMode.type` (since 3.0.0) | Optional. Default: `Polling`. Set the mode for retrieving records. Possible values: `Polling` or `FanOut` | | `input.retrievalMode.maxRecords` (since 3.0.0) | Required for `Polling`. Default: `750`. Maximum size of a batch returned by a call to `getRecords`. Records are checkpointed after a batch has been fully processed, thus the smaller `maxRecords`, the more often records can be checkpointed into DynamoDb, but possibly reducing the throughput | | `input.retrievalMode.idleTimeBetweenReads` (since 3.1.0) | Optional for `Polling`. Default: `1500 millis`. Idle time between `getRecords` requests. | | `input.workerIdentifier` (since 3.0.0) | Optional. Default: host name. Name of this KCL worker used in the DynamoDB lease table | | `input.leaseDuration` (since 3.0.0) | Optional. Default: `10 seconds`. Duration of shard leases. KCL workers must periodically refresh leases in the DynamoDB table before this duration expires | | `input.maxLeasesToStealAtOneTimeFactor` (since 3.0.0) | Optional. Default: `2.0`. Controls how to pick the max number of leases to steal at one time. E.g. If there are 4 available processors, and `maxLeasesToStealAtOneTimeFactor = 2.0`, then allow the KCL to steal up to 8 leases. Allows bigger instances to more quickly acquire the shard-leases they need to combat latency | | `input.checkpointThrottledBackoffPolicy.minBackoff` (since 3.0.0) | Optional. Default: `100 millis`. Minimum backoff before retrying when DynamoDB provisioned throughput exceeded | | `input.checkpointThrottledBackoffPolicy.maxBackoff` (since 3.0.0) | Optional. Default: `1 second`. Maximum backoff before retrying when DynamoDB provisioned throughput limit exceeded | | `input.debounceCheckpoints` (since 3.0.0) | Optional. Default: `10 seconds`. How frequently to checkpoint our progress to the DynamoDB table. By increasing this value, we can decrease the write-throughput requirements of the DynamoDB table | | `input.customEndpoint` | Optional. Override the default endpoint for kinesis client api calls | | `input.maxRetries` (since 3.1.0) | Optional. Default: `10`. Maximum number of times the Kinesis client will retry AWS API calls in case of failure | | `input.apiCallAttemptTimeout` (since 3.1.0) | Optional. Default: `15 seconds`. Maximum time of a single attempt of an AWS API operation. | | `output.good.path` | Required. Full path to output data, e.g. `s3://acme-snowplow-output/` | | `output.good.partitionFormat` (since 2.1.0) | Optional. Configures how files are partitioned into S3 directories. When loading self describing jsons, you might choose to partition by `{vendor}.{name}/model={model}/date={yyyy}-{MM}-{dd}`. Valid substitutions are `{vendor}`, `{name}`, `{format}`, `{model}` for self-describing jsons; and `{yyyy}`, `{MM}`, `{dd}`, `{HH}` for year, month, day and hour. Defaults to `{vendor}.{schema}` when loading self-describing JSONs or blank when loading enriched events | | `output.good.filenamePrefix` | Optional. Add a prefix to files | | `output.good.compression` | Optional. Has to be `GZIP` (default) | | `output.bad.streamName` | Required. Name of a kinesis stream to output failures | | `output.bad.throttledBackoffPolicy.minBackoff` (since 3.0.0) | Optional. Default: `100 milliseconds`. Minimum backoff before retrying when writing fails with exceeded kinesis write throughput | | `output.bad.throttledBackoffPolicy.maxBackoff` (since 3.0.0) | Optional. Default: `1 second`. Maximum backoff before retrying when writing fails with exceeded kinesis write throughput | | `output.bad.recordLimit` (since 3.0.0) | Optional. Default: `500`. Maximum allowed to records we are allowed to send to Kinesis in 1 PutRecords request | | `output.bad.byteLimit` (since 3.0.0) | Optional. Default: `5242880`. Maximum allowed to bytes we are allowed to send to Kinesis in 1 PutRecords request | | `output.bad.maxRetries` (since 3.1.0) | Optional. Default: `10`. Maximum number of times the Kinesis client will retry AWS API calls in case of failure | | `purpose` | Required. `ENRICHED_EVENTS` for enriched events or `SELF_DESCRIBING` for self-describing data | | `batching.maxBytes` (since 3.0.0) | Optional. Default: `67108864`. After this amount of compressed bytes have been added to the buffer it gets written to a file (unless `maxDelay` is reached before) | | `batching.maxDelay` (since 3.0.0) | Optional. Default: `2 minutes`. After this delay has elapsed the buffer gets written to a file (unless `maxBytes` is reached before) | | `cpuParallelismFactor` (since 3.0.0) | Optional. Default: `1`. Controls how the app splits the workload into concurrent batches which can be run in parallel, e.g. if there are 4 available processors and `cpuParallelismFactor = 0.75` then we process 3 batches concurrently. Adjusting this value can cause the app to use more or less of the available CPU | | `uploadParallelismFactor` (since 3.0.0) | Optional. Default: `2`. Controls number of upload jobs that can be run in parallel, e.g. if there are 4 available processors and `sinkParallelismFraction = 2` then we run 8 upload job concurrently. Adjusting this value can cause the app to use more or less of the available CPU | | `initialBufferSize` (since 3.0.0) | Optional. Default: none. Overrides the initial size of the byte buffer that holds the compressed events in-memory before they get written to a file. If not set, the initial size is picked dynamically based on other configuration options. The default is known to work well. Increasing this value is a way to reduce in-memory copying, but comes at the cost of increased memory usage | | `monitoring.sentry.dsn` | Optional. For tracking uncaught run time exceptions | | `monitoring.metrics.statsd.hostname` | Optional. For sending loading metrics (latency and event counts) to a `statsd` server | | `monitoring.metrics.statsd.port` | Optional. Port of the statsd server | | `monitoring.metrics.statsd.tags` | E.g.`{ "key1": "value1", "key2": "value2" }`. Tags are used to annotate the statsd metric with any contextual information | | `monitoring.metrics.statsd.prefix` | Optional. Default `snoplow.s3loader`. Configures the prefix of statsd metric names | | `monitoring.healthProbe.port` (since 3.0.0) | Optional. Default: `8080`. Port of the HTTP server that returns OK only if the app is healthy | | `monitoring.healthProbe.unhealthyLatency` (since 3.0.0) | Optional. Default: `2 minutes`. Health probe becomes unhealthy if any received event is still not fully processed before this cutoff time | --- # S3 Loader > Archive Snowplow events from Kinesis to S3 in LZO or Gzip format for raw payloads, enriched events, and failed events. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/s3-loader/ Snowplow S3 Loader consumes records from an [Amazon Kinesis](http://aws.amazon.com/kinesis/) stream and writes them to [S3](http://aws.amazon.com/s3/). A typical Snowplow pipeline would use the S3 loader in several places: - Load enriched events from the "enriched" stream. These serve as input for [the RDB loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) when loading to a warehouse. - Load failed events from the "bad" stream. Records that can't be successfully written to S3 are written to a [second Kinesis stream](https://github.com/snowplow/snowplow-s3-loader/blob/master/examples/config.hocon.sample#L75) with the error message. ## Output format : GZIP The records are treated as byte arrays containing UTF-8 encoded strings (whether CSV, JSON or TSV). New lines are used to separate records written to a file. This format can be used with the Snowplow Kinesis Enriched stream, among other streams. Gzip encoding is generally used for both enriched data and bad data. ## Running ### Available on Terraform Registry A Terraform module which deploys the Snowplow S3 Loader on AWS EC2 for use with Kinesis. For installing in other environments, please see the other installation options below. ### Docker image We publish two different flavours of the docker image: - `snowplow/snowplow-s3-loader:3.1.0` - `snowplow/snowplow-s3-loader:3.1.0-distroless` (lightweight alternative) Here is a standard command to run the loader on a EC2 instance in AWS: ```bash docker run \ -d \ --name snowplow-s3-loader \ --restart always \ --log-driver awslogs \ --log-opt awslogs-group=snowplow-s3-loader \ --log-opt awslogs-stream='ec2metadata --instance-id' \ --network host \ -v $(pwd):/snowplow/config \ -e 'JAVA_OPTS=-Xms512M -Xmx1024M -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN' \ snowplow/snowplow-s3-loader:3.1.0 \ --config /snowplow/config/config.hocon ``` --- # S3 Loader monitoring > Monitor S3 Loader with StatsD metrics, Sentry error tracking, and Snowplow event tracking for application health and failures. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/s3-loader/monitoring/ The S3 loader has several types of monitoring built in, to help the pipeline operator: Statsd metrics, Sentry alerts, and Snowplow tracking. ## Statsd [Statsd](https://github.com/statsd/statsd) is a daemon that aggregates and summarizes application metrics. It receives metrics sent by the application over UDP, and then periodically flushes the aggregated metrics to a [pluggable storage backend](https://github.com/statsd/statsd/blob/master/docs/backend.md). When processing enriched events, the S3 loader can emit metrics to a statsd daemon describing every S3 file it writes. Here is a string representation of the metrics it sends: ```text snowplow.s3loader.count:42|c|#tag1:value1 snowplow.s3loader.latency_collector_to_load:123|g|#tag1:value1 snowplow.s3loader.latency_millis:56|g|#tag1:value1 snowplow.s3loader.e2e_latency_millis:123|g|#tag1:value1 ``` - `count`: total number of events that got written to S3. - `latency_collector_to_load`: time difference between reaching the collector and getting loaded to S3 (only for enriched events). Will get deprecated eventually in favor of `e2e_latency_millis`. - `latency_millis`: delay between the input record getting written to the stream and S3 loader starting to process it. - `e2e_latency_millis`: same as `latency_collector_to_load`, which will get deprecated eventually and replaced with this metric. Statsd monitoring is configured by setting the `monitoring.metrics.statsd` section in [the hocon file](/docs/api-reference/loaders-storage-targets/s3-loader/configuration-reference/): ```json "monitoring": { "metrics": { "hostname": "localhost" "port": 8125 "tags": { "tag1": "value1" "tag2": "value2" } "prefix": "snowplow.s3loader" } } ``` ## Health probe Starting with `3.0.0` version S3 loader gets a health probe, configured via the `monitoring.healthProbe` section (see the configuration reference). ## Sentry [Sentry](https://docs.sentry.io/) is a popular error monitoring service, which helps developers diagnose and fix problems in an application. The S3 loader can send an error report to sentry whenever something unexpected happens. The reasons for the error can then be explored in the sentry server’s UI. Common reasons might be failure to read or write from Kinesis, or failure to write to S3. Sentry monitoring is configured by setting the `monitoring.sentry.dsn` key in [the hocon file](/docs/api-reference/loaders-storage-targets/s3-loader/configuration-reference/) with the url of your sentry server: ```json "monitoring": { "dsn": "http://sentry.acme.com" } ``` --- # S3 Loader 1.0.0 configuration > Configure S3 Loader 1.0.0 with HOCON settings for Kinesis or NSQ streams, compression, buffering, and monitoring. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/s3-loader/upgrade-guides/1-0-0-configuration/ The sink is configured using a HOCON file. These are the fields: - `source`: Choose kinesis or nsq as a source stream - `sink`: Choose between kinesis or nsq as a sink stream for failed events - `aws.accessKey` and `aws.secretKey`: Change these to your AWS credentials. You can alternatively leave them as "default", in which case the [DefaultAWSCredentialsProviderChain](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html) will be used. - `kinesis.initialPosition`: Where to start reading from the stream the first time the app is run. "TRIM\_HORIZON" for as far back as possible, "LATEST" for as recent as possibly, "AT\_TIMESTAMP" for after the specified timestamp. - `kinesis.initialTimestamp`: Timestamp for "AT\_TIMESTAMP" initial position - `kinesis.maxRecords`: Maximum number of records to read per GetRecords call - `kinesis.region`: The Kinesis region name to use. - `kinesis.appName`: Unique identifier for the app which ensures that if it is stopped and restarted, it will restart at the correct location. - `kinesis.customEndpoint`: Optional endpoint url configuration to override aws kinesis endpoints. This can be used to specify local endpoints when using localstack. - `kinesis.disableCloudWatch`: Optional override to disable CloudWatch metrics for KCL - `nsq.channelName`: Channel name for NSQ source stream. If more than one application reading from the same NSQ topic at the same time, all of them must have unique channel name to be able to get all the data from the same topic. - `nsq.host`: Hostname for NSQ tools - `nsq.port`: HTTP port number for nsqd - `nsq.lookupPort`: HTTP port number for nsqlookupd - `stream.inStreamName`: The name of the input stream of the tool which you choose as a source. This should be the stream to which your are writing records with the Scala Stream Collector. - `streams.outStreamName`: The name of the output stream of the tool which you choose as sink. This is stream where records are sent if the compression process fails. - `streams.buffer.byteLimit`: Whenever the total size of the buffered records exceeds this number, they will all be sent to S3. - `streams.buffer.recordLimit`: Whenever the total number of buffered records exceeds this number, they will all be sent to S3. - `streams.buffer.timeLimit`: If this length of time passes without the buffer being flushed, the buffer will be flushed. **Note**: With NSQ streams, only record limit is taken into account. Other two option will be ignored. - `s3.region`: The AWS region for the S3 bucket - `s3.bucket`: The name of the S3 bucket in which files are to be stored - `s3.format`: The format the app should write to S3 in (`lzo` or `gzip`) - `s3.maxTimeout`: The maximum amount of time the app attempts to PUT to S3 before it will kill itself ### Monitoring It's possible to include Snowplow monitoring in the application. This is setup through the `monitoring` section at the bottom of the config file: - `monitoring.snowplow.collectorUri` your snowplow collector URI - `monitoring.snowplow.appId` the app-id used in decorating the events sent To disable Snowplow monitoring, just remove the entire `monitoring` section from the config. --- # S3 Loader 2.0.0 upgrade guide > Upgrade S3 Loader to 2.0.0 with configuration refactoring, purpose property, and new Sentry and StatsD monitoring features. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/s3-loader/upgrade-guides/2-0-0-upgrade-guide/ ## Caution If you're upgrading from Snowplow pre-R119 and S3 Loader pre-version 0.7.0 you have to upgrade to version 0.7.0 or 1.0.0 first in order to split bad data produced during transition period. In [Snowplow R119](https://snowplowanalytics.com/blog/2020/05/12/snowplow-release-r119/) we introduced a new self-describing bad rows format. S3 Loader 0.7.0 was the first version capable of partitioning self-describing data based on its schema. 0.7.0 and 1.0.0 are capable to recognize at runtime whether old or new format is consumed and use `partitionedBucket` output path only if necessary, so both formats can be consumed. S3 Loader 2.0.0 supports only new self-describing format and will be raising exceptions if legacy bad data is pushed. ## Config file In 2.0.0 the S3 Loader went through a major configuration refactoring. A [sample config](https://github.com/snowplow/snowplow-s3-loader/blob/2.0.0/config/config.hocon.sample) is available in GitHub repository. - No more `aws` property allowing to hardcode credentials - [default credentials chain](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html) is used - NSQ support has been dropped - Instead of `kinesis` and `s3` the topology now is represented as `input` (Kinesis Stream) and `output` (S3 bucket and a Kinesis Stream for bad data) - `partitionedBucket` property has been removed (see Caution above) - New `purpose` property allowing Loader to recognize the data it works with: `ENRICHED` for enriched TSVs enabling latency monitoring, `SELF_DESCRIBING` generally for any self-describing JSON but usually used for bad rows and `RAW` ## New features - `metrics.sentry.dsn` can be used to track exceptions, including internal KCL exceptions - `metricsd.statsd` can be used to send observability data to StatsD-compatible server --- # S3 Loader 2.2.x upgrade guide > Upgrade S3 Loader to 2.2.x with separate Docker images for GZip, LZO, and distroless variants for security improvements. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/s3-loader/upgrade-guides/2-2-0-upgrade-guide/ Starting from the 2.2.0 release we started publishing three different flavours of the docker image. - Pull the `:2.2.9` tag if you only need GZip output format - Pull the `:2.2.9-lzo` tag if you also need LZO output format - Pull the `:2.2.9-distroless` tag for an lightweight alternative to `:2.2.0` ```bash docker pull snowplow/snowplow-s3-loader:2.2.9 docker pull snowplow/snowplow-s3-loader:2.2.9-lzo docker pull snowplow/snowplow-s3-loader:2.2.9-distroless ``` We removed LZO support from the standard image, because it means we can more easily eliminate security vulnerabilities that are brought in from a dependency on hadoop version 2. The "distroless" docker image is built from [a more lightweight base image](https://github.com/GoogleContainerTools/distroless). It provides some security advantages, because it carries only the minimal files and executables needed for the loader to run. --- # 3.0.0 upgrade guide > Guide for upgrading to S3 Loader 3.0.0, covering buffering changes, LZO deprecation, configuration refactoring, and filename changes. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/s3-loader/upgrade-guides/3-0-0-upgrade-guide/ S3 loader was using AWS SDK v1 which goes EOL at the end of 2025. Bumping to AWS SDK v2 required a full rewrite of the app. ## Buffering S3 loader buffers the events into memory before writing them to S3. There are 2 key differences between the previous loader and the new one: - In the previous loader we had one buffer per Kinesis shard, each buffer getting written to one file. In the new loader, records from all the Kinesis shards go to the same buffer and file. The consequence is that the new loader writes fewer but bigger files. - In the previous loader, records were compressed after the buffer was full, before getting written to disk. In the new loader, records get compressed before getting added to the buffer. The consequence is that the new loader writes bigger files (very close to `maxBytes` if this limit is reached before `maxDelay`). ```mermaid flowchart LR subgraph "Old S3 loader" buffer1["buffer"] buffer2["buffer"] buffer3["buffer"] end shard1["Kinesis shard"] --> buffer1 -->|"compression"| file1["file/"] shard2["Kinesis shard"] --> buffer2 -->|"compression"| file2["file/"] shard3["Kinesis shard"] --> buffer3 -->|"compression"| file3["file/"] classDef noborder stroke:none,fill:none class shard1,shard2,shard3,file1,file2,file3 noborder ``` ```mermaid flowchart LR subgraph "New S3 loader" buffer["buffer"] end shard1["Kinesis shard"] & shard2["Kinesis shard"] & shard3["Kinesis shard"] -->|"compression"| buffer --> file["file/"] classDef noborder stroke:none,fill:none class shard1,shard2,shard3,file noborder ``` ## LZO deprecation Starting from version `3.0.0`, S3 loader should only be used to load enriched events and bad rows (no more `purpose = "RAW"`). The reason for this is that storing the events emitted by the collector is redundant and it is not compatible with features that we have on the roadmap. LZO compression format is not supported any more (it was used in old batch pipelines). Only the following Docker images with GZIP get published: - `snowplow/snowplow-s3-loader:3.1.0` - `snowplow/snowplow-s3-loader:3.1.0-distroless` (lightweight alternative) ## Config file In `3.0.0` S3 Loader went through a major configuration refactoring. A [sample config](https://github.com/snowplow/snowplow-s3-loader/blob/3.0.0/config/config.aws.reference.hocon) is available in GitHub repository. These config fields have been removed: - `region`: it is now retrieved from the region provider chain. - `buffer.recordLimit`: only `maxDelay` and `maxBytes` are now used for the buffering. - `monitoring.snowplow`: Snowplow tracking (sending events e.g. `app_initialized` or `app_heartbeat`) got removed. - `output.s3.maxTimeout` These sections/fields have been renamed: - `output.s3` -> `output.good` - `buffer.byteLimit` -> `batching.maxBytes` - `buffer.timeLimit` -> `batching.maxDelay` - `input.maxRecords` -> `input.retrievalMode.maxRecords` For more details, refer to the [configuration reference](/docs/api-reference/loaders-storage-targets/s3-loader/configuration-reference/). ## Change in the filename There is a change in the name of the files written to S3. In `2.x` the filename was `yyyy-MM-dd-HHmmss--.gz`. In `3.0.0` the new filename is `yyyy-MM-dd-HHmmss-.gz`. The reason for this change is that S3 loader `2.x` was writing the events one file per shard whereas the new loader is writing the events many shards to the same file. --- # Upgrade guides for the S3 Loader > Step-by-step upgrade guides for S3 Loader with configuration changes, breaking changes, and migration paths for major versions. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/s3-loader/upgrade-guides/ This section contains information to help you upgrade to newer versions of the S3 Loader. --- # Schema translation to warehouse column types > How Snowplow JSON schemas map to column types and structures in Redshift, BigQuery, Snowflake, Databricks, Iceberg, and Delta Lake. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/ [Self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/) use [schemas](/docs/fundamentals/schemas/) to define which fields should be present, and of what type (e.g. string, number). This page explains what happens to this information in the warehouse. ## Location Where can you find the data carried by a self-describing event or an entity? **Redshift:** Each type of self-describing event and each type of entity get their own dedicated tables. The name of such a table is composed of the schema vendor, schema name and its major version (more on versioning [later](#versioning)). > **Note:** All characters are converted to lowercase and all symbols (like `.`) are replaced with an underscore. Examples: | Kind | Schema | Resulting table | | --------------------- | ------------------------------------------- | ---------------------------- | | Self-describing event | `com.example/button_press/jsonschema/1-0-0` | `com_example_button_press_1` | | Entity | `com.example/user/jsonschema/1-0-0` | `com_example_user_1` | Inside the table, there will be columns corresponding to the fields in the schema. Their types are determined according to the logic described [below](#types). > **Note:** The name of each column is the name of the schema field converted to snake case. > **Warning:** If an event or entity includes fields not defined in the schema, those fields will not be stored in the warehouse. For example, suppose you have the following field in the schema: ```json "lastName": { "type": "string", "maxLength": 100 } ``` It will be translated into a column called `last_name` (notice the underscore), of type `VARCHAR(100)`. **BigQuery:** **Version 2.x:** Each type of self-describing event and each type of entity get their own dedicated columns in the `events` table. The name of such a column is composed of the schema vendor, schema name and major schema version (more on versioning [later](#versioning)). Examples: | Kind | Schema | Resulting column | | --------------------- | ------------------------------------------- | -------------------------------------------------- | | Self-describing event | `com.example/button_press/jsonschema/1-0-0` | `events.unstruct_event_com_example_button_press_1` | | Entity | `com.example/user/jsonschema/1-0-0` | `events.contexts_com_example_user_1` | **Version 1.x:** Each type of self-describing event and each type of entity get their own dedicated columns in the `events` table. The name of such a column is composed of the schema vendor, schema name and full schema version (more on versioning [later](#versioning)). Examples: | Kind | Schema | Resulting column | | --------------------- | ------------------------------------------- | ------------------------------------------------------ | | Self-describing event | `com.example/button_press/jsonschema/1-0-0` | `events.unstruct_event_com_example_button_press_1_0_0` | | Entity | `com.example/user/jsonschema/1-0-0` | `events.contexts_com_example_user_1_0_0` | *** The column name is prefixed by `unstruct_event_` for self-describing events, and by `contexts_` for entities. _(In case you were wondering, those are the legacy terms for self-describing events and entities, respectively.)_ > **Note:** All characters are converted to lowercase and all symbols (like `.`) are replaced with an underscore. For self-describing events, the column will be of a `RECORD` type, while for entities the type will be `REPEATED RECORD` (because an event can have more than one entity attached). Inside the record, there will be fields corresponding to the fields in the schema. Their types are determined according to the logic described [below](#types). > **Note:** The name of each record field is the name of the schema field converted to snake case. > **Warning:** If an event or entity includes fields not defined in the schema, those fields will not be stored in the warehouse. For example, suppose you have the following field in the schema: ```json "lastName": { "type": "string", "maxLength": 100 } ``` It will be translated into a field called `last_name` (notice the underscore), of type `STRING`. **Snowflake:** Each type of self-describing event and each type of entity get their own dedicated columns in the `events` table. The name of such a column is composed of the schema vendor, schema name and major schema version (more on versioning [later](#versioning)). The column name is prefixed by `unstruct_event_` for self-describing events, and by `contexts_` for entities. _(In case you were wondering, those are the legacy terms for self-describing events and entities, respectively.)_ > **Note:** All characters are converted to lowercase and all symbols (like `.`) are replaced with an underscore. Examples: | Kind | Schema | Resulting column | | --------------------- | ------------------------------------------- | -------------------------------------------------- | | Self-describing event | `com.example/button_press/jsonschema/1-0-0` | `events.unstruct_event_com_example_button_press_1` | | Entity | `com.example/user/jsonschema/1-0-0` | `events.contexts_com_example_user_1` | For self-describing events, the column will be of an `OBJECT` type, while for entities the type will be an `ARRAY` of objects (because an event can have more than one entity attached). Inside the object, there will be keys corresponding to the fields in the schema. The values for the keys will be of type `VARIANT`. > **Note:** If an event or entity includes fields not defined in the schema, those fields will be included in the object. However, remember that you need to set `additionalProperties` to `true` in the respective schema for such events and entities to pass schema validation. For example, suppose you have the following field in the schema: ```json "lastName": { "type": "string", "maxLength": 100 } ``` It will be translated into an object with a `lastName` key that points to a value of type `VARIANT`. **Databricks, Iceberg, Delta:** Each type of self-describing event and each type of entity get their own dedicated columns in the `events` table. The name of such a column is composed of the schema vendor, schema name and major schema version (more on versioning [later](#versioning)). The column name is prefixed by `unstruct_event_` for self-describing events, and by `contexts_` for entities. _(In case you were wondering, those are the legacy terms for self-describing events and entities, respectively.)_ > **Note:** All characters are converted to lowercase and all symbols (like `.`) are replaced with an underscore. Examples: | Kind | Schema | Resulting column | | --------------------- | ------------------------------------------- | -------------------------------------------------- | | Self-describing event | `com.example/button_press/jsonschema/1-0-0` | `events.unstruct_event_com_example_button_press_1` | | Entity | `com.example/user/jsonschema/1-0-0` | `events.contexts_com_example_user_1` | For self-describing events, the column will be of a `STRUCT` type, while for entities the type will be `ARRAY` of `STRUCT` (because an event can have more than one entity attached). Inside the `STRUCT`, there will be fields corresponding to the fields in the schema. Their types are determined according to the logic described [below](#types). > **Note:** The name of each record field is the name of the schema field converted to snake case. > **Warning:** If an event or entity includes fields not defined in the schema, those fields will not be stored in the warehouse. For example, suppose you have the following field in the schema: ```json "lastName": { "type": "string", "maxLength": 100 } ``` It will be translated into a field called `last_name` (notice the underscore), of type `STRING`. **Synapse Analytics:** Each type of self-describing event and each type of entity get their own dedicated columns in the underlying data lake table. The name of such a column is composed of the schema vendor, schema name and major schema version (more on versioning [later](#versioning)). The column name is prefixed by `unstruct_event_` for self-describing events, and by `contexts_` for entities. _(In case you were wondering, those are the legacy terms for self-describing events and entities, respectively.)_ > **Note:** All characters are converted to lowercase and all symbols (like `.`) are replaced with an underscore. Examples: | Kind | Schema | Resulting column | | --------------------- | ------------------------------------------- | -------------------------------------------------- | | Self-describing event | `com.example/button_press/jsonschema/1-0-0` | `events.unstruct_event_com_example_button_press_1` | | Entity | `com.example/user/jsonschema/1-0-0` | `events.contexts_com_example_user_1` | The column will be formatted as JSON — an object for self-describing events and an array of objects for entities (because an event can have more than one entity attached). Inside the JSON object, there will be fields corresponding to the fields in the schema. > **Note:** The name of each JSON field is the name of the schema field converted to snake case. > **Warning:** If an event or entity includes fields not defined in the schema, those fields will not be stored in the data lake, and will not be availble in Synapse. For example, suppose you have the following field in the schema: ```json "lastName": { "type": "string", "maxLength": 100 } ``` It will be translated into a field called `last_name` (notice the underscore) inside the JSON object. *** ## Versioning What happens when you evolve your schema to a [new version](/docs/event-studio/data-structures/versioning/)? **Redshift:** Because the table name for the self-describing event or entity includes the major schema version, each major version of a schema gets a new table: | Schema | Resulting table | | ------------------------------------------- | ---------------------------- | | `com.example/button_press/jsonschema/1-0-0` | `com_example_button_press_1` | | `com.example/button_press/jsonschema/1-2-0` | `com_example_button_press_1` | | `com.example/button_press/jsonschema/2-0-0` | `com_example_button_press_2` | When you evolve your schema within the same major version, (non-destructive) changes are applied to the existing table automatically. For example, if you change the `maxLength` of a `string` field, the limit of the `VARCHAR` column would be updated accordingly. > **Info:** If you make a breaking schema change (e.g. change a type of a field from a `string` to a `number`) without creating a new major schema version, the loader will not be able to modify the table to accommodate the new data. > > In this case, _upon receiving the first event with the offending schema_, the loader will instead create a new table, with a name like `com_example_button_press_1_0_1_recovered_9999999`, where: > > - `1-0-1` is the version of the offending schema > - `9999999` is a hash code unique to the schema (i.e. it will change if the schema is overwritten with a different one) > > To resolve this situation: > > - Create a new schema version (e.g. `1-0-2`) that reverts the offending changes and is again compatible with the original table. The data for events with that `1-0-2` schema will start going to the original table as expected. > - You might also want to manually adapt the data in the `..._recovered_...` table and copy it to the original one. > > Note that this behavior was introduced in RDB Loader 6.0.0. In older versions, breaking changes will halt the loading process. > **Info:** Once the loader creates a column for a given schema version as `NULLABLE` or `NOT NULL`, it will never alter the nullability constraint for that column. For example, if a field is nullable in schema version `1-0-0` and not nullable in version `1-0-1`, the column will remain nullable. (In this example, the Enrich application will still validate data according to the schema, accepting `null` values for `1-0-0` and rejecting them for `1-0-1`.) **BigQuery:** **Version 2.x:** Because the column name for the self-describing event or entity includes the major schema version, each major version of a schema gets a new column: | Schema | Resulting column | | ------------------------------------------- | ------------------------------------------- | | `com.example/button_press/jsonschema/1-0-0` | `unstruct_event_com_example_button_press_1` | | `com.example/button_press/jsonschema/1-2-0` | `unstruct_event_com_example_button_press_1` | | `com.example/button_press/jsonschema/2-0-0` | `unstruct_event_com_example_button_press_2` | When you evolve your schema within the same major version, (non-destructive) changes are applied to the existing column automatically. For example, if you add a new optional field in the schema, a new optional field will be added to the `RECORD`. > **Info:** If you make a breaking schema change (e.g. change a type of a field from a `string` to a `number`) without creating a new major schema version, the loader will not be able to modify the column to accommodate the new data. > > In this case, _upon receiving the first event with the offending schema_, the loader will instead create a new column, with a name like `unstruct_event_com_example_button_press_1_0_1_recovered_9999999`, where: > > - `1-0-1` is the version of the offending schema > - `9999999` is a hash code unique to the schema (i.e. it will change if the schema is overwritten with a different one) > > To resolve this situation: > > - Create a new schema version (e.g. `1-0-2`) that reverts the offending changes and is again compatible with the original column. The data for events with that schema will start going to the original column as expected. > - You might also want to manually adapt the data in the `..._recovered_...` column and copy it to the original one. **Version 1.x:** Because the column name for the self-describing event or entity includes the full schema version, each version of a schema gets a new column: | Schema | Resulting column | | ------------------------------------------- | ----------------------------------------------- | | `com.example/button_press/jsonschema/1-0-0` | `unstruct_event_com_example_button_press_1_0_0` | | `com.example/button_press/jsonschema/1-2-0` | `unstruct_event_com_example_button_press_1_2_0` | | `com.example/button_press/jsonschema/2-0-0` | `unstruct_event_com_example_button_press_2_0_0` | If you are [modeling your data with dbt](/docs/modeling-your-data/modeling-your-data-with-dbt/), you can use [this macro](https://github.com/snowplow/dbt-snowplow-utils#combine_column_versions-source) to aggregate the data across multiple columns. > **Info:** While our recommendation is to use major schema versions to indicate breaking changes (e.g. changing a type of a field from a `string` to a `number`), this is not particularly relevant for BigQuery Loader version 1.x. Indeed, each schema version gets its own column, so there is no difference between major and minor versions. That said, we believe sticking to our recommendation is a good idea: > > - Breaking changes might affect downstream consumers of the data, even if they don’t affect BigQuery > - Version 2 of the loader has stricter behavior that matches our loaders for other warehouses and lakes *** **Snowflake:** Because the column name for the self-describing event or entity includes the major schema version, each major version of a schema gets a new column: | Schema | Resulting column | | ------------------------------------------- | ------------------------------------------- | | `com.example/button_press/jsonschema/1-0-0` | `unstruct_event_com_example_button_press_1` | | `com.example/button_press/jsonschema/1-2-0` | `unstruct_event_com_example_button_press_1` | | `com.example/button_press/jsonschema/2-0-0` | `unstruct_event_com_example_button_press_2` | > **Info:** While our recommendation is to use major schema versions to indicate breaking changes (e.g. changing a type of a field from a `string` to a `number`), this is not particularly relevant for Snowflake. Indeed, the event or entity data is stored in the column as is in the `VARIANT` form, so Snowflake is not “aware” of the schema. That said, we believe sticking to our recommendation is a good idea: > > - Breaking changes might affect downstream consumers of the data, even if they don’t affect Snowflake > - In the future, you might decide to migrate to a different data warehouse where our rules are stricter (e.g. Databricks) > > Also, creating a new major version of the schema (and hence a new column) is the only way to indicate a change in semantics, where the data is in the same format but has different meaning (e.g. amounts in dollars vs euros). **Databricks, Iceberg, Delta:** Because the column name for the self-describing event or entity includes the major schema version, each major version of a schema gets a new column: | Schema | Resulting column | | ------------------------------------------- | ------------------------------------------- | | `com.example/button_press/jsonschema/1-0-0` | `unstruct_event_com_example_button_press_1` | | `com.example/button_press/jsonschema/1-2-0` | `unstruct_event_com_example_button_press_1` | | `com.example/button_press/jsonschema/2-0-0` | `unstruct_event_com_example_button_press_2` | When you evolve your schema within the same major version, (non-destructive) changes are applied to the existing column automatically. For example, if you add a new optional field in the schema, a new optional field will be added to the `STRUCT`. > **Info:** If you make a breaking schema change (e.g. change a type of a field from a `string` to a `number`) without creating a new major schema version, the loader will not be able to modify the column to accommodate the new data. > > In this case, _upon receiving the first event with the offending schema_, the loader will instead create a new column, with a name like `unstruct_event_com_example_button_press_1_0_1_recovered_9999999`, where: > > - `1-0-1` is the version of the offending schema > - `9999999` is a hash code unique to the schema (i.e. it will change if the schema is overwritten with a different one) > > To resolve this situation: > > - Create a new schema version (e.g. `1-0-2`) that reverts the offending changes and is again compatible with the original column. The data for events with that schema will start going to the original column as expected. > - You might also want to manually adapt the data in the `..._recovered_...` column and copy it to the original one. *** ## Types How do schema types translate to the database types? ### Nullability **Redshift:** All non-required schema fields translate to nullable columns. Required fields translate to `NOT NULL` columns: ```json { "properties": { "myRequiredField": {"type": ...} }, "required": [ "myRequiredField" ] } ``` However, it is possible to define a required field where `null` values are allowed (the Enrich application will still validate that the field is present, even if it’s `null`): ```json "myRequiredField": { "type": ["null", ...] } ``` OR ```json "myRequiredField": { "enum": ["null", ...] } ``` In this case, the column will be nullable. It does not matter if `"null"` is in the beginning, middle or end of the list of types or enum values. > **Info:** See also how [versioning](#versioning) affects this. **BigQuery:** All non-required schema fields translate to nullable `RECORD` fields. Required schema fields translate to required `RECORD` fields: ```json { "properties": { "myRequiredField": {"type": ...} }, "required": [ "myRequiredField" ] } ``` However, it is possible to define a required field where `null` values are allowed (the Enrich application will still validate that the field is present, even if it’s `null`): ```json "myRequiredField": { "type": ["null", ...] } ``` OR ```json "myRequiredField": { "enum": ["null", ...] } ``` In this case, the `RECORD` field will be nullable. It does not matter if `"null"` is in the beginning, middle or end of the list of types or enum values. **Snowflake:** All fields are nullable (because they are stored inside the `VARIANT` type). **Databricks, Iceberg, Delta:** All schema fields, including the required ones, translate to nullable fields inside the `STRUCT`. *** ### Types themselves **Redshift:** > **Note:** The row order in this table is important. Type lookup stops after the first match is found scanning from top to bottom. | Json Schema | Redshift Type | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | | ```json { "enum": [E1, E2, ...] } ```The `enum` can contain more than **one** JavaScript type: `string`, `number\|integer`, `boolean`. For the purposes of this `number` and `integer` are the same.`array`, `object`, `NaN` and other types in the `enum` will be cast as fallback `VARCHAR(4096)`._If content size is longer than 4096 it would be truncated when inserted into the Redshift._ | `VARCHAR(M)``M` is the maximum size of `json.stringify(E*)` | | ```json { "type": ["boolean", "integer"] } ```OR```json { "type": ["integer", "boolean"] } ``` | `VARCHAR(10)` | | ```json { "type": [T1, T2, ...] } ``` | `VARCHAR(4096)`_If content size is longer than 4096 it would be truncated when inserted into the Redshift._ | | ```json { "type": "string", "format": "date-time" } ``` | `TIMESTAMP` | | ```json { "type": "string", "format": "date" } ``` | `DATE` | | ```json { "type": "array" } ``` | `VARCHAR(65535)`_Content is stringified and quoted.__If content size is longer than 65535 it would be truncated when inserted into the Redshift._ | | ```json { "type": "integer", "maximum": M } ```* `M` ≤ 32767 | `SMALLINT` | | ```json { "type": "integer", "maximum": M } ```* 32767 < `M` ≤ 2147483647 | `INT` | | ```json { "type": "integer", "maximum": M } ```* `M` >2147483647 | `BIGINT` | | ```json { "type": "integer", "enum": [E1, E2, ...] } ```* Maximum `E*` ≤ 32767 | `SMALLINT` | | ```json { "type": "integer", "enum": [E1, E2, ...] } ```* 32767 < maximum `E*` ≤ 2147483647 | `INT` | | ```json { "type": "integer", "enum": [E1, E2, ...] } ```* Maximum `E*` > 2147483647 | `BIGINT` | | ```json { "type": "integer" } ``` | `BIGINT` | | ```json { "multipleOf": B } ``` | `INT` | | ```json { "type": "number", "multipleOf": B } ```* Only works for `B`=0.01 | `DECIMAL(36,2)` | | ```json { "type": "number" } ``` | `DOUBLE` | | ```json { "type": "boolean" } ``` | `BOOLEAN` | | ```json { "type": "string", "minLength": M, "maxLength": M } ```* `M` is the same in minLength and maxLength | `CHAR(M)` | | ```json { "type": "string", "format": "uuid" } ``` | `CHAR(36)` | | ```json { "type": "string", "format": "ipv6" } ``` | `VARCHAR(39)` | | ```json { "type": "string", "format": "ipv4" } ``` | `VARCHAR(15)` | | ```json { "type": "string", "format": "email" } ``` | `VARCHAR(255)` | | ```json { "type": "string", "maxLength": M } ```* `enum` is not defined | `VARCHAR(M)` | | ```json { "enum": ["E1"] } ```* `E1` is the only element | `CHAR(M)``M` is the size of `json.stringify("E1")` | | If nothing matches above, this is a catch-all. | `VARCHAR(4096)`_Values will be quoted as in JSON.__If content size is longer than 4096 it would be truncated when inserted into the Redshift._ | **BigQuery:** > **Note:** The row order in this table is important. Type lookup stops after the first match is found scanning from top to bottom. | Json Schema | BigQuery Type | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ | | ```json { "type": "object", "properties": {...} } ```If the `"properties"` key is missing, the type for the entire object will be `JSON` instead of `RECORD`.Objects can be nullable. Nested fields can also be nullable (same rules as for everything else). | `RECORD` | | ```json { "type": "array", "items": {...} } ```The type of the repeated value is determined by the `"items"` key of the schema. If the `"items"` key is missing, then the repeated type is `JSON`. | `REPEATED` | | ```json { "type": "string", "format": "date-time" } ``` | `TIMESTAMP` | | ```json { "type": "string", "format": "date" } ``` | `DATE` | | ```json { "type": "boolean" } ``` | `BOOLEAN` | | ```json { "type": "string" } ``` | `STRING` | | ```json { "type": "integer" } ``` | `INT` | | ```json { "type": "number" } ```OR```json { "type": [ "integer", "number"] } ``` | `FLOAT` | | ```json { "enum": [I1, I2, ...] } ```* All `Ix` are integer. | `INT` | | ```json { "enum": [I1, N1, ...] } ```* All `Ix`, `Nx` are integer or number. | `FLOAT` | | ```json { "enum": [S1, S2, ...] } ```* All `Sx` are strings | `STRING` | | ```json { "enum": [A1, A2, ...] } ```* `Ax` are a mix of different types | `JSON`_String values will be quoted as in JSON._ | | If nothing matches above, this is a catch-all. | `JSON` | **Snowflake:** All types are `VARIANT`. **Databricks, Iceberg, Delta:** > **Note:** The row order in this table is important. Type lookup stops after the first match is found scanning from top to bottom. | Json Schema | Databricks Type | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | | ```json { "type": "object", "properties": {...} } ```The `STRUCT` has nested fields, whose types are determined by the `"properties"` key of the schema.If the `"properties"` key is missing, the type for the entire object will be `STRING` instead of `STRUCT`, and data will be JSON-serialized in the string column.Objects can be nullable. Nested fields can also be nullable (same rules as for everything else). | `STRUCT` | | ```json { "type": "array", "items": {...} } ```The type of values within the `ARRAY` is determined by the `"items"` key of the schema. If the `"items"` key is missing, then the values within the array will have type `STRING`, and array items will be JSON-serialized.Arrays can be nullable. Nested fields can also be nullable (same rules as for everything else). | `REPEATED` | | ```json { "type": "string", "format": "date-time" } ``` | `TIMESTAMP` | | ```json { "type": "string", "format": "date" } ``` | `DATE` | | ```json { "type": "boolean" } ``` | `BOOLEAN` | | ```json { "type": "string" } ``` | `STRING` | | ```json { "type": "integer", "minimum": N, "maximum": M } ```* `M` ≤ 2147483647 * `N` ≥ -2147483648 | `INT` | | ```json { "type": "integer", "minimum": N, "maximum": M } ```* `M` ≤ 9223372036854775807 * `N` ≥ -9223372036854775808 | `BIGINT` | | ```json { "type": "integer", "minimum": N, "maximum": M } ```* `M` >1e38-1 * `N` <-1e38 | `DECIMAL(38,0)` | | ```json { "type": "integer", "minimum": N, "maximum": M } ```* `M` < 1e38-1 * `N` >-1e38 | `DOUBLE` | | ```json { "type": "integer" } ``` | `BIGINT` | | ```json { "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F } ```* `M` ≤ 2147483647 * `N` ≥ -2147483648 * `F` is integer | `INT` | | ```json { "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F } ```* `M` ≤ 9223372036854775807 * `N` ≥ -9223372036854775808 * `F` is integer | `BIGINT` | | ```json { "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F } ```* `M` > 1e38-1 * `N` < -1e38 * `F` is integer | `DECIMAL(38,0)` | | ```json { "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F } ```* `M` < 1e38-1 * `N` > -1e38 * `F` is integer | `DOUBLE` | | ```json { "type": "number", // OR ["number", "integer"] "multipleOf": F } ```* `F` is integer | `BIGINT` | | ```json { "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F } ```* `P` ≤ 38, where `P` is the maximum precision (total number of digits) of `M` and `N`, adjusted for scale (number of digits after the `.`) of `F`. * `S` is the maximum scale (number of digits after the `.`) in the enum list and it is greater than 0.**More details**`P` = `MAX`(`M.precision` - `M.scale` + `F.scale`, `N.precision` - `N.scale` + `F.scale`)`S` = `F.scale`For example, `M=10.9999, N=-10, F=0.1` will be `DECIMAL(9,1)`. Calculation as follows:`M` is `DECIMAL(6,4)`, `N` is `DECIMAL(2,0)`, `F` is `DECIMAL(2,1)``P` = `MAX`(6 - 4 + 1, 2 + 1) = 3, rounded up to 9`S` = 1result is `DECIMAL(9,1)` | `DECIMAL(P,S)`_`P` is rounded up to either `9`, `18` or `38`._ | | ```json { "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F } ```* `P` >38, where is the maximum precision (total number of digits) of `M` and `N`, adjusted for scale (number of digits after the `.`) of `F`. * `S` is the maximum scale (number of digits after the `.`) in the enum list and it is greater than 0.**More details**`P` = `MAX`(`M.precision` - `M.scale` + `F.scale`, `N.precision` - `N.scale` + `F.scale`)For example, `M=10.9999, N=-1e50, F=0.1` will be `DOUBLE`. Calculation as follows:`M` is `DECIMAL(6,4)`, `N` is `DECIMAL(2,0)`, `F` is `DECIMAL(2,1)``P` = `MAX`(6 - 4 + 1, 50 + 1) = 51 >38 | `DOUBLE` | | ```json { "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F } ```* `M` < 1e38-1 * `N` > -1e38 * `F` is integer | `DOUBLE` | | ```json { "type": "number" // OR ["number", "integer"] } ``` | `DOUBLE` | | ```json { "enum": [N1, I1, ...] } ```* All `Nx` and `Ix` are of types number or integer. * Maximum scale (number of digits after the `.`) in the enum list is 0. * Maximum absolute value of the enum list is lesser or equal than 2147483647. | `INT` | | ```json { "enum": [N1, I1, ...] } ```* All `Nx` and `Ix` are of types number or integer. * Maximum scale (number of digits after the `.`) in the enum list is 0. * Maximum absolute value of the enum list is lesser or equal than 9223372036854775807. | `BIGINT` | | ```json { "enum": [N1, I1, ...] } ```* All `Nx` and `Ix` are of types number or integer. * Maximum scale (number of digits after the `.`) in the enum list is 0. * Maximum absolute value of the enum list is greater than 9223372036854775807. | `BIGINT` | | ```json { "enum": [N1, I1, ...] } ```* All `Nx` and `Ix` are of types number or integer. * Absolute maximum value of the enum list and less than 1e38. * `S` is the maximum scale (number of digits after the `.`) in the enum list and it is greater than 0. * `P` is precision (total number of digits in `M`). Rounded up to `9`, `18` or `38`. | `DECIMAL(P,S)`_`P` is rounded up to either `9`, `18` or `38`._ | | ```json { "enum": [S1, S2, ...] } ```* All `Sx` are string | `STRING` | | ```json { "enum": [A1, A2, ...] } ```* `Ax` are a mix of different types | `STRING`_String values will be quoted as in JSON._ | | If nothing matches above, this is a catch-all. | `STRING`_Values will be quoted as in JSON._ | *** --- # Snowflake Streaming Loader configuration reference > Configure Snowflake Streaming Loader with Snowpipe Streaming, Kinesis, Pub/Sub, and Kafka settings for real-time warehouse loading. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/configuration-reference/ The configuration reference in this page is written for Snowflake Streaming Loader `0.5.1` ### License The Snowflake Streaming Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.1/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run the loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, configure the `license.accept` option in the config file: ```json "license": { "accept": true } ``` ### Snowflake configuration | Parameter | Description | | ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `output.good.url` | Required, e.g. `https://orgname.accountname.snowflakecomputing.com`. URI of the Snowflake account. | | `output.good.user` | Required. Snowflake user who has necessary privileges | | `output.good.privateKey` | Required. Snowflake private key, used to connect to the account | | `output.good.privateKeyPassphrase` | Optional. Passphrase for the private key | | `output.good.role` | Optional. Snowflake role which the Snowflake user should assume | | `output.good.database` | Required. Name of the Snowflake database containing the events table | | `output.good.schema` | Required. Name of the Snowflake schema containing the events table | | `output.good.table` | Optional. Default value `events`. Name to use for the events table | | `output.good.channel` | Optional. Default value `snowplow`. Prefix to use for the snowflake channels. The full name will be suffixed with a number, e.g. `snowplow-1`. If you run multiple loaders in parallel, then each loader must be configured with a unique channel prefix. | ### Streams configuration **AWS:** | Parameter | Description | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.streamName` | Required. Name of the Kinesis stream with the enriched events | | `input.appName` | Optional, default `snowplow-snowflake-loader`. Name to use for the dynamodb table, used by the underlying Kinesis Consumer Library for managing leases. | | `input.initialPosition.type` | Optional, default `LATEST`. Allowed values are `LATEST`, `TRIM_HORIZON`, `AT_TIMESTAMP`. When the loader is deployed for the first time, this controls from where in the kinesis stream it should start consuming events. On all subsequent deployments of the loader, the loader will resume from the offsets stored in the DynamoDB table. | | `input.initialPosition.timestamp` | Required if `input.initialPosition` is `AT_TIMESTAMP`. A timestamp in ISO8601 format from where the loader should start consuming events. | | `input.retrievalMode` | Optional, default Polling. Change to FanOut to enable the enhance fan-out feature of Kinesis. | | `input.retrievalMode.maxRecords` | Optional. Default value 1000. How many events the Kinesis client may fetch in a single poll. Only used when `input.retrievalMode` is Polling. | | `input.workerIdentifier` | Optional. Defaults to the `HOSTNAME` environment variable. The name of this KCL worker used in the dynamodb lease table. | | `input.leaseDuration` | Optional. Default value `10 seconds`. The duration of shard leases. KCL workers must periodically refresh leases in the dynamodb table before this duration expires. | | `input.maxLeasesToStealAtOneTimeFactor` | Optional. Default value `2.0`. Controls how to pick the max number of shard leases to steal at one time. E.g. If there are 4 available processors, and `maxLeasesToStealAtOneTimeFactor = 2.0`, then allow the loader to steal up to 8 leases. Allows bigger instances to more quickly acquire the shard-leases they need to combat latency. | | `input.checkpointThrottledBackoffPolicy.minBackoff` | Optional. Default value `100 milliseconds`. Initial backoff used to retry checkpointing if we exceed the DynamoDB provisioned write limits. | | `input.checkpointThrottledBackoffPolicy.maxBackoff` | Optional. Default value `1 second`. Maximum backoff used to retry checkpointing if we exceed the DynamoDB provisioned write limits. | | `output.bad.streamName` | Required. Name of the Kinesis stream that will receive failed events. | | `output.bad.throttledBackoffPolicy.minBackoff` | Optional. Default value `100 milliseconds`. Initial backoff used to retry sending failed events if we exceed the Kinesis write throughput limits. | | `output.bad.throttledBackoffPolicy.maxBackoff` | Optional. Default value `1 second`. Maximum backoff used to retry sending failed events if we exceed the Kinesis write throughput limits. | | `output.bad.recordLimit` | Optional. Default value 500. The maximum number of records we are allowed to send to Kinesis in 1 PutRecords request. | | `output.bad.byteLimit` | Optional. Default value 5242880. The maximum number of bytes we are allowed to send to Kinesis in 1 PutRecords request. | | `output.bad.maxRecordSize` | Optional. Default value 1000000. Any single event failed event sent to Kinesis should not exceed this size in bytes | **GCP:** | Parameter | Description | | --------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.subscription` | Required, e.g. `projects/myproject/subscriptions/snowplow-enriched`. Name of the Pub/Sub subscription with the enriched events | | `input.parallelPullFactor` | Optional. Default value 0.5. `parallelPullFactor * cpu count` will determine the number of threads used internally by the Pub/Sub client library for fetching events | | `input.durationPerAckExtension` | Optional. Default value `60 seconds`. Pub/Sub ack deadlines are extended for this duration when needed. | | `input.minRemainingAckDeadline` | Optional. Default value `0.1`. Controls when ack deadlines are re-extended, for a message that is close to exceeding its ack deadline. For example, if `durationPerAckExtension` is `60 seconds` and `minRemainingAckDeadline` is `0.1` then the loader will wait until there is `6 seconds` left of the remining deadline, before re-extending the message deadline. | | `input.maxMessagesPerPull` | Optional. Default value 1000. How many Pub/Sub messages to pull from the server in a single request. | | `input.debounceRequests` | Optional. Default value `100 millis`. Adds an artifical delay between consecutive requests to Pub/Sub for more messages. Under some circumstances, this was found to slightly alleviate a problem in which Pub/Sub might re-deliver the same messages multiple times. | | `output.bad.topic` | Required, e.g. `projects/myproject/topics/snowplow-bad`. Name of the Pub/Sub topic that will receive failed events. | | `output.bad.batchSize` | Optional. Default value 1000. Bad events are sent to Pub/Sub in batches not exceeding this count. | | `output.bad.requestByteThreshold` | Optional. Default value 1000000. Bad events are sent to Pub/Sub in batches with a total size not exceeding this byte threshold | | `output.bad.maxRecordSize` | Optional. Default value 9000000. Any single failed event sent to Pub/Sub should not exceed this size in bytes | **Azure:** | Parameter | Description | | ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.topicName` | Required. Name of the Kafka topic for the source of enriched events. | | `input.bootstrapServers` | Required. Hostname and port of Kafka bootstrap servers hosting the source of enriched events. | | `input.consumerConf.*` | Optional. A map of key/value pairs for [any standard Kafka consumer configuration option](https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html). | | `output.bad.topicName` | Required. Name of the Kafka topic that will receive failed events. | | `output.bad.bootstrapServers` | Required. Hostname and port of Kafka bootstrap servers hosting the bad topic | | `output.bad.producerConf.*` | Optional. A map of key/value pairs for [any standard Kafka producer configuration option](https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html). | | `output.bad.maxRecordSize` | Optional. Default value 1000000. Any single failed event sent to Kafka should not exceed this size in bytes | > **Info:** You can use the `input.consumerConf` and `output.bad.producerConf` options to configure authentication to Azure event hubs using SASL. For example: > > ```json > "input.consumerConf": { > "security.protocol": "SASL_SSL" > "sasl.mechanism": "PLAIN" > "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\$ConnectionString\" password=;" > } > ``` *** ## Other configuration options | Parameter | Description | | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `batching.maxBytes` | Optional. Default value `16000000`. Events are emitted to Snowflake when the batch reaches this size in bytes | | `batching.maxDelay` | Optional. Default value `1 second`. Events are emitted to Snowflake after a maximum of this duration, even if the `maxBytes` size has not been reached | | `batching.uploadParallelismFactor` | Optional. Default value 3.5. Controls how many batches can we send simultaneously over the network to Snowflake. E.g. If there are 4 available processors, and `uploadParallelismFactor` is 3.5, then the loader sends up to 14 batches in parallel. Adjusting this value can cause the app to use more or less of the available CPU. | | `cpuParallelismFactor` | Optional. Default value 0.75. Controls how the loaders splits the workload into concurrent batches which can be run in parallel. E.g. If there are 4 available processors, and `cpuParallelismFactor` is 0.75, then the loader processes 3 batches concurrently. Adjusting this value can cause the app to use more or less of the available CPU. | | `retries.setupErrors.delay` | Optional. Default value `30 seconds`. Configures exponential backoff on errors related to how Snowflake is set up for this loader. Examples include authentication errors and permissions errors. This class of errors are reported periodically to the monitoring webhook. | | `retries.transientErrors.delay` | Optional. Default value `1 second`. Configures exponential backoff on errors that are likely to be transient. Examples include server errors and network errors. | | `retries.transientErrors.attempts` | Optional. Default value 5. Maximum number of attempts to make before giving up on a transient error. | | `retries.checkCommittedOffset.delay` | Optional. Default value `100 millis`. Configures a delay in between flushing events to Snowflake and fetching the latest offset token from Snowflake to check the events are fully ingested. | | `skipSchemas` | Optional, e.g. `["iglu:com.example/skipped1/jsonschema/1-0-0"]` or with wildcards `["iglu:com.example/skipped2/jsonschema/1--"]`. A list of schemas that won't be loaded to Snowflake. This feature could be helpful when recovering from edge-case schemas which for some reason cannot be loaded to the table. | | `monitoring.metrics.statsd.hostname` | Optional. If set, the loader sends statsd metrics over UDP to a server on this host name. | | `monitoring.metrics.statsd.port` | Optional. Default value 8125. If the statsd server is configured, this UDP port is used for sending metrics. | | `monitoring.metrics.statsd.tags.*` | Optional. A map of key/value pairs to be sent along with the statsd metric. | | `monitoring.metrics.statsd.period` | Optional. Default `1 minute`. How often to report metrics to statsd. | | `monitoring.metrics.statsd.prefix` | Optional. Default `snowplow.snowflake-loader`. Prefix used for the metric name when sending to statsd. | | `monitoring.webhook.endpoint` | Optional, e.g. `https://webhook.example.com`. The loader will send to the webhook a payload containing details of any error related to how Snowflake is set up for this loader. | | `monitoring.webhook.tags.*` | Optional. A map of key/value strings to be included in the payload content sent to the webhook. | | `monitoring.webhook.heartbeat.*` | Optional. Default value `5.minutes`. How often to send a heartbeat event to the webhook when healthy. | | `monitoring.sentry.dsn` | Optional. Set to a Sentry URI to report unexpected runtime exceptions. | | `monitoring.sentry.tags.*` | Optional. A map of key/value strings which are passed as tags when reporting exceptions to Sentry. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `output.good.jdbcLoginTimeout` | Optional. Sets the login timeout on the JDBC driver which connects to Snowflake | | `output.good.jdbcNetworkTimeout` | Optional. Sets the network timeout on the JDBC driver which connects to Snowflake | | `output.good.jdbcQueryTimeout` | Optional. Sets the query timeout on the JDBC driver which connects to Snowflake | | `http.client.maxConnectionsPerServer` | Optional. Default value 4. Configures the internal HTTP client used for alerts and telemetry. The maximum number of open HTTP requests to any single server at any one time. | --- # Snowflake Streaming Loader > Load Snowplow events to Snowflake with sub-minute latency from Kinesis, Pub/Sub, or Kafka using Snowpipe Streaming. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/ The Snowflake Streaming Loader is an application that loads Snowplow events to Snowflake. > **Tip:** Both [Snowflake Streaming Loader](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/) and [RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) can load data into Snowflake. > > Snowflake Streaming Loader is newer and has two advantages: > > - Much lower latency — you can get data in Snowflake in seconds, as opposed to minutes with RDB Loader > - Much lower cost — unlike with RDB Loader, there is no need for EMR and extensive Snowflake compute to load batch files > > We recommend the Streaming Loader over the RDB Loader. If you already use RDB Loader, see the [migration guide](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/migrating/) for more information. **AWS:** On AWS, the Snowflake Streaming Loader continually pulls events from Kinesis and writes to Snowflake using the [Snowpipe Streaming API](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview). ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] loader{{"Snowflake Streaming Loader"}} subgraph snowflake [Snowflake] table[("Events table")] end stream-->loader-->|Snowpipe Streaming API|snowflake ``` The Snowflake Streaming Loader is published as a Docker image which you can run on any AWS VM. You do not need a Spark cluster to run this loader. ```bash docker pull snowplow/snowflake-loader-kinesis:0.5.1 ``` To run the loader, mount your config file into the docker image, and then provide the file path on the command line. We recommend setting the `SNOWFLAKE_PRIVATE_KEY` environment variable so that you can refer to it in the config file. ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ --env SNOWFLAKE_PRIVATE_KEY="${SNOWFLAKE_PRIVATE_KEY}" \ snowplow/snowflake-loader-kinesis:0.5.1 \ --config=/myconfig/loader.hocon ``` **GCP:** On GCP, the Snowflake Streaming Loader continually pulls events from Pub/Sub and writes to Snowflake using the [Snowpipe Streaming API](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview). ```mermaid flowchart LR stream[["Enriched Events (Pub/Sub stream)"]] loader{{"Snowflake Streaming Loader"}} subgraph snowflake [Snowflake] table[("Events table")] end stream-->loader-->|Snowpipe Streaming API|snowflake ``` The Snowflake Streaming Loader is published as a Docker image which you can run on any GCP VM. You do not need a Spark cluster to run this loader. ```bash docker pull snowplow/snowflake-loader-pubsub:0.5.1 ``` To run the loader, mount your config file into the docker image, and then provide the file path on the command line. We recommend setting the `SNOWFLAKE_PRIVATE_KEY` environment variable so that you can refer to it in the config file. ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ --env SNOWFLAKE_PRIVATE_KEY="${SNOWFLAKE_PRIVATE_KEY}" \ snowplow/snowflake-loader-pubsub:0.5.1 \ --config=/myconfig/loader.hocon ``` **Azure:** On Azure, the Snowflake Streaming Loader continually pulls events from Kafka and writes to Snowflake using the [Snowpipe Streaming API](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview). ```mermaid flowchart LR stream[["Enriched Events (Kafka stream)"]] loader{{"Snowflake Streaming Loader"}} subgraph snowflake [Snowflake] table[("Events table")] end stream-->loader-->|Snowpipe Streaming API|snowflake ``` The Snowflake Streaming Loader is published as a Docker image which you can run on any Azure VM. You do not need a Spark cluster to run this loader. ```bash docker pull snowplow/snowflake-loader-kafka:0.5.1 ``` To run the loader, mount your config file into the docker image, and then provide the file path on the command line. We recommend setting the `SNOWFLAKE_PRIVATE_KEY` environment variable so that you can refer to it in the config file. ```bash docker run \ --mount=type=bind,source=/path/to/myconfig,destination=/myconfig \ --env SNOWFLAKE_PRIVATE_KEY="${SNOWFLAKE_PRIVATE_KEY}" \ snowplow/snowflake-loader-kafka:0.5.1 \ --config=/myconfig/loader.hocon ``` *** ## Configuring the loader The loader config file is in HOCON format, and it allows configuring many different properties of how the loader runs. The simplest possible config file just needs a description of your pipeline inputs and outputs: **AWS:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/snowplow-snowflake-loader/blob/main/config/config.kinesis.minimal.hocon) **GCP:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/snowplow-snowflake-loader/blob/main/config/config.pubsub.minimal.hocon) **Azure:** ```json loading... ``` [View on GitHub](https://github.com/snowplow-incubator/snowplow-snowflake-loader/blob/main/config/config.azure.minimal.hocon) *** See the [configuration reference](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/configuration-reference/) for all possible configuration parameters. --- # Migrating to Snowflake Streaming Loader from RDB Loader > Migrate from RDB Loader to Snowflake Streaming Loader for lower latency and cost with same table or fresh table strategies. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/migrating/ This guide is aimed at Snowplow users who load events into Snowflake via the [RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/). We recommend migrating to use the [Snowflake Streaming Loader](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/) because it has a much lower latency and is cheaper to run. There are two migration strategies you might take: 1. [Load into the same table as before](#load-into-the-same-table-as-before). This way you have a single events table, containing old events loaded with RDB Loader and new events loaded by the Streaming Loader. 2. [Load into a fresh new table](#load-into-a-fresh-new-table-with-the-streaming-loader) with the Streaming Loader. This is a more cautious approach, but you will need to point any data models or downstream applications, dashboards, etc, to the new table. ## Load into the same table as before The Streaming Loader is fully compatible with the table created and managed by the recent versions of RDB Loader. In particular, these aspects are exactly the same as before: - There are 129 columns for the atomic fields, common to all Snowplow events - [Self-describing events](/docs/fundamentals/events/#self-describing-events) are loaded into columns named like `unstruct_event_com_example_button_press_1` - [Entities](/docs/fundamentals/entities/) are loaded into columns named like `contexts_com_example_user_1` - For both self-describing events and entities, a new column is created for each major version of the Iglu schema > **Tip:** [This page](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) explains how Snowplow data maps to the warehouse in more detail. You will notice some subtle differences: #### No loader-side deduplication RDB Loader performs [within-batch and cross-batch deduplication](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/deduplication/) during loading. The Streaming Loader does not deduplicate events, so you may see more duplicates than with the RDB Loader. Snowplow's [data models](/docs/modeling-your-data/modeling-your-data-with-dbt/) handle deduplication automatically. If you write custom queries, see [dealing with duplicates](/docs/destinations/warehouses-lakes/querying-data/#dealing-with-duplicates). #### New `_schema_version` property in entities Previously, when loading entities into the table, RDB Loader would drop any information about exactly which version of the schema had been used to validate them. The Streaming Loader adds an extra property called `_schema_version`, so the versioning information is not lost in the warehouse. For example, if a tracker sends an entity like this: ```json { "schema": "iglu:com.example/my_schema/jsonschema/1-0-3", "data": { "a": 1 } } ``` Then the value loaded into the `contexts_com_example_my_schema_1` column is this: ```json { "a": 1, "_schema_version": "1-0-3" } ``` #### Null values omitted from entities For self-describing events and entities, the Streaming Loader omits values that are explicitly set to `null`. This differs from RDB loader, which stores them. For example, if a tracker sends an entity like this: ```json { "schema": "iglu:com.example/my_schema/jsonschema/1-0-0", "data": { "a": 1, "b": null, "c": null } } ``` Then the value loaded into the `contexts_com_example_my_schema_1` column is this (note the `b` and `c` fields are missing): ```json { "a": 1 } ``` We made this change as a performance optimization for querying data. The Snowflake docs [explain](https://docs.snowflake.com/en/user-guide/semistructured-considerations) that JSON `null` values affect how Snowflake extracts nested properties. Snowflake automatically builds indexes on the nested properties of VARIANT columns, but only if those properties do not contain explicit `null` values. #### The `load_tstamp` field is required before migrating > **Note:** This only affects users migrating from RDB Loader older than version 4.0.0. The Snowflake events table must have a column named `load_tstamp` of type `TIMESTAMP`. If you have ever used a version of RDB Loader newer than 4.0.0, then it will have already added this column for you. But if you are migrating from an older version of RDB Loader then you will need to add the column manually: ```sql ALTER TABLE events ADD COLUMN load_tstamp TIMESTAMP ``` ## Load into a fresh new table with the Streaming Loader The Snowflake Streaming Loader will automatically create the events table when you run it for the first time. If you are familiar with the RDB Loader's table, then all of the points in the previous section are still relevant to you. You will also notice a few other differences: #### No maximum lengths on VARCHAR columns The old table created by RDB Loader had maximum lengths on some of the columns, e.g. `app_id VARCHAR(255)`. The new Streaming Loader creates columns without max lengths, e.g. `app_id VARCHAR`. #### VARCHAR instead of CHAR When RDB Loader created the events table, it used a mixture of VARCHAR and CHAR column types for the various different string fields. For the sake of simplicity, the Streaming Loader uses VARCHAR column types only. In Snowflake, there is no meaningful difference between the two column types. --- # Postgres Loader for testing and development > Load Snowplow enriched events into PostgreSQL database from Kinesis or Pub/Sub for development and testing environments. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-postgres-loader/ With Snowplow Postgres Loader you can load enriched data or [failed events](/docs/fundamentals/failed-events/) into PostgreSQL database. > **Danger:** The Postgres loader is not recommended for production use, especially with large data volumes. We recommend using a fully-fledged data warehouse like Databricks, Snowflake, BigQuery or Redshift, together with a [respective loader](/docs/destinations/warehouses-lakes/). > **Tip:** For more information on how events are stored in Postgres, check the [mapping between Snowplow schemas and the corresponding Postgres column types](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/?warehouse=postgres). ## Available on Terraform Registry A Terraform module which deploys the Snowplow Postgres Loader on AWS EC2 for use with Kinesis. For installing in other environments, please see the other installation options below. ## Getting a Docker image Snowplow Postgres Loader is [published on DockerHub](https://hub.docker.com/r/snowplow/snowplow-postgres-loader): ```bash docker pull snowplow/snowplow-postgres-loader:0.3.3 ``` It accepts very typical configuration for Snowplow Loader: ```bash docker run --rm \ -v $PWD/config:/snowplow/config \ snowplow/snowplow-postgres-loader:0.3.3 \ --resolver /snowplow/config/resolver.json \ --config /snowplow/config/config.hocon ``` ## Iglu Where `resolver.json` is a typical [Iglu Client](/docs/api-reference/iglu/iglu-resolver/) configuration. **Please pay attention that schemas for all self-describing JSONs flowing through Postgres Loader must be hosted on Iglu Server 0.6.0 or above.** Iglu Central is static registry and if you use Snowplow-authored schemas - you need to upload all schemas from there as well. ## Configuration The configuration file is in HOCON format, and it specifies connection details for the target database and the input stream of events. ```json { "input": { "type": "Kinesis" "streamName": "enriched-events" "region": "eu-central-1" } "output" : { "good": { "type": "Postgres" "host": "localhost" "database": "snowplow" "username": "postgres" "password": ${POSTGRES_PASSWORD} "schema": "atomic" } } } ``` The `input` section can alternatively specify a GCP PubSub subscription, instead of a kinesis stream like in the example above. ```json "input": { "type": "PubSub" "projectId": "my-project" "subscriptionId": "my-subscription" } ``` See [the configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-postgres-loader/postgres-loader-configuration-reference/) for a complete description of all parameters. ## Other Loader creates `events` table on the start and every other table when it first encounters its corresponding schema. You should ensure that the database and schema specified in the configuration exist before starting the loader. --- # Postgres Loader configuration reference > Configure Postgres Loader with Kinesis, Pub/Sub, or local input sources and database connection settings for event loading. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-postgres-loader/postgres-loader-configuration-reference/ This is a complete list of the options that can be configured in the postgres loader's HOCON config file. The [example configs in github](https://github.com/snowplow-incubator/snowplow-postgres-loader/tree/master/config) show how to prepare an input file. | | | | -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.type` | Required. Can be "Kinesis", "PubSub" or "Local". Configures where input events will be read from. | | `input.streamName` | Required when `input.type` is Kinesis. Name of the Kinesis stream to read from. | | `input.region` | Required when `input.type` is Kinesis. AWS region in which the Kinesis stream resides. | | `input.initialPosition` | Optional. Used when `input.type` is Kinesis. Use "TRIM\_HORIZON" (the default) to start streaming at the last untrimmed record in the shard, which is the oldest data record in the shard. Or use "LATEST" to start streaming just after the most recent record in the shard. | | `input.retrievalMode.type` | Optional. When `input.type` is Kinesis, this sets the polling mode for retrieving records. Can be "FanOut" (the default) or "Polling". | | `input.retrievalMode.maxRecords` | Optional. Used when `input.retrievalMode.type` is "Polling". Configures how many records are fetched in each poll of the kinesis stream. Default 10000. | | `input.projectId` | Required when `input.type` is PubSub. The name of your GCP project. | | `input.subscriptionId` | Required when `input.type` is PubSub. Id of the PubSub subscription to read events from | | `input.path` | Required when `input.type` is Local. Path for event source. It can be directory or file. If it is directory, all the files under given directory will be read recursively. Also, given path can be both absolute path or relative path w\.r.t. executable. | | `output.good.host` | Required. Hostname of the postgres database. | | `output.good.port` | Optional. Port number of the postgres database. Default 5432. | | `output.good.database` | Required. Name of the postgres database. | | `output.good.username` | Required. Postgres role name to use when connecting to the database | | `output.good.password` | Required. Password for the postgres user. | | `output.good.schema` | Required. The Postgres schema in which to create tables and write events. | | `output.good.sslMode` | Optional. Configures how the client and server agree on ssl protection. Default "REQUIRE" | | `output.bad.type` | Optional. Can be "Kinesis", "PubSub", "Local" or "Noop". Configures where failed events will be sent. Default is "Noop" which means failed events will be discarded | | `output.bad.streamName` | Required when `bad.type` is Kinesis. Name of the Kinesis stream to write to. | | `output.bad.region` | Required when `bad.type` is Kinesis. AWS region in which the Kinesis stream resides. | | `output.bad.projectId` | Required when `bad.type` is PubSub. The name of your GCP project. | | `output.bad.topicId` | Required when `bad.type` is PubSub. Id of the PubSub topic to write failed events to | | `output.bad.path` | Required when `bad.type` is Local. Path of the file to write failed events | | `purpose` | Optional. Set this to "ENRICHED\_EVENTS" (the default) when reading the stream of enriched events in tsv format. Set this to "JSON" when reading a stream of self-describing json, e.g. snowplow [bad rows](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.badrows). | | `monitoring.metrics.cloudWatch` | Optional boolean, with default true. For kinesis input, this is used to disable sending metrics to cloudwatch. | #### Advanced options We believe these advanced options are set to sensible defaults, and hopefully you won't need to ever change them. | | | | ---------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `backoffPolicy.minBackoff` | If producer (PubSub or Kinesis) fails to send item, it will retry to send it again. This field configures backoff time for first retry. Every retry will double the backoff time of previous one. | | `backoffPolicy.maxBackoff` | Maximum backoff time for retry. After this value is reached, backoff time will no more increase. | | `input.checkpointSettings.maxBatchSize` | Used when `input.type` is Kinesis. Determines the max number of records to aggregate before checkpointing the records. Default is 1000. | | `input.checkpointSettings.maxBatchWait` | Used when `input.type` is Kinesis. Determines the max amount of time to wait before checkpointing the records. Default is 10 seconds. | | `input.checkpointSettings.maxConcurrent` | Used when `input.type` is PubSub. The max number of concurrent evaluation for checkpointer. | | `output.good.maxConnections` | Maximum number of connections database pool is allowed to reach. Default 10 | | `output.good.threadPoolSize` | Size of the thread pool for blocking database operations. Default is value of "maxConnections" | | `output.bad.delayThreshold` | Set the delay threshold to use for batching. After this amount of time has elapsed (counting from the first element added), the elements will be wrapped up in a batch and sent. Default 200 milliseconds | | `output.bad.maxBatchSize` | A batch of messages will be emitted when the number of events in batch reaches the given size. Default 500 | | `output.bad.maxBatchBytes` | A batch of messages will be emitted when the size of the batch reaches the given size. Default 5 MB | --- # RDB Loader for Redshift and Databricks > Load Snowplow events into Redshift, Databricks, or Snowflake with transformation and deduplication using Spark or stream transformers. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/ We use the name RDB Loader (from "relational database") for a set of applications that can be used to load Snowplow events into a data warehouse. Use these tools if you want to load into **Redshift** (including Redshift serverless), **Databricks**, or **Snowflake** (the latter not recommended). For other destinations, see [here](/docs/api-reference/loaders-storage-targets/). **Redshift:** **AWS (Batching, recommended):** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Redshift, stores them in an S3 bucket and instructs Redshift to load them. ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] s3loader{{"S3 Loader"}} prebucket[("Enriched Events (S3 bucket)")] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (S3 bucket)")] subgraph "Redshift" table[("Events table")] end stream-->s3loader-->prebucket-->loader-->bucket--->Redshift ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Redshift. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to TSV Note over Transformer: Write data to the S3 bucket Transformer->>Loader: Notify the loader (via SQS) Loader->>Redshift: Send SQL commands for loading Note over Redshift: Load the data from the S3 bucket using “COPY FROM” end ``` **AWS (Micro-batching):** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Redshift, stores them in an S3 bucket and instructs Redshift to load them. ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (S3 bucket)")] subgraph "Redshift" table[("Events table")] end stream-->loader-->bucket--->Redshift ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Redshift. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to TSV Note over Transformer: Write data to the S3 bucket Transformer->>Loader: Notify the loader (via SQS) Loader->>Redshift: Send SQL commands for loading Note over Redshift: Load the data from the S3 bucket using “COPY FROM” end ``` *** **Databricks:** > **Note:** The cloud selection below is for your _pipeline_. We don’t have restrictions on where Databricks itself is deployed. **AWS (Batching, recommended):** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Databricks, stores them in an S3 bucket and instructs Databricks to load them. ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] s3loader{{"S3 Loader"}} prebucket[("Enriched Events (S3 bucket)")] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (S3 bucket)")] subgraph "Databricks" table[("Events table")] end stream-->s3loader-->prebucket-->loader-->bucket--->Databricks ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Databricks. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to Parquet Note over Transformer: Write data to the S3 bucket Transformer->>Loader: Notify the loader (via SQS) Loader->>Databricks: Send SQL commands for loading Note over Databricks: Load the data from the S3 bucket using “COPY FROM” end ``` **AWS (Micro-batching):** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Databricks, stores them in an S3 bucket and instructs Databricks to load them. ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (S3 bucket)")] subgraph "Databricks" table[("Events table")] end stream-->loader-->bucket--->Databricks ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Databricks. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to Parquet Note over Transformer: Write data to the S3 bucket Transformer->>Loader: Notify the loader (via SQS) Loader->>Databricks: Send SQL commands for loading Note over Databricks: Load the data from the S3 bucket using “COPY FROM” end ``` **GCP:** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Databricks, stores them in an GCS bucket and instructs Databricks to load them. ```mermaid flowchart LR stream[["Enriched Events (Pub/Sub stream)"]] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (GCS bucket)")] subgraph "Databricks" table[("Events table")] end stream-->loader-->bucket--->Databricks ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Databricks. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to Parquet Note over Transformer: Write data to the GCS bucket Transformer->>Loader: Notify the loader (via Pub/Sub) Loader->>Databricks: Send SQL commands for loading Note over Databricks: Load the data from the GCS bucket using “COPY FROM” end ``` *** **Snowflake:** > **Tip:** Both [Snowflake Streaming Loader](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/) and [RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) can load data into Snowflake. > > Snowflake Streaming Loader is newer and has two advantages: > > - Much lower latency — you can get data in Snowflake in seconds, as opposed to minutes with RDB Loader > - Much lower cost — unlike with RDB Loader, there is no need for EMR and extensive Snowflake compute to load batch files > > We recommend the Streaming Loader over the RDB Loader. If you already use RDB Loader, see the [migration guide](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/migrating/) for more information. > **Note:** The cloud selection below is for your _pipeline_. We don’t have restrictions on where Snowflake itself is deployed. **AWS (Batching, recommended):** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Snowflake, stores them in an S3 bucket and instructs Snowflake to load them. ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] s3loader{{"S3 Loader"}} prebucket[("Enriched Events (S3 bucket)")] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (S3 bucket)")] subgraph "Snowflake" table[("Events table")] end stream-->s3loader-->prebucket-->loader-->bucket--->Snowflake ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Snowflake. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to JSON Note over Transformer: Write data to the S3 bucket Transformer->>Loader: Notify the loader (via SQS) Loader->>Snowflake: Send SQL commands for loading Note over Snowflake: Load the data from the S3 bucket using “COPY FROM” end ``` **AWS (Micro-batching):** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Snowflake, stores them in an S3 bucket and instructs Snowflake to load them. ```mermaid flowchart LR stream[["Enriched Events (Kinesis stream)"]] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (S3 bucket)")] subgraph "Snowflake" table[("Events table")] end stream-->loader-->bucket--->Snowflake ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Snowflake. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to JSON Note over Transformer: Write data to the S3 bucket Transformer->>Loader: Notify the loader (via SQS) Loader->>Snowflake: Send SQL commands for loading Note over Snowflake: Load the data from the S3 bucket using “COPY FROM” end ``` **GCP:** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Snowflake, stores them in an GCS bucket and instructs Snowflake to load them. ```mermaid flowchart LR stream[["Enriched Events (Pub/Sub stream)"]] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (GCS bucket)")] subgraph "Snowflake" table[("Events table")] end stream-->loader-->bucket--->Snowflake ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Snowflake. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to JSON Note over Transformer: Write data to the GCS bucket Transformer->>Loader: Notify the loader (via Pub/Sub) Loader->>Snowflake: Send SQL commands for loading Note over Snowflake: Load the data from the GCS bucket using “COPY FROM” end ``` **Azure:** At the high level, RDB Loader reads batches of enriched Snowplow events, converts them to the format supported by Snowflake, stores them in an Azure Blob Storage bucket and instructs Snowflake to load them. ```mermaid flowchart LR stream[["Enriched Events (Kafka stream)"]] loader{{"RDB Loader (Transformer and Loader apps)"}} bucket[("Transformed Events (Azure Blob Storage bucket)")] subgraph "Snowflake" table[("Events table")] end stream-->loader-->bucket--->Snowflake ``` RDB loader consists of two applications: Transformer and Loader. The following diagram illustrates the interaction between them and Snowflake. ```mermaid sequenceDiagram loop Note over Transformer: Read a batch of events Note over Transformer: Transform events to JSON Note over Transformer: Write data to the Azure Blob Storage bucket Transformer->>Loader: Notify the loader (via Kafka) Loader->>Snowflake: Send SQL commands for loading Note over Snowflake: Load the data from the Azure Blob Storage bucket using “COPY FROM” end ``` *** *** > **Tip:** For more information on how events are stored in the warehouse, check the [mapping between Snowplow schemas and the corresponding warehouse column types](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). To run RDB Loader, you will need to run one instance of the Transformer and one instance of the Loader. ## How to pick a transformer The transformer app currently comes in two flavours: a Spark job that processes data in batches, and a long-running streaming app. The process of transforming the data is not dependent on the storage target. Which one is best for your use case depends on three factors: - cloud provider you want to use (AWS, GCP or Azure) - your expected data volume - how much importance you place on deduplicating the data before loading it into the data warehouse. ### Based on cloud provider If you want to run the transformer on AWS, you can use Spark transformer (`snowplow-transformer-batch`) or Transformer Kinesis (`snowplow-transformer-kinesis`). If you want to run the transformer on GCP, you can use Transformer Pubsub (`snowplow-transformer-pubsub`). If you want to run the transformer on Azure, you can use Transformer Kafka (`snowplow-transformer-kafka`). ### Based on expected data volume The Spark transformer (`snowplow-transformer-batch`) is the best choice for big volumes, as the work can be split across multiple workers. However, the need to run it on EMR creates some overhead that is not justified for low-volume pipelines. The stream transformer (`snowplow-transformer-kinesis`, `snowplow-transformer-pubsub` and `snowplow-transformer-kafka`) is a much leaner alternative and suggested for use with low volumes that can be comfortably processed on a single node. However, multiple stream transformers can be run parallel therefore it is possible to process big data volume with stream transformer too. To make the best choice, consider: - What is the underlying infrastructure? For example, a single-node stream transformer will perform differently based on the resources it is given by the machine it runs on. - What is the frequency for processing data? For example, even in a low-volume pipeline, if you only run the transform job once a day, the accumulated data might be enough to justify the use of Spark. ### Based on the importance of deduplication The transformer is also in charge of [deduplicating](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/deduplication/) the data. Currently, only the Spark transformer can do that. If duplicates are not a concern, or if you are happy to deal with them after the data has been loaded in the warehouse, then pick a transformer based on your expected volume (see above). Otherwise, use the Spark transformer. ## How to pick a loader based on the destination There are different loader applications depending on the storage target. Currently, RDB Loader supports Redshift (AWS only), Snowflake and Databricks. For loading into **Redshift** (including Redshift serverless), use the `snowplow-rdb-loader-redshift` artifact. For loading into **Snowflake**, use the `snowplow-rdb-loader-snowflake` artifact. For loading into **Databricks**, use the `snowplow-rdb-loader-databricks` artifact. ## How `transformer` and `loader` interface with other Snowplow components and each other The applications communicate through messages. The transformer consumes enriched tsv-formatted Snowplow events from S3 (AWS) or stream (AWS, GCP and Azure). It writes its output to blob storage (S3, GCS or Azure Blob Storage). Once it's finished processing a batch of data, it issues a message with details about the run. The loader consumes a stream of these messages and uses them to determine what data needs to be loaded. It issues the necessary SQL commands to the storage target. --- # Load into Databricks using the RDB Loader > Load wide row Parquet data into Databricks with automatic schema creation and Delta Lake optimization. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/loading-transformed-data/databricks-loader/ To set up the Databricks loader, the following resources need to be created: - [Databricks cluster](https://docs.databricks.com/clusters/create-cluster.html) - [Databricks access token](https://docs.databricks.com/dev-tools/api/latest/authentication.html) The `events` table and the database schema will be created automatically by the loader. You can [configure](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/) the name of the database schema with the `storage.schema` config field. The table name (`events`) can’t be changed. Keep in mind that the Databricks Loader database user needs to have permissions to create schemas on the given database to be able to perform this operation. Check [this page](https://docs.databricks.com/sql/language-manual/security-grant.html) for more information about granting privileges in Databricks. You can also create the schema manually if you prefer. ## Downloading the artifact The asset is published as a jar file attached to the [Github release notes](https://github.com/snowplow/snowplow-rdb-loader/releases) for each version. It's also available as a Docker image on Docker Hub under `snowplow/rdb-loader-databricks:6.3.0`. ## Configuring `rdb-loader-databricks` The loader takes two configuration files: - a `config.hocon` file with application settings - an `iglu_resolver.json` file with the resolver configuration for your [Iglu](https://github.com/snowplow/iglu) schema registry. An example of the minimal required config for the Databricks loader can be found [here](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/databricks.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/databricks.config.reference.hocon). For details about each setting, see the [configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/). See [here](/docs/api-reference/iglu/iglu-resolver/) for details on how to prepare the Iglu resolver file. > **Tip:** All self-describing schemas for events processed by RDB Loader **must** be hosted on [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) version 0.6.0 or above. [Iglu Central](/docs/api-reference/iglu/iglu-repositories/iglu-central/) is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. ## Running the Databricks loader The two config files need to be passed in as base64-encoded strings: ```bash $ docker run snowplow/rdb-loader-databricks:6.3.0 \ --iglu-config $RESOLVER_BASE64 \ --config $CONFIG_BASE64 ``` **Telemetry notice** By default, Snowplow collects telemetry data for Databricks Loader (since version 5.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Loading transformed data into warehouses > Load transformed Snowplow data into Redshift, Snowflake, or Databricks with automated table management and SQL COPY operations. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/loading-transformed-data/ _For a high-level overview of the RDB Loader architecture, of which the loader is a part, see [RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/)._ The loader applications are specialised to a specific storage target. Each one performs 3 key tasks: - Consume messages from SQS / SNS / Pubsub / Kafka to discover information about transformed data: where it is stored and what it looks like. - Use the information from the message to determine if any changes to the target table(s) are required, eg to add a column for a new event field. If required, submit the appropriate SQL statement for execution by the storage target. - Prepare and submit for execution the appropriate SQL `COPY` statement. For loading into **Redshift**, use the [Redshift loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/loading-transformed-data/redshift-loader/). This loads [shredded data](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#shredded-data) into multiple Redshift tables. For loading into **Snowflake**, use the [Snowflake loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/loading-transformed-data/snowflake-loader/). This loads [wide row JSON format data](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#wide-row-format) into a single Snowflake table. For loading into **Databricks**, use the [Databricks loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/loading-transformed-data/databricks-loader/). This loads [wide row Parquet format data](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#wide-row-format) into a single Databricks table. > **Note:** AWS is fully supported for both Snowflake and Databricks. GCP is supported for Snowflake (since 5.0.0). Azure is supported for Snowflake (since 5.7.0). --- # Monitoring RDB Loader > Monitor RDB Loader with folder checks, health checks, StatsD metrics, Sentry alerts, and Snowplow tracking for warehouse loading. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/loading-transformed-data/monitoring/ The loader app has several types of monitoring built in to help the pipeline operator: folder monitoring, warehouse health checks, StatsD metrics, Sentry alerts, and Snowplow tracking. For all monitoring configuration options, see the [configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/). ## Webhook alerts The loader can send `POST` requests via HTTP webhook to a configurable URL whenever there is an issue which needs investigation by the pipeline operator. The webhook payload conforms to the [`alert`](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.monitoring.batch/alert/jsonschema/1-0-0) schema on Iglu Central. You can configure where the webhook is sent by setting the `monitoring.webhook` section in the `config.hocon` file. The webhook monitoring can be used for folder monitoring and warehouse health checks. ### Folder monitoring A webhook alert is sent whenever the loader identifies inconsistencies between the transformed output in S3 and the data in the warehouse. The algorithm is as follows: - Check if all folders on S3 have a `shredding_complete.json` file (the legacy name is kept for backwards compatibility, but this applies to wide row format data as well). A missing file suggests the transformer failed to complete writing the transformed data, and so manual intervention is required to remove the folder from S3 and rerun. - Check if all folders on S3 created within a specific time range are listed in the warehouse manifest table. This table is maintained by the loader and contains information about loads. If a folder is missing from the manifest table, it suggests the loader has previously tried and failed to load it. Manual intervention is required to resend the `shredding_complete.json` message via SQS / SNS to trigger reloading of the folder. Folder monitoring is configured by setting the `monitoring.folders` section in the `config.hocon` file. ### Warehouse health check The loader can send an alert if the warehouse does not respond to a periodic `SELECT 1` statement. For each failed health check, a `POST` request is sent via the webhook. The health check is configured by setting the `monitoring.healthCheck` section in the `config.hocon` file. ## StatsD and stdout [StatsD](https://github.com/statsd/statsd) is a daemon that aggregates and summarizes application metrics. It receives metrics sent by the application over UDP, and then periodically flushes the aggregated metrics to a [pluggable storage backend](https://github.com/statsd/statsd/blob/master/docs/backend.md). The loader can emit metrics to a StatsD daemon describing every batch it processes. Here is a string representation of the metrics it sends: ```text snowplow.rdbloader.count_good:42|c|#tag1:value1 snowplow.rdbloader.count_bad:2|c|#tag1:value1 snowplow.rdbloader.latency_collector_to_load_min:123.4|g|#tag1:value1 snowplow.rdbloader.latency_collector_to_load_max:234.5|g|#tag1:value1 snowplow.rdbloader.latency_transformer_start_to_load:66.6|g|#tag1:value1 snowplow.rdbloader.latency_transformer_end_to_load:44.4|g|#tag1:value1 ``` These are the meanings of the individual metrics: - `count_good`: the total number of good events in the batch that was loaded - `count_bad`: the total number of bad events in the batch that was loaded (available since version 5.4.0) - `latency_collector_to_load_min`: for the most recent event in the batch, this is the time difference between reaching the collector and getting loaded to the warehouse - `latency_collector_to_load_min`: for the oldest event in the batch, this is the time difference between reaching the collector and getting loaded to the warehouse - `latency_transformer_start_to_load`: time difference between the transformer starting on this batch and the loader completing loading to the warehouse - `latency_transformer_end_to_load`: time difference between the transformer completing this batch and the loader completing loading it into the warehouse. StatsD monitoring is configured by setting the `monitoring.metrics.statsd` section in the `config.hocon` file. You can expose these metrics in `stdout` for easier debugging by setting the `monitoring.metrics.stdout` section in the `config.hocon` file. ## Sentry [Sentry](https://docs.sentry.io/) is a popular error monitoring service, which helps developers diagnose and fix problems in an application. The loader and transformer can both send an error report to sentry whenever something unexpected happens. The reasons for the error can then be explored in the Sentry server’s UI. Common reasons might be lost connection to the database, or an HTTP error fetching a schema from an Iglu server. Sentry monitoring is configured by setting the `monitoring.sentry.dsn` section in the `config.hocon` file. ## Snowplow tracking The loader can emit a Snowplow event to a collector when the application crashes with an unexpected error. The event conforms to the [`load_failed`](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.monitoring.batch/load_failed/jsonschema/1-0-0) schema on Iglu Central. Snowplow tracking is configured by setting the `monitoring.snowplow` section in the `config.hocon` file. --- # Load into Redshift using the RDB Loader > Load shredded Snowplow events into Amazon Redshift with automatic schema creation and table management. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/loading-transformed-data/redshift-loader/ The `events` table and the database schema will be created automatically by the loader. You can [configure](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/) the name of the database schema with the `storage.schema` config field. The table name (`events`) can’t be changed. Keep in mind that the Redshift Loader database user needs to have permissions to create schemas on the given database to be able to perform this operation. Check [this page](https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html) for more information about granting privileges in Redshift. You can also create the schema manually if you prefer. ## Downloading the artifact The asset is published as a jar file attached to the [Github release notes](https://github.com/snowplow/snowplow-rdb-loader/releases) for each version. It's also available as a Docker image on Docker Hub under `snowplow/rdb-loader-redshift:6.3.0`. ## Configuring `rdb-loader-redshift` The loader takes two configuration files: - a `config.hocon` file with application settings - an `iglu_resolver.json` file with the resolver configuration for your [Iglu](https://github.com/snowplow/iglu) schema registry. An example of the minimal required config for the Redshift loader can be found [here](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/redshift.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/redshift.config.reference.hocon). For details about each setting, see the [configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/). See [here](/docs/api-reference/iglu/iglu-resolver/) for details on how to prepare the Iglu resolver file. > **Tip:** All self-describing schemas for events processed by RDB Loader **must** be hosted on [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) 0.6.0 or above. [Iglu Central](/docs/api-reference/iglu/iglu-repositories/iglu-central/) is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. ## Running the Redshift loader The two config files need to be passed in as base64-encoded strings: ```bash $ docker run snowplow/rdb-loader-redshift:6.3.0 \ --iglu-config $RESOLVER_BASE64 \ --config $CONFIG_BASE64 ``` **Telemetry notice** By default, Snowplow collects telemetry data for Redshift Loader (since version 5.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Load into Snowflake using the RDB Loader > Load wide row JSON data into Snowflake on AWS, GCP, or Azure with TempCreds or NoCreds authentication methods. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/loading-transformed-data/snowflake-loader/ It is possible to run Snowflake Loader on AWS, GCP and Azure. ### Setting up Snowflake You can use the steps outlined in our [quick start guide](/docs/get-started/self-hosted/quick-start/?warehouse=snowflake#prepare-the-destination) to create most of the necessary Snowflake resources. There are two different authentication methods with Snowflake Loader: - With the `TempCreds` method, there are no additional Snowflake resources needed. - With the `NoCreds` method, the Loader needs a Snowflake stage. This choice is controlled by the `loadAuthMethod` [configuration setting](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/#snowflake-loader-storage-section). > **Note:** For GCP pipelines, only the `NoCreds` method is available. **Using the NoCreds method** First, create a Snowflake stage. For that, you will need a Snowflake database, Snowflake schema, Snowflake storage integration, Snowflake file format, and the path to the transformed events bucket (in S3, GCS or Azure Blob Storage). You can follow [this tutorial](https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration.html) to create the storage integration. Assuming you created the other required resources for it, you can create the Snowflake stage by following [this document](https://docs.snowflake.com/en/sql-reference/sql/create-stage.html). Finally, use the `transformedStage` [configuration setting](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/#snowflake-loader-storage-section) to point the loader to your stage. ### Running the loader There are dedicated terraform modules for deploying Snowflake Loader on [AWS](https://registry.terraform.io/modules/snowplow-devops/snowflake-loader-ec2/aws/latest) and [Azure](https://github.com/snowplow-devops/terraform-azurerm-snowflake-loader-vmss). You can see how they are used in our full pipeline deployment examples [here](/docs/get-started/self-hosted/quick-start/). We don't have a terraform module for deploying Snowflake Loader on GCP yet. Therefore, it needs to be deployed manually at the moment. ### Downloading the artifact The asset is published as a jar file attached to the [Github release notes](https://github.com/snowplow/snowplow-rdb-loader/releases) for each version. It's also available as a Docker image on Docker Hub under `snowplow/rdb-loader-snowflake:6.3.0`. ### Configuring `rdb-loader-snowflake` The loader takes two configuration files: - a `config.hocon` file with application settings - an `iglu_resolver.json` file with the resolver configuration for your [Iglu](https://github.com/snowplow/iglu) schema registry. | Minimal Configuration | Extended Configuration | | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | [aws/snowflake.config.minimal.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/snowflake.config.minimal.hocon) | [aws/snowflake.config.reference.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/snowflake.config.reference.hocon) | | [gcp/snowflake.config.minimal.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/gcp/snowflake.config.minimal.hocon) | [gcp/snowflake.config.reference.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/gcp/snowflake.config.reference.hocon) | | [azure/snowflake.config.minimal.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/azure/snowflake.config.minimal.hocon) | [azure/snowflake.config.reference.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/azure/snowflake.config.reference.hocon) | For details about each setting, see the [configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/). See [here](/docs/api-reference/iglu/iglu-resolver/) for details on how to prepare the Iglu resolver file. > **Tip:** All self-describing schemas for events processed by RDB Loader **must** be hosted on [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) 0.6.0 or above. [Iglu Central](/docs/api-reference/iglu/iglu-repositories/iglu-central/) is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. ### Running the Snowflake loader The two config files need to be passed in as base64-encoded strings: ```bash $ docker run snowplow/rdb-loader-snowflake:6.3.0 \ --iglu-config $RESOLVER_BASE64 \ --config $CONFIG_BASE64 ``` **Telemetry notice** By default, Snowplow collects telemetry data for Snowflake Loader (since version 5.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # RDB Loader configuration reference > Configure RDB Loader for Redshift, Snowflake, and Databricks with storage, messaging, scheduling, and monitoring settings. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/ The configuration reference in this page is written for RDB Loader 5.0.0 or higher. The configuration reference pages for previous versions can be found [here](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/rdb-loader-configuration-reference/rdb-loader-previous-versions/). | Minimal Configuration | Extended Configuration | | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | [aws/redshift.config.minimal.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/redshift.config.minimal.hocon) | [aws/redshift.config.reference.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/redshift.config.reference.hocon) | | [aws/snowflake.config.minimal.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/snowflake.config.minimal.hocon) | [aws/snowflake.config.reference.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/snowflake.config.reference.hocon) | | [aws/databricks.config.minimal.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/databricks.config.minimal.hocon) | [aws/databricks.config.reference.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/aws/databricks.config.reference.hocon) | | [gcp/snowflake.config.minimal.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/gcp/snowflake.config.minimal.hocon) | [gcp/snowflake.config.reference.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/gcp/snowflake.config.reference.hocon) | | [azure/snowflake.config.minimal.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/azure/snowflake.config.minimal.hocon) | [azure/snowflake.config.reference.hocon](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/loader/azure/snowflake.config.reference.hocon) | All applications use a common module for core functionality, so only the `storage` sections are different in their config. ## License Since version 6.0.0, RDB Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run RDB Loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```hcl license { accept = true } ``` ## Password rotation To rotate the password for your RDB Loader, you'll need to contact [Snowplow Support](https://support.snowplow.io/), and share the intended password using the secure Credential sharing form in Console. To avoid disruption to your pipeline, we'll coordinate a time with you to update the credentials. ## Redshift Loader `storage` section | Parameter | Description | | --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `type` | Optional. The only valid value is the default: `redshift`. | | `host` | Required. Host name of Redshift cluster. | | `port` | Required. Port of Redshift cluster. | | `database` | Required. Redshift database which the data will be loaded to. | | `roleArn` | Required if 'NoCreds' is chosen as load auth method.. AWS Role ARN allowing Redshift to load data from S3. | | `schema` | Required. Redshift schema name, eg “atomic”. | | `username` | Required. DB user with permissions to load data. | | `jdbcAuth.*` (since 6.3.0) | Required. JDBC authentication configuration. Supports two modes: `password` and `iam`. | | `jdbcAuth.type` (since 6.3.0) | Required. The authentication type. Possible values: `password` and `iam`. | | `jdbcAuth.password` (since 6.3.0) | Required if `jdbcAuth.type` is `password`. Password of the DB user. | | `jdbcAuth.roleArn` (since 6.3.0) | Required if `jdbcAuth.type` is `iam`. IAM role ARN with permissions to call `GetClusterCredentials`. | | `jdbcAuth.roleExternalId` (since 6.3.0) | Required if `jdbcAuth.type` is `iam`. External ID for assuming the IAM role, used to restrict role assumption to trusted principals. | | `jdbcAuth.redshiftRegion` (since 6.3.0) | Required if `jdbcAuth.type` is `iam`. AWS region where the Redshift cluster is located. | | `jdbcAuth.clusterId` (since 6.3.0) | Required if `jdbcAuth.type` is `iam`. Redshift cluster identifier. | | `jdbcAuth.roleSessionName` (since 6.3.0) | Optional. Session name for the assumed IAM role. | | `jdbcAuth.credentialsTtl` (since 6.3.0) | Optional. TTL for temporary database credentials when using IAM authentication. Must be between 15 minutes and one hour. | | `maxError` | Optional. Configures the [Redshift MAXERROR load option](https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html#copy-maxerror). The default is 10. | | `loadAuthMethod.*` (since 5.2.0) | Optional, default method is `NoCreds`. Specifies the auth method to use with the `COPY` statement. | | `loadAuthMethod.type` | Required if `loadAuthMethod` section is included. Specifies the type of the authentication method. The possible values are `NoCreds` and `TempCreds`. With `NoCreds`, no credentials will be passed to the `COPY` statement. Instead, Redshift cluster needs to be configured with an AWS Role ARN that allows it to load data from S3. This Role ARN needs to be passed in the `roleArn` setting above. You can find more information [here](https://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-access-permissions.html). With 'TempCreds', temporary credentials will be created for every load operation and these temporary credentials will be passed to the `COPY` statement. | | `loadAuthMethod.roleArn` | Required if `loadAuthMethod.type` is `TempCreds`. IAM role that is used while creating temporary credentials. This role should allow access to the S3 bucket the transformer will write data to, with the following permissions: `s3:GetObject*`, `s3:ListBucket`, and `s3:GetBucketLocation`. | | `loadAuthMethod.credentialsTtl` since (5.4.0) | Optional, default value `1 hour`. If `TempCreds` load auth method is used, this value will be used as a session duration of temporary credentials used for loading data and folder monitoring. In that case, it can't be greater than 1 hour and can't be less than 15 minutes. | | `jdbc.*` | Optional. Custom JDBC configuration. The default value is `{"ssl": true}`. | | `jdbc.BlockingRowsMode` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.DisableIsValidQuery` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.DSILogLevel` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.FilterLevel` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.loginTimeout` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.loglevel` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.socketTimeout` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.ssl` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.sslMode` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.sslRootCert` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.tcpKeepAlive` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | | `jdbc.TCPKeepAliveMinutes` | Optional. Refer to the [Redshift JDBC driver reference](https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.54.1082/Amazon+Redshift+JDBC+Connector+Install+Guide.pdf). | ## Snowflake Loader `storage` section | Parameter | Description | | --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `type` | Optional. The only valid value is the default: `snowflake`. | | `snowflakeRegion` | Required. AWS Region used by Snowflake to access its endpoint. | | `username` | Required. Snowflake user with necessary role granted to load data. | | `role` | Optional. Snowflake role with permission to load data. If it is not provided, the default role in Snowflake will be used. | | `password` | Required. Password of the Snowflake user. Can be plain text, read from the EC2 parameter store or GCP secret manager (see below). | | `password.secretStore.parameterName` | Alternative way for passing in the user password. | | `account` | Required. Target Snowflake account. | | `warehouse` | Required. Snowflake warehouse which the SQL statements submitted by Snowflake Loader will run on. | | `database` | Required. Snowflake database which the data will be loaded to. | | `schema` | Required. Target schema | | `transformedStage.*` | Required if `NoCreds` is chosen as load auth method. Snowflake stage for transformed events. | | `transformedStage.name` | Required if `transformedStage` is included. The name of the stage. | | `transformedStage.location` | Required if `transformedStage` is included. The S3 path used as stage location. (Not needed since 5.2.0 because it is auto-configured) | | `folderMonitoringStage.*` | Required if `monitoring.folders` section is configured and `NoCreds` is chosen as load auth method. Snowflake stage to load folder monitoring entries into temporary Snowflake table. | | `folderMonitoringStage.name` | Required if `folderMonitoringStage` is included. The name of the stage. | | `folderMonitoringStage.location` | Required if `folderMonitoringStage` is included. The S3 path used as stage location. (Not needed since 5.2.0 because it is auto-configured) | | `appName` | Optional. Name passed as 'application' property while creating Snowflake connection. The default is `Snowplow_OSS`. | | `maxError` | Optional. A table copy statement will skip an input file when the number of errors in it exceeds the specified number. This setting is used during initial loading and thus can filter out only invalid JSONs (which is impossible situation if used with Transformer). | | `jdbcHost` | Optional. Host for the JDBC driver that has priority over automatically derived hosts. If it is not given, host will be created automatically according to given `snowflakeRegion`. | | `loadAuthMethod.*` | Optional, default method is `NoCreds`. Specifies the auth method to use with `COPY INTO` statement. Note that `TempCreds` auth method doesn't work when data is loaded from GCS. | | `loadAuthMethod.type` | Required if `loadAuthMethod` section is included. Specifies the type of the auth method. The possible values are `NoCreds` and `TempCreds`. With `NoCreds`, no credentials will be passed to `COPY INTO` statement. Instead, `transformedStage` and `folderMonitoringStage` specified above will be used. More information can be found [here](https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration.html). With `TempCreds`, temporary credentials will be created for every load operation and these temporary credentials will be passed to `COPY INTO` statement. | | `loadAuthMethod.roleArn` | Required if `loadAuthMethod.type` is `TempCreds`. IAM role that is used while creating temporary credentials. This role should allow access to the S3 bucket the transformer will write data to. You can find the list of necessary permissions needs to be given to role in [here](https://docs.snowflake.com/en/user-guide/data-load-s3-config-aws-iam-user.html). | | `loadAuthMethod.credentialsTtl` since (5.4.0) | Optional, default value `1 hour`. If `TempCreds` load auth method is used, this value will be used as a session duration of temporary credentials used for loading data and folder monitoring. In that case, it can't be greater than 1 hour and can't be less than 15 minutes. | | `readyCheck` since (5.4.0) | Optional. Either `ResumeWarehouse` (the default) or `Select1`. The command the loader runs to prepare the JDBC connection. | ## Databricks Loader `storage` section | Parameter | Description | | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `type` | Optional. The only valid value is the default: `databricks`. | | `host` | Required. Hostname of Databricks cluster. | | `password` | Required if `oauth` section is not defined. [Databricks access token](https://docs.databricks.com/dev-tools/api/latest/authentication.html). Can be plain text, read from the EC2 parameter store or GCP secret manager (see below). | | `oauth.clientId` | Required if `password` is not defined. Client id used in [Databricks OAuth machine-to-machine](https://docs.databricks.com/en/integrations/jdbc/authentication.html#oauth-machine-to-machine-m2m-authentication) authentication flow. | | `oauth.clientSecret` | Required if `password` is not defined. Client secret used in [Databricks OAuth machine-to-machine](https://docs.databricks.com/en/integrations/jdbc/authentication.html#oauth-machine-to-machine-m2m-authentication) authentication flow. | | `eventsOptimizePeriod` | Optional. The default value is `2 days`. Optimize period per table, that will be used as predicate for the `OPTIMIZE` command. | | `password.secretManager.parameterName` | Alternative way for passing in the access token. | | `schema` | Required. Target schema. | | `port` | Required. Port of Databricks cluster. | | `httpPath` | Required. Http Path of Databricks cluster. Get it from the JDBC connection details after the cluster has been created. | | `catalog` | Optional. The default value is `hive_metastore`. [Databricks unity catalog name](https://docs.databricks.com/data-governance/unity-catalog/index.html). | | `userAgent` | Optional. The default value is `snowplow-rdbloader-oss`. User agent name for Databricks connection. | | `loadAuthMethod.*` | Optional, default method is `NoCreds`. Specifies the auth method to use with `COPY INTO` statement | | `loadAuthMethod.type` | Required if `loadAuthMethod` section is included. Specifies the type of the auth method. The possible values are `NoCreds` and `TempCreds`. With `NoCreds`, no credentials will be passed to `COPY INTO` statement. Databricks cluster needs to have permission to access transformer output S3 bucket. More information can be found [here](https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html). With `TempCreds`, temporary credentials will be created for every load operation and these temporary credentials will be passed to `COPY INTO` statement. With this way, Databricks cluster doesn't need permission to access to transformer output S3 bucket. This access will be provided by temporary credentials. | | `loadAuthMethod.roleArn` | Required if `loadAuthMethod.type` is `TempCreds`. IAM role that is used while creating temporary credentials. This role should allow access to the S3 bucket the transformer will write data to, with the following permissions: `s3:GetObject*`, `s3:ListBucket`, and `s3:GetBucketLocation`. | | `loadAuthMethod.credentialsTtl` since (5.4.0) | Optional, default value `1 hour`. If `TempCreds` load auth method is used, this value will be used as a session duration of temporary credentials used for loading data and folder monitoring. In that case, it can't be greater than 1 hour and can't be less than 15 minutes. | | `logLevel` since (5.3.2) | Optional. The default value is 3. Specifies JDBC driver log level. 0 - Disable all logging. 1 - Log severe error events that lead the driver to abort. 2 - Log error events that might allow the driver to continue running. 3 - Log events that might result in an error if action is not taken. (default) 4 - Log general information that describes the progress of the driver. 5 - Log detailed information that is useful for debugging the driver. 6 - Log all driver activity. | ## AWS specific settings | Parameter | Description | | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `region` | Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). AWS region of the S3 bucket. | | `messageQueue.type` | Required. Type of the message queue. It should be `sqs` when application is run on AWS. | | `messageQueue.queueName` | Required. The name of the SQS queue used by the transformer and loader to communicate. | | `jsonpaths` | Optional. An S3 URI that holds JSONPath files. | ## GCP specific settings Only Snowflake Loader can be run on GCP at the moment. | Parameter | Description | | --------------------------- | ------------------------------------------------------------------------------------------------ | | `messageQueue.type` | Type of the message queue. It should be `pubsub` when application is run on GCP. | | `messageQueue.subscription` | Required. The name of the Pubsub subscription used by the transformer and loader to communicate. | ## Azure specific settings Only Snowflake Loader can be run on Azure at the moment. | Parameter | Description | | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | | `blobStorageEndpoint` | Endpoint of Azure Blob Storage container that contains transformer's output. | | `azureVaultName` | Name of the Azure Key Vault where application secrets are stored. Required if secret store is used in `storage.password` field. | | `messageQueue.type` | Type of the message queue. It should be `kafka` when application is run on Azure. | | `messageQueue.topicName` | Name of the Kafka topic used to communicate with Transformer. | | `messageQueue.bootstrapServers` | A list of host:port pairs to use for establishing the initial connection to the Kafka cluster. | | `messageQueue.consumerConf` | Optional. Kafka consumer configuration. See for all properties. | ## Common loader settings | Parameter | Description | | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `schedules.*` | Optional. Tasks scheduled to run periodically. | | `schedules.noOperation.[*]` | Optional. Array of objects which specifies no-operation windows during which periodically scheduled tasks (configured in this section) will not run. | | `schedules.noOperation.[*].name` | Human-readable name of the no-op window. | | `schedules.noOperation.[*].when` | Cron expression with second granularity. | | `schedules.noOperation.[*].duration` | For how long the loader should be paused. | | `schedules.optimizeEvents` | Optional. The default value is `"0 0 0 ? * *"` (i.e. every day at 00:00, JVM timezone). Cron expression with second granularity that specifies the schedule to run periodically an `OPTIMIZE` statement on event table. (Only for Databricks Loader) | | `schedules.optimizeManifest` | Optional. The default value is `"0 0 5 ? * *"` (i.e. every day at 05:00 AM, JVM timezone). Cron expression with second granularity that specifies the schedule to run periodically an `OPTIMIZE` statement on manifest table. (Only for Databricks Loader) | | `retryQueue.*` | Optional. Additional backlog of recently failed folders that could be automatically retried. Retry queue saves a failed folder and then re-reads the info from `shredding_complete` S3 file. (Despite the legacy name of the message, which is required for backward compatibility, this also works with wide row format data.) | | `retryQueue.period` | Required if `retryQueue` section is configured. How often batch of failed folders should be pulled into a discovery queue. | | `retryQueue.size` | Required if `retryQueue` section is configured. How many failures should be kept in memory. After the limit is reached new failures are dropped. | | `retryQueue.maxAttempts` | Required if `retryQueue` section is configured. How many attempts to make for each folder. After the limit is reached new failures are dropped. | | `retryQueue.interval` | Required if `retryQueue` section is configured. Artificial pause after each failed folder being added to the queue. | | `retries.*` | Optional. Unlike `retryQueue` these retries happen immediately, without proceeding to another message. | | `retries.backoff` | Required if `retries` section is configured. Starting backoff period, eg '30 seconds'. | | `retries.strategy` | Backoff strategy used during retry. The possible values are `JITTER`, `CONSTANT`, `EXPONENTIAL`, `FIBONACCI`. | | `retries.attempts` | Optional. How many attempts to make before sending the message into retry queue. If missing, `cumulativeBound` will be used. | | `retries.cumulativeBound` | Optional. When backoff reaches this delay, eg '1 hour', the loader will stop retrying. If both this and `attempts` are not set, the loader will retry indefinitely. | | `timeouts.loading` | Optional. How long, eg '1 hour', `COPY` statement execution can take before considering Redshift unhealthy. If no progress (ie, moving to a different subfolder) within this period, the loader will abort the transaction. | | `timeouts.nonLoading` | Optional. How long, eg '10 mins', non-loading steps such as `ALTER TABLE` can take before considering Redshift unhealthy. | | `timeouts.sqsVisibility` | Optional. The time window in which a message must be acknowledged. Otherwise it is considered abandoned. If a message has been pulled, but hasn't been acked, the time before it is again available to consumers is equal to this, eg '5 mins'. Another consequence is that if the loader has failed on processing a message, the next time it will get this (or anything) from the queue has this delay. | | `readyCheck.*` | Optional. Check the target destination to make sure it is ready. | | `readyCheck.backoff` | Optional. The default value is `15 seconds`. Starting backoff period. | | `readyCheck.strategy` | Optional. The default value is `CONSTANT`. Backoff strategy used during retry. The possible values are `JITTER`, `CONSTANT`, `EXPONENTIAL`, `FIBONACCI`. | | `initRetries.*` | Optional. Retries configuration for initialization block. It will retry on all exceptions from there. | | `initRetries.backoff` | Required if `initRetries` section is configured. Starting backoff period, eg '30 seconds'. | | `initRetries.strategy` | Backoff strategy used during retry. The possible values are `JITTER`, `CONSTANT`, `EXPONENTIAL`, `FIBONACCI`. | | `initRetries.attempts` | Optional. How many attempts to make before sending the message into retry queue. If missing, `cumulativeBound` will be used. | | `initRetries.cumulativeBound` | Optional. When backoff reaches this delay, eg '1 hour', the loader will stop retrying. If both this and `attempts` are not set, the loader will retry indefinitely. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | ## Common monitoring settings | Parameter | Description | | -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `monitoring.webhook.endpoint` | Optional. An HTTP endpoint where monitoring alerts should be sent. | | `monitoring.webhook.tags` | Optional. Custom key-value pairs which can be added to the monitoring webhooks. Eg, `{"tag1": "label1"}`. | | `monitoring.snowplow.appId` | Optional. When using Snowplow tracking, set this `appId` in the event. | | `monitoring.snowplow.collector` | Optional. Set to a collector URL to turn on Snowplow tracking. | | `monitoring.sentry.dsn` | Optional. For tracking runtime exceptions. | | `monitoring.metrics.*` | Send metrics to a StatsD server or stdout. | | `monitoring.metrics.period` | Optional. The default is 5 minutes. Period for metrics emitted periodically. | | `monitoring.metrics.statsd.*` | Optional. For sending loading metrics (latency and event counts) to a StatsD server. | | `monitoring.metrics.statsd.hostname` | Required if `monitoring.metrics.statsd` section is configured. The host name of the StatsD server. | | `monitoring.metrics.statsd.port` | Required if `monitoring.metrics.statsd` section is configured. Port of the StatsD server. | | `monitoring.metrics.statsd.tags` | Optional. Tags which are used to annotate the StatsD metric with any contextual information. | | `monitoring.metrics.statsd.prefix` | Optional. Configures the prefix of StatsD metric names. The default is `snoplow.rdbloader`. | | `monitoring.metrics.stdout.*` | Optional. For sending metrics to stdout. | | `monitoring.metrics.stdout.prefix` | Optional. Overrides the default metric prefix. | | `monitoring.folders.*` | Optional. Configuration for periodic unloaded / corrupted folders checks. | | `monitoring.folders.staging` | Required if `monitoring.folders` section is configured. Path where loader could store auxiliary logs for folder monitoring. Loader should be able to write here, storage target should be able to load from here. | | `monitoring.folders.period` | Required if `monitoring.folders` section is configured. How often to check for unloaded / corrupted folders. | | `monitoring.folders.since` | Optional. Specifies from when folder monitoring will start to monitor. Note that this is a duration, eg `7 days`, relative to when the loader is launched. | | `monitoring.folders.until` | Optional. Specifies until when folder monitoring will monitor. Note that this is a duration, eg `7 days`, relative to when the loader is launched. | | `monitoring.folders.transformerOutput` | Required if `monitoring.folders` section is configured. Path to transformed archive. | | `monitoring.folders.failBeforeAlarm` | Required if `monitoring.folders` section is configured. How many times the check can fail before generating an alarm. Within the specified tolerance, failures will log a `WARNING` instead. | | `monitoring.healthCheck.*` | Optional. Periodic DB health check, raising a warning if DB hasn't responded to `SELECT 1`. | | `monitoring.healthCheck.frequency` | Required if `monitoring.healthCheck` section is configured. How often to run a periodic DB health check. | | `monitoring.healthCheck.timeout` | Required if `monitoring.healthCheck` section is configured. How long to wait for a health check response. | --- # Event deduplication with Spark transformer > Deduplicate Snowplow events in-batch and cross-batch using natural and synthetic deduplication strategies with DynamoDB storage. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/deduplication/ **NOTE:** Deduplication is currently only available in the [Spark transformer](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/spark-transformer/). Duplicates are a common problem in event pipelines. At the root of it is the fact that we can't guarantee every event has a unique UUID because: - We have no exactly-once delivery guarantees - User-side software can send events more than once - Robots can send events reusing the same event ID Depending on your use case, you may choose to ignore duplicates, or deal with them once the events are in the data warehouse. If you are loading into **Redshift** (using [shredded data](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#shredded-data)), we strongly recommend to deduplicate the data upstream of loading. Once duplicates are loaded into separate tables, table joins would create a Cartesian product. This is less of a concern with [wide row format](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#wide-row-format) loading into **Snowflake**, where it's easier to deduplicate during the data modeling step in the warehouse. This table shows the available deduplication mechanisms: | Strategy | Batch? | Same event ID? | Same event fingerprint? | Availability | | ----------------------------------- | ----------- | -------------- | ----------------------- | ----------------- | | In-batch natural deduplication | In-batch | Yes | Yes | Spark transformer | | In-batch synthetic deduplication | In-batch | Yes | No | Spark transformer | | Cross-batch natural deduplication | Cross-batch | Yes | Yes | Spark transformer | | Cross-batch synthetic deduplication | Cross-batch | Yes | No | Not supported | ## In-batch natural deduplication "Natural duplicates" are events which share the same event ID (`event_id`) and the same event payload (`event_fingerprint`), meaning that they are semantically identical to each other. For a given batch of events being processed, RDB Transformer keeps only the first out of each group of natural duplicates and discards all others. To enable this functionality, you need to have the [Event Fingerprint Enrichment](/docs/pipeline/enrichments/available-enrichments/event-fingerprint-enrichment/) enabled in Enrich. This will correctly populate the `event_fingerprint` property. No changes are required in the transformer's own `config.hocon` file. If the fingerprint enrichment is not enabled, the transformer will assign a random UUID to each event, effectively marking all events as non-duplicates (in the 'natural' sense). ## In-batch synthetic deduplication "Synthetic duplicates" are events which share the same event ID (`event_id`), but have different event payload (`event_fingerprint`), meaning that they can be either semantically independent events or the same events with slightly different payloads (caused by third-party software). For a given batch of events being processed, RDB Transformer uses the following strategy: - Collect all the events with identical `event_id` which are left after natural deduplication - Generate new random `event_id` for each of them - Create a [`duplicate`](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/duplicate/jsonschema/1-0-0) context with the original `event_id` for each event where the duplicated `event_id` was found. There is no transformer configuration required for this functionality: deduplication is performed automatically. It is optional but highly recommended to use the [Event Fingerprint Enrichment](/docs/pipeline/enrichments/available-enrichments/event-fingerprint-enrichment/) in Enrich in order to correctly populate the `event_fingerprint` property. ## Cross-batch natural deduplication The strategies described above deal with duplicates within the same batch of data being processed. But what if events are duplicated across batches? To apply any of these strategies, we need to store information about previously seen duplicates, so that we can compare events in the current batch against them. We don't need to store the whole event: just the `event_id` and the `event_fingerprint` fields. We need to store these in a database that allows fast random access, so we chose DynamoDB, a fully managed NoSQL database service. ### How to enable cross-batch natural deduplication To enable cross-batch natural deduplication, you must provide a third configuration option in the `RDB Transformer` step of the Dataflow Runner playbook, using the `--duplicate-storage-config` flag. Like the other options, this needs to be provided as a base64-encoded string. This config file contains information about the DynamoDB table to be used, as well as credentials for accessing it. For more details on the config file structure, refer to the [Snowplow Events Manifest](https://github.com/snowplow-incubator/snowplow-events-manifest) library and its documentation. An example step definition can look like this: ```json { "type": "CUSTOM_JAR", "name": "RDB Transformer", "actionOnFailure": "CANCEL_AND_WAIT", "jar": "command-runner.jar", "arguments": [ "spark-submit", "--class", "com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main", "--master", "yarn", "--deploy-mode", "cluster", "s3://snowplow-hosted-assets-eu-central-1/4-storage/transformer-batch/snowplow-transformer-batch-4.1.0.jar", "--iglu-config", "{{base64File "/home/snowplow/configs/snowplow/iglu_resolver.json"}}", "--config", "{{base64File "/home/snowplow/configs/snowplow/config.hocon"}}", "--duplicate-storage-config", "{{base64File "/home/snowplow/configs/snowplow/duplicate-storage-config.json"}}" ] } ``` If this configuration option is not provided, cross-batch natural deduplication will be disabled. In-batch deduplication will still work however. ### Costs and performance implications Cross-batch deduplication uses DynamoDB as transient storage and therefore has associated AWS costs. The default write capacity is 100 units, which should roughly cost USD50 per month. Note that at this rate your shred job can get throttled by insufficient capacity, even with a very powerful EMR cluster. You can tweak throughput to match your needs but that will inflate the bill. ### How RDB Transformer uses DynamoDB for deduplication We store duplicate data in a DynamoDB table with the following attributes: - `eventId`, a String - `fingerprint`, a String - `etlTime`, a Date - `ttl`, a Date. We can query this table to see if the event that is currently being processed has been seen before based on `event_id` and `event_fingerprint`. We store the `etl_timestamp` to prevent issues in case of a failed transformer run. If a run fails and is then rerun, we don't want the rerun to consider rows in the DynamoDB table which were written as part of the failed run. Otherwise all events that were processed by the failed run will be rejected as duplicates. To update the DynamoDB table, RDB Transformer uses so-called 'conditional updates' to perform a check-and-set operation on a per-event basis. The algorithm is as follows: - Attempt to write the `(event_id, event_fingerprint, etl_timestamp)` triple to DynamoDB but succeed only if the `(event_id, event_fingerprint)` pair cannot be found in the table with an earlier `etl_timestamp` than the current one. - If the write fails, we have a natural duplicate. We can safely drop it because we know that we have the 'original' of this event already safely in the data warehouse. - If the write succeeds, we know we have an event which is not a natural duplicate. (It could still be a synthetic duplicate however.) The transformer performs this check after grouping the batch by `event_id` and `event_fingerprint`. This ensures that all check-and-set requests for a specific `(event_id, event_fingerprint)` pair will come from a single mapper, avoiding race conditions. To keep the DynamoDB table size in check, we're using the time-to-live feature which provides automatic cleanup after the specified time. For event manifests this time is the ETL timestamp plus 180 days. This is stored in the table's `ttl` attribute. ### Creating the DynamoDB table and IAM policy If you provide a `duplicate-storage-config` that specifies a DynamoDB table but RDB Transformer can't find it upon launch, it will create it with the default provisioned throughput. That might not be enough for the amount of data you want to process. Creating the table upfront gives you the opportunity to spec it out according to your needs. This step is optional but recommended. 1. The table name can be anything, but it must be unique. 2. The partition key must be called `eventId` and have type String. The sort key must be called `fingerprint` and have type String. You can refer to the the [DynamoDB table definition](#how-rdb-transformer-uses-dynamodb-for-deduplication) above for the full table schema. 3. Uncheck the "Use default settings" checkbox and set "Write capacity units" to 100 (or your desired value). 4. After the table is created, note down its ARN in the "Overview" tab. 5. Create the IAM policy In the AWS console, navigate to IAM and go to "Policies". Select "Create Your Own Policy" and choose a descriptive name. Here's an example Policy Document that you can paste: ```json { "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1486765706000", "Effect": "Allow", "Action": [ "dynamodb:CreateTable", "dynamodb:DeleteTable", "dynamodb:DescribeTable", "dynamodb:PutItem" ], "Resource": [ "arn:aws:dynamodb:us-east-1:{{AWS_ACCOUNT_ID}}:table/snowplow-deduplication" ] } ] } ``` Replace the element in the `Resources` array with the ARN that you noted down in step 4. If you've already created the table, the policy does not require the `dynamodb:CreateTable` and `dynamodb:DeleteTable` permissions. --- # How the RDB Loader transforms enriched data for warehouses > Transform Snowplow enriched events into shredded data for Redshift or wide row format for Snowflake and Databricks using Spark or stream transformers. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/ _For a high-level overview of the RDB Loader architecture, of which the transformer is a part, see [RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/)._ The `transformer` application can have two types of output: - 'shredded' data - wide row format. Both the Spark transformer and the stream transformer can output both types. Which type to pick, depends on the intended storage target. For loading into **Redshift**, use [shredded data](#shredded-data). For loading into **Snowflake**, use [wide row format](#wide-row-format). For loading into **Databricks**, use [wide row format](#wide-row-format). ## Shredded data Shredding is the process of splitting a Snowplow enriched event into several smaller chunks, which can be inserted directly into **Redshift** tables. A Snowplow enriched event is a 131-column tsv line. Each line contains all fields that constitute a specific event, including its id, timestamps, custom and derived contexts, etc. After shredding, the following entities are split out from the original event: 1. **Atomic event.** A tsv line very similar to the enriched event but not containing JSON fields (`contexts`, `derived_contexts` and `unstruct_event`). The results are stored under a path similar to `shredded/good/run=2016-11-26-21-48-42/atomic-events/part-00000` and are available to load with RDB Loader or directly with Redshift `COPY`. 2. **Contexts.** Two JSON fields -- `contexts` and `derived_contexts` -- are extracted from the enriched event. Their original values are validated self-describing JSONs, consisting of a `schema` and a `data` property. After shredding, a third property is added, called `hierarchy`. This `hierarchy` contains fields you can use to later join your context SQL tables with the `atomic.events` table. One atomic event can be associated with multiple context entities. The results are stored under a path like `shredded/good/run=2016-11-26-21-48-42/shredded-types/vendor=com.acme/name=my_context/format=jsonschema/version=1-0-1/part-00000`, where the `part-*` files are valid ndJSON files which can be loaded with RDB Loader or directly with Redshift `COPY`. 3. **Self-describing (unstructured) events.** Same as the contexts described above but there is a strict one-to-one relation with atomic events. The results are stored under a path with the same structure as for contexts and are ready to be loaded with RDB Loader or directly with Redshift `COPY`. These files are stored on S3 partitioned by type. When the data is loaded into Redshift, each type goes to a dedicated table. The following diagram illustrates the process: ![](/assets/images/storage-loader-dataflow-96341b5e426da988ea3bc5c07a4949d7.png) **NOTE:** Shredded data can currently only be loaded into **Redshift**. ## Wide row format Unlike shredding, wide row format preserves data as a single line per event, with one column for each different type of contexts and self-describing events. For contexts (aka entities), the type of the column is `ARRAY` and the name looks like `contexts_com_acme_my_custom_entity_schema_1`. Note the plural that matches the `ARRAY` type: in theory each `contexts_*` column may contain multiple entities. For self-describing events, the type of the column is `OBJECT` and the name looks like `unstruct_event_com_acme_my_custom_event_schema_1`. Each line in the table contains only 1 event. The values in these columns have recursive structure with arbitrary depth, which depends on the schema that describes them. The results are stored under a path like `output=good/part-00000` and can be loaded with RDB Loader. There are two options as output file format with wide row transformation: JSON and Parquet JSON file format can be used for loading into **Snowflake** and Parquet file format can be used for loading into **Databricks** --- # Spark transformer configuration reference > Configure Spark batch transformer with EMR settings, S3 paths, output formats, and deduplication options for warehouse loading. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/spark-transformer/configuration-reference/ The configuration reference in this page is written for Spark Transformer `6.3.0` An example of the minimal required config for the Spark transformer can be found [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/aws/transformer.batch.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/aws/transformer.batch.config.reference.hocon). ## License Since version 6.0.0, RDB Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run RDB Loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```hcl license { accept = true } ``` | Parameter | Description | | --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input` | Required. S3 URI of the enriched archive. It must be populated separately with `run=YYYY-MM-DD-hh-mm-ss` directories. | | `runInterval.*` | Specifies interval to process. | | `runInterval.sinceAge` | Optional. A duration that specifies the maximum age of folders that should get processed. If `sinceAge` and `sinceTimestamp` are both specified, then the latest value of the two determines the earliest folder that will be processed. | | `runInterval.until` | Optional. Process until this timestamp. | | `monitoring.sentry.dsn` | Optional. For tracking runtime exceptions. | | `monitoring.metrics.cloudwatch` (since 5.5.0) | Optional. For sending metrics to Cloudwatch. If not set, metrics are not sent. | | `monitoring.metrics.cloudwatch.namespace` (since 5.5.0) | Namespace that will contain the metrics in Cloudwatch. Example: `snowplow/transformer_batch` | | `monitoring.metrics.cloudwatch.transformDuration` (since 5.5.0) | Name of the metric that contains the number of milliseconds needed to transform a folder. Example: `transform_duration` | | `monitoring.metrics.cloudwatch.dimensions` (since 5.5.0) | Any key-value pairs to be added as dimensions in Cloudwatch metrics. Example:```json {"app_version": "x.y.z", "env": "prod"} ``` | | `deduplication.*` | Configure the way in-batch deduplication is performed | | `deduplication.synthetic.type` | Optional. The default is `BROADCAST`. Can be `NONE` (disable), `BROADCAST` (default) and `JOIN` (different low-level implementations). | | `deduplication.synthetic.cardinality` | Optional. The default is 1. Do not deduplicate pairs with less-or-equal cardinality. | | `deduplication.natural` | Optional. The default is 'true'. Enable or disable natural deduplication. Available since `5.1.0` | | `featureFlags.enableMaxRecordsPerFile` (since 5.4.0) | Optional, default = false. When enabled, `output.maxRecordsPerFile` configuration parameter is going to be used. | | `skipSchemas` (since 5.7.1) | Optional, default = none. Supply a list of Iglu URIs and the transformer's output files will omit any columns using that schema. This feature could be helpful when recovering from edge-case schemas which for some reason cannot be loaded to the table. | | `output.path` | Required. S3 URI of the transformed output. | | `output.compression` | Optional. One of `NONE` or `GZIP`. The default is `GZIP`. | | `output.region` | AWS region of the S3 bucket. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). | | `output.maxRecordsPerFile` (since 5.4.0) | Optional. Default = 10000. Max number of events per parquet partition. | | `output.bad.type` (since 5.4.0) | Optional. Either `kinesis` or `file`, default value `file`. Type of bad output sink. When `file`, failed events are written as files under URI configured in `output.path`. | | `output.bad.streamName` (since 5.4.0) | Required if output type is `kinesis`. Name of the Kinesis stream to write to. | | `output.bad.region` (since 5.4.0) | AWS region of the Kinesis stream. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). | | `output.bad.recordLimit` (since 5.4.0) | Optional, default = 500. Limits the number of events in a single PutRecords Kinesis request. | | `output.bad.byteLimit` (since 5.4.0) | Optional, default = 5242880. Limits the number of bytes in a single PutRecords Kinesis request. | | `output.bad.backoffPolicy.minBackoff` (since 5.4.0) | Optional, default = 100 milliseconds. Minimum backoff before retrying when writing to Kinesis fails with internal errors. | | `output.bad.backoffPolicy.maxBackoff` (since 5.4.0) | Optional, default = 10 seconds. Maximum backoff before retrying when writing to Kinesis fails with internal errors. | | `output.bad.backoffPolicy.maxRetries` (since 5.4.0) | Optional, default = 10. Maximum number of retries for internal Kinesis errors. | | `output.bad.throttledBackoffPolicy.minBackoff` (since 5.4.0) | Optional, default = 100 milliseconds. Minimum backoff before retrying when writing to Kinesis fails in case of throughput exceeded. | | `output.bad.throttledBackoffPolicy.maxBackoff` (since 5.4.0) | Optional, default = 10 seconds. Maximum backoff before retrying when writing to Kinesis fails in case of throughput exceeded. Writing is retried forever. | | `queue.type` | Required. Type of the message queue. Can be either `sqs` or `sns`. | | `queue.queueName` | Required if queue type is `sqs`. Name of the SQS queue. SQS queue needs to be FIFO. | | `queue.topicArn` | Required if queue type is `sns`. ARN of the SNS topic. | | `queue.region` | AWS region of the SQS queue or SNS topic. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). | | `formats.*` | Schema-specific format settings. | | `formats.transformationType` | Required. Type of transformation, either `shred` or `widerow`. See [Shredded data](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#shredded-data) and [Wide row format](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#wide-row-format). | | `formats.fileFormat` | Optional. The default is `JSON`. Output file format produced when transformation is `widerow`. Either `JSON` or `PARQUET`. | | `formats.default` | Optional. The default is `TSV`. Data format produced by default when transformation is `shred`. Either `TSV` or `JSON`. `TSV` is recommended as it enables table autocreation, but requires an Iglu Server to be available with known schemas (including Snowplow schemas). `JSON` does not require an Iglu Server, but requires Redshift JSONPaths to be configured and does not support table autocreation. | | `formats.tsv` | Optional. List of Iglu URIs, but can be set to empty list `[]` which is the default. If `default` is set to `JSON` this list of schemas will still be shredded into `TSV`. | | `formats.json` | Optional. List of Iglu URIs, but can be set to empty list `[]` which is the default. If `default` is set to `TSV` this list of schemas will still be shredded into `JSON`. | | `formats.skip` | Optional. List of Iglu URIs, but can be set to empty list `[]` which is the default. Schemas for which loading can be skipped. | | `validations.*` | Optional. Criteria to validate events against | | `validations.minimumTimestamp` | This is currently the only validation criterion. It checks that all timestamps in the event are older than a specific point in time, eg `2021-11-18T11:00:00.00Z`. | | `featureFlags.*` | Optional. Enable features that are still in beta, or which aim to enable smoother upgrades. | | `featureFlags.legacyMessageFormat` | This currently the only feature flag. Setting this to `true` allows you to use a new version of the transformer with an older version of the loader. | | `featureFlags.truncateAtomicFields` (since 5.4.0) | Optional, default `false`. When enabled, event's atomic fields are truncated (based on the length limits from the atomic JSON schema) before transformation. | --- # Spark transformer for batch processing > Transform enriched Snowplow data in batches using Spark on EMR with support for deduplication and wide row or shredded output. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/spark-transformer/ > **Info:** For a high-level overview of the Transform process, see [Transforming enriched data](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/). For guidance on picking the right `transformer` app, see [How to pick a transformer](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/#how-to-pick-a-transformer). The Spark-based transformer is a batch job designed to be deployed in an EMR cluster and process a bounded data set stored on S3. In order to run it, you will need: - the `snowplow-transformer-batch` jar file (from version 3.0.0 this replaces the `snowplow-rdb-shredder` asset) - configuration files for the jar file - an EMR cluster specification - a way to spin up an EMR cluster and submit a job to it. You can use any suitable tool to periodically submit the transformer job to an EMR cluster. We recommend you use our purpose-built [Dataflow Runner](https://github.com/snowplow/dataflow-runner) tool. All the examples below assume that Dataflow Runner is being used. Refer to the app's [documentation](/docs/api-reference/dataflow-runner/) for more details. ## Downloading the artifact The asset is published as a jar file attached to the [Github release notes](https://github.com/snowplow/snowplow-rdb-loader/releases) for each version. It's also available in several S3 buckets that are accessible to an EMR cluster: ```text s3://snowplow-hosted-assets/4-storage/transformer-batch/snowplow-transformer-batch-6.3.0.jar -- or -- s3://snowplow-hosted-{{ region }}/4-storage/transformer-batch/snowplow-transformer-batch-6.3.0.jar ``` where `region` is one of `us-east-1`, `us-east-2`, `us-west-1`, `us-west-2`, `eu-central-1`, `eu-west-2`, `ca-central-1`, `sa-east-1`, `ap-southeast-1`, `ap-southeast-2`, `ap-northeast-1`, `ap-northeast-2`, or `ap-south-1`. Pick the region of your EMR cluster. ## Configuring the EMR cluster > **Warning:** Starting from version `5.5.0`, batch transformer requires to use Java 11 on EMR ([default is Java 8](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/configuring-java8.html)). See the `bootstrapActionConfigs` section in the configuration below. Here's an example of an EMR cluster config file that can be used with Dataflow Runner: ```json { "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0", "data": { "name": "RDB Transformer", "region": "eu-central-1", "logUri": "s3://com-acme/logs/", "credentials": { "accessKeyId": "env", "secretAccessKey": "env" }, "roles": { "jobflow": "EMR_EC2_DefaultRole", "service": "EMR_DefaultRole" }, "ec2": { "amiVersion": "6.2.0", "keyName": "ec2-key-name", "location": { "vpc": { "subnetId": "subnet-id" } }, "instances": { "master": { "type": "m4.large", "ebsConfiguration": { "ebsOptimized": true, "ebsBlockDeviceConfigs": [] } }, "core": { "type": "r4.xlarge", "count": 1 }, "task": { "type": "m4.large", "count": 0, "bid": "0.015" } } }, "tags": [], "bootstrapActionConfigs": [ { "name": "Use Java 11", "scriptBootstrapAction": { "path": "s3:////emr-bootstrap-java-11.sh", "args": [] } } ], "configurations": [ { "classification": "core-site", "properties": { "Io.file.buffer.size": "65536" }, "configurations": [] }, { "classification": "yarn-site", "properties": { "yarn.nodemanager.resource.memory-mb": "57344", "yarn.scheduler.maximum-allocation-mb": "57344", "yarn.nodemanager.vmem-check-enabled": "false" }, "configurations": [] }, { "classification": "spark", "properties": { "maximizeResourceAllocation": "false" }, "configurations": [] }, { "classification": "spark-defaults", "properties": { "spark.executor.memory": "7G", "spark.driver.memory": "7G", "spark.driver.cores": "3", "spark.yarn.driver.memoryOverhead": "1024", "spark.default.parallelism": "24", "spark.executor.cores": "1", "spark.executor.instances": "6", "spark.yarn.executor.memoryOverhead": "1024", "spark.dynamicAllocation.enabled": "false" }, "configurations": [] } ], "applications": [ "Hadoop", "Spark" ] } } ``` You will need to replace `` and `` (in the `bootstrapActionConfigs` section) according to where you placed `emr-bootstrap-java-11.sh`. The content of this file should be as follows: ```bash #!/bin/bash set -e sudo update-alternatives --set java /usr/lib/jvm/java-11-amazon-corretto.x86_64/bin/java exit 0 ``` This is a typical cluster configuration for processing \~1.5GB of uncompressed enriched data. You need to change the following settings with your own values: - `region`: the AWS region of your EMR cluster - `logUri`: the location of an S3 bucket where EMR logs will be written - `ec2.keyName` (optional): The name of an EC2 key pair that you’ll use to shh into the EMR cluster - `ec2.location.vpc.subnetId`: your VPN subnet ID. ## Configuring `snowplow-transformer-batch` The transformer takes two configuration files: - a `config.hocon` file with application settings - an `iglu_resolver.json` file with the resolver configuration for your [Iglu](https://github.com/snowplow/iglu) schema registry. An example of the minimal required config for the Spark transformer can be found [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/aws/transformer.batch.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/aws/transformer.batch.config.reference.hocon). For details about each setting, see the [configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/spark-transformer/configuration-reference/). See [here](/docs/api-reference/iglu/iglu-resolver/) for details on how to prepare the Iglu resolver file. > **Tip:** All self-describing schemas for events processed by the transformer **must** be hosted on [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) 0.6.0 or above. [Iglu Central](/docs/api-reference/iglu/iglu-repositories/iglu-central/) is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. ## Running the Spark transformer To run the transformer on EMR with Dataflow Runner, you need: - the EMR cluster config (see [Configuring the EMR cluster](#configuring-the-emr-cluster) above) - a Dataflow Runner playbook (a DAG with steps to be submitted to the EMR cluster). ### Preparing the Dataflow Runner playbook A typical playbook can look like: ```json { "schema": "iglu:com.snowplowanalytics.dataflowrunner/PlaybookConfig/avro/1-0-1", "data": { "region": "eu-central-1", "credentials": { "accessKeyId": "env", "secretAccessKey": "env" }, "steps": [ { "type": "CUSTOM_JAR", "name": "S3DistCp enriched data archiving", "actionOnFailure": "CANCEL_AND_WAIT", "jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar", "arguments": [ "--src", "s3://com-acme/enriched/sink/", "--dest", "s3://com-acme/enriched/archive/run={{nowWithFormat "2006-01-02-15-04-05"}}/", "--s3Endpoint", "s3-eu-central-1.amazonaws.com", "--srcPattern", ".*", "--outputCodec", "gz", "--deleteOnSuccess" ] }, { "type": "CUSTOM_JAR", "name": "RDB Transformer", "actionOnFailure": "CANCEL_AND_WAIT", "jar": "command-runner.jar", "arguments": [ "spark-submit", "--class", "com.snowplowanalytics.snowplow.rdbloader.transformer.batch.Main", "--master", "yarn", "--deploy-mode", "cluster", "s3://snowplow-hosted-assets-eu-central-1/4-storage/transformer-batch/snowplow-transformer-batch-6.3.0.jar", "--iglu-config", "{{base64File "/home/snowplow/configs/snowplow/iglu_resolver.json"}}", "--config", "{{base64File "/home/snowplow/configs/snowplow/config.hocon"}}" ] } ], "tags": [] } } ``` This playbook consists of two steps. The first one copies the enriched data to a dedicated directory, from which the transformer will read it. The second step is the transformer Spark job that transforms the data. You need to change the following settings with your own values: - `region`: the AWS region of the EMR cluster - `"--src"`: the bucket in which your enriched data is sunk by Enrich - `"--dest"`: the bucket in which the data for your enriched data lake lives. **NOTE:** The `"--src"` and `"--dest"` settings above apply only to the `s3DistCp` step of the playbook. The source and destination buckets for the transformer step are configured via the `config.hocon` file. ### Submitting the job to EMR with Dataflow Runner Here's an example of putting all of the above together on a transient EMR cluster: ```bash $ ./dataflow-runner run-transient \ --emr-config path/to/cluster.conig \ --emr-playbook path/to/playbook ``` This will spin up the cluster with the above configuration, submit the steps from the playbook, and terminate the cluster once all steps are completed. For more examples on running EMR jobs with Dataflow Runner, as well as details on cluster configurations and playbooks, see the app's [documentation](/docs/api-reference/dataflow-runner/). It also details how you can submit steps to a persistent EMR cluster. --- # Stream transformer for real-time processing > Transform enriched Snowplow data in real-time from Kinesis, Pub/Sub, or Kafka streams without Spark or EMR. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/ > **Info:** For a high-level overview of the Transform process, see [Transforming enriched data](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/). For guidance on picking the right `transformer` app, see [How to pick a transformer](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/#how-to-pick-a-transformer). Unlike the [Spark transformer](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/spark-transformer/), the stream transformer reads data directly from the enriched stream and does not use Spark or EMR. It's a plain JVM application, like Stream Enrich or S3 Loader. Reading directly from stream means that the transformer can bypass the `s3DistCp` staging / archiving step. Another benefit is that it doesn't process a bounded data set and can emit transformed folders based only on its configured frequency. This means the pipeline loading frequency is limited only by the storage target. Stream Transformer has three variants: [Transformer Kinesis](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-kinesis/), [Transformer Pubsub](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-pubsub/) and [Transformer Kafka](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-kafka/). They are different variants for AWS, GCP and Azure. --- # Transformer Kafka configuration reference > Configure Transformer Kafka with stream settings, Azure Blob Storage output, windowing, and monitoring for Azure real-time transformation. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-kafka/configuration-reference/ The configuration reference in this page is written for Transformer Kafka `6.3.0` An example of the minimal required config for the Transformer Kafka can be found [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/azure/transformer.kafka.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/azure/transformer.kafka.config.reference.hocon). ## License Since version 6.0.0, RDB Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run RDB Loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```hcl license { accept = true } ``` | Parameter | Description | | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.topicName` | Name of the Kafka topic to read from | | `input.bootstrapServers` | A list of host:port pairs to use for establishing the initial connection to the Kafka cluster | | `input.consumerConf` | Optional. Kafka consumer configuration. See for all properties | | `output.path` | Azure Blob Storage path to transformer output | | `output.compression` | Optional. One of `NONE` or `GZIP`. The default is `GZIP`. | | `output.bad.type` | Optional. Either `kafka` or `file`, default value `file`. Type of bad output sink. When `file`, failed events are written as files under URI configured in `output.path`. | | `output.bad.topicName` | Required if output type is `kafka`. Name of the Kafka topic that will receive the bad data. | | `output.bad.bootstrapServers` | Required if output type is `kafka`. A list of host:port pairs to use for establishing the initial connection to the Kafka cluster | | `output.producerConf` | Optional. Kafka producer configuration. See for all properties | | `queue.topicName` | Name of the Kafka topic used to communicate with Loader | | `queue.bootstrapServers` | A list of host:port pairs to use for establishing the initial connection to the Kafka cluster | | `queue.producerConf` | Optional. Kafka producer configuration. See for all properties | | `monitoring.metrics.*` | Send metrics to a StatsD server or stdout. | | `monitoring.metrics.statsd.*` | Optional. For sending metrics (good and bad event counts) to a StatsD server. | | `monitoring.metrics.statsd.hostname` | Required if `monitoring.metrics.statsd` section is configured. The host name of the StatsD server. | | `monitoring.metrics.statsd.port` | Required if `monitoring.metrics.statsd` section is configured. Port of the StatsD server. | | `monitoring.metrics.statsd.tags` | Optional. Tags which are used to annotate the StatsD metric with any contextual information. | | `monitoring.metrics.statsd.prefix` | Optional. Configures the prefix of StatsD metric names. The default is `snoplow.transformer`. | | `monitoring.metrics.stdout.*` | Optional. For sending metrics to stdout. | | `monitoring.metrics.stdout.prefix` | Optional. Overrides the default metric prefix. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `monitoring.sentry.dsn` | Optional. For tracking runtime exceptions. | | `featureFlags.enableMaxRecordsPerFile` (since 5.4.0) | Optional, default = true. When enabled, `output.maxRecordsPerFile` configuration parameter is going to be used. | | `validations.*` | Optional. Criteria to validate events against | | `validations.minimumTimestamp` | This is currently the only validation criterion. It checks that all timestamps in the event are older than a specific point in time, eg `2021-11-18T11:00:00.00Z`. | | `featureFlags.*` | Optional. Enable features that are still in beta, or which aim to enable smoother upgrades. | | `featureFlags.legacyMessageFormat` | This currently the only feature flag. Setting this to `true` allows you to use a new version of the transformer with an older version of the loader. | | `featureFlags.truncateAtomicFields` (since 5.4.0) | Optional, default `false`. When enabled, event's atomic fields are truncated (based on the length limits from the atomic JSON schema) before transformation. | --- # Transformer Kafka for Azure streams > Stream transformer for Azure that reads enriched events from Kafka and writes transformed data to Azure Blob Storage in real-time. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-kafka/ > **Info:** This application is available since v5.7.0. ## Downloading the artifact The asset is published as a jar file attached to the [Github release notes](https://github.com/snowplow/snowplow-rdb-loader/releases) for each version. It's also available as a Docker image on Docker Hub under `snowplow/transformer-kafka:6.3.0` ## Configuring `snowplow-transformer-kafka` The transformer takes two configuration files: - a `config.hocon` file with application settings - an `iglu_resolver.json` file with the resolver configuration for your [Iglu](https://github.com/snowplow/iglu) schema registry. An example of the minimal required config for the Transformer Kafka can be found [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/azure/transformer.kafka.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/azure/transformer.kafka.config.reference.hocon). For details about each setting, see the [configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-kafka/configuration-reference/). See [here](/docs/api-reference/iglu/iglu-resolver/) for details on how to prepare the Iglu resolver file. > **Tip:** All self-describing schemas for events processed by the transformer **must** be hosted on [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) 0.6.0 or above. [Iglu Central](/docs/api-reference/iglu/iglu-repositories/iglu-central/) is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. ## Running the Transformer Kafka The two config files need to be passed in as base64-encoded strings: ```bash $ docker run snowplow/transformer-kafka:6.3.0 \ --iglu-config $RESOLVER_BASE64 \ --config $CONFIG_BASE64 ``` **Telemetry notice** By default, Snowplow collects telemetry data for Transformer Kafka (since version 5.7.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Transformer Kinesis configuration reference > Configure Transformer Kinesis with stream settings, S3 output, windowing, and monitoring for AWS real-time transformation. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-kinesis/configuration-reference/ The configuration reference in this page is written for Transformer Kinesis `6.3.0` An example of the minimal required config for the Transformer Kinesis can be found [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/aws/transformer.kinesis.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/aws/transformer.kinesis.config.reference.hocon). ## License Since version 6.0.0, RDB Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run RDB Loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```hcl license { accept = true } ``` | Parameter | Description | | ------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.appName` | Optional. KCL app name. The default is `snowplow-rdb-transformer` | | `input.streamName` | Required for `kinesis`. Enriched Kinesis stream name. | | `input.region` | AWS region of the Kinesis stream. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). | | `input.position` | Optional. Kinesis position: `LATEST` or `TRIM_HORIZON`. The default is `LATEST`. | | `windowing` | Optional. Frequency to emit shredding complete message. The default is `10 minutes`. Maximum allowed value is `60 minutes` | | `output.path` | Required. S3 URI of the transformed output. | | `output.compression` | Optional. One of `NONE` or `GZIP`. The default is `GZIP`. | | `output.region` | AWS region of the S3 bucket. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). | | `output.maxRecordsPerFile` (since 5.4.0) | Optional. Default = 10000. Max number of events per parquet partition. | | `output.bad.type` (since 5.4.0) | Optional. Either `kinesis` or `file`, default value `file`. Type of bad output sink. When `file`, failed events are written as files under URI configured in `output.path`. | | `output.bad.streamName` (since 5.4.0) | Required if output type is `kinesis`. Name of the Kinesis stream to write to. | | `output.bad.region` (since 5.4.0) | AWS region of the Kinesis stream. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). | | `output.bad.recordLimit` (since 5.4.0) | Optional, default = 500. Limits the number of events in a single PutRecords Kinesis request. | | `output.bad.byteLimit` (since 5.4.0) | Optional, default = 5242880. Limits the number of bytes in a single PutRecords Kinesis request. | | `output.bad.backoffPolicy.minBackoff` (since 5.4.0) | Optional, default = 100 milliseconds. Minimum backoff before retrying when writing to Kinesis fails with internal errors. | | `output.bad.backoffPolicy.maxBackoff` (since 5.4.0) | Optional, default = 10 seconds. Maximum backoff before retrying when writing to Kinesis fails with internal errors. | | `output.bad.backoffPolicy.maxRetries` (since 5.4.0) | Optional, default = 10. Maximum number of retries for internal Kinesis errors. | | `output.bad.throttledBackoffPolicy.minBackoff` (since 5.4.0) | Optional, default = 100 milliseconds. Minimum backoff before retrying when writing to Kinesis fails in case of throughput exceeded. | | `output.bad.throttledBackoffPolicy.maxBackoff` (since 5.4.0) | Optional, default = 10 seconds. Maximum backoff before retrying when writing to Kinesis fails in case of throughput exceeded. Writing is retried forever. | | `queue.type` | Required. Type of the message queue. Can be either `sqs` or `sns`. | | `queue.queueName` | Required if queue type is `sqs`. Name of the SQS queue. SQS queue needs to be FIFO. | | `queue.topicArn` | Required if queue type is `sns`. ARN of the SNS topic. | | `queue.region` | AWS region of the SQS queue or SNS topic. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). | | `formats.*` | Schema-specific format settings. | | `formats.transformationType` | Required. Type of transformation, either `shred` or `widerow`. See [Shredded data](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#shredded-data) and [Wide row format](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/#wide-row-format). | | `formats.fileFormat` | Optional. The default is `JSON`. Output file format produced when transformation is `widerow`. Either `JSON` or `PARQUET`. | | `formats.default` | Optional. The default is `TSV`. Data format produced by default when transformation is `shred`. Either `TSV` or `JSON`. `TSV` is recommended as it enables table autocreation, but requires an Iglu Server to be available with known schemas (including Snowplow schemas). `JSON` does not require an Iglu Server, but requires Redshift JSONPaths to be configured and does not support table autocreation. | | `formats.tsv` | Optional. List of Iglu URIs, but can be set to empty list `[]` which is the default. If `default` is set to `JSON` this list of schemas will still be shredded into `TSV`. | | `formats.json` | Optional. List of Iglu URIs, but can be set to empty list `[]` which is the default. If `default` is set to `TSV` this list of schemas will still be shredded into `JSON`. | | `formats.skip` | Optional. List of Iglu URIs, but can be set to empty list `[]` which is the default. Schemas for which loading can be skipped. | | `monitoring.metrics.*` | Send metrics to a StatsD server or stdout. | | `monitoring.metrics.statsd.*` | Optional. For sending metrics (good and bad event counts) to a StatsD server. | | `monitoring.metrics.statsd.hostname` | Required if `monitoring.metrics.statsd` section is configured. The host name of the StatsD server. | | `monitoring.metrics.statsd.port` | Required if `monitoring.metrics.statsd` section is configured. Port of the StatsD server. | | `monitoring.metrics.statsd.tags` | Optional. Tags which are used to annotate the StatsD metric with any contextual information. | | `monitoring.metrics.statsd.prefix` | Optional. Configures the prefix of StatsD metric names. The default is `snoplow.transformer`. | | `monitoring.metrics.stdout.*` | Optional. For sending metrics to stdout. | | `monitoring.metrics.stdout.prefix` | Optional. Overrides the default metric prefix. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `monitoring.sentry.dsn` | Optional. For tracking runtime exceptions. | | `featureFlags.enableMaxRecordsPerFile` (since 5.4.0) | Optional, default = true. When enabled, `output.maxRecordsPerFile` configuration parameter is going to be used. | | `validations.*` | Optional. Criteria to validate events against | | `validations.minimumTimestamp` | This is currently the only validation criterion. It checks that all timestamps in the event are older than a specific point in time, eg `2021-11-18T11:00:00.00Z`. | | `featureFlags.*` | Optional. Enable features that are still in beta, or which aim to enable smoother upgrades. | | `featureFlags.legacyMessageFormat` | This currently the only feature flag. Setting this to `true` allows you to use a new version of the transformer with an older version of the loader. | | `featureFlags.truncateAtomicFields` (since 5.4.0) | Optional, default `false`. When enabled, event's atomic fields are truncated (based on the length limits from the atomic JSON schema) before transformation. | --- # Transformer Kinesis for AWS streams > Stream transformer for AWS that reads enriched events from Kinesis and writes transformed data to S3 in real-time. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-kinesis/ The asset is published as a jar file attached to the [Github release notes](https://github.com/snowplow/snowplow-rdb-loader/releases) for each version. It's also available as a Docker image on Docker Hub under `snowplow/transformer-kinesis:6.3.0` ## Configuring `snowplow-transformer-kinesis` The transformer takes two configuration files: - a `config.hocon` file with application settings - an `iglu_resolver.json` file with the resolver configuration for your [Iglu](https://github.com/snowplow/iglu) schema registry. An example of the minimal required config for the Transformer Kinesis can be found [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/aws/transformer.kinesis.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/aws/transformer.kinesis.config.reference.hocon). For details about each setting, see the [configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-kinesis/configuration-reference/). See [here](/docs/api-reference/iglu/iglu-resolver/) for details on how to prepare the Iglu resolver file. > **Tip:** All self-describing schemas for events processed by the transformer **must** be hosted on [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) 0.6.0 or above. [Iglu Central](/docs/api-reference/iglu/iglu-repositories/iglu-central/) is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. ## Running the Transformer Kinesis The two config files need to be passed in as base64-encoded strings: ```bash $ docker run snowplow/transformer-kinesis:6.3.0 \ --iglu-config $RESOLVER_BASE64 \ --config $CONFIG_BASE64 ``` **Telemetry notice** By default, Snowplow collects telemetry data for Transformer Kinesis (since version 4.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Transformer Pub/Sub configuration reference > Configure Transformer Pub/Sub with stream settings, GCS output, windowing, and monitoring for GCP real-time transformation. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-pubsub/configuration-reference/ The configuration reference in this page is written for Transformer Pubsub `6.3.0` An example of the minimal required config for the Transformer Pubsub can be found [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/gcp/transformer.pubsub.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/gcp/transformer.pubsub.config.reference.hocon). ## License Since version 6.0.0, RDB Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run RDB Loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```hcl license { accept = true } ``` | Parameter | Description | | ---------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `input.subscription` | Name of the Pubsub subscription with the enriched events | | `input.parallelPullCount` | Optional. Default value 1. Number of threads used internally by permutive library to handle incoming messages. These threads do very little "work" apart from writing the message to a concurrent Queue. | | `input.bufferSize` | Optional. Default value 500. The max size of the buffer queue used between fs2-pubsub and java-pubsub libraries. | | `input.maxAckExtensionPeriod` | Optional. Default value '1 hour'. The maximum period a message ack deadline will be extended. | | `output.path` | Required. GCS URI of the transformed output. It needs to have `gs://` URI scheme | | `output.compression` | Optional. One of `NONE` or `GZIP`. The default is `GZIP`. | | `output.bufferSize` | Optional. Default value 4096. During the window period, processed items are stored in a buffer. This value determines the size of this buffer. When its limit is reached, buffer content is flushed to blob storage. | | `output.maxRecordsPerFile` (since 5.4.0) | Optional. Default = 10000. Max number of events per parquet partition. | | `output.bad.type` (since 5.4.0) | Optional. Either `pubsub` or `file`, default value `file`. Type of bad output sink. When `file`, failed events are written as files under URI configured in `output.path`. | | `output.bad.topic` (since 5.4.0) | Required if output type is `pubsub`. Name of the PubSub topic that will receive the bad data. | | `output.bad.batchSize` (since 5.4.0) | Optional. Default = 1000, max = 1000. Maximum number of messages sent to PubSub within a batch. When the buffer reaches this number of messages they are sent. | | `output.bad.requestByteThreshold` (since 5.4.0) | Optional. Default = 8000000, max = 10MB. Maximum number of bytes sent to PubSub within a batch. When the buffer reaches this size messages are sent. | | `output.bad.delayThreshold` (since 5.4.0) | Optional. Default = 200 milliseconds. Delay threshold to use for PubSub batching. After this amount of time has elapsed, before `batchSize` and `requestByteThreshold` have been reached, messages from the buffer will be sent. | | `queue.topic` | Name of the Pubsub topic used to communicate with Loader | | `formats.fileFormat` | Optional. The default option at the moment is `JSON`. Either `JSON` or `PARQUET`. | | `windowing` | Optional. Frequency to emit shredding complete message. The default is `5 minutes`. Note that there is a problem with acking messages when window period is greater than 10 minute in transformer-pubsub. Therefore, it is advisable to make window period equal or less than 10 minutes. | | `monitoring.metrics.*` | Send metrics to a StatsD server or stdout. | | `monitoring.metrics.statsd.*` | Optional. For sending metrics (good and bad event counts) to a StatsD server. | | `monitoring.metrics.statsd.hostname` | Required if `monitoring.metrics.statsd` section is configured. The host name of the StatsD server. | | `monitoring.metrics.statsd.port` | Required if `monitoring.metrics.statsd` section is configured. Port of the StatsD server. | | `monitoring.metrics.statsd.tags` | Optional. Tags which are used to annotate the StatsD metric with any contextual information. | | `monitoring.metrics.statsd.prefix` | Optional. Configures the prefix of StatsD metric names. The default is `snoplow.transformer`. | | `monitoring.metrics.stdout.*` | Optional. For sending metrics to stdout. | | `monitoring.metrics.stdout.prefix` | Optional. Overrides the default metric prefix. | | `telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `monitoring.sentry.dsn` | Optional. For tracking runtime exceptions. | | `featureFlags.enableMaxRecordsPerFile` (since 5.4.0) | Optional, default = true. When enabled, `output.maxRecordsPerFile` configuration parameter is going to be used. | | `validations.*` | Optional. Criteria to validate events against | | `validations.minimumTimestamp` | This is currently the only validation criterion. It checks that all timestamps in the event are older than a specific point in time, eg `2021-11-18T11:00:00.00Z`. | | `featureFlags.*` | Optional. Enable features that are still in beta, or which aim to enable smoother upgrades. | | `featureFlags.legacyMessageFormat` | This currently the only feature flag. Setting this to `true` allows you to use a new version of the transformer with an older version of the loader. | | `featureFlags.truncateAtomicFields` (since 5.4.0) | Optional, default `false`. When enabled, event's atomic fields are truncated (based on the length limits from the atomic JSON schema) before transformation. | --- # Transformer Pub/Sub for GCP streams > Stream transformer for GCP that reads enriched events from Pub/Sub and writes transformed data to GCS in real-time. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-pubsub/ > **Info:** This application is available since v5.0.0. ## Downloading the artifact The asset is published as a jar file attached to the [Github release notes](https://github.com/snowplow/snowplow-rdb-loader/releases) for each version. It's also available as a Docker image on Docker Hub under `snowplow/transformer-pubsub:6.3.0` ## Configuring `snowplow-transformer-pubsub` The transformer takes two configuration files: - a `config.hocon` file with application settings - an `iglu_resolver.json` file with the resolver configuration for your [Iglu](https://github.com/snowplow/iglu) schema registry. An example of the minimal required config for the Transformer Pubsub can be found [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/gcp/transformer.pubsub.config.minimal.hocon) and a more detailed one [here](https://github.com/snowplow/snowplow-rdb-loader/tree/master/config/transformer/gcp/transformer.pubsub.config.reference.hocon). For details about each setting, see the [configuration reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/stream-transformer/transformer-pubsub/configuration-reference/). See [here](/docs/api-reference/iglu/iglu-resolver/) for details on how to prepare the Iglu resolver file. > **Tip:** All self-describing schemas for events processed by the transformer **must** be hosted on [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) 0.6.0 or above. [Iglu Central](/docs/api-reference/iglu/iglu-repositories/iglu-central/) is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. ## Running the Transformer Pubsub The two config files need to be passed in as base64-encoded strings: ```bash $ docker run snowplow/transformer-pubsub:6.3.0 \ --iglu-config $RESOLVER_BASE64 \ --config $CONFIG_BASE64 ``` **Telemetry notice** By default, Snowplow collects telemetry data for Transformer PubSub (since version 5.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # RDB Loader 1.0.0 upgrade guide > Upgrade RDB Loader to 1.0.0 with Stream Shredder, unified output partitioning, and new configuration schema for batch processing. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/1-0-0-upgrade-guide/ This is a release adding a new experimental Stream Shredder asset and improving independent Loader architecture, introduced in R35. [Release notes](https://github.com/snowplow/snowplow-rdb-loader/releases/tag/1.0.0). This is the first release in 1.x branch and no breaking changes will be introduced until 2.x release. If you're upgrading from R34 or earlier it's strictly recommended to follow [R35 Upgrade Guide](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/r35-upgrade-guide/) first. ## Assets RDB Loader, RDB Shredder and Stream RDB Shredder all have 1.o.0 version, despite last one being an experimental asset. RDB Shredder is published on S3: - `s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-1.0.1.jar` RDB Loader and RDB Stream Shredder distributed as Docker images, published on DockerHub: - `snowplow/snowplow-rdb-loader:1.0.1` - `snowplow/snowplow-rdb-stream-shredder:1.0.1` ## Configuration changes All configuration changes are scoped to `shredder` property. Since we added another type of a shredder, one has to specify the type explicitly: ```json "shredder": { "type" : "batch", # Was not necessary in R35 "input": "s3://snowplow-enriched-archive/path/", # Remains the same "output": ... # Explained below } ``` \_ The major API change in 1.0.0 is the new partitioning scheme unifying `good` and `bad` output. Whereas previously it was necessary to specify `output` and `outputBad`, now there's only `path` in `shredder.output` object: \_ _`"output": { # Was a string in R35 "path": "s3://snowplow-shredded-archive/", # Path to shredded output "compression": "GZIP" # Output compression, GZIP or NONE`_ `}` In Dataflow Runner playbook you have to specify new Main classpath for RDB Shredder: ```text "--class", "com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main" ``` ## Manifest The new manifest table has the same name as previous one - `manifest`. In order to avoid a clash, RDB Loader 1.0.0 checks existence of the table every time it starts and if table exists checks if it's new or old one. If table exists and it's legacy - it will be renamed into `manifest_legacy` and can be removed manually later, new table will be created automatically. If table doesn't exist - it will be created. No user actions necessary here. ## Stream Shredder You only need to choose one Shredder: batch or stream. **For production environment we recommend using Batch Shredder.** Stream Shredder is configured within same configuration file as RDB Loader and RDB Batch Shredder, but using following properties: ```json "shredder": { # A batch loader would fail, if stream type encountered "type" : "stream", # Input stream information "input": { # file is another option, but used for debugging only "type": "kinesis", # KCL app name - a DynamoDB table will be created with the same name "appName": "acme-rdb-shredder", # Kinesis Stream name, must exist "streamName": "enriched-events", # Kinesis region "region": "us-east-1", # Kinesis position: LATEST or TRIM_HORIZON "position": "LATEST" }, # A frequency to emit loading finished message - 5,10,15,20,30,60 etc minutes, this is what controls how often your data will be loaded "windowing": "10 minutes", # Path to shredded archive, same as for batch "output": { # Path to shredded output "path": "s3://bucket/good/", # Shredder output compression, GZIP or NONE "compression": "GZIP" } } ``` ## Directory structure There is a major change in shredder output directory structure, on top of what has changed in R35. If you're using a 3rd-party query engine such as Amazon Athena to query shredded data, the new partitioning can break the schema. And thus it's recommended to create a new root for shredded data. Structure of the typical shredded folder now looks like following: ```text run=2021-03-29-15-40-30/     shredding_complete.json     output=good/             vendor=com.snowplowanalytics.snowplow/                 name=atomic/                     format=tsv/                         model=1/             vendor=com.acme/                 name=link_click/                     format=json/                         model=1/     output=bad/             vendor=com.snowplowanalytics.snowplow/                 name=loader_parsing_error/                     format=json/                         model=1/ ``` --- # RDB Loader 1.2.0 upgrade guide > Upgrade RDB Loader to 1.2.0 with webhook monitoring, folder monitoring, and enhanced observability for Redshift loading. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/1-2-0-upgrade-guide/ RDB Loader 1.2.0 brings many improvements for monitoring subsystem. If you're not interested in new features - you can just bump versions. If you need webhook monitoring - read below instructions on how to enable it. [Release notes](https://github.com/snowplow/snowplow-rdb-loader/releases/tag/1.2.0). ## Assets RDB Shredder is published on S3: - `s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-1.2.0.jar` RDB Loader and RDB Stream Shredder distributed as Docker images, published on DockerHub: - `snowplow/snowplow-rdb-loader:1.2.0` - `snowplow/snowplow-rdb-stream-shredder:1.2.0` ## Enabling Webhook monitoring All configuration changes are scoped to `monitoring` property. ```json "monitoring": { "webhook": { "endpoint": "https://webhooks.acme.com/rdb-loader", "tags": { # Custom set of tags "host": $HOST, # Environment variables are supported "pipeline": "production" } } } ``` It's up to you to setup a preferable webhook backend. It can be a Snowplow Iglu webhook or custom monitoring system. ## Enabling folder monitoring All configuration changes are scoped to `monitoring` property. ```json "monitoring": { "folders": { "staging": "s3://snowplow-acme-com/logging/", # This path will contain temporary files # Redshift role must have an access for this folder "period": "2 hours" # How often the check should be performed } } ``` It's up to you to setup a preferable webhook backend. It can be a Snowplow Iglu webhook or custom monitoring system. --- # RDB Loader 2.0.0 upgrade guide > Upgrade RDB Loader to 2.0.0 with separate configs for Loader and Shredder, SNS messaging, and split configuration architecture. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/2-0-0-upgrade-guide/ RDB Loader 2.0.0 brings ability to send shredding complete messages from Shredder to SNS topic and splits configs of RDB Loader and RDB Shredder. From now on, Loader and Shredder will not use same config. [Release notes](https://github.com/snowplow/snowplow-rdb-loader/releases/tag/2.0.0). ## Assets RDB Shredder is published on S3: - `s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-2.0.0.jar` RDB Loader and RDB Stream Shredder distributed as Docker images, published on DockerHub: - `snowplow/snowplow-rdb-loader:`2.0.0 - `snowplow/snowplow-rdb-stream-shredder:`2.0.0 ## Sending shredding complete messages from Shredder to SNS Shredding complete message can be sent to SNS topic with following queue configuration: ```json "queue": { "type": "sns", "topicArn": "arn:aws:sns:eu-central-1:123456789:test-sns-topic", "region": "eu-central-1" } ``` ## New separate configs RDB Loader and RDB Shredder were using the same config HOCON until version 2.0.0. Starting from 2.0.0, they will use separate configs. Reference docs for new configs can be found on the following pages: [RDB Loader configuration](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/previous-versions/snowplow-rdb-loader/configuration-reference/) [RDB Shredder configuration](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/previous-versions/snowplow-rdb-loader/rdb-shredder-configuration-reference/) --- # RDB Loader 6.0.0 upgrade guide > Upgrade RDB Loader to 6.0.0 with recovery tables, schema merging, and improved schema evolution for Redshift and other warehouses. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/6-0-0-upgrade-guide/ ## License Since version 6.0.0, RDB Loader is released under the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run RDB Loader, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```hcl license { accept = true } ``` ## \[Redshift-only] New migration mechanism & recovery tables ### What is schema evolution? One of Snowplow’s key features is the ability to [define custom schemas and validate events against them](/docs/fundamentals/schemas/). Over time, users often evolve the schemas, e.g. by adding new fields or changing existing fields. To accommodate these changes, RDB loader automatically adjusts the database tables in the warehouse accordingly. There are two main types of schema changes: **Breaking**: The schema version has to be changed in a model way (`1-2-3` → `2-0-0`). In Redshift, each model schema version has its own table (`..._1`, `..._2`, etc, for example: `com_snowplowanalytics_snowplow_ad_click_1`). **Non-breaking**: The schema version can be changed in a addition way (`1-2-3` → `1-3-0` or `1-2-3` → `1-2-4`). Data is stored in the same database table. ### How it used to work In the past, the transformer would fetch all schemas for an entity (which conforms to the `vendor/name/model-*-*` criterion of the entity) from Iglu Server, merge them, extract properties from JSON, stringify using tab char as delimeter and put null char in case the property is missing. Then the loader would adjust the database table and load the file. This logic relied on two assumptions: 1. **Old events compatible with new schemas.** Events with older schema versions, e.g. `1-0-0` and `1-0-1`, had to be valid against the newer ones, e.g. `1-0-2`. Those that were not valid would result in failed events. 2. **Old columns compatible with new schemas.** The corresponding Redshift table had to be migrated correctly from one version to another. Changes, such as altering the type of a field from `integer` to `string`, would fail. Loading would break with SQL errors and the whole batch would be stuck and hard to recover. These assumptions were not always clear to the users, making the transformer and loader error-prone. ### What happens now? Transformer and loader are now more robust, and the data is easy to recover if the schema was not evolved correctly. First, we support schema evolution that’s not strictly backwards compatible (although we still recommend against it since it can confuse downstream consumers of the data). This is done by _merging_ multiple schemas so that both old and new events can coexist. For example, suppose we have these two schemas: ```json { // 1-0-0 "properties": { "a": {"type": "integer"} } } ``` ```json { // 1-0-1 "properties": { "b": {"type": "integer"} } } ``` These would be merged into the following: ```json { // merged "properties": { "a": {"type": "integer"}, "b": {"type": "integer"} } } ``` Second, the loader does not fail when it can’t modify the database table to store both old and new events. (As a reminder, an example would be changing the type of a field from `integer` to `string`.) Instead, it creates a table for the new data as an exception. The users can then run SQL statements to resolve this situation as they see fit. For instance, consider these two schemas: ```json { // 1-0-0 "properties": { "a": {"type": "integer"} } } ``` ```json { // 1-0-1 "properties": { "a": {"type": "string"} } } ``` Because `1-0-1` events cannot be loaded into the same table with `1-0-0`, the data would be put in a separate table, e.g. `com_snowplowanalytics_ad_click_1_0_1_recovered_9999999`, where: - `1_0_1` is the version of the offending schema; - `9999999` is a hash code unique to the schema (i.e. it will change if the schema is overwritten with a different one). If you create a new schema `1-0-2` that reverts the offending changes and is again compatible with `1-0-0`, the data for events with that schema will be written to the original table as expected. ### Identifying schemas that need patching After upgrading RDB Loader, you might find out that events or entities with some of the old schemas land in recovery tables. To avoid this, the offending older Iglu schemas must be patched to align with the latest. You can use the latest version of `igluctl` to do this. To illustrate, let's say we have schema versions `1-0-0` and `1-0-1` that differ in one field only: - version `1-0-0`: `{ "type": "integer", "maximum": 100 }` - translates to `SMALLINT`. All is good. - version `1-0-1`: `{ "type": "integer", "maximum": 1000000 }` - breaking change as it translates to `INT` and the loader can't migrate the column. Since version `1-0-1` had a breaking change, loading must have broken with an older version of RDB Loader. To fix that, the user must have updated the warehouse manually (changing the column type to `INT`). After this manual intervention, events with the `1-0-1` schema would have been loaded successfully with older versions of RDB Loader. However, after an upgrade to RDB Loader 6.0.0, events with the `1-0-1` schema will start to land in the recovery table. Let's see how we can use `igluctl` to solve this problem. 1. Run the `igluctl static generate` command. If a recovery table is to be created, it will show up as a warning message. Example: ```bash mkdir igluctl static pull igluctl static generate # ... # iglu:com.test/test/jsonschema/1-0-1 has a breaking change Incompatible types in column example_field old RedshiftSmallInt new RedshiftInteger # ... ``` 2. Run the `igluctl table-check` command to check if table structure is in line with the latest schema version that doesn't contain any breaking changes. With the example schemas above, this would be `1-0-0` because `1-0-1` contains a breaking change. Example: ```bash igluctl table-check \ --server \ --apikey \ --host \ --port \ --username \ --password \ --dbname \ --dbschema # ... # * Column doesn't match, expected: 'example_field SMALLINT', actual: 'example_field INT' # ... ``` We got the `Column doesn't match` output with the above example because the table column had been migrated manually from `SMALLINT` to `INT`. Since the latest schema version that doesn't contain any breaking change is `1-0-0`, `table-check` command expects to see `SMALLINT` in the table therefore it gives the `Column doesn't match` output. In order to solve this problem, we should patch `1-0-0` with `{ "type": "integer", "maximum": 1000000 }`. In this case, there won't be any breaking change between versions `1-0-0` and `1-0-1`. RDB Loader 6.0.0+ will then successfully load events into the intended table as expected. After identifying all the offending schemas, you should patch them to reflect the changes in the warehouse. Schema casting rules could be found [here](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/?warehouse=redshift#types). #### `$.featureFlags.disableRecovery` configuration If you have older schemas with breaking changes and don’t want the loader to apply the new logic to them, you can use `$.featureFlags.disableRecovery` configuration. For the provided schema criterions only, RDB Loader will neither migrate the corresponding shredded table nor create recovery tables for breaking schema versions. Loader will attempt to load to the corresponding shredded table without migrating. You can set it as follows: ```json { ... "featureFlags": { "disableRecovery": [ "iglu:com.example/myschema1/jsonschema/1-*-*", "iglu:com.example/myschema2/jsonschema/1-*-*" ] } } ``` --- # Upgrade guides for the Snowplow RDB Loader > Step-by-step upgrade guides for RDB Loader with breaking changes, schema migrations, and compatibility notes for Redshift, Snowflake, and Databricks. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/ This section contains information to help you upgrade to newer versions of Snowplow RDB Loader. --- # RDB Loader R32 upgrade guide > Upgrade RDB Loader to R32 with EMR 5.19.0, automigrations, and updated shredder and loader versions for Redshift. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/r32-upgrade-guide/ We recommend to go through the upgrade routine in several independent steps. After every step you should have a working pipeline. If something is not working or Shredder produces unexpected failed events - please let us know. ## Updating assets 1. Upgrade EmrEtlRunner to R116 or higher 2. In your `redshift_config.json` 1. Update SchemaVer to `4-0-0` 2. Add `"blacklistTabular": null` field into `data` payload 3. Update your `config.yml` file ```yaml aws: emr: ami_version: 5.19.0 # was 5.9.0; Required by RDB Shredder storage: versions: rdb_loader: 0.17.0 # was 0.16.0 rdb_shredder: 0.16.0 # was 0.15.0 ``` At this point, your pipeline should be running with new assets as it was before, without automigrations and generated TSV. We recommend to test this setup and monitor shredded failed events for one or two runs before proceeding to enabling automigrations. ## Iglu Server Automigrations work only with Iglu Server 0.6.0. This component provides information about how columns should be ordered across different ADDITIONs and REVISIOINs. If you still don't have Iglu Server 0.6.0 we recommend to [set it up](https://github.com/snowplow/iglu/wiki/Setting-up-an-Iglu-Server). You still can use static registries as a backup, they will continue to work for validatioin purposes, but won't work for TSV shredding. Snowplow does not provide a public Iglu Server hosting Iglu Central schemas, so we recommend you to mirror Iglu Central with your own Iglu Server: ```bash $ git clone https://github.com/snowplow/iglu-central.git $ igluctl static push iglu-central/schemas $YOUR_SERVER_URL $YOU_API_KEY $ igluctl static push com.acme-iglu-registry/schemas $YOUR_SERVER_URL $YOU_API_KEY ``` After setting up the Iglu Server, don't forget to add it to your resolver config. ## Tabular blacklisting New RDB Shredder is still able to produce legacy JSON files. But automigrations can be applied only to tables where data is prepared as TSV. If you setup a new pipeline, you can generate only TSVs abandoning legacy DDLs (except `atomic.events` and `atomic.manifest`) and JSONPaths altogether. However, if you already have tables deployed which DDLs were generated manually or via old igluctl you will likely need to apply so called _tabular blacklisting_ to these tables. It means that Shredder will keep producing data with these schemas as JSONs and Loader won't be able to apply migrations to it. This is necessary because manually generated DDLs are not guaranteed to have predictable column order and the only way to map JSON values to respective columns is JSONPaths files. [igluctl version 0.7.0](https://github.com/snowplow/igluctl/releases/tag/0.7.0) provides `rdbms table-check` subcommand that get schemas from Iglu Server, figures out what DDL the Loader would generate, then connects to Redshift and compares those DDLs with actual state of the table. Every table that have an incompatible order will have to be "blacklisted" in Redshift storage target config (`redshift_config.json`). Here's an example of a black list containing several schemas from Iglu Central: ```json "blacklistTabular": [ "iglu:org.w3/PerformanceTiming/jsonschema/1-*-*", "iglu:com.snowplowanalytics.snowplow/timing/jsonschema/1-*-*", "iglu:com.snowplowanalytics.snowplow/screen_view/jsonschema/1-*-*", "iglu:com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-*-*", "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-*-*", "iglu:com.snowplowanalytics.snowplow/geolocation_context/jsonschema/1-*-*", "iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-*-*", "iglu:com.snowplowanalytics.snowplow/application_error/jsonschema/1-*-*", "iglu:com.mandrill/recipient_unsubscribed/jsonschema/1-*-*", "iglu:com.mandrill/message_soft_bounced/jsonschema/1-*-*", "iglu:com.mandrill/message_sent/jsonschema/1-*-*", "iglu:com.mandrill/message_opened/jsonschema/1-*-*", "iglu:com.mandrill/message_marked_as_spam/jsonschema/1-*-*", "iglu:com.mandrill/message_delayed/jsonschema/1-*-*", "iglu:com.mandrill/message_clicked/jsonschema/1-*-*", "iglu:com.mandrill/message_bounced/jsonschema/1-*-*" ] ``` As you can see, schemas specified in schema criterion format (with wildcards everywhere except MODEL). ## Conclusion At this point if you track an event with new schema and this schema resides on an Iglu Server - RDB Shredder will produce TSV data for it and RDB Loader will automatically create a new table. Same with ADDITION and REVISION migrations - they're handled by RDB Loader automatically. --- # RDB Loader R33/R34 upgrade guide > Upgrade RDB Loader to R33/R34 with Spark 3, EMR 6.1.0, and bugfixes for long text properties in Redshift. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/r33-upgrade-guide/ R34 is a release with bugfixes and performance improvements. R33 was almost identical reelase with major bug preventing some long text properties from loading. ## Updating assets 1. Upgrade EmrEtlRunner to 1.0.4 or higher 2. Your `redshift_config.json` should have 4-0-0 version 3. Update your `config.yml` file ```yaml aws: emr: ami_version: 6.1.0 # was 5.19.0; Required by Spark 3 storage: versions: rdb_loader: 0.18.2 # was 0.17.0 rdb_shredder: 0.18.2 # was 0.16.0 ``` --- # RDB Loader R35 upgrade guide > Upgrade RDB Loader to R35 with independent Loader architecture, SQS messaging, and removal of EmrEtlRunner dependency. > Source: https://docs.snowplow.io/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/r35-upgrade-guide/ R35 is a release with major changes in pipeline architecture: - No dependency on EmrEtlRunner (neither Shredder nor Loader can be launched using EmrEtlRunner, marking deprecation of EmrEtlRunner) - Loader is not an EMR step anymore - Major changes in directory structure - New dependency on SQS [Release notes](https://github.com/snowplow/snowplow-rdb-loader/releases/tag/r35). This is the last release in 0.x branch and breaking changes still might be introduced in 1.0.0 release. ## Assets Both RDB Shredder and Loader have 0.19.0 version. Both published on S3: - `s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-0.19.0.jar` - `s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-loader/snowplow-rdb-loader-0.19.0.jar` For RDB Loader however it is recommended to use the docker image, published on DockerHub: `snowplow/snowplow-rdb-loader:0.19.0` ## New architecture Previous workflow was orchestrated by EmrEtlRunner, along with multiple S3DistCp steps, recovery scenarios and dedicated RDB Loader step. RDB Loader was finding out what data needs to be loaded by scanning S3. In the new architecture there are two EMR steps: 1. S3DistCp, copying enriched data sunk by [S3 Loader](/docs/api-reference/loaders-storage-targets/s3-loader/), from S3 sink bucket (same as "enriched stream bucket") into _enriched data lake_ (aka shredder input, similar as previously known "enriched archive") 2. RDB Shredder, picking up all _unprocessed_ folders in enriched data lake, shredding data there and writing it into _shredded data lake_ (previously known as "shredded archive") RDB Loader is a stand-alone long-running app, launched either on EC2 box or Fargate cluster. The loading gets triggered by an SQS message, sent by Shredder after it finished processing a new batch. RDB Shredder decides that folder is unprocessed by: 1. Comparing folder names in _enriched data lake_ and in _shredded data lake_. Every folder that is **in** enriched, but **not in** shredded will be considered unprocessed 2. ...except folders that don't have `shredding_complete.json` file in their root. This file is written at the end of the job and indicates that job has completed successfully. Absence of this file means that shred job has been aborted. If you're upgrading from R34 or earlier it is strictly recommended to pick new paths for enriched and shredded archives in order to avoid double-loading OR make sure that there's a strict 1:1 correspondence between content of enriched and shredded archive. We recommend to use either [Dataflow Runner](/docs/api-reference/dataflow-runner/) or boto3 script to launch scheduled S3DistCp and Shredder jobs. Here's an example of a Dataflow Runner playbook: ```json { "schema": "iglu:com.snowplowanalytics.dataflowrunner/PlaybookConfig/avro/1-0-1", "data": { "region": "eu-central-1", "credentials": { "accessKeyId": "env", "secretAccessKey": "env" }, "steps": [ { "type": "CUSTOM_JAR", "name": "S3DistCp enriched data archiving", "actionOnFailure": "CANCEL_AND_WAIT", "jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar", "arguments": [ "--src", "s3://com-acme/enriched/sink/", "--dest", "s3://com-acme/enriched/archive/run={{nowWithFormat "2006-01-02-15-04-05"}}/", "--s3Endpoint", "s3-eu-central-1.amazonaws.com", "--srcPattern", ".*", "--outputCodec", "gz", "--deleteOnSuccess" ] }, { "type": "CUSTOM_JAR", "name": "RDB Shredder", "actionOnFailure": "CANCEL_AND_WAIT", "jar": "command-runner.jar", "arguments": [ "spark-submit", "--class", "com.snowplowanalytics.snowplow.shredder.Main", "--master", "yarn", "--deploy-mode", "cluster", "s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-0.19.0.jar", "--iglu-config", "{{base64File "/home/snowplow/configs/snowplow/iglu_resolver.json"}}", "--config", "{{base64File "/home/snowplow/configs/snowplow/config.hocon"}}" ] } ], "tags": [ ] } } ``` We recommend to launch RDB Loader as long-running docker image. ## New configuration file Common configuration file, previously known as `config.yml` and target JSON configuration file, previously known as `redshift.json` have been replaced by a [single HOCON file](https://github.com/snowplow/snowplow-rdb-loader/blob/master/config/config.hocon.sample). Here's an example: ```json { # Human-readable identificator, can be random "name": "Acme Redshift", # Machine-readable unique identificator, must be UUID "id": "123e4567-e89b-12d3-a456-426655440000", # Data Lake (S3) region "region": "us-east-1", # SQS topic name used by Shredder and Loader to communicate "messageQueue": "messages.fifo", # Shredder-specific configs "shredder": { # Path to enriched archive (must be populated separately with run=YYYY-MM-DD-hh-mm-ss directories) "input": "s3://com-acme/ernched/archive/", # Path to shredded output "output": "s3://com-acme/shredded/good/", # Path to data failed being processed "outputBad": "s3://com-acje/shredded/bad/", # Shredder output compression, GZIP or NONE "compression": "GZIP" }, # Optional. S3 path that holds JSONPaths "jsonpaths": "s3://bucket/jsonpaths/", # Schema-specific format settings (recommended to leave all three groups empty and use TSV as default) # To make it compatible with R34, leave default = TSV and populate json array with things from blacklistTabular "formats": { # Format used by default (TSV or JSON) "default": "TSV", # Schemas to be shredded as JSONs, corresponding JSONPath files must be present. Automigrations will be disabled "json": [ ], # Schemas to be shredded as TSVs, presence of the schema on Iglu Server is necessary. Automigartions enabled "tsv": [ ], # Schemas that won't be loaded "skip": [ ] }, # Warehouse connection details, identical to storage target config "storage": { # Database, redshift is the only acceptable option "type": "redshift", # Redshift hostname "host": "redshift.amazon.com", # Database name "database": "snowplow", # Database port "port": 5439, # AWS Role ARN allowing Redshift to load data from S3 "roleArn": "arn:aws:iam::123456789012:role/RedshiftLoadRole", # DB schema name "schema": "atomic", # DB user with permissions to load data "username": "storage-loader", # DB password "password": "secret", # Custom JDBC configuration "jdbc": {"ssl": true}, # MAXERROR, amount of acceptable loading errors "maxError": 10, "compRows": 100000 }, # Additional steps. analyze, vacuum and transit_load are valid values "steps": ["analyze"], # Observability and logging opitons "monitoring": { # Snowplow tracking (optional) "snowplow": null, # Sentry (optional) "sentry": null } } ``` If you need to use cross-batch deduplication - the file format remains the same for DynamoDB config. CLI arguments also have changed. Both applications now accept only `--iglu-config` with base64-encoded string representing [Iglu Resolver JSON](/docs/api-reference/iglu/iglu-resolver/) and `--config` with base64-encoded above HOCON. Loader also accepts `--dry-run` flag. ## SQS SQS serves as message bus between Shredder and Loader. Loader expects to find there self-describing messages with instructions on what to load. The queue must be FIFO. ## Directory structure There are several major changes in shredder output directory structure: 1. Elements of the paths have changed from Iglu-compatible to shredder-specific, e.g. `format` now can be either `json` or `tsv` (and not `jsonschema` as before) and instead of `version` (that could have been either `1-0-0` or just `1`) it is always just `model` 2. There's no dedicated `atomic-events` folder. It is replaced with unified `vendor=com.snowplowanalytics.snowplow/name=atomic/format=tsv/model=1` 3. There are no `shredded-types` or `shredded-tsv` either, all types are in the root of the folder. Structure of the typical shredded folder now looks like following: ```text run=2021-01-27-18-35-00/ vendor=com.snowplowanalytics.snowplow/ name=atomic/ format=tsv/ model=1/ vendor=com.snowplowanalytics.snowplow/ name=ad_click/ format=json/ model=1/ vendor=nj.basjes/ name=yauaa_context/ format=tsv/ model=1/ shredding_complete.json _SUCCESS ``` ## Caution We consider this version a public beta. Although it has been carefully tested in sandbox environments showing significantly decreased AWS costs on associated infrastructure, it still haven't been used in production. One known issue in this version is absence of protection against double-loading. If Loader receives the same SQS message multiple time (i.e. sent manually) - the same batch will be loaded multiple times. We also reserve right to make other breaking API changes in next versions. --- # Snowbridge upgrade guide > Upgrade Snowbridge to version 3.X.X with configuration changes for transformations, new features, and breaking changes. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/X-X-upgrade-guide/ ## Version 4.0.0 Breaking Changes ### HTTP target: ordered response rule evaluation **Breaking change**: response rules are now evaluated in the order they are defined in the configuration, rather than being organized in separate `invalid` and `setup` blocks. **Migration required**: you must update your HTTP target configuration to specify a `type` attribute for each rule: **Before:** ```hcl response_rules { invalid { http_codes = [400] body = "Invalid value for 'purchase' field" } setup { http_codes = [401, 403] } } ``` **After (4.0.0):** ```hcl response_rules { rule { type = "invalid" http_codes = [400] body = "Invalid value for 'purchase' field" } rule { type = "setup" http_codes = [401, 403] } } ``` **Important**: rules are now evaluated in the order they appear in your configuration. The first matching rule determines the error type. ## Version 3.0.0 Breaking Changes The below breaking changes were made in version 3.0.0. All other functionality is backwards compatible. ### Lua support removed Support for Lua transformations has been removed. If you are running a Lua transformation, you can port the logic to [Javascript](/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/javascript-configuration/) or [JQ](/docs/api-reference/snowbridge/configuration/transformations/builtin/jq/). ### HTTP target: non-JSON data no longer supported We never intended to support non-JSON data, but prior to version 3.0.0, the request body was simply populated with whatever bytes were found in the message data, regardless of whether it is valid JSON. From version 3.0.0 onwards, only valid JSON will work, otherwise the message will be considered invalid and sent to the failure target. ### HTTP target: request batching Many HTTP APIs allow sending several events in a single request by putting them into a JSON array. Since version 3.0.0, if the Snowbridge source provides data in batches, the HTTP target will batch events in this way. As a consequence, even when the source provides events in a single event batch, it will now be placed into an array of one element. For example, prior to version 3.0.0, a request body might look like this: ```text {"foo": "bar"} ``` But it will now look like this: ```text [{"foo": "bar"}] ``` As of version 3.0.0, the SQS source provides events in batches of up to ten, and the Kinesis, Kafka, and Pubsub and Stdin sources provide events in single-event batches. This behavior will likely change in a future version. You can preserve the previous behavior and ensure that requests are always single-event non-array objects, even with a batching source. To do so, set `request_max_messages` to 1, and provide this template (as long as your data is valid JSON): ```go loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/master/assets/docs/configuration/targets/http-template-unwrap-example.file) --- # Snowbridge batching model > Understand how Snowbridge handles message batching across sources and targets for optimal throughput and performance. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/concepts/batching-model/ Messages are processed in batches according to how the source provides data. The Kinesis and Pubsub sources provide data in message-by-message, data is handled in batches of 1 message. The SQS source is batched according to how the SQS queue returns messages. Transformations always handle individual messages at a time. If the source provides the data in batch, the Kinesis, SQS, EventHub and Kafka targets can chunk the data into smaller batches before sending the requests. The EventHub target can further batch the data according to partitionKey, if set - which is a feature of the EventHub client specifically. The Pubsub and HTTP targets handle messages individually at present. --- # Snowbridge failure model > Learn how Snowbridge handles target failures, oversized data, invalid data, transformation failures, and fatal errors. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/concepts/failure-model/ ## Failure targets When Snowbridge hits an unrecoverable error — for example [oversized](#oversized-data) or [invalid](#invalid-data) data — it will emit a [failed event](/docs/fundamentals/failed-events/) to the configured failure target. A failure target is the same as a target, the only difference is that the configured destination will receive failed events. You can find more detail on setting up a failure target, in the [configuration section](/docs/api-reference/snowbridge/configuration/targets/). There are several different failures that Snowbridge may hit. ## Target failure This is where a request to the destination technology fails or is rejected - for example a HTTP 400 response is received. Retry behavior for target failures is determined by the retry configuration. You can find details of this in the [configuration section](/docs/api-reference/snowbridge/configuration/retries/). As of Snowbridge 2.4.2, the Kinesis target does not treat kinesis write throughput exceptions as this type of failure. Rather it has an in-built backoff and retry, which will persist until each event in the batch is either successful, or fails for a different reason. Before version 3.0.0, Snowbridge treated every kind of target failure the same - it would retry 5 times. If all 5 attempts failed, it would proceed without acking the failed Messages. As long as the source's acking model allows for it, these would be re-processed through Snowbridge again. Each target failure attempt will be reported as a 'MsgFailed' for monitoring purposes. ## Oversized data Targets have limits to the size of a single message. Where the destination technology has a hard limit, targets are hardcoded to that limit. Otherwise, this is a configurable option in the target configuration. When a message's data is above this limit, Snowbridge will produce a [size violation failed event](/docs/api-reference/failed-events/#size-violation), and emit it to the failure target. Writes of oversized messages to the failure target will be recorded with 'OversizedMsg' statistics in monitoring. Any failure to write to the failure target will cause a [fatal failure](#fatal-failure). ## Invalid data In the unlikely event that Snowbridge encounters data which is invalid for the target destination (for example empty data is invalid for pubsub), it will create a [generic error failed event](/docs/api-reference/failed-events/#generic-error), emit it to the failure target, and ack the original message. As of version 3.0.0, the HTTP target may produce 'invalid' type failures. This occurs when: the a POST request body cannot be formed; the templating feature's attempts to template data result in an error; or the response conforms to a response rules configuration which specifies that the failure is to be treated as invalid. You can find more details in the [configuration section](/docs/api-reference/snowbridge/configuration/targets/http/). Transformation failures are also treated as invalid, as described below. Writes of invalid messages to the failure target will be recorded with 'InvalidMsg' statistics in monitoring. Any failure to write to the failure target will cause a [fatal failure](#fatal-failure). ## Transformation failure Where a transformation hits an exception, Snowbridge will consider it invalid, assuming that the configured transformation cannot process the data. It will create a [generic error failed event](/docs/api-reference/failed-events/#generic-error), emit it to the failure target, and ack the original message. As long as the built-in transformations are configured correctly, this should be unlikely. For scripting transformations, Snowbridge assumes that an exception means the data cannot be processed - make sure to construct and test your scripts accordingly. Writes of invalid messages to the failure target will be recorded with 'InvalidMsg' statistics in monitoring. Any failure to write to the failure target will cause a [fatal failure](#fatal-failure). ## Fatal failure Snowbridge is built to be averse to crashes, but there are two scenarios where it would be expected to crash. Firstly, if it hits an error in retrieving data from the source stream, it will log an error and crash. If this occurs it is normally a case of misconfiguration of the source. If that is not the case, it will be safe to redeploy the app — it will attempt to begin from the first unacked message. This may cause duplicates. Secondly, as described above, where there are failures it will attempt to reprocess the data if it can, and where failures aren't recoverable it will attempt to handle that via a failure target. Normally, even reaching this point is rare. In the very unlikely event that Snowbridge reaches this point and cannot write to a failure target, the app will crash. Should this happen, and the app is re-deployed, it will begin processing data from the last acked message. Note that the likely impact of this is duplicated sends to the target, but not data loss. Of course, if you experience crashes or other issues that are not explained by the above, please log an issue detailing the behavior. --- # Snowbridge core concepts > Understand Snowbridge architecture including sources, transformations, targets, batching, failure handling, and scaling strategies. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/concepts/ Snowbridge’s architecture is fairly simple: it receives data from one streaming technology (via [Sources](/docs/api-reference/snowbridge/concepts/sources/)), optionally runs filtering and transformation logic on them (message-by-message, via [Transformations](/docs/api-reference/snowbridge/concepts/transformations/)), and sends the data to another streaming technology or destination (via [Targets](/docs/api-reference/snowbridge/concepts/targets/)). If it is not possible to process or retry the data [as per the failure model](/docs/api-reference/snowbridge/concepts/failure-model/), it outputs a message to another destination (via [Failure Targets](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets)). Where the source supports acking, Snowbridge only acks messages once the data is successfully sent to either the target or the failure target (in the case of unrecoverable failure). In the case of a recoverable failure — for example when the target is temporarily unavailable — Snowbridge will not ack the messages and will retry them once the source technology’s ack deadline has passed. ![architecture](/assets/images/snowbridge-architecture-12bf56119494c1ca221081e869411e65.jpg) ## Operational details Data is processed on an at-least-once basis, and there is no guarantee of order of messages. The application is designed to minimise duplicates as much as possible, but there isn’t a guarantee of avoiding them — for example if there’s a failure, it is possible for messages to be delivered without a successful response, and duplicates can occur. --- # Scaling Snowbridge horizontally > Scale Snowbridge horizontally across multiple instances with concurrency controls and target provisioning for optimal throughput. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/concepts/scaling/ Snowbridge is built to suit a **horizontal scaling** model, and you can safely deploy multiple instances of Snowbridge to consume the same input out-of-the-box. No addditional configuration or setup is required for the app to smoothly run across multiple instances/environments, compared to a single instance/environment. > **Note:** If you are using the Kinesis source, you will need to manually create a few DynamoDB tables as described in [the Kinesis source configuration section](/docs/api-reference/snowbridge/configuration/sources/kinesis/). Snowbridge uses these tables to coordinate multiple instances consuming from the same stream. How to configure scaling behavior will depend on the infrastructure you’re using, and the use case you have implemented. For example, if you choose to scale based on CPU usage, note that this metric will be affected by the size and shape of the data, by the transformations and filters used, and for script transformations, by the content of the scripts. > **Tip:** Occasionally, new releases of Snowbridge will improve its efficiency. In the past, this has had a large impact on metrics typically used for scaling. To ensure that scaling behaves as expected, we recommend monitoring your metrics after you upgrade Snowbridge or change the transformation configuration. In addition to configuring the number of Snowbridge instances, you can manage concurrency via the `concurrent_writes` setting (explained in the [next section](#concurrency)). This setting provides a degree of control over throughput and resource usage. Snowbridge should consume as much data as possible, as fast as possible — a backlog of data or spike in traffic should cause the app’s CPU usage to increase significantly. If spikes/backlogs do not induce this behavior, and there are no target retries or failures (see below), then you can increase the `concurrent_writes`. ## Concurrency Snowbridge is a Go application, which makes use of [goroutines](https://golangdocs.com/goroutines-in-golang). You can think of goroutines as lightweight threads. The source’s `concurrent_writes` setting controls how many goroutines may be processing data at once, in a given instance of the app (others may exist separately, under the hood for non-data processing purposes). You can determine the total maximum concurrency for the entire application by multiplying `concurrent_writes` by the number of horizontal instances of the app. For example, if Snowbridge is deployed via kubernetes pods, and there are 4 active pods with `concurrent_writes` set to 150, then at any given time there will be up to 600 concurrent goroutines that can process and send data. ## Target scaling Snowbridge will attempt to send data to the target as fast as resources will allow, so we recommend that you set up the target to scale sufficiently with the expected volume and throughput. Note that in case of failure, Snowbridge will retry sending the messages with an exponential backoff, staring with a 1s delay between retries, and doubling that delay for 5 retries. If a backlog of data builds up due to some failure — for example target downtime — then we advise to overprovision the target until the backlog is processed. That’s only required until latency falls back to normal rates. --- # Introduction to Snowbridge sources > Configure Snowbridge sources to retrieve data from streams and forward them for processing with acking and concurrency controls. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/concepts/sources/ Sources deal with retrieving data from the input stream, and forwarding it for processing — once messages are either filtered or successfully sent, they are then acked (if the source technology supports acking). Otherwise, messages will be retrieved again by the source. Sources also have a setting which controls concurrency for the instance — `concurrent_writes`. You can find more detail on setting up a source, in the [configuration section](/docs/api-reference/snowbridge/configuration/sources/). --- # Introduction to Snowbridge targets > Configure Snowbridge targets to validate, batch, and send data to destination streams with size and validity checks. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/concepts/targets/ Targets check for [validity and size restrictions](/docs/api-reference/snowbridge/concepts/failure-model/), [batch data](/docs/api-reference/snowbridge/concepts/batching-model/) where appropriate and send data to the destination stream. You can find more detail on setting up a target, in the [configuration section](/docs/api-reference/snowbridge/configuration/targets/). --- # Introduction to Snowbridge transformations and filters > Transform and filter messages with built-in transformations for Snowplow data or custom JavaScript scripts for flexible data processing. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/concepts/transformations/ Transformations allow you to modify messages' data on the fly before they're sent to the destination. There are a set of built-in transformations, specifically for use with Snowplow data (for example transforming Snowplow enriched events to JSON), You can also configure a script to transform your data however you require - for example if you need to rename fields or change a field's format. It's also possible to exclude messages (ie. not send them to the target) based on a condition, by configuring a special type of transformation called a filter. (Technically then, filters are transformations, but we sometimes refer to them as a separate concept for clarity). Again there are built-in filters to apply to Snowplow data, or you can provide a script to filter the data. Transformations operate on a per-message basis, are chained together in the order configured, and the same type of transformation may be configured more than once. We recommend to place filters first for performance reasons. When transformations are chained together, the output of the first is the input of the second, however transformations may not depend on each other in any other way. As an example of how transformations relate to each other - if you have a built-in filter with condition A, and a filter with condition B, you can arrange them one after another, so that the data must satisfy A AND B. But you can't arrange them to satisfy A OR B - because the outcome of each must be determined on their own. The latter use case, and further nuanced use cases can, however, be achieved using scripting transformation (in the case of the latter example, a single script can perform both checks with an OR condition). ## Custom scripting transformations Custom scripting transformations allow you to provide a script to transform the data, set the destination's partition key, or filter the data according to your own logic. For scripting, you can use Javascript. Snowbridge uses a runtime engine to run the script against the data. Scipts interface with the rest of the app via the `EngineProtocol` interface, which provides a means to pass data into the scripting layer, and return data from the scripting layer back to the app. You can find more detail on setting up custom scripts [in the configuration section](/docs/api-reference/snowbridge/configuration/transformations/). --- # Snowbridge configuration overview > Configure Snowbridge using HCL format to define sources, transformations, targets, and monitoring for stream replication. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/ Snowbridge is configured using [HCL](https://github.com/hashicorp/hcl). To configure Snowbridge, create your configuration in a file with `.hcl` extension, and set the `SNOWBRIDGE_CONFIG_FILE` environment variable to the path to your file. By default, the Snowbridge docker image uses `/tmp/config.hcl` as the config path - when using the docker images you can either mount your config file to `/tmp/config.hcl`, or mount it to a different path, and set the `SNOWBRIDGE_CONFIG_FILE` environment variable in your docker container to that path. Inside the configuration, you can reference environment variables using the `env` object. For example, to refer to an environment variable named `MY_ENV_VAR` in your configuration, you can use `env.MY_ENV_VAR`. We recommend employing environment variables for any sensitive value, such as a password, as opposed to adding the value to the configuration verbatim. For most options, Snowbridge uses blocks for configuration. The `use` keyword specifies what you'd like to configure - for example a kinesis source is configured using `source { use "kinesis" {...}}`. For all configuration blocks except for transformations, you must provide only one block (or none, to use the defaults). For transformations, you may provide 0 or more `transform` configuration blocks. They will be applied to the data, one after another, in the order they appear in the configuration. The exception to this is when a filter is applied and the filter condition is met - in this case the message will be acked and subsequent transformations will not be applied (neither will the data be sent to the destination). Some application-level options are not contained in a block, instead they're top-level options in the configuration. For example, to set the log level of the application, we just set the top-level variable `log_level`. If you do not provide a configuration, or provide an empty one, the application will use the defaults: - `stdin` source; - no transformations; - `stdout` target; - `stdout` failure target. There'll be no external statistics reporting or sentry error reporting. ## License Since version 2.4.0, Snowbridge is released under the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run Snowbridge, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `license.accept` option, like this: ```hcl license { accept = true } ``` ## Example configuration The below example is a complete configuration, which specifies a kinesis source, a builtin Snowplow filter (which may only be used if the input is Snowplow enriched data), a custom javascript transformation, and a pubsub target, as well as the statsD stats receiver, and sentry for error reporting. In layman's terms, this configuration will read data from a kinesis stream, filter out any data whose `event_name` field is not `page_view`, run a custom Javascript script upon the data to change the `app_id` to `"1"`, and send the transformed page view data to pubsub. It will also send statistics about what it's doing to a statsD endpoint, and will send information about errors to a sentry endpoint. ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/overview-full-example.hcl) --- # Snowbridge monitoring configuration > Monitor Snowbridge with configurable logging, pprof profiling, StatsD metrics, and Sentry error reporting for observability. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/monitoring/ Snowbridge comes with configurable logging, [pprof](https://github.com/google/pprof) profiling, [statsD](https://www.datadoghq.com/statsd-monitoring) statistics and [Sentry](https://sentry.io/welcome/) integrations to ensure that you know what's going on. ## Logging Use the `log_level` parameter to specify the log level. ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/monitoring/log-level-example.hcl) ## Sentry Configuration ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/monitoring/sentry-example.hcl) ## StatsD stats receiver configuration ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/monitoring/statsd-example.hcl) ## End-to-end latency configuration Snowplow Enriched data only: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/metrics/e2e-latency-example.hcl) ## Metric definitions Snowbridge sends the following metrics to statsd: | Metric | Definitions | | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | `target_success` | Events successfully sent to the target. | | `target_failed` | Events which failed to reach the target, and will be handled by the retry config. Retries which fail are also counted. | | `message_filtered` | Events filtered out via transformation. | | `failure_target_success` | Events we could not send to the target, which are not retryable, successfully sent to the failure target. | | `failure_target_failed` | Events we could not send to the target, which are not retryable, which we failed to send to the failure target. In this scenario, Snowbridge will crash. | | `min_processing_latency` | Min time between entering Snowbridge and write to target. | | `max_processing_latency` | Max time between entering Snowbridge and write to target. | | `min_message_latency` | Min time between entering the source stream and write to target. | | `max_message_latency` | Max time between entering the source stream and write to target. | | `min_transform_latency` | Min time between start and completion of transformation. | | `max_transform_latency` | Max time between start and completion of transformation. | | `min_filter_latency` | Min time between entering Snowbridge and being filtered out. | | `max_filter_latency` | Max time between entering Snowbridge and being filtered out. | | `min_request_latency` | Min time between starting request to target and finishing request to target. | | `max_request_latency` | Max time between starting request to target and finishing request to target. | | `sum_request_latency` | Sum of request times, use with `target_request_count` to calculate average request latency. | | `target_request_count` | Number of requests sent to target, use with `sum_request_latency` to calculate average request latency. | | `min_e2e_latency` | Min time between Snowplow collector tstamp and finishing request to target. Enabled via configuration - Snowplow enriched data only. | | `max_e2e_latency` | Max time between Snowplow collector tstamp and finishing request to target. Enabled via configuration - Snowplow enriched data only. | --- # Snowbridge retry behavior configuration (beta) > Configure retry behavior for transient and setup failures with exponential backoff and configurable max attempts. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/retries/ > **Note:** This feature was added in version 3.0.0 > > This feature is in beta status because we may make breaking changes in future versions. This feature allows you to configure the retry behavior when the target encounters a failure in sending the data. There are three types of failure you can define: A **transient failure** is a failure which we expect to succeed again on retry. For example, some temporary network error. Typically, you would configure a short backoff for this type of failure. When we encounter a transient failure, we keep processing the rest of the data as normal, under the expectation that everything is operating as normal. The failed data is retried after a backoff. A **setup failure** is one we don't expect to be immediately resolved, for example an incorrect address, or an invalid API key. Typically, you would configure a long backoff for this type of failure, under the assumption that the issue needs to be fixed with either a configuration change or a change to the target itself (e.g. permissions need to be granted). Setup errors will be retried up to the configured `max_attempts` before the app crashes. A **throttle failure** (added in version 4.0.0) is a special type of failure that indicates the target is rate limiting requests. This is handled separately from transient errors to allow different retry behavior - typically with longer delays to respect rate limits. As of version 3.0.0, only the http target can be configured to return setup and throttle errors, via response rules - see [the http target configuration section](/docs/api-reference/snowbridge/configuration/targets/http/). For all other targets, all errors returned will be considered transient, and behavior can be configured using the `transient` block of the retry configuration. Retries will be attempted with an exponential backoff. In other words, on each subsequent failure, the backoff time will double. You can configure transient failures to be retried indefinitely by setting `max_attempts` to 0. As of version 4.0.0, you can configure transient failures to be sent to the failure target after reaching `max_attempts` by setting `invalid_after_max` to `true`. ## Configuration options ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/retry-example.hcl) --- # Configure HTTP as a Snowbridge source > Configure HTTP source for Snowplow Snowbridge to receive data over HTTP endpoints for experimental stream ingestion. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/sources/http/ > **Note:** This source was added in version 3.6.2 > > This source is experimental and not recommended for production use. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/http-minimal-example.hcl) Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/http-full-example.hcl) --- # Snowbridge source configuration > Configure Snowbridge sources including stdin, Kafka, Kinesis, PubSub, SQS, and HTTP for stream ingestion. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/sources/ **Stdin source** is the default. We also support Kafka, Kinesis, PubSub, SQS and experimental HTTP sources. Stdin source simply treats stdin as the input. It has one optional configuration to set the concurrency. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/stdin-minimal-example.hcl) Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/stdin-full-example.hcl) --- # Configure Kafka as a Snowbridge source > Configure Kafka source for Snowplow Snowbridge to read data from Kafka topics with authentication and consumer group settings. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/sources/kafka/ Authentication is done by providing valid credentials in the configuration. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/kafka-minimal-example.hcl) Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/kafka-full-example.hcl) --- # Configure Kinesis as a Snowbridge source > Configure Kinesis source for Snowplow Snowbridge to read from AWS Kinesis streams with DynamoDB checkpointing and authentication. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/sources/kinesis/ > **Note:** To use this source, you need the AWS-specific version of Snowbridge that can only be run on AWS. See [the page on Snowbridge distributions](/docs/api-reference/snowbridge/getting-started/) for more information. ## Authentication Authentication is done via the [AWS authentication environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html). Optionally, you can use the `role_arn` option to specify an ARN to use on the stream. ## Setup The AWS kinesis source requires the additional setup of a set of dynamoDB tables for checkpointing purposes. To set up a kinesis source, you will need to: 1. Configure the above required variables in the HCL file. 2. Create three DynamoDB tables which will be used for checkpointing the progress of the replicator on the stream (_Note_: details below) Under the hood we are using a fork of the [Kinsumer](https://github.com/snowplow-devops/kinsumer) library which has defined this DynamoDB table structure - these tables need to be created by hand before the application can launch. | TableName | DistKey | | ---------------------------------------- | -------------- | | `${SOURCE_KINESIS_APP_NAME}_clients` | ID (String) | | `${SOURCE_KINESIS_APP_NAME}_checkpoints` | Shard (String) | | `${SOURCE_KINESIS_APP_NAME}_metadata` | Key (String) | Assuming your AWS credentials have sufficient permission for Kinesis and DynamoDB, your consumer should now be able to run when you launch the executable. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/kinesis-minimal-example.hcl) Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/kinesis-full-example.hcl) --- # Configure Pub/Sub as a Snowbridge source > Configure PubSub source for Snowplow Snowbridge to read from GCP Pub/Sub subscriptions with service account authentication. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/sources/pubsub/ Authentication is done using a [GCP Service Account](https://cloud.google.com/docs/authentication/application-default-credentials#attached-sa). Create a service account credentials file, and provide the path to it via the `GOOGLE_APPLICATION_CREDENTIALS` environment variable. Snowbridge connects to PubSub using [Google's Go Pubsub sdk](https://cloud.google.com/go/pubsub), which establishes a grpc connection with TLS encryption. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/pubsub-minimal-example.hcl) Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/pubsub-full-example.hcl) --- # Configure SQS as a Snowbridge source > Configure SQS source for Snowplow Snowbridge to read from AWS SQS queues with IAM authentication and message handling. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/sources/sqs/ Read data from an SQS queue. ## Authentication Authentication is done via the [AWS authentication environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html). Optionally, you can use the `role_arn` option to specify an ARN to use on the stream. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/sqs-minimal-example.hcl) Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/sources/sqs-full-example.hcl) --- # Configure EventHub as a Snowbridge target > Configure EventHub target for Snowplow Snowbridge to write data to Azure Event Hubs with namespace and environment authentication. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/targets/eventhub/ Authentication for the EventHub target is done by configuring any valid combination of the environment variables [listed in the Azure Event Hubs Client documentation](https://pkg.go.dev/github.com/Azure/azure-event-hubs-go#NewHubWithNamespaceNameAndEnvironment). ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/eventhub-minimal-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/eventhub-full-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. --- # Configure HTTP as a Snowbridge target > Configure HTTP target for Snowplow Snowbridge to send data over HTTP with authentication, OAuth2, request templating, and response rules. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/targets/http/ > **Note:** Version 3.0.0 makes breaking changes to the HTTP target. Details on migrating can be found [in the migration guide](/docs/api-reference/snowbridge/X-X-upgrade-guide/) ## Basic authentication Where basicauth is used, it may be configured using the `basic_auth_username` and `basic_auth_password` options. Where an authorization header is used, it may be set via the `headers` option. We recommend using environment variables for sensitive values - which can be done via HCL's native `env.MY_ENV_VAR` format (as seen below). TLS may be configured by providing the `key_file`, `cert_file` and `ca_file` options with paths to the relevant TLS files. ## OAuth2 Snowbridge supports sending authorized requests to OAuth2-compliant HTTP targets. This can be enabled by setting `oauth2_client_id`, `oauth2_client_secret`, `oauth2_refresh_token` (these three are long-lived credentials used to generate short-lived bearer access tokens), and `oauth2_token_url` (which is the URL of the authorization server providing access tokens). Like in the case of basic authentication, we recommend using environment variables for sensitive values. ## Dynamic headers > **Note:** This feature was added in version 2.3.0 When enabled, this feature attaches a header to the data according to what your transformation provides in the `HTTPHeaders` field of `engineProtocol`. Data is batched independently per each dynamic header value before requests are sent. ## Request templating > **Note:** This feature was added in version 3.0.0 This feature allows you to provide a [Golang text template](https://pkg.go.dev/text/template) to construct a request body from a batch of data. This feature should be useful in constructing requests to send to an API, for example. Input data must be valid JSON, any message that fails to be marshaled to JSON will be treated as invalid and sent to the failure target. Equally, if an attempt to template a batch of data results in an error, then all messages in the batch will be considered invalid and sent to the failure target. Where the dynamic headers feature is enabled, data is split into batches according to the provided header value, and the templater will operate on each batch separately. ### Helper functions In addition to all base functions available in the Go text/template package, the following custom functions are available for convenience: `prettyPrint` - Because the input to the templater is a Go data structure, simply providing a reference to an object field won't output it as JSON. `prettyPrint` converts the data to prettified JSON. Use it wherever you expect a JSON object in the output. This is compatible with any data type, but it shouldn't be necessary if the data is not an object. `env` - Allows you to set and refer to an env var in your template. Use it when your request body must contain sensitive data, for example an API key. ### Template example The following example provides an API key via environment variable, and iterates the batch to provide JSON-formatted data one by one into a new key, inserting a comma before all but the first event. ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/http-template-full-example.file) ## Response rules (beta) > **Note:** This feature was added in version 3.0.0 > > This feature is in beta status because we may make breaking changes in future versions. > > **Breaking change in version 4.0.0**: Response rules are now evaluated in the order they are defined in the configuration. Rules must specify a `type` attribute to distinguish between invalid, setup, and throttle errors, rather than organizing them in separate `invalid` and `setup` blocks. Response rules allow you to configure how the app deals with failures in sending the data. You can configure a response code and an optional string match on the response body to determine how a failure response is handled. Response codes between 200 and 299 are considered successful, and are not handled by this feature. **Response rules are evaluated in the order they are defined in your configuration.** The first matching rule determines how the error is categorized. There are four categories of failure: `invalid` means that the data is considered incompatible with the target for some reason. For example, you may have defined a mapping for a given API, but the event happens to have null data for a required field. In this instance, retrying the data won't fix the issue, so you would configure an invalid response rule, identifying which responses indicate this scenario. Data that matches an invalid response rule is sent to the failure target. `setup` means that this error is not retryable, but is something which can only be resolved by a change in configuration or a change to the target. An example of this is an authentication failure - retrying won't fix the issue; the resolution is to grant the appropriate permissions, or provide the correct API key. Data that matches a setup response rule is handled by a retry as determined in the `setup` configuration block of [retry configuration](/docs/api-reference/snowbridge/configuration/retries/). `throttle` (added in version 4.0.0) is a special type of error that indicates the target is rate limiting requests. This is handled separately from transient errors to allow different retry behavior - typically with longer delays to respect rate limits. Data that matches a throttle response rule is handled by a retry as determined in the `throttle` configuration block of [retry configuration](/docs/api-reference/snowbridge/configuration/retries/). `transient` errors are everything else - we assume that the issue is temporary and retrying will resolve the problem. There is no explicit configuration for transient - rather, anything that is not configured as one of the other types is considered transient. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/http-minimal-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/http-full-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. ## Example for Google Tag Manager Server Side You can use the HTTP target to send events to Google Tag Manager Server Side, where the [Snowplow Client tag](/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-client-for-gtm-ss/) is installed. To do this, you will need to include a [transformation](/docs/api-reference/snowbridge/concepts/transformations/) that converts your events to JSON — [`spEnrichedToJson`](/docs/api-reference/snowbridge/configuration/transformations/builtin/spEnrichedToJson/). Here’s an example configuration. Replace `` with the hostname of your Google Tag Manager instance, and — optionally — `` with your preview mode token. ```hcl target { use "http" { url = "https:///com.snowplowanalytics.snowplow/enriched" request_timeout_in_seconds = 5 content_type = "application/json" # this line is optional, in case you want to send events to GTM Preview Mode headers = "{\"x-gtm-server-preview\": \"\"}" } } transform { use "spEnrichedToJson" {} } ``` --- # Snowbridge target configuration > Configure Snowbridge targets including stdout, EventHub, HTTP, Kafka, Kinesis, PubSub, and SQS for stream output. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/targets/ **Stdout target** is the default. We also support EventHub, HTTP, Kafka, Kinese, PubSub, and SQS targets. Stdout target doesn't have any configurable options - when configured it simply outputs the messages to stdout. ## Configuration options Here is an example of the configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/stdout-full-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use `failure_target` instead of `target`. --- # Configure Kafka as a Snowbridge target > Configure Kafka target for Snowplow Snowbridge to write data to Kafka topics with SASL authentication and TLS encryption. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/targets/kafka/ Where SASL is used, it may be enabled via the `enable_sasl`, `sasl_username`, and `sasl_password` and `sasl_algorithm` options. we recommend using environment variables for sensitive values - which can be done via HCL's native `env.MY_ENV_VAR` format (as seen below). TLS may be configured by providing the `key_file`, `cert_file` and `ca_file` options with paths to the relevant TLS files. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/kafka-minimal-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/kafka-full-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. --- # Configure Kinesis as a Snowbridge target > Configure Kinesis target for Snowplow Snowbridge to write data to AWS Kinesis streams with throttle retry handling and IAM authentication. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/targets/kinesis/ Authentication is done via the [AWS authentication environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html). Optionally, you can use the `role_arn` option to specify an ARN to use on the stream. ## Throttle retries As of 2.4.2, the kinesis target handles kinesis write throughput exceptions separately from all other errors and failures. It will back off and retry only the throttled records on an initial back off of 50ms, increasing by 50ms each time, until there are no more throttle errors. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/kinesis-minimal-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/kinesis-full-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. --- # Configure Pub/Sub as a Snowbridge target > Configure PubSub target for Snowplow Snowbridge to write data to GCP Pub/Sub topics with service account authentication. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/targets/pubsub/ Authentication is done using a [GCP Service Account](https://cloud.google.com/docs/authentication/application-default-credentials#attached-sa). Create a service account credentials file, and provide the path to it via the `GOOGLE_APPLICATION_CREDENTIALS` environment variable. Snowbridge connects to PubSub using [Google's Go Pubsub sdk](https://cloud.google.com/go/pubsub), which establishes a grpc connection with TLS encryption. ## Configuration options The PubSub Target has only two required options, and no optional ones. ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/pubsub-full-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. --- # Configure SQS as a Snowbridge target > Configure SQS target for Snowplow Snowbridge to write data to AWS SQS queues with IAM authentication and message attributes. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/targets/sqs/ Authentication is done via the [AWS authentication environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html). Optionally, you can use the `role_arn` option to specify an ARN to use on the queue. ## Configuration options Here is an example of the minimum required configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/sqs-minimal-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. Here is an example of every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/targets/sqs-full-example.hcl) If you want to use this as a [failure target](/docs/api-reference/snowbridge/concepts/failure-model/#failure-targets), then use failure\_target instead of target. --- # Snowbridge telemetry configuration > Enable or disable telemetry for Snowbridge with user-provided identifiers and privacy controls. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/telemetry/ You can read about our telemetry principles [here](/docs/get-started/self-hosted/telemetry/). To enable telemetry: ```hcl # Optional. Set to true to disable telemetry. disable_telemetry = false # Optional. An identifier to associate with telemetry data. user_provided_id = "elmer.fudd@acme.com" ``` To disable telemetry: ```hcl # Optional. Set to true to disable telemetry. disable_telemetry = false ``` --- # Snowbridge base64Decode transformation > Base64 decode message data from base64 byte array to decoded byte array representation for Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/base64Decode/ Introduced in version 2.1.0 `base64Decode`: Base64 decodes the message's data. This transformation base64 decodes the message's data from a base64 byte array, to a byte array representation of the decoded data. `base64Decode` has no options. ## Configuration options ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/builtin/base64Decode-minimal-example.hcl) --- # Snowbridge base64Encode transformation > Base64 encode message data to base64 byte array representation for Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/base64Encode/ Introduced in version 2.1.0 `base64Encode`: Base64 encodes the message's data. This transformation base64 encodes the message's data to a base 64 byte array. `base64Encode` has no options. ## Configuration options ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/builtin/base64Encode-minimal-example.hcl) --- # Built-in Snowbridge transformations > Use built-in Snowbridge transformations for Snowplow data including enriched filtering, JSON conversion, and base64 encoding. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/ Snowbridge includes several configurable built-in transformations. | Transformation | Functionality | Snowplow data only | | ------------------------------- | -------------------------------------------------------------------------------------- | ------------------ | | `base64Decode` | Base64-decodes the message's data. | | | `base64Encode` | Base64 encodes the message's data. | | | `jq` | Runs a `jq` command on the message data, and outputs the result of the command. | | | `jqFilter` | Filters messages based on the output of a `jq` command. | | | `spEnrichedFilter` | Filters messages based on a regex match against an atomic field. | ✅ | | `spEnrichedFilterContext` | Filters messages based on a regex match against a field in an entity. | ✅ | | `spEnrichedFilterUnstructEvent` | Filters messages based on a regex match against a field in a custom event. | ✅ | | `spEnrichedSetPk` | Sets the message's destination partition key. | ✅ | | `spEnrichedToJson` | Transforms a message's data from Snowplow enriched tsv string format to a JSON object. | ✅ | | `spGtmssPreview` | Attaches a GTM SS preview mode header | ✅ | --- # Snowbridge jq transformation > Run jq commands on message data to transform JSON structures with custom queries and helper functions, with Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/jq/ > **Note:** This transformation was added in version 3.0.0. [jq](https://github.com/jqlang/jq) is a lightweight and flexible command-line JSON processor. Snowbridge's jq features utilise the [gojq](https://github.com/itchyny/gojq) package, which is a pure Go implementation of jq. jq is Turing complete, so these features allow you to configure arbitrary logic dealing with JSON data structures. jq supports formatting values, mathematical operations, boolean comparisons, regex matches, and many more useful features. To get started with jq command, see the [tutorial](https://jqlang.github.io/jq/tutorial/), and [full reference manual](https://jqlang.github.io/jq/manual/). [This open-source jq playground tool](https://jqplay.org/) may also be helpful. For most use cases, you are unlikely to encounter them, but note that there are [some small differences](https://github.com/itchyny/gojq?tab=readme-ov-file#difference-to-jq) between jq and gojq. `jq` runs a jq command on the message data, and outputs the result of the command. While jq supports multi-element results, commands must output only a single element - this single element can be an array data type. If the provided jq command results in an error, the message will be considered invalid, and will be sent to the failure target. The minimal example here returns the input data as a single element array, and the full example maps the data to a new data structure. The jq transformation will remove any keys with null values from the data. ## Configuration options Minimal configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/builtin/jq-minimal-example.hcl) Every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/builtin/jq-full-example.hcl) ## Helper functions In addition to the native functions available in the jq language, the following helper functions are available for use in a jq query: - `epoch` converts a Go `time.Time` timestamp to an epoch timestamp in seconds, as integer type. jq's native timestamp-based functions expect integer input, but the Snowplow Analytics SDK provides base level timestamps as `time.Time`. This function can be chained with jq native functions to get past this limitation. For example: ```text { foo: .collector_tstamp | epoch | todateiso8601 } ``` - `epochMillis` converts a Go `time.Time` timestamp to an epoch timestamp in milliseconds, as unsigned integer type. Because of how integers are handled in Go, unsigned integers aren't compatible with jq's native timestamp functions, so the `epoch` function truncates to seconds, and the `epochMillis` function exists in case milliseconds are needed. This function cannot be chained with native jq functions, but where milliseconds matter for a value, use this function. ```text { foo: .collector_tstamp | epochMillis } ``` - `hash(algorithm, salt)` hashes the input value. To use unsalted hash, pass an empty string for salt. Salt may be provided as an environment variable using hcl syntax. The following hash algorithms are supported: - `sha1` - SHA-1 hash (160 bits) - `sha256` - SHA-256 hash (256 bits) - `md5` - MD5 hash (128 bits) ```text { foo: .user_id | hash("sha1"; "${env.SHA1_SALT}") } ``` --- # Snowbridge jqFilter transformation > Filter messages using jq commands that return boolean results to keep or discard messages, with Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/jqFilter/ > **Note:** This transformation was added in version 3.0.0. [jq](https://github.com/jqlang/jq) is a lightweight and flexible command-line JSON processor. Snowbridge's jq features utilise the [gojq](https://github.com/itchyny/gojq) package, which is a pure Go implementation of jq. jq is Turing complete, so these features allow you to configure arbitrary logic dealing with JSON data structures. jq supports formatting values, mathematical operations, boolean comparisons, regex matches, and many more useful features. To get started with jq command, see the [tutorial](https://jqlang.github.io/jq/tutorial/), and [full reference manual](https://jqlang.github.io/jq/manual/). [This open-source jq playground tool](https://jqplay.org/) may also be helpful. For most use cases, you are unlikely to encounter them, but note that there are [some small differences](https://github.com/itchyny/gojq?tab=readme-ov-file#difference-to-jq) between jq and gojq. `jqFilter` filters messages based on the output of a jq command which is run against the data. The provided command must return a boolean result. `false` filters the message out, `true` keeps it. If the provided jq command returns a non-boolean value error, or results in an error, then the message will be considered invalid, and will be sent to the failure target. ## Configuration options This example filters out all data that doesn't have an `app_id` key. Minimal configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/builtin/jqFilter-minimal-example.hcl) Every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/builtin/jqFilter-full-example.hcl) ## Filtering examples The following examples demonstrate common filtering patterns for Snowplow enriched events. You can filter Snowplow enriched events based on any field in the data. Match an atomic field to any value from a list: ```hcl transform { use "jqFilter" { # Keep only web and mobile data jq_command = < Filter Snowplow enriched events based on regex matches against atomic fields with keep or drop actions, with Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/spEnrichedFilter/ > **Warning:** This transformation is deprecated and can result in unexpected behavior when matching integers. Use the [`jqFilter`](/docs/api-reference/snowbridge/configuration/transformations/builtin/jqFilter/) transformation instead, which provides more robust and flexible filtering capabilities. `spEnrichedFilter`: Specific to Snowplow data. Filters messages based on a regex match against an atomic field. This transformation is for use on base-level atomic fields, rather than fields from contexts, or custom events — which can be achieved with `spEnrichedFilterContext` and `spEnrichedFilterUnstructEvent`. Filters can be used in one of two ways, which is determined by the `filter_action` option. `filter_action` determines the behavior of the app when the regex provided evaluates to `true`. If it's set to `"keep"`, the app will complete the remaining transformations and send the message to the destination (unless a subsequent filter determines otherwise). If it's set to `"drop"`, the message will be acked and discarded, without continuing to the next transformation or target. This example filters out all data whose `platform` value does not match either `web` or `mobile`. ## Configuration options Minimal configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spEnrichedFilter-minimal-example.hcl) Every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spEnrichedFilter-full-example.hcl) --- # Snowbridge spEnrichedFilterContext transformation > Filter Snowplow enriched events based on regex matches against entity fields using jsonpath notation, with Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/spEnrichedFilterContext/ > **Warning:** This transformation is deprecated and can result in unexpected behavior when matching integers. Use the [`jqFilter`](/docs/api-reference/snowbridge/configuration/transformations/builtin/jqFilter/) transformation instead, which provides more robust and flexible filtering capabilities. `spEnrichedFilterContext`: Specific to Snowplow data. Filters messages based on a regex match against a field in an entity. This transformation is for use on fields from entities (contexts). Note that if the same context is present in the data more than once, one instance of a match is enough for the regex condition to be considered a match — and the message to be kept. The full parsed context name must be provided, in camel case, in the format returned by the Snowplow analytics SDK: `contexts_{vendor}_{name}_{major version}` — for example `contexts_nl_basjes_yauaa_context_1`. The path to the field to be matched must then be provided as a jsonpath (dot notation and square braces only) — for example `test1.test2[0].test3`. Filters can be used in one of two ways, which is determined by the `filter_action` option. `filter_action` determines the behavior of the app when the regex provided evaluates to `true`. If it's set to `"keep"`, the app will complete the remaining transformations and send the message to the destination (unless a subsequent filter determines otherwise). If it's set to `"drop"`, the message will be acked and discarded, without continuing to the next transformation or target. The below example keeps messages which contain `prod` in the `environment` field of the `contexts_com_acme_env_context_1` context. Note that the `contexts_com_acme_env_context_1` context is attached more than once, if _any_ of the values at `dev` don't match `environment`, the message will be kept. ## Configuration options Minimal configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spEnrichedFilterContext-minimal-example.hcl) Every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spEnrichedFilterContext-full-example.hcl) --- # Snowbridge spEnrichedFilterUnstructEvent transformation > Filter Snowplow enriched events based on regex matches against self-describing event fields using jsonpath notation, with Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/spEnrichedFilterUnstructEvent/ > **Warning:** This transformation is deprecated and can result in unexpected behavior when matching integers. Use the [`jqFilter`](/docs/api-reference/snowbridge/configuration/transformations/builtin/jqFilter/) transformation instead, which provides more robust and flexible filtering capabilities. `spEnrichedFilterUnstructEvent`: Specific to Snowplow data. Filters messages based on a regex match against a field in a custom event. This transformation is for use on fields from custom events. The event name must be provided as it appears in the `event_name` field of the event (eg. `add_to_cart`). Optionally, a regex can be provided to match against the stringified version of the event (eg. `1-*-*`) The path to the field to match against must be provided as a jsonpath (dot notation and square braces only) — for example `test1.test2[0].test3`. Filters can be used in one of two ways, which is determined by the `filter_action` option. `filter_action` determines the behavior of the app when the regex provided evaluates to `true`. If it's set to `"keep"`, the app will complete the remaining transformations and send the message to the destination (unless a subsequent filter determines otherwise). If it's set to `"drop"`, the message will be acked and discarded, without continuing to the next transformation or target. ## Configuration options This example keeps all events whose `add_to_cart` event data at the `sku` field matches `test-data`. Minimal configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spEnrichedFilterUnstructEvent-minimal-example.hcl) Every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spEnrichedFilterUnstructEvent-full-example.hcl) --- # Snowbridge spEnrichedSetPk transformation > Set destination partition key for Snowplow enriched events using atomic field values, with Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/spEnrichedSetPk/ `spEnrichedSetPk`: Specific to Snowplow data. Sets the message's destination partition key to an atomic field from a Snowplow Enriched tsv string. The input data must be a valid Snowplow enriched TSV. ## Configuration options `SpEnrichedSetPk` only takes one option — the field to use for the partition key. ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spEnrichedSetPk-minimal-example.hcl) Note: currently, setting partition key to fields in custom events and contexts is unsupported. --- # Snowbridge spEnrichedToJson transformation > Transform Snowplow enriched TSV data to JSON format using the Go Analytics SDK, with Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/spEnrichedToJson/ `spEnrichedToJson`: Specific to Snowplow data. Transforms a message's data from Snowplow Enriched tsv string format to a JSON object. The input data must be a valid Snowplow enriched TSV. `spEnrichedToJson` has no options. ## Configuration options ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spEnrichedToJson-minimal-example.hcl) The transformation to JSON is done via the [analytics SDK](/docs/api-reference/analytics-sdk/) logic, specifically in this case the [Golang analytics SDK](/docs/api-reference/analytics-sdk/analytics-sdk-go/). In brief, the relevant logic here is that: - If a field is not populated in the original event, it won't have a key in the resulting JSON - In the TSV, there's a separate field for `contexts` (sent via tracker), and `derived_contexts` (attached during enrichment). In the analytics SDK, there is one key per context, regardless of which type. (Technically, it's one key per major version of a context. So if you had a 1.0.0 and a 2.0.0 of the same one, you'd have two keys). --- # Snowbridge spGtmssPreview transformation > Extract GTM Server Side preview mode header from Snowplow context for debugging with GTM preview mode, with Snowbridge. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/builtin/spGtmssPreview/ > **Note:** This transformation was added in version 2.3.0 `spGtmssPreview`: Specific to Snowplow data. Extracts a value from the `x-gtm-server-preview` field of a [preview mode context](https://github.com/snowplow/iglu-central/blob/master/schemas/com.google.tag-manager.server-side/preview_mode/jsonschema/1-0-0), and attaches it as the GTM SS preview mode header, to enable easier debugging using GTM SS preview mode. Only one preview mode context should be sent at a time. > **Note:** As of version 3.0.0: > > Invalid preview headers sent to GTM SS can result in requests failing, which may be problematic. There is insufficient information available about the values to allow us to confidently validate them, but we do two things to avoid this problem. > > First, we validate to ensure that the value is a valid base64 string. Second, we compare the age of the event (based on `collector_tstamp`) to ensure it is under a configurable timeout age. If either of these conditions fail, we treat the message as invalid, and output to the failure target. ## Configuration options ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spGtmssPreview-minimal-example.hcl) ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/snowplow-builtin/spGtmssPreview-full-example.hcl) --- # Snowbridge Script transformation examples > View example Snowbridge JavaScript transformations for Snowplow and non-Snowplow data including filtering, field modification, and partition keys. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/examples/ Examples showing the transformation of Snowplow or non-Snowplow data. ## Non-Snowplow data For this example, the input data is a json string which looks like this: ```json { "name": "Bruce", "id": "b47m4n", "batmobileCount": 1 } ``` The script filters out any data with a `batmobileCount` less than 1, otherwise it updates the Data's `name` field to "Bruce Wayne", and sets the PartitionKey to the value of `id`: ```js loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/examples/js-non-snowplow-script-example.js) The configuration for this script is: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/examples/js-non-snowplow-config-example.hcl) ## Snowplow data For this example, the input data is a valid Snowplow TSV event - so we can enable `snowplow_mode`, which will convert the data to a JSON before passing it to the script as a JSON object. The script below filters out non-web data, based on the `platform` value, otherwise it checks for a `user_id` value, setting a new `uid` field to that value if it's found, or `domain_userid` if not. It also sets the partitionKey to `app_id`. ```js loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/examples/js-snowplow-script-example.js) The configuration for this script is: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/examples/js-snowplow-config-example.hcl) --- # Custom Snowbridge script transformations > Create custom JavaScript Snowbridge transformations to modify data, filter messages, set partition keys, and add HTTP headers. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/ Custom transformation scripts may be defined in Javascript and provided to Snowbridge. ## The scripting interface The script must define a main function with a single argument. Snowbridge will pass the `engineProtocol` data structure as the argument: ```go type engineProtocol struct { FilterOut bool PartitionKey string Data interface{} HTTPHeaders map[string]string } ``` This structure is represented as an object in the script engine, and serves as both the input and output of the script. Scripts must define a `main` function with a single input argument (JSDoc for type information is optional): ```js /** * @typedef {object} EngineProtocol * @property {boolean} FilterOut * @property {string} PartitionKey * @property {(string | Object.)} Data * @property {Object.} HTTPHeaders */ /** * @param {EngineProtocol} input * @return {Partial} */ function main(input) { return input } ``` ## Accessing data Scripts can access the message Data at `input.Data`, and can return modified data by returning it in the `Data` field of the output. Likewise for the partition key to be used for the destination - `input.PartitionKey` and the `PartitionKey` field of the output. By default, the input's `Data` field will be a string in [enriched TSV format](/docs/pipeline/enriched-tsv-format/). This can be changed with the [SpEnrichedToJson](/docs/api-reference/snowbridge/configuration/transformations/builtin/spEnrichedToJson/) transform, or the Javascript transformation itself has a `snowplow_mode` option, which transforms the data to an object first. The output of the script must be an object which maps to engineProtocol. ### `snowplow_mode` `snowplow_mode` uses the [Go Analytics SDK](/docs/api-reference/analytics-sdk/analytics-sdk-go/) to parse the TSV fields into an object suitable for working with in Go. The result of the [`ParsedEvent.ToMap()`](https://pkg.go.dev/github.com/snowplow/snowplow-golang-analytics-sdk/analytics#ParsedEvent.ToMap) method is the input to your transform function. The [keys of the resulting map](https://github.com/snowplow/snowplow-golang-analytics-sdk/blob/a3430fbe576483d615b713120cfb5e443897d572/analytics/mappings.go#L153) are defined by the Analytics SDK. Values are primitives (string, number, boolean) or, for timestamps (e.g. `derived_tstamp`, `collector_tstamp`), [`time.Time`](https://pkg.go.dev/time#Time) objects. > **Tip:** To work with native Javascript [Date](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date) objects for timestamps, construct one via `new Date(ts.UnixMilli())` where `ts` is a `time.Time` instance. > > When returned as part of `Data`, `time.Time` instances will [serialize in JSON](https://pkg.go.dev/time#Time.MarshalJSON) as [RFC3339](https://www.rfc-editor.org/rfc/rfc3339.html) strings. Structured data ([Self Describing Event](/docs/fundamentals/events/#self-describing-events) payloads and [Entities](/docs/fundamentals/entities/)) will have keys with a prefix of `unstruct_event_` or `contexts_`, the vendor name converted to snake case, the event/entity name in snake case, and the [schema model version](/docs/api-reference/iglu/common-architecture/schemaver/). For example: | **Schema URI** | **Type** | **Key** | | --------------------------------------------------------------- | -------- | ---------------------------------------------------- | | `iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0` | Entity | `contexts_com_snowplowanalytics_snowplow_web_page_1` | | `iglu:org.w3/PerformanceNavigationTiming/jsonschema/1-0-0` | Entity | `contexts_org_w3_performance_navigation_timing_1` | | `iglu:com.urbanairship.connect/OPEN/jsonschema/1-0-0` | Event | `unstruct_event_com_urbanairship_connect_open_1` | Timestamps in structured data values will always have their primitive representation (string/number) and not be `time.Time` instances. ## Transforming Data For all the below examples, the input is a string representation of the below JSON object. For Snowplow data, using `snowplow_mode` will produce a JSON object input - see [the snowplow example](/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/examples/). ```json { "name": "Bruce", "id": "b47m4n", "batmobileCount": 1 } ``` To modify the message data, return an object which conforms to EngineProtocol, with the `Data` field set to the modified data. The `Data` field may be returned as either a string, or an object. ```js loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/create-a-script-modify-example.js) ## Filtering If the `FilterOut` field of the output is returned as `true`, the message will be acknowledged immediately and won't be sent to the target. This will be the behavior regardless of what is returned to the other fields in the protocol. ```js loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/create-a-script-filter-example.js) ## Setting the Partition Key To set the Partition Key in the message, you can simply set the input's PartitionKey field, and return it: ```js loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/create-a-script-setpk-example.js) Or, if modifying the data as well, return the modified data and PartitionKey field: ```js loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/create-a-script-setpk-modify-example.js) ## Setting an HTTP header For the `http` target only, you can specify a set of HTTP headers, which will be appended to the configured headers for the `http` target. Do so by providing an object in the `HTTPHeaders` field: ```js loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/create-a-script-header-example.js) The headers will only be included if the target has the [`dynamic_headers = true` setting](/docs/api-reference/snowbridge/configuration/targets/http/#configuration-options) configured. ## Helper functions - `hash(input, algorithm)` hashes the input value. Salt is configured using the `hash_salt_secret` parameter in the hcl configuration. If no value is provided, this function will perform an unsalted hash The following hash algorithms are supported: - `sha1` - SHA-1 hash (160 bits) - `sha256` - SHA-256 hash (256 bits) - `md5` - MD5 hash (128 bits) ```text hash(input.Data["app_id"], "sha1") ``` ## Configuration Once your script is ready, you can configure it in the app by following the [Javascript](/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/javascript-configuration/) configuration page. You can also find some complete example use cases in [the examples section](/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/examples/). --- # Snowbridge JavaScript transformation configuration > Configure Snowbridge JavaScript transformations using the goja embedded JavaScript engine with script paths and timeout settings. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/javascript-configuration/ This section details how to configure the transformation, once a script is written. You can also find some complete example use cases in [the examples section](/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/examples/). The custom JavaScript script transformation uses the [goja](https://pkg.go.dev/github.com/dop251/goja) embedded Javascript engine to run scripts upon the data. If a script errors or times out, a [transformation failure](/docs/api-reference/snowbridge/concepts/failure-model/#transformation-failure) occurs. Scripts must be available to the runtime of the application at the path provided in the `script_path` configuration option. For docker, this means mounting the script to the container and providing that path. ## Configuration options Minimal configuration: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/js-configuration-minimal-example.hcl) Every configuration option: ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/custom-scripts/js-configuration-full-example.hcl) --- # Snowbridge transformation configuration > Configure transformations and filters to modify or exclude messages with built-in transformations or custom scripts. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/configuration/transformations/ You can configure any number of transformations to run on the data one after another - transformations will run in the order provided. (You can repeatedly specify the same transformation more than once, if needed.) All transformations operate on a single message basis. If you're filtering the data, it's best to provide the filter first, for efficiency. If you're working with Snowplow enriched messages, you can configure scripting transformations, or any of the built-in transformations, which are specific to Snowplow data. If you're working with any other type of data, you can create transformations via scripting transformations. ## Transformations and filters Transformations modify messages in-flight. They might rename fields, perform computations, set partition keys, or modify data. For example if you wanted to change a `snake_case` field name to `camelCase`, you would use a transformation to do this. Filters are a type of transformation which prevent Snowbridge from further processing data based on a condition. When data is filtered, Snowbridge will ack the message without sending it to the target. For example if you only wanted to send page views to the destination, you would set up a filter with a condition where `event_name` matches the string `page_view`. ## Configuration To configure transformations, supply one or more `transform {}` block. Choose the transformation using `use "${transformation_name}"`. Example: The below first filters out any `event_name` which does not match the regex `^page_view$`, then runs a custom javascript script to change the app\_id value to `"1"` ```hcl loading... ``` [View on GitHub](https://github.com/snowplow/snowbridge/blob/v4.1.0/assets/docs/configuration/transformations/transformations-overview-example.hcl) --- # Getting started with Snowbridge > Install and configure Snowbridge using Docker or binaries to start replicating event streams between sources and targets. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/getting-started/ The fastest way to get started and experiment with Snowbridge is to run it via the command line: 1. Download the pre-compiled ZIP from the [releases](https://github.com/snowplow/snowbridge/releases/) 2. Unzip and run the binary with eg. `echo "hello world" | ./snowbridge` The defaults for the app are stdin source, no transformations, and stdout target - so this should print the message 'hello world' along with some logging data to the console. Next, the app can be configured using HCL - simply create a configuration file, and provide the path to it using the `SNOWBRIDGE_CONFIG_FILE` environment variable. You can find a guide to configuration in the [configuration section](/docs/api-reference/snowbridge/configuration/). **Telemetry notice** By default, Snowplow collects telemetry data for Snowbridge (since version 1.0.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `user_provided_id` configuration setting. If you wish to disable telemetry, you can do so by setting `disable_telemetry` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. ## Distribution options There are two distributions of Snowbridge. **Default:** The default distribution contains everything except for the [Kinesis source](/docs/api-reference/snowbridge/configuration/sources/kinesis/), i.e. the ability to read from AWS Kinesis. This distribution is all licensed under the [Snowplow Limited Use License Agreement](/limited-use-license-1.0/). _(If you are uncertain how it applies to your use case, check our answers to [frequently asked questions](/docs/licensing/limited-use-license-faq/).)_ It’s available on Docker: ```bash docker pull snowplow/snowbridge:4.1.0 docker run snowplow/snowbridge:4.1.0 ``` **AWS-specific (includes Kinesis source):** The AWS-specific distribution contains everything, including the [Kinesis source](/docs/api-reference/snowbridge/configuration/sources/kinesis/), i.e. the ability to read from AWS Kinesis. Like the default distribution, it’s licensed under the [Snowplow Limited Use License Agreement](/limited-use-license-1.0/) ([frequently asked questions](/docs/licensing/limited-use-license-faq/)). However, this distribution has a dependency on [twitchscience/kinsumer](https://github.com/twitchscience/kinsumer), which is licensed by Twitch under the [Amazon Software License](https://github.com/twitchscience/kinsumer/blob/master/LICENSE). To comply with the [Amazon Software License](https://github.com/twitchscience/kinsumer/blob/master/LICENSE), you may only use this distribution of Snowbridge _“with the web services, computing platforms or applications provided by Amazon.com, Inc. or its affiliates, including Amazon Web Services, Inc.”_ It’s available on Docker: ```bash docker pull snowplow/snowbridge:4.1.0-aws-only docker run snowplow/snowbridge:4.1.0-aws-only ``` *** ## Deployment The app can be deployed via services like EC2, ECS or Kubernetes using docker. Configuration and authentication can be done by mounting the relevant files, and/or setting the relevant environment variables as per the standard authentication methods for cloud services. --- # Replicate event streams in real time with Snowbridge > Replicate Snowplow event streams to multiple destinations with Snowbridge, a configurable tool supporting Kinesis, PubSub, Kafka, HTTP, and more. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/ Snowbridge is a flexible, low latency tool which can replicate streams of data of any type to external destinations, optionally filtering or transforming the data along the way. It can be used to consume, transform and relay data to any third party platform which supports HTTP or is listed as a target below — in real-time. ## Features - [Kinesis](https://aws.amazon.com/kinesis), [SQS](https://aws.amazon.com/sqs/), [PubSub](https://cloud.google.com/pubsub), [Kafka](https://kafka.apache.org/), and stdin sources - [Kinesis](https://aws.amazon.com/kinesis), [SQS](https://aws.amazon.com/sqs/), [PubSub](https://cloud.google.com/pubsub), [Kafka](https://kafka.apache.org/), [Event Hubs](https://azure.microsoft.com/en-us/services/event-hubs/), HTTP (e.g. for an [integration with Google Tag Manager Server Side](/docs/destinations/forwarding-events/google-tag-manager-server-side/)), and stdout targets - Custom in-flight JS transformations - Low-latency Snowplow-specific data transformations - Statsd and Sentry reporting and monitoring interfaces Snowbridge is a generic tool, built to work on any type of data, developed by the Snowplow team. It began life as a closed-source tool developed to deliver various requirements related to Snowplow data, and so some of the features are specific to that data. > **Note:** Version 4.0.0 includes breaking changes to response rule evaluation. See the [upgrade guide](/docs/api-reference/snowbridge/X-X-upgrade-guide/) for migration information. --- # Testing Snowbridge locally > Test Snowbridge configurations locally using stdin and stdout sources before deploying to production stream infrastructure. > Source: https://docs.snowplow.io/docs/api-reference/snowbridge/testing/ The easiest way to test Snowbridge configuration (e.g. transformations) is to run it locally. Ideally, you should also use a sample of data that is as close to your real world data as possible. The sample file should contain the events/messages you’d like to test with, one per line. ## Snowplow data You can get started working with Snowplow data by [downloading this file](/assets/files/input-5ef65c98e57e93e452e9f6bd5a413fed.txt/) which contains a sample of web and mobile Snowplow events in TSV format. However, if you need events that match your actual events, generate your own events sample. In order to generate your own sample of Snowplow data, you can follow the [guide to use Snowplow Micro](/docs/testing/snowplow-micro/local/) to generate test data, using the `--output-tsv` to get the data into a file, as per the [exporting to tsv section](/docs/testing/snowplow-micro/local/#exporting-events). For example, here we’re using a file named `data.tsv`: ```bash docker run -p 9090:9090 snowplow/snowplow-micro:4.1.1 --output-tsv > data.tsv ``` Point some test environment tracking to `localhost:9090`, and your events should land in `data.tsv`. ## Testing Snowbridge locally You can run Snowbridge locally via Docker: ```bash docker run --env ACCEPT_LIMITED_USE_LICENSE=yes snowplow/snowbridge:4.1.0 ``` The default configuration for Snowbridge uses the `stdin` source and the `stdout` target. So, to test sending data through with no transformations, we can run the following command (where `data.tsv` is a file with Snowplow events in TSV format): ```bash cat data.tsv | docker run --env ACCEPT_LIMITED_USE_LICENSE=yes -i snowplow/snowbridge:4.1.0 ``` This will print the data to the terminal, along with logs. > **Note:** The metrics reported in the logs may state that no data has been processed. This is because the app reached the end of output and exited before the default reporting period of 1 second. You can safely ignore this. You can output the results to a file to make it easier to examine them: ```bash cat data.tsv | docker run -i snowplow/snowbridge:4.1.0 > output.txt ``` > **Tip:** The output (in `output.txt`) will contain more than the data itself. There will be additional fields called `PartitionKey`, `TimeCreated` `TimePulled` and `TimeTransformed`. The data that reaches the target in a production setup is under `Data`. ### Adding configuration To add a specific configuration to test, create a configuration file (`config.hcl`) and pass it to the Docker container. You will need to [mount the file](https://docs.docker.com/storage/bind-mounts/) to the default config path of `/tmp/config.hcl`: ```bash cat data.tsv | docker run -i \ --env ACCEPT_LIMITED_USE_LICENSE=yes \ --mount type=bind,source=$(pwd)/config.hcl,target=/tmp/config.hcl \ snowplow/snowbridge:4.1.0 > output.txt ``` Note that docker expects absolute paths for mounted files - here we use `$(pwd)` but you can specify the aboslute path manually too. To test transformations, you only need to add the `transform` block(s) to your configuration file. Don’t specify the `source` and `target` blocks to leave them on default (`stdin` and `stdout`). To test specific sources or targets, add the respective `source` or `target` blocks. For example, see the [configuration](/docs/api-reference/snowbridge/configuration/targets/http/#example-for-google-tag-manager-server-side) for an HTTP target sending data to Google Tag Manager Server Side. ### Adding a custom transformation script You can add custom scripts by mounting a file, similarly to the above. Assuming the script is in `script.js`, that looks like this: ```bash cat data.tsv | docker run -i \ --env ACCEPT_LIMITED_USE_LICENSE=yes \ --mount type=bind,source=$(pwd)/config.hcl,target=/tmp/config.hcl \ --mount type=bind,source=$(pwd)/script.js,target=/tmp/script.js \ snowplow/snowbridge:4.1.0 > output.txt ``` The transformation config should point to the path of the script _inside_ the container (`/tmp/script.js` above). For example, the transformation block in the configuration might look like this: ```hcl transform { use "js" { script_path = "/tmp/script.js" } } ``` ### Adding an HTTP request template For the HTTP target, you can add a custom HTTP request template by mounting a file, similarly to above. Assuming the template is in `template.tpl`, that looks like this: ```bash cat data.tsv | docker run -i \ --env ACCEPT_LIMITED_USE_LICENSE=yes \ --mount type=bind,source=$(pwd)/config.hcl,target=/tmp/config.hcl \ --mount type=bind,source=$(pwd)/template.tpl,target=/tmp/template.tpl \ snowplow/snowbridge:4.1.0 ``` The HTTP target config should point to the path of the script _inside_ the container (`/tmp/template.tpl` above). For example, the HTTP target block in the configuration might look like this: ```hcl target { use "http" { url = "http://myApi.com/events" template_file = "/tmp/template.tpl" } } ``` ## Using Docker Compose Instead of creating an intermediate output file with Snowplow Micro and then processing that file in Snowbridge, you can also use Docker Compose to run Micro and Snowbridge together (since Micro 3.0.0 and Snowbridge 3.6.2). This way, any events sent to Micro will immediately make it to Snowbridge. Here is an example setup. Copy the code into `docker-compose.yml` and run with `docker-compose up`: ```yaml services: micro: image: snowplow/snowplow-micro:4.1.1 ports: - "9090:9090" command: --output-tsv --destination http://snowbridge:8080 snowbridge: image: snowplow/snowbridge:4.1.0 environment: - SNOWBRIDGE_CONFIG_FILE=/tmp/config.hcl configs: - source: snowbridge_config target: /tmp/config.hcl configs: snowbridge_config: content: | license { accept = true } source { use "http" { url = "0.0.0.0:8080" } } transform { use "spEnrichedToJson" { } } # any other Snowbridge configuration ``` ## Further testing You can use either method above (TSV file or Docker Compose) to test all aspects of the app from a local environment too, including sources, targets, failure targets, metrics endpoints etc. In some cases, you'll need to ensure that the local environment has access to any required resources and can authenticate, such as connecting from a laptop to a cloud account/local mock of cloud resources, or setting up a local metrics server for testing. Once that’s done, provide Snowbridge with an hcl file configuring it to connect to those resources, and run it the same way as in the examples above. --- # Snowplow Micro REST API > Snowplow Micro REST API endpoints for querying good events, bad events, and resetting cache. > Source: https://docs.snowplow.io/docs/api-reference/snowplow-micro/api/ This page documents the REST API of [Snowplow Micro](/docs/testing/snowplow-micro/). ## /micro/all > **Note:** This endpoint is not available when using Micro [through Snowplow Console](/docs/testing/snowplow-micro/console/). This endpoint responds with a summary JSON object of the number of total, good and bad events currently in the cache. ### HTTP method `GET`, `POST`, `OPTIONS` ### Response format Example: ```json { "total": 7, "good": 5, "bad": 2 } ``` ## /micro/good > **Note:** This endpoint is not available when using Micro [through Snowplow Console](/docs/testing/snowplow-micro/console/). This endpoint queries the good events, which are the events that have been successfully validated. ### HTTP method - `GET`: get _all_ the good events from the cache. - `POST`: get the good events with the possibility to filter. ### Response format JSON array of [GoodEvent](https://github.com/snowplow/snowplow-micro/blob/master/src/main/scala/com.snowplowanalytics.snowplow.micro/model.scala#L19)s. A `GoodEvent` contains 4 fields: - `rawEvent`: contains the [RawEvent](https://github.com/snowplow/enrich/blob/master/modules/common/src/main/scala/com.snowplowanalytics.snowplow.enrich/common/adapters/RawEvent.scala#L28). It corresponds to the format of a validated event just before being enriched. - `event`: contains the [canonical snowplow Event](https://github.com/snowplow/snowplow-scala-analytics-sdk/blob/master/src/main/scala/com.snowplowanalytics.snowplow.analytics.scalasdk/Event.scala#L42). It is in the format of an event after enrichment, even if all the enrichments are deactivated. - `eventType`: type of the event. - `schema`: schema of the event in case of an unstructured event. - `contexts`: contexts of the event. An example of a response with one event can be found below: ```json [ { "rawEvent":{ "api":{ "vendor":"com.snowplowanalytics.snowplow", "version":"tp2" }, "parameters":{ "e":"pv", "duid":"36746fd2-8441-4ea2-8ad0-237d6f4c77cf", "vid":"1", "co":"{\"schema\":\"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0\",\"data\":[{\"schema\":\"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0\",\"data\":{\"id\":\"5cea9899-10df-4ccf-bf66-8c36f4a4bba2\"}}]}", "eid":"bee0a6d7-fc17-4392-b2bc-2208e8e944f3", "url":"http://localhost:8000/", "refr":"http://localhost:8000/__/", "aid":"shop", "tna":"sp1", "cs":"UTF-8", "cd":"24", "stm":"1630238465752", "tz":"Europe/London", "tv":"js-3.1.3", "vp":"1000x660", "ds":"988x670", "res":"1920x1080", "cookie":"1", "p":"web", "dtm":"1630238465748", "uid":"tester", "lang":"en-US", "sid":"6d15a4fb-9623-4ba1-b876-5240e72e6970" }, "contentType":"application/json", "source":{ "name":"ssc-2.3.1-stdout$", "encoding":"UTF-8", "hostname":"0.0.0.0" }, "context":{ "timestamp":"2021-08-29T12:01:05.787Z", "ipAddress":"172.17.0.1", "useragent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0", "refererUri":"http://localhost:8000/", "headers":[ "Timeout-Access: ", "Connection: keep-alive", "Host: 0.0.0.0:9090", "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0", "Accept: */*", "Accept-Language: en-US, en;q=0.5", "Accept-Encoding: gzip", "Referer: http://localhost:8000/", "Origin: http://localhost:8000", "Cookie: micro=3734601f-5c3d-47c5-b367-0883e1ed74e6", "application/json" ], "userId":"3734601f-5c3d-47c5-b367-0883e1ed74e6" } }, "eventType":"page_view", "schema":"iglu:com.snowplowanalytics.snowplow/page_view/jsonschema/1-0-0", "contexts":[ "iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0" ], "event":{ "app_id":"shop", "platform":"web", "etl_tstamp":"2021-08-29T12:01:05.792Z", "collector_tstamp":"2021-08-29T12:01:05.787Z", "dvce_created_tstamp":"2021-08-29T12:01:05.748Z", "event":"page_view", "event_id":"bee0a6d7-fc17-4392-b2bc-2208e8e944f3", "txn_id":null, "name_tracker":"sp1", "v_tracker":"js-3.1.3", "v_collector":"ssc-2.3.1-stdout$", "v_etl":"snowplow-micro-1.3.1-common-2.0.2", "user_id":"tester", "user_ipaddress":"172.17.0.1", "user_fingerprint":null, "domain_userid":"36746fd2-8441-4ea2-8ad0-237d6f4c77cf", "domain_sessionidx":1, "network_userid":"3734601f-5c3d-47c5-b367-0883e1ed74e6", "geo_country":null, "geo_region":null, "geo_city":null, "geo_zipcode":null, "geo_latitude":null, "geo_longitude":null, "geo_region_name":null, "ip_isp":null, "ip_organization":null, "ip_domain":null, "ip_netspeed":null, "page_url":"http://localhost:8000/", "page_title":null, "page_referrer":"http://localhost:8000/__/", "page_urlscheme":"http", "page_urlhost":"localhost", "page_urlport":8000, "page_urlpath":"/", "page_urlquery":null, "page_urlfragment":null, "refr_urlscheme":"http", "refr_urlhost":"localhost", "refr_urlport":8000, "refr_urlpath":"/__/", "refr_urlquery":null, "refr_urlfragment":null, "refr_medium":null, "refr_source":null, "refr_term":null, "mkt_medium":null, "mkt_source":null, "mkt_term":null, "mkt_content":null, "mkt_campaign":null, "contexts":{ "schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0", "data":[ { "schema":"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0", "data":{ "id":"5cea9899-10df-4ccf-bf66-8c36f4a4bba2" } } ] }, "se_category":null, "se_action":null, "se_label":null, "se_property":null, "se_value":null, "unstruct_event":null, "tr_orderid":null, "tr_affiliation":null, "tr_total":null, "tr_tax":null, "tr_shipping":null, "tr_city":null, "tr_state":null, "tr_country":null, "ti_orderid":null, "ti_sku":null, "ti_name":null, "ti_category":null, "ti_price":null, "ti_quantity":null, "pp_xoffset_min":null, "pp_xoffset_max":null, "pp_yoffset_min":null, "pp_yoffset_max":null, "useragent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0", "br_name":null, "br_family":null, "br_version":null, "br_type":null, "br_renderengine":null, "br_lang":"en-US", "br_features_pdf":null, "br_features_flash":null, "br_features_java":null, "br_features_director":null, "br_features_quicktime":null, "br_features_realplayer":null, "br_features_windowsmedia":null, "br_features_gears":null, "br_features_silverlight":null, "br_cookies":true, "br_colordepth":"24", "br_viewwidth":1000, "br_viewheight":660, "os_name":null, "os_family":null, "os_manufacturer":null, "os_timezone":"Europe/London", "dvce_type":null, "dvce_ismobile":null, "dvce_screenwidth":1920, "dvce_screenheight":1080, "doc_charset":"UTF-8", "doc_width":988, "doc_height":670, "tr_currency":null, "tr_total_base":null, "tr_tax_base":null, "tr_shipping_base":null, "ti_currency":null, "ti_price_base":null, "base_currency":null, "geo_timezone":null, "mkt_clickid":null, "mkt_network":null, "etl_tags":null, "dvce_sent_tstamp":"2021-08-29T12:01:05.752Z", "refr_domain_userid":null, "refr_dvce_tstamp":null, "derived_contexts":{}, "domain_sessionid":"6d15a4fb-9623-4ba1-b876-5240e72e6970", "derived_tstamp":"2021-08-29T12:01:05.783Z", "event_vendor":"com.snowplowanalytics.snowplow", "event_name":"page_view", "event_format":"jsonschema", "event_version":"1-0-0", "event_fingerprint":null, "true_tstamp":null } } ] ``` ### Filters When querying `/micro/good` with `POST` (`Content-Type: application/json` needs to be set in the headers of the request), it's possible to specify filters, thanks to a JSON in the data of the HTTP request. Example of command to query the good events:  ```bash curl -X POST -H 'Content-Type: application/json' /micro/good -d '' ``` An example of JSON with filters could be: ```json { "schema": "iglu:com.acme/example/jsonschema/1-0-0", "contexts": [ "com.snowplowanalytics.mobile/application/jsonschema/1-0-0", "com.snowplowanalytics.mobile/screen/jsonschema/1-0-0" ], "limit": 10 } ``` List of possible fields for the filters: - `event_type`: type of the event (in `e` param); - `schema`: corresponds to the schema of a [self-describing event](/docs/fundamentals/events/#self-describing-events) (schema of the self-describing JSON contained in `ue_pr` or `ue_px`). It automatically implies `event_type` = `ue`. - `contexts`: list of the schemas contained in the contexts of an event (parameters `co` or `cx`). An event must contain **all** the contexts of the list to be returned. It can also contain more contexts than the ones specified in the request. - `limit`: limit the number of events in the response (most recent events are returned). It's not necessary to specify all the fields in a request, only the ones that need to be used for filtering. ## /micro/bad > **Note:** This endpoint is not available when using Micro [through Snowplow Console](/docs/testing/snowplow-micro/console/). This endpoint queries the bad events, which are the events that failed validation. ### HTTP method - `GET`: get _all_ the bad events from the cache. - `POST`: get the bad events with the possibility to filter. ### Response format JSON array of [BadEvent](https://github.com/snowplow/snowplow-micro/blob/master/src/main/scala/com.snowplowanalytics.snowplow.micro/model.scala#L28)s. A `BadEvent` contains 3 fields: - `collectorPayload`: contains the [CollectorPayload](https://github.com/snowplow/enrich/blob/master/modules/common/src/main/scala/com.snowplowanalytics.snowplow.enrich/common/loaders/CollectorPayload.scala#L107) with all the raw information of the tracking event. This field can be empty if an error occured before trying to validate a payload. - `rawEvent`: contains the [RawEvent](https://github.com/snowplow/enrich/blob/master/modules/common/src/main/scala/com.snowplowanalytics.snowplow.enrich/common/adapters/RawEvent.scala#L28). It corresponds to the format of a validated event just before being enriched. - `errors`: list of errors that occured during the validation of the tracking event. An example of a response with one bad event can be found below: ```json [ { "collectorPayload":{ "api":{ "vendor":"com.snowplowanalytics.snowplow", "version":"tp2" }, "querystring":[], "contentType":"application/json", "body":"{\"schema\":\"iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4\",\"data\":[{\"e\":\"ue\",\"eid\":\"36c39024-7b1b-4c2c-ae85-e95a8cb8340a\",\"tv\":\"js-3.1.3\",\"tna\":\"spmicro\",\"aid\":\"sh0pspr33\",\"p\":\"web\",\"cookie\":\"1\",\"cs\":\"UTF-8\",\"lang\":\"en-US\",\"res\":\"1920x1080\",\"cd\":\"24\",\"tz\":\"Europe/London\",\"dtm\":\"1630234190717\",\"vp\":\"1000x660\",\"ds\":\"1003x2242\",\"vid\":\"1\",\"sid\":\"13c8f5ac-d999-4923-940d-b39f7b74aa94\",\"duid\":\"8a17bb29-e35c-4363-aec3-85b9b363f9bf\",\"uid\":\"tester\",\"refr\":\"http://localhost:8000/\",\"url\":\"http://localhost:8000/shop/\",\"ue_pr\":\"{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"schema\\\":\\\"iglu:test.example.iglu/cart_action_event/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"type\\\":\\\"add\\\"}}}\",\"co\":\"{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0\\\",\\\"data\\\":[{\\\"schema\\\":\\\"iglu:test.example.iglu/product_entity/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"sku\\\":\\\"hh456\\\",\\\"name\\\":\\\"One-size bucket hat\\\",\\\"price\\\":24.49,\\\"quantity\\\":\\\"2\\\"}},{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"id\\\":\\\"fe0dd7c7-fb0b-43a2-b299-75d20baa94ec\\\"}}]}\",\"stm\":\"1630234190719\"}]}", "source":{ "name":"ssc-2.3.1-stdout$", "encoding":"UTF-8", "hostname":"0.0.0.0" }, "context":{ "timestamp":"2021-08-29T10:49:50.727Z", "ipAddress":"172.17.0.1", "useragent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0", "refererUri":"http://localhost:8000/", "headers":[ "Timeout-Access: ", "Connection: keep-alive", "Host: 0.0.0.0:9090", "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0", "Accept: */*", "Accept-Language: en-US, en;q=0.5", "Accept-Encoding: gzip", "Referer: http://localhost:8000/", "Origin: http://localhost:8000", "Cookie: micro=3734601f-5c3d-47c5-b367-0883e1ed74e6", "application/json" ], "userId":"3734601f-5c3d-47c5-b367-0883e1ed74e6" } }, "rawEvent":{ "api":{ "vendor":"com.snowplowanalytics.snowplow", "version":"tp2" }, "parameters":{ "e":"ue", "duid":"8a17bb29-e35c-4363-aec3-85b9b363f9bf", "vid":"1", "co":"{\"schema\":\"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0\",\"data\":[{\"schema\":\"iglu:test.example.iglu/product_entity/jsonschema/1-0-0\",\"data\":{\"sku\":\"hh456\",\"name\":\"One-size bucket hat\",\"price\":24.49,\"quantity\":\"2\"}},{\"schema\":\"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0\",\"data\":{\"id\":\"fe0dd7c7-fb0b-43a2-b299-75d20baa94ec\"}}]}", "eid":"36c39024-7b1b-4c2c-ae85-e95a8cb8340a", "url":"http://localhost:8000/shop/", "refr":"http://localhost:8000/", "aid":"sh0pspr33", "tna":"spmicro", "cs":"UTF-8", "cd":"24", "stm":"1630234190719", "tz":"Europe/London", "tv":"js-3.1.3", "vp":"1000x660", "ds":"1003x2242", "ue_pr":"{\"schema\":\"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0\",\"data\":{\"schema\":\"iglu:test.example.iglu/cart_action_event/jsonschema/1-0-0\",\"data\":{\"type\":\"add\"}}}", "res":"1920x1080", "cookie":"1", "p":"web", "dtm":"1630234190717", "uid":"tester", "lang":"en-US", "sid":"13c8f5ac-d999-4923-940d-b39f7b74aa94" }, "contentType":"application/json", "source":{ "name":"ssc-2.3.1-stdout$", "encoding":"UTF-8", "hostname":"0.0.0.0" }, "context":{ "timestamp":"2021-08-29T10:49:50.727Z", "ipAddress":"172.17.0.1", "useragent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0", "refererUri":"http://localhost:8000/", "headers":[ "Timeout-Access: ", "Connection: keep-alive", "Host: 0.0.0.0:9090", "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0", "Accept: */*", "Accept-Language: en-US, en;q=0.5", "Accept-Encoding: gzip", "Referer: http://localhost:8000/", "Origin: http://localhost:8000", "Cookie: micro=3734601f-5c3d-47c5-b367-0883e1ed74e6", "application/json" ], "userId":"3734601f-5c3d-47c5-b367-0883e1ed74e6" } }, "errors":[ "Error while validating the event", "{\"schema\":\"iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/2-0-0\",\"data\":{\"processor\":{\"artifact\":\"snowplow-micro\",\"version\":\"1.2.1\"},\"failure\":{\"timestamp\":\"2021-08-29T10:49:50.739178Z\",\"messages\":[{\"schemaKey\":\"iglu:test.example.iglu/product_entity/jsonschema/1-0-0\",\"error\":{\"error\":\"ValidationError\",\"dataReports\":[{\"message\":\"$.quantity: string found, integer expected\",\"path\":\"$.quantity\",\"keyword\":\"type\",\"targets\":[\"string\",\"integer\"]}]}}]},\"payload\":{\"enriched\":{\"app_id\":\"sh0pspr33\",\"platform\":\"web\",\"etl_tstamp\":\"2021-08-29 10:49:50.731\",\"collector_tstamp\":\"2021-08-29 10:49:50.727\",\"dvce_created_tstamp\":\"2021-08-29 10:49:50.717\",\"event\":\"unstruct\",\"event_id\":\"36c39024-7b1b-4c2c-ae85-e95a8cb8340a\",\"txn_id\":null,\"name_tracker\":\"spmicro\",\"v_tracker\":\"js-3.1.3\",\"v_collector\":\"ssc-2.3.1-stdout$\",\"v_etl\":\"snowplow-micro-1.2.1-common-2.0.2\",\"user_id\":\"tester\",\"user_ipaddress\":\"172.17.0.1\",\"user_fingerprint\":null,\"domain_userid\":\"8a17bb29-e35c-4363-aec3-85b9b363f9bf\",\"domain_sessionidx\":1,\"network_userid\":\"3734601f-5c3d-47c5-b367-0883e1ed74e6\",\"geo_country\":null,\"geo_region\":null,\"geo_city\":null,\"geo_zipcode\":null,\"geo_latitude\":null,\"geo_longitude\":null,\"geo_region_name\":null,\"ip_isp\":null,\"ip_organization\":null,\"ip_domain\":null,\"ip_netspeed\":null,\"page_url\":\"http://localhost:8000/shop/\",\"page_title\":null,\"page_referrer\":\"http://localhost:8000/\",\"page_urlscheme\":null,\"page_urlhost\":null,\"page_urlport\":null,\"page_urlpath\":null,\"page_urlquery\":null,\"page_urlfragment\":null,\"refr_urlscheme\":null,\"refr_urlhost\":null,\"refr_urlport\":null,\"refr_urlpath\":null,\"refr_urlquery\":null,\"refr_urlfragment\":null,\"refr_medium\":null,\"refr_source\":null,\"refr_term\":null,\"mkt_medium\":null,\"mkt_source\":null,\"mkt_term\":null,\"mkt_content\":null,\"mkt_campaign\":null,\"contexts\":\"{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0\\\",\\\"data\\\":[{\\\"schema\\\":\\\"iglu:test.example.iglu/product_entity/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"sku\\\":\\\"hh456\\\",\\\"name\\\":\\\"One-size bucket hat\\\",\\\"price\\\":24.49,\\\"quantity\\\":\\\"2\\\"}},{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"id\\\":\\\"fe0dd7c7-fb0b-43a2-b299-75d20baa94ec\\\"}}]}\",\"se_category\":null,\"se_action\":null,\"se_label\":null,\"se_property\":null,\"se_value\":null,\"unstruct_event\":\"{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"schema\\\":\\\"iglu:test.example.iglu/cart_action_event/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"type\\\":\\\"add\\\"}}}\",\"tr_orderid\":null,\"tr_affiliation\":null,\"tr_total\":null,\"tr_tax\":null,\"tr_shipping\":null,\"tr_city\":null,\"tr_state\":null,\"tr_country\":null,\"ti_orderid\":null,\"ti_sku\":null,\"ti_name\":null,\"ti_category\":null,\"ti_price\":null,\"ti_quantity\":null,\"pp_xoffset_min\":null,\"pp_xoffset_max\":null,\"pp_yoffset_min\":null,\"pp_yoffset_max\":null,\"useragent\":\"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0\",\"br_name\":null,\"br_family\":null,\"br_version\":null,\"br_type\":null,\"br_renderengine\":null,\"br_lang\":\"en-US\",\"br_features_pdf\":null,\"br_features_flash\":null,\"br_features_java\":null,\"br_features_director\":null,\"br_features_quicktime\":null,\"br_features_realplayer\":null,\"br_features_windowsmedia\":null,\"br_features_gears\":null,\"br_features_silverlight\":null,\"br_cookies\":1,\"br_colordepth\":\"24\",\"br_viewwidth\":1000,\"br_viewheight\":660,\"os_name\":null,\"os_family\":null,\"os_manufacturer\":null,\"os_timezone\":\"Europe/London\",\"dvce_type\":null,\"dvce_ismobile\":null,\"dvce_screenwidth\":1920,\"dvce_screenheight\":1080,\"doc_charset\":\"UTF-8\",\"doc_width\":1003,\"doc_height\":2242,\"tr_currency\":null,\"tr_total_base\":null,\"tr_tax_base\":null,\"tr_shipping_base\":null,\"ti_currency\":null,\"ti_price_base\":null,\"base_currency\":null,\"geo_timezone\":null,\"mkt_clickid\":null,\"mkt_network\":null,\"etl_tags\":null,\"dvce_sent_tstamp\":\"2021-08-29 10:49:50.719\",\"refr_domain_userid\":null,\"refr_dvce_tstamp\":null,\"derived_contexts\":null,\"domain_sessionid\":\"13c8f5ac-d999-4923-940d-b39f7b74aa94\",\"derived_tstamp\":null,\"event_vendor\":null,\"event_name\":null,\"event_format\":null,\"event_version\":null,\"event_fingerprint\":null,\"true_tstamp\":null},\"raw\":{\"vendor\":\"com.snowplowanalytics.snowplow\",\"version\":\"tp2\",\"parameters\":[{\"name\":\"e\",\"value\":\"ue\"},{\"name\":\"duid\",\"value\":\"8a17bb29-e35c-4363-aec3-85b9b363f9bf\"},{\"name\":\"vid\",\"value\":\"1\"},{\"name\":\"co\",\"value\":\"{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0\\\",\\\"data\\\":[{\\\"schema\\\":\\\"iglu:test.example.iglu/product_entity/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"sku\\\":\\\"hh456\\\",\\\"name\\\":\\\"One-size bucket hat\\\",\\\"price\\\":24.49,\\\"quantity\\\":\\\"2\\\"}},{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"id\\\":\\\"fe0dd7c7-fb0b-43a2-b299-75d20baa94ec\\\"}}]}\"},{\"name\":\"eid\",\"value\":\"36c39024-7b1b-4c2c-ae85-e95a8cb8340a\"},{\"name\":\"url\",\"value\":\"http://localhost:8000/shop/\"},{\"name\":\"refr\",\"value\":\"http://localhost:8000/\"},{\"name\":\"aid\",\"value\":\"sh0pspr33\"},{\"name\":\"tna\",\"value\":\"spmicro\"},{\"name\":\"cs\",\"value\":\"UTF-8\"},{\"name\":\"cd\",\"value\":\"24\"},{\"name\":\"stm\",\"value\":\"1630234190719\"},{\"name\":\"tz\",\"value\":\"Europe/London\"},{\"name\":\"tv\",\"value\":\"js-3.1.3\"},{\"name\":\"vp\",\"value\":\"1000x660\"},{\"name\":\"ds\",\"value\":\"1003x2242\"},{\"name\":\"ue_pr\",\"value\":\"{\\\"schema\\\":\\\"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"schema\\\":\\\"iglu:test.example.iglu/cart_action_event/jsonschema/1-0-0\\\",\\\"data\\\":{\\\"type\\\":\\\"add\\\"}}}\"},{\"name\":\"res\",\"value\":\"1920x1080\"},{\"name\":\"cookie\",\"value\":\"1\"},{\"name\":\"p\",\"value\":\"web\"},{\"name\":\"dtm\",\"value\":\"1630234190717\"},{\"name\":\"uid\",\"value\":\"tester\"},{\"name\":\"lang\",\"value\":\"en-US\"},{\"name\":\"sid\",\"value\":\"13c8f5ac-d999-4923-940d-b39f7b74aa94\"}],\"contentType\":\"application/json\",\"loaderName\":\"ssc-2.3.1-stdout$\",\"encoding\":\"UTF-8\",\"hostname\":\"0.0.0.0\",\"timestamp\":\"2021-08-29T10:49:50.727Z\",\"ipAddress\":\"172.17.0.1\",\"useragent\":\"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0\",\"refererUri\":\"http://localhost:8000/\",\"headers\":[\"Timeout-Access: \",\"Connection: keep-alive\",\"Host: 0.0.0.0:9090\",\"User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0\",\"Accept: */*\",\"Accept-Language: en-US, en;q=0.5\",\"Accept-Encoding: gzip\",\"Referer: http://localhost:8000/\",\"Origin: http://localhost:8000\",\"Cookie: micro=3734601f-5c3d-47c5-b367-0883e1ed74e6\",\"application/json\"],\"userId\":\"3734601f-5c3d-47c5-b367-0883e1ed74e6\"}}}}" ] } ] ``` ### Filters When querying `/micro/bad` with `POST` (`Content-Type: application/json` needs to be set in the headers of the request), it's possible to specify filters, thanks to a JSON in the data of the HTTP request. Example of command to query the bad events:  ```bash curl -X POST -H 'Content-Type: application/json' /micro/bad -d '' ``` An example of JSON with filters could be: ```json { "vendor":"com.snowplowanalytics.snowplow", "version":"tp2", "limit": 10 } ``` List of possible fields for the filters: - `vendor`: vendor for the tracking event. - `version`: version of the vendor for the tracking event. - `limit`: limit the number of events in the response (most recent events are returned). It's not necessary to specify all the fields in each request, only the ones that need to be used for filtering. ## /micro/reset Sending a request to this endpoint deletes all events stored by Micro. ### HTTP method `GET`, `POST` ### Response format Expected: ```json { "total": 0, "good": 0, "bad": 0 } ``` ## /micro/iglu > **Note:** This is available since version 1.2.0. The `/micro/iglu` endpoint can be used in order to check whether a schema can be resolved. Schema lookup should be in format: ```text /micro/iglu/{vendor}/{schemaName}/jsonschema/{schemaVersion} ``` Or more specifically: ```text /micro/iglu/{vendor}/{schemaName}/jsonschema/{model}-{revision}-{addition} ``` For example, assuming Micro running on localhost port `9090`: ```bash curl -X GET http://localhost:9090/micro/iglu/com.myvendor/myschema/jsonschema/1-0-0 ``` ### HTTP Method GET ### Response format The JSON schema itself, if resolved: ```json { "$schema":"http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description":"A template for a self-describing JSON Schema for use with Iglu", "self": { "vendor":"com.myvendor", "name":"myschema", "format":"jsonschema", "version":"1-0-0" }, "type":"object", "properties": { "myStringProperty": { "type":"string" }, "myNumberProperty":{ "type":"number" } }, "required": ["myStringProperty","myNumberProperty"], "additionalProperties":false } ``` If a schema cannot be resolved, a JSON indicating the Iglu repositories searched: ```json { "value": { "Iglu Central": { "errors": [{"error":"NotFound"}], "attempts":1, "lastAttempt":"2021-08-26T13:41:06.905Z" }, "Iglu Client Embedded": { "errors": [{"error":"NotFound"}], "attempts":1, "lastAttempt":"2021-08-26T13:41:06.677Z" } } } ``` --- # Snowplow Micro API reference > Snowplow Micro arguments and environment variables. > Source: https://docs.snowplow.io/docs/api-reference/snowplow-micro/ See [this guide](/docs/testing/snowplow-micro/) for learning about Snowplow Micro and getting started with it. > **Note:** You can skip this reference page if you are running Micro [through Console](/docs/testing/snowplow-micro/console/). You can always run Micro with the `--help` argument to find out what is supported: ```bash docker run -p 9090:9090 snowplow/snowplow-micro:4.1.1 --help ``` ## Arguments | Argument | Description | | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `--collector-config` | Configuration file for collector ([usage](/docs/testing/snowplow-micro/local/advanced-usage/#adding-custom-collector-configuration)) | | `--iglu` | Configuration file for Iglu Client ([usage](/docs/testing/snowplow-micro/local/advanced-usage/#adding-custom-iglu-resolver-configuration)) | | `-t`, `--output-tsv` _(since 1.4.0)_ | Print events in TSV format to standard output ([usage](/docs/testing/snowplow-micro/local/#exporting-events)) | | `-j`, `--output-json` _(since 2.4.0)_ | Print events in JSON format to standard output ([usage](/docs/testing/snowplow-micro/local/#exporting-events)) | | `-d`, `--destination` _(since 2.4.0)_ | Send data to an HTTP endpoint instead of outputting it via standard output. Requires either `--output-tsv` or `--output-json` ([usage](/docs/testing/snowplow-micro/local/#exporting-events)) | | `--yauaa` | Enable YAUAA user agent enrichment ([usage](/docs/testing/snowplow-micro/local/enrichments/#yauaa-yet-another-user-agent-analyzer)) | | `--no-storage` _(since 4.0.0)_ | Do not store the events anywhere and disable the API | | `--storage` _(since 4.0.0)_ | Enable PostgreSQL storage backend ([usage](/docs/testing/snowplow-micro/local/advanced-usage/#persisting-events-across-restarts)) | ## Environment variables | Variable | Version | Description | | ---------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `MICRO_IGLU_REGISTRY_URL` | 1.5.0+ | The URL for an additional custom Iglu registry ([usage](/docs/testing/snowplow-micro/local/schemas/#pointing-micro-to-an-iglu-registry)) | | `MICRO_IGLU_API_KEY` | 1.5.0+ | An optional API key for an Iglu registry defined with `MICRO_IGLU_REGISTRY_URL` | | `MICRO_SSL_CERT_PASSWORD` | 1.7.0+ | The password for the optional SSL/TLS certificate in `/config/ssl-certificate.p12`. Enables HTTPS ([usage](/docs/testing/snowplow-micro/local/advanced-usage/#enabling-https)) | | `MICRO_POSTGRESQL_PASSWORD` | 4.0.0+ | The password for the optional PostgreSQL database ([usage](/docs/testing/snowplow-micro/local/advanced-usage/#persisting-events-across-restarts)) | | `MICRO_AZURE_BLOB_ACCOUNT` | 4.0.0+ | The Azure blob storage account name to use for downloading enrichment assets | | `MICRO_AZURE_BLOB_SAS_TOKEN` | 4.0.0+ | The Azure blob storage account token to use for downloading enrichment assets | --- # Snowplow Mini Control Plane API > Control and configure a Snowplow Mini instance through the Control Plane API with HTTP authentication. > Source: https://docs.snowplow.io/docs/api-reference/snowplow-mini/control-plane-api/ Snowplow Mini Control Plane API is created for controlling and configuring the Snowplow Mini instance without ssh to it. You can use control plane from Snowplow Mini dashboard or you can send a request to a specific endpoint of the API with any HTTP client e.g. cURL ### Authentication The Control Plane uses [HTTP basic access authentication](https://en.wikipedia.org/wiki/Basic_access_authentication). This means that you need to add "-u username:password" to all `curl` commands, where `username` and `password` are as you specified in the Snowplow Mini setup. ### Current Methods #### Service restart ```bash /control-plane/restart-services ``` Example using `curl`: ```bash $ curl -XPUT http://${snowplow_mini_ip}/control-plane/restart-services \ -u username:password ``` Restarts all the services running on the Snowplow Mini, including Stream Collector, Stream Enrich, Elasticsearch Loader. This API call blocks until all the services have been restarted. Return status 200 means that services have been successfully restarted. #### Resetting Opensearch indices As of 0.13.0, it is possible to reset Opensearch (or previously Elasticsearch) indices, along with the corresponding index patterns in Opensearch Dashboards, through Control Plane API. ```bash curl -L \ -X POST '/control-plane/reset-service' \ -u ':' \ -H 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'service_name=elasticsearch' ``` Note that resetting deletes not only indices and patterns but also all events stored so far. #### Restart services individually As of 0.13.0, it is possible to restart services one by one. ```bash curl -L \ -X PUT '/control-plane/restart-service' \ -u ':' \ -H 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'service_name=' ``` where `service_name` can be one of the following: `collector`, `enrich`, `esLoaderGood`, `esLoaderBad`, `iglu`, `kibana`, `elasticsearch`. #### Configuring telemetry See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information on telemetry. HTTP GET to get current configuration ```bash curl -L -X GET '/control-plane/telemetry' -u ':' ``` HTTP PUT to set it (use true or false as value of key `disable` to turn it on or off) ```bash curl -L -X PUT '/control-plane/telemetry' -u ':' -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode 'disable=false' ``` #### Adding external Iglu Server ```bash /control-plane/external-iglu ``` Example using `curl`: ```bash curl -XPOST http://${snowplow_mini_ip}/control-plane/external-iglu \ -d "uri=${external_iglu_uri}&apikey=${external_iglu_server_apikey}&vendor_prefix=${vendor_prefix}&name=${iglu_server_name}&priority=${priority}" \ -u username:password ``` Adds given pieces of information of the external Iglu Server to iglu resolver json file to use it with Stream Enrich. Note that a lower priority number means that the repo is ranked higher in terms of priority. Return status 200 means that pieces of information are added to iglu resolver json file and Stream Enrich is restarted successfully. **Note**: Apikey must follow the UUID format. #### Uploading custom enrichments ```bash /control-plane/enrichments ``` Example using `curl`: ```bash curl http://${snowplow_mini_ip}/control-plane/enrichments \ -F "enrichmentjson=@${path_of_the_custom_enrichment_dir}" \ -u username:password ``` Upload custom enrichment json file to enrichments directory and restarts the Stream Enrich to activate uploaded custom enrichment. Return status 200 means that custom enrichment json file is placed in the enrichments directory and Stream Enrich is restarted successfully. #### Adding apikey for local Iglu Server ```bash /control-plane/local-iglu-apikey ``` Example using `curl`: ```bash curl -XPOST http://${snowplow_mini_ip}/control-plane/local-iglu-apikey \ -d "local_iglu_apikey=${new_local_iglu_apikey}" \ -u username:password ``` Adds provided apikey to the section of local Iglu Server in the iglu resolver json config. Restarts to Stream Enrich to activate the changes. Return status 200 means that apikey is added and Stream Enrich is restarted successfully. **Note**: Apikey must follow the UUID format. #### Changing credentials for basic HTTP authentication As of version 0.13.0, this endpoint doesn't accept new passwords shorter than 8 chars and with a score lower than 4 according to [zxcvbn](https://pkg.go.dev/github.com/trustelem/zxcvbn) ```bash /control-plane/credentials ``` Example using `curl`: ```bash curl -XPOST http://${snowplow_mini_ip}/control-plane/credentials \ -d "new_username=${new_username}&new_password=${new_password}" \ -u username:password ``` Changes the credentials for basic HTTP authentication. You will get always empty reply from the server because caddy server will be restarted after credentials are submitted and the connection will be lost until caddy server is up and running. #### Add domain name ```bash /control-plane/domain-name ``` Example using `curl`: ```bash curl -XPOST http://${snowplow_mini_ip}/control-plane/domain-name \ -d "domain_name=${registered_domain_name}" \ -u username:password ``` Adds domain name for Snowplow Mini instance. After adding the domain name, your connection will be secured with TLS automatically. Make sure that given domain name is resolving to Snowplow Mini instance IP address. You will get always empty reply from the server because caddy server will be restarted after the domain name is submitted and the connection will be lost until caddy server is up and running. #### Get Snowplow Mini version ```bash /control-plane/version ``` Example using `curl`: ```bash curl -XGET http://${snowplow_mini_ip}/control-plane/version \ -u username:password ``` Returns version of the running Snowplow Mini instance. #### Uploading Iglu Server configuration ```bash /control-plane/iglu-config ``` Example using `curl`: ```bash curl http://${snowplow_mini_ip}/control-plane/iglu-config \ -F "igluserverhocon=@${path_of_the_iglu_server_config}" \ -u username:password ``` Uploads Iglu Server configuration file and restarts the Iglu Server to activate uploaded configuration. Return status 200 means that configuration is uploaded and Iglu Server is restarted successfully. --- # Introduction to Snowplow Mini > Snowplow Mini is a single-instance development environment for testing tracker updates and schema changes. > Source: https://docs.snowplow.io/docs/api-reference/snowplow-mini/ [Snowplow Mini](/docs/api-reference/snowplow-mini/) is a single-instance version of Snowplow that primarily serves as a development environment, giving you a quick way to debug tracker updates and changes to your schema and pipeline configuration. > **Tip:** For new testing environments, we recommend using [Snowplow Micro](/docs/testing/snowplow-micro/), which you can [deploy through Console](/docs/testing/snowplow-micro/console/) or [run locally](/docs/testing/snowplow-micro/local/). New Snowplow Mini deployments are no longer available through Console. You might use Snowplow Mini when: - You've updated a schema in your Development environment and wish to send some test events against it before promoting it to Production - You want to enable an Enrichment in a test environment before enabling it on Production ## Getting started New Snowplow Mini instances are no longer available through Console. For new development environments, use [Snowplow Micro](/docs/testing/snowplow-micro/console/) instead. For Snowplow Self-Hosted, see the setup guides for [AWS](/docs/api-reference/snowplow-mini/setup-guide-for-aws/) and [GCP](/docs/api-reference/snowplow-mini/setup-guide-for-gcp/). ## Conceptual diagram ![](/assets/images/image-72cd90387cfe692a867dd033688a5254.png) The diagram above illustrates how Snowplow Mini (top) works alongside your Production pipeline (bottom). By pointing your tracker(s) to the Collector on your Snowplow Mini you can send events from your applications development and QA environments to Snowplow Mini for testing. Once you are happy with the changes you have made you would then change the trackers in your application to point to the Collector on your Production pipeline.[](https://github.com/snowplow/snowplow-mini#features) ## Features of Snowplow Mini - Data is tracked and processed in real time - Your Snowplow Mini speaks to your [Schema registries](/docs/fundamentals/schemas/#iglu-schema-repository) to allow events to be sent against your custom schemas - Data is validated during processing - Data is loaded into OpenSearch and can be queried directly or through the OpenSearch Dashboard - Successfully processed events and failed events are in distinct good and bad indexes ## Topology Snowplow-Mini runs several distinct applications on the same box which are all linked by NSQ topics. In a production deployment each instance could be an Autoscaling Group and each NSQ topic would be a distinct Kinesis Stream. - Scala Stream Collector: - Starts server listening on `http://< sp mini public ip>/` which events can be sent to. - Sends "good" events to the `RawEvents` NSQ topic - Sends "bad" events to the `BadEvents` NSQ topic - Stream Enrich: - Reads events in from the `RawEvents` NSQ topic - Sends events which passed the enrichment process to the `EnrichedEvents` NSQ topic - Sends events which failed the enrichment process to the `BadEvents` NSQ topic - OpenSearch Sink Good: - Reads events from the `EnrichedEvents` NSQ topic - Sends those events to the `good` OpenSearch index - On failure to insert, writes errors to `BadElasticsearchEvents` NSQ topic - OpenSearch Sink Bad: - Reads events from the `BadEvents` NSQ topic - Sends those events to the `bad` OpenSearch index - On failure to insert, writes errors to `BadElasticsearchEvents` NSQ topic These events can then be viewed in Kibana at `http://< sp mini public ip>/kibana`. --- # Set up Snowplow Mini on AWS > Deploy Snowplow Mini on AWS for a single-instance testing environment. > Source: https://docs.snowplow.io/docs/api-reference/snowplow-mini/setup-guide-for-aws/ Snowplow Mini is, in essence, the Snowplow real time stack inside of a single image. It is an easily-deployable, single instance version of Snowplow that serves three use cases: 1. Giving a Snowplow consumer (e.g. an analyst / data team / marketing team) a way to quickly understand what Snowplow "does" i.e. what you put it at one end and take out of the other 2. Giving developers new to Snowplow an easy way to start with Snowplow and understand how the different pieces fit together 3. Giving people running Snowplow a quick way to debug tracker updates All setup for Snowplow Mini is done within the AWS Console and will incur small amounts of running costs, depending on the size of the EC2 instance you select. We offer Snowplow Mini in 3 different sizes. To decide on which size of Snowplow Mini to choose, read on. ## large & xlarge & xxlarge Mini is available in 3 different sizes: - `large` : Opensearch has `4g` heap size and Snowplow apps has `0.5g` heap size. Recommended machine RAM is `8g`. - `xlarge` : Double the large image. Opensearch has `8g` heap size and Snowplow apps has `1.5g` heap size. Recommended machine RAM is `16g`. - `xxlarge` : Double the xlarge image. Opensearch has `16g` heap size and Snowplow apps has `3g` heap size. Recommended machine RAM is `32g`. This service is available as an EC2 image within the AWS Community AMIs in the following regions: `ap-northeast-1`, `ap-northeast-2`, `ap-south-1`, `ap-southeast-1`, `ap-southeast-2`, `ca-central-1`, `eu-central-1`, `eu-west-1`, `eu-west-2`, `sa-east-1`, `us-east-1`, `us-east-2`, `us-west-1` and `us-west-2`. Version 0.25.1 (recommended) comes with: - Snowplow Collector NSQ 3.7.0 - Snowplow Enrich NSQ 6.7.1 - Snowplow Elasticsearch Loader 2.1.3 - Snowplow Iglu Server 0.14.0 - Opensearch 3.3.0 - Opensearch Dashboards 3.3.0 - Postgresql 16.10 - NSQ v1.3.0 Note: All services are configured to start automatically so everything should happily survive restarts/shutdowns. To understand the flow of data please refer to the following diagram: ![This image has an empty alt attribute; its file name is snowplow-mini-topology.jpg](/assets/images/snowplow-mini-topology-95da73899375d477bfe132b2bcdb0e19.jpg) **IAM** Create a role with the following configuration - Step 1: For `Select type of trusted entity` , select `EC2` - Step 2.1: For `Attach permissions policies` , create a policy with the following ```json { "Version" : "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": ["*"] } ] } ``` - Step 2.2: In step 2 of role creation, select the policy created in the previous step - Step 3: Tags are optional - Step 4: Fill in the role name and create it. **CloudWatch** Create a log group named `snowplow-mini` so that Mini can emit logs to this log group. Mini will not function properly if a log group with that name isn't found. ## Security Group In the EC2 Console UI select `Security Groups` from the panel on the left. Select the `Create Security Group` button and fill in the name, description and what VPC you want to attach it to. You will then need to add the following InBound rules: ![snowplow-mini-security-group-setup](/assets/images/security-groups-setup-f8d299c2b2b111f0dbd2351e83ec119e.png) - Custom TCP Rule | Port Range (80) - CIDR range `0.0.0.0/0` - Custom TCP Rule | Port Range (443) - CIDR range `0.0.0.0/0` - SSH (optional): - Custom TCP Rule | Port Range (22) - CIDR range `{{ YOUR IP HERE }}/32` For OutBound you can leave the default to allow everything out. ## Choose AMI In the EC2 Console UI select the `Launch Instance` button then select the `Community AMIs` button. In the search bar enter `snowplow-mini-0.25.1` to find the needed AMI and then select it. ## Choose Instance Type AMI names explicitly specify which instance type to use. - `0.25.1-large` needs `t2.large` - `0.25.1-xlarge` needs `t2.xlarge` - `0.25.1-xxlarge` needs `t2.2xlarge` ## Configure Instance - Select the IAM role created above. - If you created your Security Group in a different VPC than the default you will need to select the same VPC in the Network field. **NOTE**: If you select a custom VPC ensure that you select `Enable` for the Auto-assign Public IP option. ## Add Storage Depending on how long you intend to run Snowplow Mini and how much data you intend to send/store you will need to change the size of the block store accordingly. For basic testing and debugging; - 20-50 Gb should suffice for `large` - 50-100 Gb should suffice for `xlarge` - 100-200 Gb should suffice for `xxlarge` We also recommend changing the `Volume Type` to GP2 from Magnetic for a smoother experience. ## Tag Instance Add any tags you like here. ## Configure Security Group Select the Security Group you created [above](#security-group). ## Review Press the `Launch` button and select an existing key-pair, or create a new one, if you want to be able to SSH into the box. **Telemetry notice** By default, Snowplow collects telemetry data for Mini (since version 0.13.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to disable telemetry, you can do so via the [API](/docs/api-reference/snowplow-mini/control-plane-api/#configuring-telemetry). See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Set up Snowplow Mini on GCP > Deploy Snowplow Mini on GCP for a single-instance testing environment. > Source: https://docs.snowplow.io/docs/api-reference/snowplow-mini/setup-guide-for-gcp/ Snowplow Mini is, in essence, the Snowplow real time stack inside of a single image. It is an easily-deployable, single instance version of Snowplow that serves three use cases: 1. Giving a Snowplow consumer (e.g. an analyst / data team / marketing team) a way to quickly understand what Snowplow "does" i.e. what you put it at one end and take out of the other 2. Giving developers new to Snowplow an easy way to start with Snowplow and understand how the different pieces fit together 3. Giving people running Snowplow a quick way to debug tracker updates (because they can) Version 0.25.1 (recommended) comes with: - Snowplow Collector NSQ 3.7.0 - Snowplow Enrich NSQ 6.7.1 - Snowplow Elasticsearch Loader 2.1.3 - Snowplow Iglu Server 0.14.0 - Opensearch 3.3.0 - Opensearch Dashboards 3.3.0 - Postgresql 16.10 - NSQ v1.3.0 Note: All services are configured to start automatically so everything should happily survive restarts/shutdowns. To understand the flow of data please refer to the following diagram: ![](/assets/images/snowplow-mini-topology-95da73899375d477bfe132b2bcdb0e19.jpg) ## Importing public tarballs to a GCP project Our offering for GCP is 3 compressed tarballs for 3 different sized Snowplow Mini, produced through `gcloud`'s [`export`](https://cloud.google.com/sdk/gcloud/reference/compute/images/export) command. Simply put, they are Virtual Disk images exported in GCP format, a `disk.raw` file that has been tarred and gzipped. To use them within GCP console, you need to import a tarball of your choice into your GCP project. You can use `gcloud` CLI utility to do that. Browse [GCP docs](https://cloud.google.com/sdk/docs/quickstarts) to get started with `gcloud`. Assuming you have `gcloud` setup and configured for your GCP project, use `gcloud`'s [`create`](https://cloud.google.com/sdk/gcloud/reference/compute/images/create) command to import a tarball of your choice into your GCP project. A sample usage would be as following. ```bash gcloud compute images create \ imported-sp-mini \ --source-uri \ https://storage.googleapis.com/snowplow-mini/snowplow-mini-0-25-1-large-1771418956.tar.gz ``` Note that `imported-sp-mini` is a name of your choice for destination image and above URI is for large image, change it with your preferred version of Snowplow Mini. Version 0.25.1 (recommended) | L / 2 vCPUs | XL / 4 vCPUs | XXL / 8 vCPUs | | -------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | [large](https://storage.googleapis.com/snowplow-mini/snowplow-mini-0-25-1-large-1771418956.tar.gz) | [xlarge](https://storage.googleapis.com/snowplow-mini/snowplow-mini-0-25-1-xlarge-1771418987.tar.gz) | [xxlarge](https://storage.googleapis.com/snowplow-mini/snowplow-mini-0-25-1-xxlarge-1771418969.tar.gz) | You can find more about `gcloud compute images create` command [here](https://cloud.google.com/sdk/gcloud/reference/compute/images/create) for additional parameters. After importing our tarball of your choice into your project, you should see it under `Images` on `Compute Engine`. To decide on which size of Snowplow Mini to choose, read on. ## large & xlarge & xxlarge Mini is available in 3 different sizes: - `large` : Opensearch has `4g` heap size and Snowplow apps has `0.5g` heap size. Recommended machine RAM is `8g`. - `xlarge` : Double the large image. Opensearch has `8g` heap size and Snowplow apps has `1.5g` heap size. Recommended machine RAM is `16g`. - `xxlarge` : Double the xlarge image. Opensearch has `16g` heap size and Snowplow apps has `3g` heap size. Recommended machine RAM is `32g`. ## Create instance Go to `Compute Engine` on GCP console, select `Images` from menu on the left. You should see your imported image on the list. Select it then you should see `CREATE INSTANCE` button at the top of the page. Click on it. ![](/assets/images/create-instance-cc7dee2edb679cbcfab716d3f068aa49.png) ![](/assets/images/create-instance-2-b23317e188861463d0a697e7e6698441.png) ![](/assets/images/create-instance-3-7397ccd9528d0226acb3bf709bdccf5a.png) Click `Create`. These images show setup for `large` image. To setup `xlarge` or `xxlarge`, you should increase memory per above explanation of different sizes. **Telemetry notice** By default, Snowplow collects telemetry data for Mini (since version 0.13.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to disable telemetry, you can do so via the [API](/docs/api-reference/snowplow-mini/control-plane-api/#configuring-telemetry). See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. --- # Snowplow Mini usage guide > Learn how to use Snowplow Mini for testing, debugging trackers, and exploring Snowplow features. > Source: https://docs.snowplow.io/docs/api-reference/snowplow-mini/usage-guide/ ## Overview Snowplow Mini is, in essence, the Snowplow real time stack inside of a single image. It is an easily-deployable, single instance version of Snowplow that serves three use cases: 1. Giving a Snowplow consumer (e.g. an analyst / data team / marketing team) a way to quickly understand what Snowplow "does" i.e. what you put it at one end and take out of the other 2. Giving developers new to Snowplow an easy way to start with Snowplow and understand how the different pieces fit together 3. Giving people running Snowplow a quick way to debug tracker updates Jump to [First Time Usage](#first-time-usage) if it is your first time with a Mini. ## Upgrading Until version 0.15.0, Snowplow data was loaded to Elasticsearch 6.x in the Mini. However, a [licensing change](https://www.elastic.co/blog/licensing-change) in Elasticsearch prevented us from upgrading it to more recent versions. To make sure we stay up to date with important security fixes, we've decided to replace Elasticsearch with [Opensearch](https://opensearch.org/). Also, Kibana is replaced with [Opensearch Dashboards](https://opensearch.org/docs/latest/dashboards/index/). [Opensearch](https://opensearch.org/) is a fork of open source Elasticsearch 7.10. Therefore this change shouldn't affect Mini users much. To minimize the impact further, we've tried to make as minimal a change as possible. In Mini, you can still access Opensearch with the `/elasticsearch` endpoint and Opensearch Dashboards with the `/kibana` endpoint. The only breaking change this migration would bring is the removal of mapping types. It means that you don't have to provide a mapping type in your search queries anymore when accessing your data in good or bad indices. For example, good events count can be found using the following endpoint in previous versions: `/elasticsearch/good/good/_count`. Starting with 0.15.0, it can be found using this endpoint: `/elasticsearch/good/_count`. ## First time usage This section is dedicated to the steps that need performing when accessing the Snowplow Mini instance for the first time. ### Connecting to the instance for the first time You can access the Snowplow Mini instance at the `http://[public_dns]/home` address. While accessing Snowplow Mini services, HTTP authentication is required. As a result, you will be prompted for credentials which are `USERNAME_PLACEHOLDER` and `PASSWORD_PLACEHOLDER` by default. You **should** change these default credentials to something to your liking by going to the Control Plane tab (the last one) and fill the "Change username and password for basic http authentication" form towards the bottom. **Note that only alphanumeric passwords are supported.** You will then be prompted for those new credentials. ### Changing the super API key for the local Iglu schema registry As as second step, you should change the super API key for the Iglu schema registry that is bundled with Snowplow Mini. This API key can be changed via the Control Plane tab. Given that this API key must be a UUID v4, you will need to generate one by running `uuidgen` at the command line, or by using an online UUID generator like [this one](https://www.uuidgenerator.net/). Make a note of this UUID, you'll need it to upload your own event and context schemas to Snowplow Mini in the next subsection. ### Generating a pair of read/write API keys for the local Iglu schema registry > **Note:** Mini 0.8.0 comes bundled with Iglu Server 0.6.1 which introduced a couple of changes regarding this section. > > - Swagger UI of Iglu Server is deprecated, however Iglu Server still serves at `/iglu-server` endpoint. > - `POST /api/auth/keygen` no longer supports query parameter to provide vendor prefix. Use POST raw data request instead. To add schemas to the Iglu repository bundled with Snowplow Mini, you have to create a dedicated pair of API keys. There are 2 options: - Use igluctl’s [server keygen](/docs/api-reference/iglu/igluctl-2/#server-keygen) subcommand - Use any HTTP client e.g. cURL #### Option 1 First, [download igluctl](/docs/api-reference/iglu/igluctl-2/#downloading-and-running-igluctl). Following is a sample execution where `com.acme` is the vendor prefix for which we'll upload our schemas, `mini-address` is the URL of our mini and `53b4c441-84f7-467e-af4c-074ced53eb20` is an example super API key you would have created in the previous steps. ```bash /path/to/igluctl server keygen --vendor-prefix com.acme mini-address/iglu-server 53b4c441-84f7-467e-af4c-074ced53eb20 ``` #### Option 2 You can also use `cURL` to interact with Iglu Server: ```text curl --location --request POST 'mini-address/iglu-server/api/auth/keygen' \ --header 'apikey: 1b5d0459-3492-451c-aab1-7f74cbe12112' \ --header 'Content-Type: application/json' \ --data-raw '{"vendorPrefix":"com.acme"}' ``` should return a read key and a write key. ```json { "read":"bfa90866-ab14-4b92-b6ef-d421fd688b54", "write":"6175aa41-d3a7-4e4f-9fb4-3a170f3c6c16" } ``` ### Copying your Iglu repository to Snowplow Mini (optional) To test and send non-standard Snowplow events such as your own custom-contexts and unstructured events you can load them into the Iglu repository local to the Snowplow Mini instance. 1. Get a local copy of your Iglu repository which contains your schemas. This should be modelled after [this folder](https://github.com/snowplow/iglu-central/tree/master/schemas) 2. [Download igluctl](/docs/api-reference/iglu/igluctl-2/#downloading-and-running-igluctl). 3. Run the executable with the following input: - The address of the Iglu repository: `http://[public_dns]/iglu-server` - The Super API Key you created previously - The path to your schemas For example to load the `iglu-central` repository into Iglu Server: ```bash /path/to/igluctl static push iglu-central/schemas http://[public_dns]/iglu-server 980ae3ab-3aba-4ffe-a3c2-3b2e24e2ffce --public ``` Note: this example assumes the `iglu-central` repository has been cloned in the same directory as where executable is run. 1. After uploading the schemas, you will need to clear the cache with the restart button under the Control Plane tab in the Snowplow Mini dashboard. ### Setting up HTTPS (optional) If you want to use HTTPS to connect to Snowplow Mini, you need to submit a domain name via the Control Plane. Make sure that the domain name you submit is redirected to the IP of the server Snowplow Mini is running from. ## Sending events to Snowplow Mini Now that the first time usage steps have been dealt with, you can send some events! ### Example events An easy way to quickly send a few test events is to use our example web page. 1. Open up the Snowplow Mini UI at: `http://[public_dns]/home` 2. Login with username and password which you choose in step 2.1 3. Select the `Example Events` tab 4. Press the event triggering buttons on the page! ### Events from tracker You can instrument any other Snowplow tracker by specifying the collector URL as the public DNS of the Snowplow Mini instance. ## Accessing the Opensearch API Snowplow Mini makes the Opensearch (or previously Elasticsearch) HTTP API available at `http://[public_dns]/elasticsearch`, you can check it's working by: - Checking the Opensearch API is available: - `curl --user username:password http://[public_dns]/elasticsearch` - You should see a `200 OK` response - Checking the number of good events we sent in step 3: - `curl --user username:password http://[public_dns]/elasticsearch/good/_count` - You should see the appropriate count of sent events ## Viewing the data in Opensearch Dashboards Data sent to Snowplow Mini will be processed and loaded into Opensearch in real time. In turn, it will be made available in Opensearch Dashboards. To view the data in Opensearch Dashboards, navigate in your browser to `mini-public-address/kibana`. ### Index patterns Snowplow Mini comes with two index patterns: - `good` : For good events, indexed on `collector_tstamp` - `bad` : For bad events, indexed in `data.failure.timestamp` ### Discover your data Browse to `mini-public-address/kibana` , once Opensearch Dashboards is loaded, you should be able to view most recently sent good events via the discover interface: You can then inspect any individual event data in the UI by unfolding a payload: ![](/assets/images/Screen-Shot-2020-04-13-at-13.20.22-8410d743b7a7a1261de9528d561c8aa7.jpg) If you want to inspect bad events, click on `good`, placed towards top left of the screen and select `bad` from drop-down list. ![](/assets/images/Screen-Shot-2020-04-13-at-13.32.26-1b9f1eaae978503d3f587585563148df.jpg) Unfold any payload to inspect a bad event in detail. ![](/assets/images/Screen-Shot-2020-04-13-at-13.23.16-970d83b883ef507d69792cf0f65de9eb.jpg) ## Resetting Opensearch indices As of 0.13.0, it is possible to reset Opensearch (or previously Elasticsearch) indices, along with the corresponding index patterns in Opensearch Dashboards, through Control Plane API. ```bash curl -L \ -X POST '/control-plane/reset-service' \ -u ':' \ -H 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'service_name=elasticsearch' ``` Note that resetting deletes not only indices and patterns but also all events stored so far. ## Restart services individually As of 0.13.0, it is possible to restart services one by one. ```bash curl -L \ -X PUT '/control-plane/restart-service' \ -u ':' \ -H 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'service_name=' ``` where `service_name` can be one of the following: `collector`, `enrich`, `esLoaderGood`, `esLoaderBad`, `iglu`, `kibana`, `elasticsearch`. ## Configuring telemetry See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information on telemetry. HTTP GET to get current configuration ```bash curl -L -X GET '/control-plane/telemetry' -u ':' ``` HTTP PUT to set it (use true or false as value of key `disable` to turn it on or off) ```bash curl -L -X PUT '/control-plane/telemetry' -u ':' -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode 'disable=false' ``` ## Uploading custom enrichments You can add new custom enrichments via the Control Plane tab. The only thing you have to do is submit the enrichment configuration file which you created according to the documentation in [Available Enrichments](/docs/pipeline/enrichments/available-enrichments/). If the enrichment relies on additional schemas these should be uploaded to the Iglu repository. ## Adding a custom schema Since Mini 0.8.0 deprecated Swagger UI of Iglu Server, we have 2 options: - Use igluctl’s [static push](/docs/api-reference/iglu/igluctl-2/#static-push) subcommand to put our custom schema into the Iglu Server - Use any HTTP client e.g. cURL #### Option 1 First, [download igluctl](/docs/api-reference/iglu/igluctl-2/#downloading-and-running-igluctl). Following is a sample execution where `path-to-schema(s)` is the path to custom schema(s) , `mini-address` is the URL of our mini and `53b4c441-84f7-467e-af4c-074ced53eb20` is an example super API key you would have created in the previous steps. ```bash /path/to/igluctl static push path-to-schema(s) mini-address/iglu-server 53b4c441-84f7-467e-af4c-074ced53eb20 ``` #### Option 2 You can also use `cURL` to interact with Iglu Server: ```bash curl mini-address/iglu-server/api/schemas -X POST \ -H "apikey: YOUR_APIKEY" -d '{"json": YOUR_JSON}' ``` The command will produce a response like this one, if no errors are encountered: ```json { "message": "Schema created", "updated": false, "location": "iglu:com.acme/ad_click/jsonschema/1-0-0", "status":201 } ``` ## Adding an external Iglu repository If you already have an external Iglu repository available, instead of copying it inside the Iglu repository bundled with the Snowplow Mini instance as shown in 2.4, you can add it directly with the Control Plane's `Add an external Iglu repository` form. Note that if you're using a static repository hosted on S3, you can omit providing an API key. ## Runtime metrics Mini 0.12.0 introduced /metrics endpoint powered by [cAdvisor](https://github.com/google/cadvisor) . You can also find the link to metrics on the home page under Quicklinks header. It's been possible to observe runtime metrics of a Mini instance by looking at AWS/GCP monitoring dashboards however internal services' individual metrics weren't exposed, making it more difficult to diagnose issues. Exposing runtime metrics such as CPU, RAM and Network usage of the internal services in real time will make Mini more transparent, hopefully making it easier to understand what's going on under the hood. ## Logs As of Mini 0.12.0, application logs of the Mini sub-services are exported to Cloudwatch on AWS and Cloud Logging on GCP. On AWS, each individual service emits its' logs under a specific log stream within `snowplow-mini` log group. On GCP, we need to make use of filters to see the logs of a specific component. The recommended approach is as following: - On GCP console, go to Logging > Logs Viewer - Under Query Builder, select resource - Under `VM instance`, select the instance Mini is running at - Click on `Add` Click on `Run Query` and we should see logs of all services combined. To see the logs of a specific component, add the following filter to the query: jsonPayload.container.name="/service-name" where service-name can be one of the following: `elasticsearch`, `kibana`, `elasticsearch-loader-good`, `elasticsearch-loader-bad`, `nsqlookupd`, `nsqd`, `nsqadmin`, `scala-stream-collector-nsq`, `stream-enrich-nsq` An example query looks as following: resource.type="gce\_instance" resource.labels.instance\_id="3778299199368430127" jsonPayload.container.name="/elasticsearch" --- # Collector configuration reference > Complete configuration reference for the Collector HOCON config file, including common options, sink-specific settings, cookie management, networking, and TLS configuration. > Source: https://docs.snowplow.io/docs/api-reference/stream-collector/configure/ This is a complete list of the options that can be configured in the collector HOCON config file. The [example configs in github](https://github.com/snowplow/stream-collector/tree/master/examples) show how to prepare an input file. Some features are described in more detail at the bottom of this page. ### License The collector is released under the [Snowplow Limited Use License](/limited-use-license-1.1/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To accept the terms of license and run the collector, set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable. Alternatively, you can configure the `collector.license.accept` option, like this: ```hcl collector { license { accept = true } } ``` ### Common options | parameter | description | | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `collector.interface` | Required. E.g. `0.0.0.0`. The collector listens for http requests on this interface. | | `collector.port` | Required. E.g. `80`. The collector listens for http requests on this port. | | `collector.ssl.enable` | Optional. Default: `false`. The collector will also listen for https requests on a different port. | | `collector.ssl.port` | Optional. Default: `443`. The port on which to listen for https requests. | | `collector.ssl.redirect` | Optional. Default: `false`. If enabled, the collector redirects http requests to the https endpoint using a `301` status code. | | `collector.hsts.enable` _(since 3.1.0)_ | Default: `false`. Whether to send an [HSTS header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security). | | `collector.hsts.maxAge` _(since 3.1.0)_ | Default: `365 days`. The maximum age for the HSTS header. | | `collector.paths` | Optional. More details about this feature below. This is for customising the collector's endpoints. You can also map any valid (ie, two-segment) path to one of the three default paths. | | `collector.p3p.policyRef` | Optional. Default: `/w3c/p3p.xml`. Configures the p3p http header. | | `collector.p3p.CP` | Optional. Default: `NOI DSP COR NID PSA OUR IND COM NAV STA`. Configures the p3p http header. | | `collector.crossDomain.enabled` | Optional. Default: `false`. If enabled, `/crossdomain.xml` endpoint returns a cross domain policy file. | | `collector.crossDomain.domains` | Optional. Default: `[*]` meaning the cross domain policy file allows all domains. This can also be changed to a list of domains. | | `collector.crossDomain.secure` | Optional. Default: `true`. Whether cross domain policy file grants access to just HTTPS or both HTTP and HTTPS sources | | `collector.cookie.enabled` | Optional. Default: `true`. The collector sets a cookie to set the user's network user id. Changing this to `false` disables setting cookies. Regardless of this setting, if the collector receives a request with the custom `SP-Anonymous:*` header, no cookie will be set. You can control whether this header is set or not in your tracking implementation. | | `collector.cookie.expiration` | Optional. Default: `365 days`. Expiry of the collector's cookie. | | `collector.cookie.name` | Optional. Default: `sp`. Name of the collector's cookie. | | `collector.cookie.domains` | Optional. Default: no domains. There is more details about this feature below. This is for fine control over the cookie's domain attribute. | | `collector.cookie.fallbackDomain` | Optional. If set, the fallback domain will be used for the cookie if none of the `Origin` header hosts matches the list of cookie domains. | | `collector.cookie.secure` | Optional. Default: `true`. Sets the `secure` property of the cookie. | | `collector.cookie.httpOnly` | Optional. Default: `true`. Sets the `httpOnly` property of the cookie. We recommend `true` because `httpOnly` cookies are allowed a longer expiry time by web browsers. | | `collector.cookie.sameSite` | Optional. Default: `None`. Sets the `sameSite` property of the cookie. Possible values: `Strict`, `Lax`, `None`. | | `collector.cookie.clientCookieName` (since 3.4.0) | Optional. Default: not set. If a name is specified (e.g. `sp_client`), the collector sets an extra cookie with that name, `httpOnly=false` and the same value as the main cookie (network user id). This is useful if you need to access the network user id on the client side using JavaScript. | | `collector.doNotTrackCookie.enabled` | Optional. Default: `false`. If enabled, the collector respects a "do not track" cookie. If the cookie is present, it returns a `200` status code but it does not log the request to the output queue. | | `collector.doNotTrackCookie.name` | Required when `doNotTrackCookie` feature is enabled. Configures the name of the cookie in which to check if tracking is disabled. | | `collector.doNotTrackCookie.value` | Required when `doNotTrackCookie` feature is enabled. Can be a regular expression. The value of the cookie must match this expression in order for the collector to respect the cookie. | | `collector.cookieBounce.enabled` | Optional. Default: `false`. When enabled, when the cookie is missing, the collector performs a redirect to itself to check if third-party cookies are blocked using the specified name. If they are indeed blocked, `fallbackNetworkId` is used instead of generating a new random one. | | `collector.cookieBounce.name` | Optional. Default: `n3pc`. Name of the request parameter which will be used on redirects checking that the third-party cookies work. | | `collector.cookieBounce.fallbackNetworkUserId` | Optional. Default: `00000000-0000-4000-A000-000000000000`. Network user id to use when third-party cookies are blocked. | | `collector.cookieBounce.forwardedProtocolHeader` | Optional. E.g. `X-Forwarded-Proto`. The header containing the originating protocol for use in the bounce redirect location. Use this if behind a load balancer that performs SSL termination. | | `collector.enableDefaultRedirect` | Optional. Default: `false`. When enabled, the collector's `/r` endpoint returns a `302` status code with a redirect back to a url specified with the `?u=` query parameter. | | `collector.redirectDomains` (since _2.5.0_) | Optional. Default: empty. Domains which are valid for collector redirects. If empty then redirects are allowed to any domain. Must be an exact match. | | `collector.redirectMacro.enabled` | Optional. Default: `false`. When enabled, the redirect url passed via the `u` query parameter is scanned for a `placeholder` token. All occurrences of the placeholder are substituted with the cookie's network user id. | | `collector.redirectMacro.placeholder` | Optional. Default: `${SP_NUID}`. | | `collector.rootResponse.enabled` | Optional. Default: `false`. Enable custom response handling for the root `"/"` path. | | `collector.rootResponse.statusCode` | Optional. Default: `302`. The http status code to use when root response is enabled. | | `collector.rootResponse.headers` | Optional. A map of key value pairs to include in the root response headers. | | `collector.rootResponse.body` | Optional. The http response body to use when root response is enabled. | | `collector.cors.accessControlMaxAge` | Optional. Default: `60 minutes`. Configures how long a the results of a preflight request can be cached by the browser. `-1` seconds disables the cache. | | `collector.preTerminationPeriod` (since _2.5.0_) | Optional. Default: `10 seconds`. Configures how long the collector should pause after receiving a sigterm before starting the graceful shutdown. During this period the collector continues to accept new connections and respond to requests. | | `collector.prometheusMetrics.enabled` (deprecated since _2.6.0_) | Optional. Default: `false`. When enabled, all requests are logged as prometheus metrics and the `/metrics` endpoint returns the report about the metrics. | | `collector.prometheusMetrics.durationBucketsInSeconds` (deprecated since _2.6.0_) | Optional. E.g. `[0.1, 3, 10]`. Custom buckets for the `http_request_duration_seconds_bucket` duration prometheus metric. | | `collector.telemetry.disable` | Optional. Set to `true` to disable [telemetry](/docs/get-started/self-hosted/telemetry/). | | `collector.telemetry.userProvidedId` | Optional. See [here](/docs/get-started/self-hosted/telemetry/#how-can-i-help) for more information. | | `collector.compression.enabled` (since _3.6.0_) | Optional. Default: `false`. Enable compression on the output. Compression should only be enabled with Enrich >=6.1.0. | | `collector.compression.type` (since _3.6.0_) | Optional. Default: `zstd`. Compression algorithm to use. | | `collector.compression.gzipCompressionLevel` (since _3.6.0_) | Optional. Default: `6`. The compression level for GZIP compression. It is between 1 and 9. Lower levels have faster compression speed, but worse compression ratio. | | `collector.compression.zstdCompressionLevel` (since _3.6.0_) | Optional. Default: `9`. The compression level for ZSTD compression. It is between 1 and 22. Lower levels have faster compression speed, but worse compression ratio. | ### Kinesis collector options | parameter | description | | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `collector.streams.good` | Required. Name of the output kinesis stream for successfully collected events. | | `collector.streams.bad` | Required. Name of the output kinesis stream for http requests which could not be written to the good stream. For example, if the event size exceeds the kinesis limit of 1MB. | | `collector.streams.useIpAddressAsPartitionKey` (deprecated since _3.5.0_) | Optional. Default: `false`. Whether to use the user's IP address as the kinesis partition key. | | `collector.streams.{good,bad}.region` | Optional. Default: `eu-central-1`. AWS region of the kinesis streams. | | `collector.streams.{good,bad}.customEndpoint` | Optional. Override aws kinesis endpoint. Can be helpful when using localstack for testing. | | `collector.streams.{good,bad}.threadPoolSize` | Optional. Default: `10`. Thread pool size used by the collector sink for asynchronous operations. | | `collector.streams.{good,bad}.sqsMaxBytes` | Optional. Set to the name of a SQS topic to enable buffering of good output events. When messages cannot be sent to Kinesis, (e.g. because of exceeding api limits) then they get sent to SQS as a fallback. Helpful for smoothing over traffic spikes. | | `collector.streams.good.sqsGoodBuffer` | Optional. Set to the name of a SQS topic to enable buffering of good output events. When messages cannot be sent to Kinesis, (e.g. because of exceeding api limits) then they get sent to SQS as a fallback. Helpful for smoothing over traffic spikes. | | `collector.streams.bad.sqsBadBuffer` | Optional. Like the `sqsGoodBuffer` but for failed events. | | `collector.streams.{good,bad}.aws.accessKey` | Required. Set to `default` to use the default provider chain; set to `iam` to use AWS IAM roles; or set to `env` to use `AWS_ACCESS_KEY_ID` environment variable. | | `collector.streams.{good,bad}.aws.secretKey` | Required. Set to `default` to use the default provider chain; set to `iam` to use AWS IAM roles; or set to `env` to use `AWS_SECRET_ACCESS_KEY` environment variable. | | `collector.streams.{good,bad}.maxBytes` (since _2.9.0_) | Optional. Default: `1000000` (1 MB). Maximum number of bytes that a single record can contain. If a record is bigger, a size violation failed event is emitted instead. If SQS buffer is activated, `sqsMaxBytes` is used instead. | | `collector.streams.{good,bad}.sqsMaxBytes` | Optional. Default: `192000` (192 kb). Maximum number of bytes that a single record can contain. If a record is bigger, a size violation failed event is emitted instead. SQS has a record size limit of 256 kb, but records are encoded with Base64, which adds approximately 33% of the size, so we set the limit to `256 kb * 3/4`. | | `collector.streams.{good,bad}.startupCheckInterval` (since _2.9.0_) | Optional. Default: `1 second`. When collector starts, it checks if Kinesis streams exist with `describeStreamSummary` and if SQS buffers exist with `getQueueUrl` (if configured). This is the interval for the calls. `/sink-health` is made healthy as soon as requests are successful or records are successfully inserted. | | `collector.streams.backoffPolicy.minBackoff` | Optional. Default: `3000`. Time (in milliseconds) for retrying sending to kinesis / SQS after failure. | | `collector.streams.backoffPolicy.maxBackoff` | Optional. Default: `600000`. Time (in milliseconds) for retrying sending to kinesis / SQS after failure. | | `collector.streams.buffer.byteLimit` | Optional. Default: `3145728`. Incoming events are stored in an internal buffer before being sent to Kinesis. This configures the maximum total size of pending events. | | `collector.streams.buffer.recordLimit` | Optional. Default: `500`. Configures the maximum number of pending events before flushing to Kinesis. | | `collector.streams.buffer.timeLimit` | Optional. Default: `5000`. Configures the maximum time in milliseconds before flushing pending buffered events to Kinesis. | ### SQS collector options | parameter | description | | ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `collector.streams.good` | Required. Name of the output SQS queue for successfully collected events. | | `collector.streams.bad` | Required. Name of the output SQS queue for http requests which could not be written to the good stream. For example, if the event size exceeds the SQS limit of 256KB. | | `collector.streams.useIpAddressAsPartitionKey` (deprecated since _3.5.0_) | Optional. Default: `false`. Whether to use the user's IP address as the Kinesis partition key. This is attached to the SQS message as an attribute, with the aim of using it if the events ultimately end up in Kinesis. | | `collector.streams.{good,bad}.region` | Optional. Default: `eu-central-1`. AWS region of the SQS queues. | | `collector.streams.{good,bad}.threadPoolSize` | Optional. Default: `10`. Thread pool size used by the collector sink for asynchronous operations. | | `collector.streams.{good,bad}.aws.accessKey` | Required. Set to `default` to use the default provider chain; set to `iam` to use AWS IAM roles; or set to `env` to use `AWS_ACCESS_KEY_ID` environment variable. | | `collector.streams.{good,bad}.aws.secretKey` | Required. Set to `default` to use the default provider chain; set to `iam` to use AWS IAM roles; or set to `env` to use `AWS_SECRET_ACCESS_KEY` environment variable. | | `collector.streams.{good,bad}.maxBytes` (since _2.9.0_) | Optional. Default: `192000` (192 kb). Maximum number of bytes that a single record can contain. If a record is bigger, a size violation failed event is emitted instead. SQS has a record size limit of 256 kb, but records are encoded with Base64, which adds approximately 33% of the size, so we set the limit to `256 kb * 3/4`. | | `collector.streams.{good,bad}.startupCheckInterval` (since _2.9.0_) | Optional. Default: `1 second`. When collector starts, it checks if SQS buffers exist with `getQueueUrl`. This is the interval for the calls. `/sink-health` is made healthy as soon as requests are successful or records are successfully inserted. | | `collector.streams.backoffPolicy.minBackoff` | Optional. Default: `3000`. Time (in milliseconds) for retrying sending to SQS after failure. | | `collector.streams.backoffPolicy.maxBackoff` | Optional. Default: `600000`. Time (in milliseconds) for retrying sending to SQS after failure. | | `collector.streams.buffer.byteLimit` | Optional. Default: `3145728`. Incoming events are stored in an internal buffer before being sent to SQS. This configures the maximum total size of pending events. | | `collector.streams.buffer.recordLimit` | Optional. Default: `500`. Configures the maximum number of pending events before flushing to SQS. | | `collector.streams.buffer.timeLimit` | Optional. Default: `5000`. Configures the maximum time in milliseconds before flushing pending buffered events to SQS. | ### Pubsub collector options | parameter | description | | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `collector.streams.good` | Required. Name of the output Pubsub topic for successfully collected events. | | `collector.streams.bad` | Required. Name of the output pubsub topic for http requests which could not be written to the good stream. For example, if the event size exceeds the Pubsub limit of 10MB. | | `collector.streams.sink.{good,bad}.googleProjectId` | Required. GCP project name. | | `collector.streams.sink.{good,bad}backoffPolicy.minBackoff` (deprecated since _3.5.0_) | Optional. Default: `1000`. Time (in milliseconds) for retrying sending to Pubsub after failure. | | `collector.streams.sink.{good,bad}.backoffPolicy.maxBackoff` (deprecated since _3.5.0_) | Optional. Default: `1000`. Time (in milliseconds) for retrying sending to Pubsub after failure | | `collector.streams.sink.{good,bad}.backoffPolicy.totalBackoff` (deprecated since _3.5.0_) | Optional. Default: `9223372036854`. We set this to the maximum value so that we never give up on trying to send a message to pubsub. | | `collector.streams.sink.{good,bad}.backoffPolicy.multipler` (deprecated since _3.5.0_) | Optional. Default: `2`. Multiplier between two periods. | | `collector.streams.sink.{good,bad}.backoffPolicy.initialRpcTimeout` (deprecated since _3.5.0_) | Optional. Default: `10000`. Time (in milliseconds) before a RPC call to Pubsub is aborted and retried. | | `collector.streams.sink.{good,bad}.backoffPolicy.maxRpcTimeout` (deprecated since _3.5.0_) | Optional. Default: `10000`. Maximum time (in milliseconds) before RPC call to Pubsub is aborted and retried. | | `collector.streams.sink.{good,bad}.backoffPolicy.rpcTimeoutMultipler` (deprecated since _3.5.0_) | Optional. Default: `2`. How RPC timeouts increase as they are retried. | | `collector.streams.sink.{good,bad}..maxBytes` (since _2.9.0_) | Optional. Default: `10000000` (10 MB). Maximum number of bytes that a single record can contain. If a record is bigger, a size violation failed event is emitted instead. | | `collector.streams.sink.{good,bad}.startupCheckInterval` (since _2.9.0_) | Optional. Default: `1 second`. When collector starts, it checks if PubSub topics exist with `listTopics`. This is the interval for the calls. `/sink-health` is made healthy as soon as requests are successful or records are successfully inserted. | | `collector.streams.sink.{good,bad}.retryInterval` (since _2.9.0_) | Optional. Default: `10 seconds`. Collector uses built-in retry mechanism of PubSub API. In case of failure of these retries, the events are added to a buffer and every `retryInterval` collector retries to send them. | | `collector.streams.{good,bad}.buffer.byteLimit` | Optional. Default: `1000000`. Incoming events are stored in an internal buffer before being sent to Pubsub. This configures the maximum total size of pending events | | `collector.streams.{good,bad}.buffer.recordLimit` | Optional. Default: `40`. Maximum number of pending events before flushing to Pubsub. | | `collector.streams.{good,bad}.buffer.timeLimit` | Optional. Default: `1000`. Maximum time (in milliseconds) before flushing pending buffered events to Pubsub. | ### Kafka collector options | parameter | description | | --------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `collector.streams.good` | Required. Name of the output Kafka topic for successfully collected events. | | `collector.streams.bad` | Required. Name of the output Kafka topic for http requests which could not be written to the good stream. | | `collector.streams.{good,bad}.brokers` | Required. A list of host:port pairs to use for establishing the initial connection to the Kafka cluster. | | `collector.streams.{good,bad}.producerConf` | Optional. Kafka producer configuration. See [the docs](https://kafka.apache.org/documentation/#producerconfigs) for all properties. | | `collector.streams.{good,bad}.maxBytes` | Optional. Default: `1000000` (1 MB). Maximum number of bytes that a single record can contain. If a record is bigger, a size violation failed event is emitted instead. | | `collector.streams.{good,bad}.startupCheckInterval` | Optional. Default: `10 second`. When collector starts, it checks if Kafka topics exist. This is the interval for the calls. `/sink-health` is made healthy as soon as requests are successful or records are successfully inserted. | | `collector.streams.{good,bad}.buffer.byteLimit` | Optional. Default: `3145728`. Incoming events are stored in an internal buffer before being sent to Kafka. This configures the maximum total size of pending events. | | `collector.streams.{good,bad}.buffer.recordLimit` | Optional. Default: `500`. Configures the maximum number of pending events before flushing to Kafka. | | `collector.streams.{good,bad}.buffer.timeLimit` | Optional. Default: `5000`. Configures the maximum time in milliseconds before flushing pending buffered events to Kafka. | | `collector.streams.{good,bad}.retryInterval` | Optional. Default: `10 seconds`. Collector uses built-in retry mechanism of Kafka API. In case of failure of these retries, the events are added to a buffer and every `retryInterval` collector retries to send them. | ### Setting the domain name Set the cookie name using the `collector.cookie.name` setting. To maintain backward compatibility with earlier versions of the collector, use the string "sp" as the cookie name. The collector responds to valid requests with a `Set-Cookie` header, which may or may not specify a `domain` for the cookie. If no domain is specified, the cookie will be set against the full collector domain, for example `collector.snplow.com`. That will mean that applications running elsewhere on `*.snplow.com` won't be able to access it. If you don't need to grant access to the cookie from other applications on the domain, then you can ignore the `domains` and `fallbackDomain` settings. In earlier versions, you could specify a `domain` to tie the cookie to. For example, if set to `.snplow.com`, the cookie would have been accessible to other applications running on `*.snplow.com`. To do the same in this version, use the `fallbackDomain` setting but **make sure** that you no longer include a leading dot: ```properties fallbackDomain = "snplow.com" ``` The cookie set by the collector can be treated differently by browsers, depending on whether it's considered to be a first-party or a third-party cookie. In earlier versions (0.15.0 and earlier), if you had two collector endpoints, one on `collector.snplow.com` and one on `collector.snplow.net`, you could only specify one of those domains in the configuration. That meant that you were only able to set a first-party cookie server-side on either `.snplow.com` or `.snplow.net`, but not on both. From version 0.16.0, you can specify a list of domains to be used for the cookie (**note the lack of a leading dot**): ```properties domains = [ "snplow.com" "snplow.net" ] ``` Which domain to be used in the `Set-Cookie` header is determined by matching the domains from the `Origin` header of the request to the specified list. The first match is used. If no matches are found, the fallback domain will be used, if configured. If no `fallbackDomain` is configured, the cookie will be tied to the full collector domain. If you specify a main domain in the list, all subdomains on it will be matched. If you specify a subdomain, only that subdomain will be matched. Examples: - `domain.com` will match `Origin` headers like `domain.com`, `www.domain.com` and `secure.client.domain.com` - `client.domain.com` will match an `Origin` header like `secure.client.domain.com` but not `domain.com` or `www.domain.com`. ### Configuring custom paths The collector responds with a cookie to requests with a path that matches the `vendor/version` protocol. The expected values are: - `com.snowplowanalytics.snowplow/tp2` for Tracker Protocol 2 - `r/tp2` for redirects - `com.snowplowanalytics.iglu/v1` for the Iglu Webhook. You can also map any valid (ie, two-segment) path to one of the three defaults via the `collector.paths` section of the configuration file. Your custom path must be the key and the value must be one of the corresponding default paths. Both must be full valid paths starting with a leading slash: ```properties paths { "/com.acme/track" = "/com.snowplowanalytics.snowplow/tp2" "/com.acme/redirect" = "/r/tp2" "/com.acme/iglu" = "/com.snowplowanalytics.iglu/v1" } ``` ### TLS port binding and certificate (2.4.0+) Since 2.4.0 TLS certificates are configured using JVM system parameters. The "`Customizing JSSE`" section in [Java 11 JSSE reference documentation](https://docs.oracle.com/en/java/javase/11/security/java-secure-socket-extension-jsse-reference-guide.html#GUID-A41282C3-19A3-400A-A40F-86F4DA22ABA9) explains all system properties in detail. The following JVM properties are the ones to be used most of the time. | System Property | Customized Item | Default | Notes | | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `javax.net.ssl.keyStore` | Default keystore; see [Customizing the Default Keystores and Truststores, Store Types, and Store Passwords](https://docs.oracle.com/en/java/javase/11/security/java-secure-socket-extension-jsse-reference-guide.html#GUID-7D9F43B8-AABF-4C5B-93E6-3AFB18B66150) | None | | | `javax.net.ssl.keyStorePassword` | Default keystore password; see [Customizing the Default Keystores and Truststores, Store Types, and Store Passwords](https://docs.oracle.com/en/java/javase/11/security/java-secure-socket-extension-jsse-reference-guide.html#GUID-7D9F43B8-AABF-4C5B-93E6-3AFB18B66150) | None | It is inadvisable to specify the password in a way that exposes it to discovery by other users. **Password can not be empty.** | | `javax.net.ssl.keyStoreType` | Default keystore type; see [Customizing the Default Keystores and Truststores, Store Types, and Store Passwords](https://docs.oracle.com/en/java/javase/11/security/java-secure-socket-extension-jsse-reference-guide.html#GUID-7D9F43B8-AABF-4C5B-93E6-3AFB18B66150) | `PKCS12` | | | `jdk.tls.server.cipherSuites` | Server-side default enabled cipher suites. See [Specifying Default Enabled Cipher Suites](https://docs.oracle.com/en/java/javase/11/security/java-secure-socket-extension-jsse-reference-guide.html#GUID-D61663E8-2405-4B2D-A1F1-B8C7EA2688DB) | See [SunJSSE Cipher Suites](https://docs.oracle.com/en/java/javase/11/security/oracle-providers.html#GUID-7093246A-31A3-4304-AC5F-5FB6400405E2__SUNJSSE_CIPHER_SUITES) to determine which cipher suites are enabled by default | Caution: These system properties can be used to configure weak cipher suites, or the configured cipher suites may be weak in the future. It is not recommended that you use these system properties without understanding the risks. | | `jdk.tls.server.protocols` | Default handshaking protocols for TLS/DTLS servers. See [The SunJSSE Provider](https://docs.oracle.com/en/java/javase/11/security/oracle-providers.html#GUID-7093246A-31A3-4304-AC5F-5FB6400405E2) | None | To configure the default enabled protocol suite in the server-side of a SunJSSE provider, specify the protocols in a comma-separated list within quotation marks. The protocols in this list are standard SSL protocol names as described in [Java Security Standard Algorithm Names](https://docs.oracle.com/en/java/javase/11/docs/specs/security/standard-names.html). Note that this System Property impacts only the default protocol suite (SSLContext of the algorithms SSL, TLS and DTLS). If an application uses a version-specific SSLContext (SSLv3, TLSv1, TLSv1.1, TLSv1.2, TLSv1.3, DTLSv1.0, or DTLSv1.2), or sets the enabled protocol version explicitly, this System Property has no impact. | A deployment would need a TLS certificate, preferably issued by a trusted CA, however, for test purposes, a TLS cert could be generated as follows ```bash ssl_dir=/opt/snowplow/ssl mkdir -p ${ssl_dir} sudo openssl req \ -x509 \ -newkey rsa:4096 \ -keyout ${ssl_dir}/collector_key.pem \ -out ${ssl_dir}/collector_cert.pem \ -days 3650 \ -nodes \ -subj "/C=UK/O=Acme/OU=DevOps/CN=*.acme.com" sudo openssl pkcs12 \ -export \ -out ${ssl_dir}/collector.p12 \ -inkey ${ssl_dir}/collector_key.pem \ -in ${ssl_dir}/collector_cert.pem \ -passout pass:changeme sudo chmod 644 ${ssl_dir}/collector.p12 ``` and then the collector (kinesis as example) could be started as follows ```bash config_dir=/opt/snowplow/config docker run \ -d \ --name scala-stream-collector \ --restart always \ --network host \ -v ${config_dir}:/snowplow/config \ -v ${ssl_dir}:/snowplow/ssl \ -p 8080:8080 \ -p 8443:8443 \ -e 'JAVA_OPTS=-Xms2g -Xmx2g -Djavax.net.ssl.keyStoreType=pkcs12 -Djavax.net.ssl.keyStorePassword=changeme -Djavax.net.ssl.keyStore=/snowplow/ssl/collector.p12 -Dorg.slf4j.simpleLogger.defaultLogLevel=warn -Dcom.amazonaws.sdk.disableCbor' \ snowplow/scala-stream-collector-kinesis:2.5.0 \ --config /snowplow/config/snowplow-stream-collector-kinesis-2.5.0.hocon ``` Note: If you don't have a verified certificate, you need to disable SSL verification on client side. e.g. `cURL`'s `-k, --insecure` flag disables it. ### Setting up an SQS buffer (2.0.0+) On AWS, the lack of auto-scaling in Kinesis results in throttled streams in case of traffic spikes and the collector starts accumulating events to retry them later. If accumulation continues long enough, the collector will run out of memory. To prevent the possibility of a broken collector, we decided to make it possible to configure an SQS buffer which can provide additional assurance during extreme traffic spikes. SQS is used to queue any message that the collector failed to send to Kinesis. The [Snowbridge application](/docs/api-reference/snowbridge/) can then read the messages from SQS and write them to Kinesis once Kinesis is ready. In the event of any AWS API glitches, there is a retry mechanism which retries sending to the SQS queue 10 times. The keys set up for the Kinesis stream are stored as SQS message attributes in order to preserve the information. > **Warning:** The SQS messages cannot be as big as Kinesis messages. The limit is 256kB per message, but we send the messages as Base64 encoded, so the limit goes down to 192kB for the original message. #### Setting up the SQS queues > **Note:** This section only applies to the case when SQS is used as a fallback sink when Kinesis is unavailable. If you are using SQS as the primary sink, then the settings below should be ignored and the `good` and `bad` streams should be configured as normal under `streams.good` and `streams.bad` respectively. To start using this feature, you will first need to set up the SQS queues. Two separate queues are required for good (raw) events and bad events. The Collector then needs to be informed about the queue names, and this can be done by adding these as entries to `config.hocon`: ```properties sqsGoodBuffer = {good-sqs-queue-url} sqsBadBuffer = {bad-sqs-queue-url} ``` ### Networking Since version 3.0.0 networking settings are configured in its own `collector.networking` section: | parameter | description | | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `collector.networking.maxConnections` (since _3.0.0_) | Optional. Default: `1024`. Maximum number of concurrent active connection. | | `collector.networking.idleTimeout` (since _3.0.0_) | Optional. Default: `610 seconds`. Maximum inactivity time for a network connection. If no data is sent within that time, the connection is closed. | | `collector.networking.bodyReadTimeout` (since _3.7.0_) | Optional. Default: `25 seconds`. Maximum time from receiving request headers to receiving end of request body. If exceeded returns a 408 Request Timeout. | | `collector.networking.responseHeaderTimeout` (since _3.2.0_) | Optional. Default: `30 seconds`. Time from when the request is made until a response line is generated before a 503 response is returned. It is recommended to make this slightly larger than `bodyReadTimeout`. | | `collector.networking.maxRequestLineLength` (since _3.2.0_) | Optional. Default: `20480`. Maximum request line to parse. If exceeded returns a 400 Bad Request. | | `collector.networking.maxHeadersLength` (since _3.2.0_) | Optional. Default: `40960`. Maximum data that composes the headers. If exceeded returns a 400 Bad Request. | | `collector.networking.maxPayloadSize` (since _3.3.0_) | Optional. Default: `1048576` (1 MB). Maximum size of an event within payload allowed before emitting an Size Violation event. Returns 200 OK. | | `collector.networking.dropPayloadSize` (since _3.3.0_) | Optional. Default: `2097152` (2 MB). Maximum body payload size allowed before rejecting the request. If exceeded returns a 413 Payload Too Large. | --- # Introduction to Snowplow Collector > The Snowplow event Collector receives raw Snowplow events from trackers and webhooks, serializes them, and writes them to supported sinks including Kinesis, PubSub, Kafka, NSQ, SQS, and stdout. > Source: https://docs.snowplow.io/docs/api-reference/stream-collector/ The collector receives raw Snowplow events sent over HTTP by [trackers](/docs/sources/) or [webhooks](/docs/sources/webhooks/). It serializes them, and then writes them to a sink. Currently supported sinks are: 1. [Amazon Kinesis](http://aws.amazon.com/kinesis/) 2. [Google PubSub](https://cloud.google.com/pubsub/) 3. [Apache Kafka](http://kafka.apache.org/) 4. [NSQ](http://nsq.io/) 5. [Amazon SQS](https://aws.amazon.com/sqs/) 6. `stdout` for a custom stream collection process The collector supports cross-domain Snowplow deployments, setting a `user_id` (used to identify unique visitors) server side to reliably identify the same user across domains. ## How it works ### User identification The collector allows the use of a third-party cookie, making user tracking across domains possible. In a nutshell: the collector receives events from a tracker, sets/updates a third-party user tracking cookie, and returns the pixel to the client. The ID in this third-party user tracking cookie is stored in the `network_userid` field in Snowplow events. In pseudocode terms: ```text if (request contains an "sp" cookie) { Record that cookie as the user identifier Set that cookie with a now+1 year cookie expiry Add the headers and payload to the output array } else { Set the "sp" cookie with a now+1 year cookie expiry Add the headers and payload to the output array } ``` ## Technical architecture The collector is written in scala and built on top of [http4s](https://http4s.org). [GitHub repository](https://github.com/snowplow/stream-collector) --- # Set up and run Collector > Instructions for running the Snowplow event Collector using Docker images or jar files, including how to configure the application with HOCON files and perform health checks. > Source: https://docs.snowplow.io/docs/api-reference/stream-collector/setup/ A Terraform module is available which deploys the collector on a AWS EC2 without the need for this manual setup. ## Run the collector The collector is on docker hub with several different flavours. Pull the image that matches the sink you are using: ```bash docker pull snowplow/scala-stream-collector-kinesis:3.7.0 docker pull snowplow/scala-stream-collector-pubsub:3.7.0 docker pull snowplow/scala-stream-collector-kafka:3.7.0 docker pull snowplow/scala-stream-collector-rabbitmq-experimental:3.7.0 docker pull snowplow/scala-stream-collector-nsq:3.7.0 docker pull snowplow/scala-stream-collector-sqs:3.7.0 docker pull snowplow/scala-stream-collector-stdout:3.7.0 ``` The application is configured by passing a hocon file on the command line: ```bash docker run --rm \ -v $PWD/config.hocon:/snowplow/config.hocon \ -p 8080:8080 \ snowplow/scala-stream-collector-${flavour}:3.7.0 --config /snowplow/config.hocon ``` Alternatively, you can download and run [a jar file from the github release](https://github.com/snowplow/stream-collector/releases). ```bash java -jar scala-stream-collector-kinesis-3.7.0.jar --config /path/to/config.hocon ``` **Telemetry notice** By default, Snowplow collects telemetry data for Collector (since version 2.4.0). Telemetry allows us to understand how our applications are used and helps us build a better product for our users (including you!). This data is anonymous and minimal, and since our code is open source, you can inspect [what’s collected](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1). If you wish to help us further, you can optionally provide your email (or just a UUID) in the `collector.telemetry.userProvidedId` configuration setting. If you wish to disable telemetry, you can do so by setting `collector.telemetry.disable` to `true`. See our [telemetry principles](/docs/get-started/self-hosted/telemetry/) for more information. ## Health check Pinging the collector on the /health path should return a 200 OK response: ```bash curl http://localhost:8080/health ``` --- # Collector 3.0.x upgrade guide > Upgrade guide for Collector 3.0.x covering breaking changes including the new license, migration from Akka HTTP to http4s, single endpoint port, and updated configuration requirements. > Source: https://docs.snowplow.io/docs/api-reference/stream-collector/upgrade-guides/3-0-x-upgrade-guide/ ## Breaking changes ### New license Since version 3.0.0, the collector has been migrated to use the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). ### New HTTP stack Version 3.0.0 replaces the Akka HTTP stack with http4s. The same stack is already used in all other Snowplow microservices, so this makes the codebase more uniform and enables us to share more code between applications. Also, newer versions of Akka HTTP would have required a [commercial license from Lightbend](https://www.lightbend.com/blog/why-we-are-changing-the-license-for-akka) for many of Snowplow’s users. ### Single endpoint port Previously, the collector was able to expose both http and https ports. We’ve found it impractical and therefore the new version is exposing a single port — either http or https. To enable http→https upgrades, use a load-balancer/proxy. ## Upgrading In order to ease the migration for existing installations, we’ve tried to keep the configurations mostly backwards-compatible. However, due to changes above, some amendments are required. We’ve also introduced a few tweaks that should simplify configurations between pipeline services. Full configuration examples are available in [snowplow/stream-collector](https://github.com/snowplow/stream-collector/tree/master/examples) repository along with extended commentary. If you’ve previously used both HTTP and HTTPS ports in the collector, this feature is no longer available. Since version 3.0.0 you can either: - enable TLS termination in the collector (use HTTPS between load balancer and collector) and enable HTTP upgrade in load balancer running in front of the collector - disable TLS termination in the collector (use HTTP between load balancer and collector), use TLS termination and HTTP upgrade in the load balancer ### License acceptance You have to explicitly accept the [Snowplow Limited Use License](/limited-use-license-1.0/) ([FAQ](/docs/licensing/limited-use-license-faq/)). To do so, either set the `ACCEPT_LIMITED_USE_LICENSE=yes` environment variable, or update the following section in the configuration: ```hcl collector { license { accept = true } ... } ``` ### Akka section removal The top-level `akka` section will no longer be used. Keeping the section in your configuration will _not_ cause collector failures but should be removed for clarity. ### Split sinks Sink configurations can now be defined similarly as with our other services. Each sink is configured separately to allow for individual config specifics. This change is especially useful for our Kafka collector running on Azure, where EventHubs needs separate `producerConf` sections to account for different EH settings. Example: ```hcl collector { ... streams { good { name = "good" brokers = "localhost:9092,another.host:9092" producerConf { acks = all "key.serializer" = "org.apache.kafka.common.serialization.StringSerializer" "value.serializer" = "org.apache.kafka.common.serialization.StringSerializer" } buffer { byteLimit = 3145728 recordLimit = 500 timeLimit = 5000 } } bad { name = "bad" brokers = "localhost:9092,another.host:9092 producerConf { acks = all "key.serializer" = "org.apache.kafka.common.serialization.StringSerializer" "value.serializer" = "org.apache.kafka.common.serialization.StringSerializer" } maxBytes = 1000000 buffer { byteLimit = 3145728 recordLimit = 500 timeLimit = 5000 } } } ... } ``` ### Networking Optional networking settings were previously a part of Akka HTTP configuration section. With the removal of the framework, the relevant settings have been moved into a dedicated `collector.networking` section: - `collector.networking.maxConnections` - maximum number of concurrent active connection. - `collector.networking.idleTimeout` - maximum inactivity time for a network connection. If no data is sent within that time, the connection is closed. Example: ```hcl collector { ... networking { maxConnections = 1024 idleTimeout = 610 seconds } ... } ``` --- # Collector 3.6.x upgrade guide with compression > Upgrade guide for Collector 3.6.x introducing payload compression to reduce storage costs and improve throughput, requiring Enrich 6.1.x for compatibility. > Source: https://docs.snowplow.io/docs/api-reference/stream-collector/upgrade-guides/3-6-x-upgrade-guide/ Collector 3.6.0 introduces payload compression, a new feature that significantly reduces the size (and therefore, cost) of data written to your raw output stream. The compression feature allows the collector to batch multiple individual collector payloads into a single compressed stream record. This provides several benefits: - **Reduced storage costs**: compressed payloads take up less space in your output streams - **Improved throughput**: fewer, larger records reduce the overhead of stream processing - **Better performance**: downstream consumers can process batches more efficiently ## Enabling compression > **Warning:** Before enabling compression, you must upgrade to Enrich 6.1.x first. The reason for this is that support for processing compressed payloads is added to Enrich starting with Enrich 6.1.0. Enrich 6.1.0 can process both compressed and uncompressed payloads. > > Enrich is currently the only application compatible with compression. Setups with an [S3 loader](/docs/api-reference/loaders-storage-targets/s3-loader/) reading off the raw stream will not be supported. When upgrading to Collector 3.6.0, compression is an optional feature that can be configured in your [collector settings](/docs/api-reference/stream-collector/configure/). If this feature is not enabled, there will be no changes to the data format or size. After upgrading Enrich, compression in Collector can be enabled by adding the following config section: ```hocon compression { enabled = true } ``` > **Tip:** Take note of the `streams.buffer.timeLimit` Collector configuration parameter, which already existed in previous versions. This controls how many events are batched (or, technically, for how long) before appling compression. Bigger values lead to higher compression rates (lower infrastructure costs), but also higher latency. We recommend starting with a value around 300–500ms and fine-tuning it from there. ### Impact on metrics When compression is enabled, there will be a big decrease in the number of messages sent to the `raw` event stream, i.e. Kinesis, Pub/Sub or Event Hubs, depending on your cloud. You will notice this decrease if you monitor metrics on messages in the `raw` stream. This is perfectly normal and does not indicate any drop in event volumes. It happens because the compression feature batches together many Snowplow events into a single message sent to the `raw` stream. --- # Collector upgrade guides > Guides to help you upgrade the Snowplow event Collector to newer versions with information on breaking changes and migration steps. > Source: https://docs.snowplow.io/docs/api-reference/stream-collector/upgrade-guides/ This section contains information to help you upgrade to newer versions of Collector. ## [📄️ 3.6.x upgrade guide](/docs/api-reference/stream-collector/upgrade-guides/3-6-x-upgrade-guide/) [Upgrade guide for Collector 3.6.x introducing payload compression to reduce storage costs and improve throughput, requiring Enrich 6.1.x for compatibility.](/docs/api-reference/stream-collector/upgrade-guides/3-6-x-upgrade-guide/) ## [📄️ 3.0.x upgrade guide](/docs/api-reference/stream-collector/upgrade-guides/3-0-x-upgrade-guide/) [Upgrade guide for Collector 3.0.x covering breaking changes including the new license, migration from Akka HTTP to http4s, single endpoint port, and updated configuration requirements.](/docs/api-reference/stream-collector/upgrade-guides/3-0-x-upgrade-guide/) --- # Links to tracker API documentation > Generated API documentation for Snowplow tracker SDKs across all supported platforms and programming languages. > Source: https://docs.snowplow.io/docs/api-reference/trackers/ This section contains links to the generated API documentation for our tracker SDKs. --- # Snowplow component versions and compatibility matrix > Latest versions of Snowplow components including collectors, enrichment, loaders, trackers, Iglu, data models, and analytics SDKs with compatibility and upgrade information. > Source: https://docs.snowplow.io/docs/api-reference/versions/ This page lists the most recent versions of Snowplow components. Some information about components is relevant only for [Snowplow Self-Hosted](/docs/get-started/#self-hosted) users, as [Snowplow CDI](/docs/get-started/#customer-data-infrastructure) customers won't need to configure all their own components. In short, almost everything is compatible with almost everything. We rarely change the core protocols that various components use to communicate. You might encounter specific restrictions when following the documentation, for example, some of our [data models](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/) might call for a reasonably recent version of the [warehouse loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/). ## Upgrades and deprecation > **Info:** If you are a Snowplow CDI customer, rather than self-hosted, you don't need to deal with upgrading your pipeline. We'll perform upgrades for you. Some major upgrades might have breaking changes. In this case, we provide upgrade guides, such as the ones for [RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/). From time to time, we develop better applications for certain tasks and deprecate the old ones. Deprecations are announced on [Community](https://community.snowplow.io/). *** ## Latest versions ### Core pipeline > **Info:** If you are a Snowplow CDI customer, rather than self-hosted, you don't need to install any of the core pipeline components yourself. We'll deploy your pipeline and keep it up to date. **AWS:** | Component | Latest version | | ---------------------------------------------------------------------------------------------------------------- | -------------- | | [Stream Collector](/docs/api-reference/stream-collector/) | 3.7.0 | | [Enrich](/docs/api-reference/enrichment-components/) | 6.9.0 | | [RDB Loader (Redshift, Snowflake, Databricks)](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) | 6.3.0 | | [Lake Loader](/docs/api-reference/loaders-storage-targets/lake-loader/) | 0.9.1 | | [Snowflake Streaming Loader](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/) | 0.5.1 | | [Databricks Streaming Loader](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/) | 0.4.0 | | [S3 Loader](/docs/api-reference/loaders-storage-targets/s3-loader/) | 3.1.0 | | [Snowbridge](/docs/api-reference/snowbridge/) | 4.1.0 | | [Elasticsearch Loader](/docs/api-reference/loaders-storage-targets/elasticsearch/) | 2.1.3 | | [Postgres Loader](/docs/api-reference/loaders-storage-targets/snowplow-postgres-loader/) | 0.3.3 | | [Dataflow Runner](/docs/api-reference/dataflow-runner/) | 0.7.6 | **GCP:** | Component | Latest version | | ------------------------------------------------------------------------------------------------------- | -------------- | | [Stream Collector](/docs/api-reference/stream-collector/) | 3.7.0 | | [Enrich](/docs/api-reference/enrichment-components/) | 6.9.0 | | [RDB Loader (Snowflake, Databricks)](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) | 6.3.0 | | [BigQuery Loader](/docs/api-reference/loaders-storage-targets/bigquery-loader/) | 2.1.0 | | [Lake Loader](/docs/api-reference/loaders-storage-targets/lake-loader/) | 0.9.1 | | [Snowflake Streaming Loader](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/) | 0.5.1 | | [Databricks Streaming Loader](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/) | 0.4.0 | | [GCS Loader](/docs/api-reference/loaders-storage-targets/google-cloud-storage-loader/) | 0.5.6 | | [Snowbridge](/docs/api-reference/snowbridge/) | 4.1.0 | | [Lake Loader](/docs/api-reference/loaders-storage-targets/lake-loader/) | 0.9.1 | | [Postgres Loader](/docs/api-reference/loaders-storage-targets/snowplow-postgres-loader/) | 0.3.3 | **Azure:** | Component | Latest version | | ------------------------------------------------------------------------------------------------------- | -------------- | | [Stream Collector](/docs/api-reference/stream-collector/) | 3.7.0 | | [Enrich](/docs/api-reference/enrichment-components/) | 6.9.0 | | [RDB Loader (Snowflake)](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) | 6.3.0 | | [Lake Loader](/docs/api-reference/loaders-storage-targets/lake-loader/) | 0.9.1 | | [Snowflake Streaming Loader](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/) | 0.5.1 | | [Databricks Streaming Loader](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/) | 0.4.0 | *** ### Iglu (schema registry) > **Info:** If you are a Snowplow CDI customer, rather than self-hosted, you don't need to install Iglu Server yourself. It's also unlikely that you need to use any of the other components in this section. You can manage your data structures [in the UI or via the API](/docs/event-studio/data-structures/). | Component | Latest version | | ------------------------------------------------------------------------------ | -------------- | | [Iglu Server](/docs/api-reference/iglu/iglu-repositories/iglu-server/) | 0.14.1 | | [`igluctl` utility](/docs/api-reference/iglu/igluctl-2/) | 0.13.0 | | [Iglu Scala client](/docs/api-reference/iglu/iglu-clients/scala-client-setup/) | 4.0.3 | | [Iglu Objective-C client](/docs/api-reference/iglu/iglu-clients/objc-client/) | 0.1.1 | ### Trackers | Tracker | Latest version | | ----------------------------------------------------------- | -------------- | | [JavaScript (Web and Node.js)](/docs/sources/web-trackers/) | 4.6.8 | | [iOS](/docs/sources/mobile-trackers/) | 6.2.1 | | [Android](/docs/sources/mobile-trackers/) | 6.2.0 | | [React Native](/docs/sources/react-native-tracker/) | 4.6.8 | | [Flutter](/docs/sources/flutter-tracker/) | 0.8.0 | | [WebView](/docs/sources/webview-tracker/) | 0.3.0 | | [Roku](/docs/sources/roku-tracker/) | 0.3.1 | | [Google AMP](/docs/sources/google-amp-tracker/) | 1.1.0 | | [Pixel](/docs/sources/pixel-tracker/) | 0.3.0 | | [Golang](/docs/sources/golang-tracker/) | 3.1.0 | | [.NET](/docs/sources/net-tracker/) | 1.3.0 | | [Java](/docs/sources/java-tracker/) | 2.1.0 | | [Python](/docs/sources/python-tracker/) | 1.0.3 | | [Scala](/docs/sources/scala-tracker/) | 2.0.0 | | [Ruby](/docs/sources/ruby-tracker/) | 0.8.0 | | [Rust](/docs/sources/rust-tracker/) | 0.2.0 | | [PHP](/docs/sources/php-tracker/) | 0.9.2 | | [C++](/docs/sources/c-tracker/) | 2.0.0 | | [Unity](/docs/sources/unity-tracker/) | 0.8.1 | | [Lua](/docs/sources/lua-tracker/) | 0.2.0 | ### Data models #### dbt [Modeling data with dbt](/docs/modeling-your-data/modeling-your-data-with-dbt/) is our recommended approach. **Snowplow Unified Digital:** | snowplow-unified version | dbt versions | BigQuery | Databricks | Redshift | Snowflake | Postgres | Spark | | ------------------------ | ------------------ | -------- | ---------- | -------- | --------- | -------- | ----- | | 1.0.0 | >=1.10.6 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | 0.4.5 | >=1.6.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | **Snowplow Media Player:** | snowplow-media-player version | snowplow-web version | dbt versions | BigQuery | Databricks | Redshift | Snowflake | Postgres | Spark | | ----------------------------- | -------------------- | ------------------ | -------- | ---------- | -------- | --------- | -------- | ----- | | 0.9.4 | N/A | >=1.4.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | 0.8.0 | N/A | >=1.4.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | 0.5.3 | >=0.14.0 to <0.16.0 | >=1.4.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | 0.4.2 | >=0.13.0 to <0.14.0 | >=1.3.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | 0.4.1 | >=0.12.0 to <0.13.0 | >=1.3.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | 0.3.4 | >=0.9.0 to <0.12.0 | >=1.0.0 to <1.3.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | 0.1.0 | >=0.6.0 to <0.7.0 | >=0.20.0 to <1.1.0 | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | **Snowplow Normalize:** | snowplow-normalize version | dbt versions | BigQuery | Databricks | Redshift | Snowflake | Postgres | Spark | | -------------------------- | ----------------- | -------- | ---------- | -------- | --------- | -------- | ----- | | 0.4.1 | >=1.4.0 to <2.0.0 | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | | 0.3.5 | >=1.4.0 to <2.0.0 | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | | 0.2.3 | >=1.3.0 to <2.0.0 | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | | 0.1.0 | >=1.0.0 to <2.0.0 | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | **Snowplow Ecommerce:** | snowplow-ecommerce version | dbt versions | BigQuery | Databricks | Redshift | Snowflake | Postgres | Spark | | -------------------------- | ----------------- | -------- | ---------- | -------- | --------- | -------- | ----- | | 0.9.3 | >=1.4.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ⚠️ | ✅ | | 0.8.2 | >=1.4.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ⚠️ | ❌ | | 0.3.0 | >=1.3.0 to <2.0.0 | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | | 0.2.1 | >=1.0.0 to <2.0.0 | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | Postgres is technically supported in the models within the package, however one of the contexts’ names is too long to be loaded via the Postgres Loader. **Snowplow Attribution:** | snowplow-attribution version | dbt versions | BigQuery | Databricks | Redshift | Snowflake | Postgres | Spark | | ---------------------------- | ----------------- | -------- | ---------- | -------- | --------- | -------- | ----- | | 0.6.0 | >=1.6.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | | 0.3.0 | >=1.6.0 to <2.0.0 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | *** See also the [dbt version compatibility checker](/docs/modeling-your-data/modeling-your-data-with-dbt/#dbt-version-compatibility-checker). #### SQL Runner > **Note:** We recommend using the dbt models above, as they are more actively developed. The latest version of [SQL Runner](/docs/modeling-your-data/modeling-your-data-with-sql-runner/) itself is **0.10.1**. | Model | Redshift | BigQuery | Snowflake | | --------------------------------------------------------------------------------------------------- | -------- | -------- | --------- | | [Web](/docs/modeling-your-data/modeling-your-data-with-sql-runner/sql-runner-web-data-model/) | 1.3.1 | 1.0.4 | 1.0.2 | | [Mobile](/docs/modeling-your-data/modeling-your-data-with-sql-runner/sql-runner-mobile-data-model/) | 1.1.0 | 1.1.0 | 1.1.0 | ### Testing and debugging > **Info:** If you are a Snowplow CDI customer, rather than self-hosted, we recommend using [Snowplow Micro through Console](/docs/testing/snowplow-micro/console/) for testing and debugging. | Application | Latest version | | --------------------------------------------------------------- | -------------- | | [Snowplow Micro](/docs/testing/snowplow-micro/) | 4.1.1 | | [Snowplow Mini](/docs/api-reference/snowplow-mini/usage-guide/) | 0.25.1 | ### Analytics SDKs | SDK | Latest version | | ------------------------------------------------------------------------- | -------------- | | [Scala](/docs/api-reference/analytics-sdk/analytics-sdk-scala/) | 3.0.0 | | [Javascript](/docs/api-reference/analytics-sdk/analytics-sdk-javascript/) | 0.3.1 | | [Python](/docs/api-reference/analytics-sdk/analytics-sdk-python/) | 0.2.3 | | [.NET](/docs/api-reference/analytics-sdk/analytics-sdk-net/) | 0.2.1 | | [Go](/docs/api-reference/analytics-sdk/analytics-sdk-go/) | 0.3.0 | --- # Create event forwarders in Console > Step-by-step guide to create a Snowplow event forwarder to send events to third-party destinations in real-time with connections, filters, and field mappings. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/creating-forwarders/ Follow the steps below to configure a new event forwarder. See [available integrations](/docs/destinations/forwarding-events/integrations/) for destination-specific guides. ## Step 1: Create a connection A **connection** is a resource that stores the credentials and endpoint details needed to send events to your destination. To create a connection from [Snowplow Console](https://console.snowplowanalytics.com), first go to **Destinations** > **Connections**, then select **Set up connection**. From the dropdown, choose **Loader connection**, then select the destination you want to forward events to. Each destination will have specific authentication and endpoint details required. ![Console interface for creating a new destination connection with authentication and endpoint configuration fields](/assets/images/event-forwarding-connection-094b95e8dd4dee4353cd575fab0468b6.png) When finished, click **Deploy**. Once a connection is deployed, you can use it in one or more forwarders to connect to your destination. ## Step 2: Create a new forwarder 1. Go to **Destinations** > **Destination list**. 2. Navigate to the **Available** tab and select **Configure** on the destination card from the list of available integrations to start setting up the forwarder. 3. Give the forwarder a **name**, select the **pipeline** you want the forwarder to read events from, and choose the **connection** you created in step 1. 4. Optionally, you can choose to **Import configuration from** an existing forwarder. This is helpful when migrating a forwarder setup from development to production. 5. Click **Continue** to configure event filters and data mapping. ## Step 3: Configure event filters and field mapping Forwarders use JavaScript expressions to define which events to forward and how to map Snowplow data to your destination's required schema. ### Event filtering Use JavaScript expressions to select which events to forward. Only events that return `true` when evaluated against your filter will be sent to your destination. Use the `event` object to reference fields on your Snowplow payloads. For example: ```javascript // Forward page views from website event.app_id == "website" && event.event_name == "page_view" // Forward a list of custom events ["add_to_cart", "purchase"].includes(event.event_name) ``` Leave the filter blank to forward all events. ![Event filtering configuration panel showing JavaScript expression field for defining which events to forward](/assets/images/event-forwarding-filters-eb59897842efc2acb3dd85500b569aae.png) ### Field mapping Define how Snowplow data maps to your destination fields. For each mapping, **Destination Field** represents the property name and **Snowplow expression** is a JavaScript expression used to extract data from your Snowplow event. Snowplow provides default mappings based on common fields, but you can overwrite or delete them as needed. ![Field mapping configuration panel showing key/value pairs with JavaScript property selection expressions](/assets/images/event-forwarding-mapping-0811c90079f2c0d53d555f11294bad1a.png) ### Custom functions You can also write JavaScript functions for complex data transformations. Examples include converting date formats, transforming enum values, combining multiple fields, or applying other business logic. You can then reference functions in both the event filter and field mapping sections. ![Custom functions editor with JavaScript code for complex data transformations in event forwarding](/assets/images/event-forwarding-custom-functions-7a4098875609b9bcb07169204c459fbc.png) > **Info:** To learn more about the supported filter and mapping expressions, check out the [filter and mapping reference](/docs/destinations/forwarding-events/reference/). ## Step 4: Test your transformations Once you've defined your filter and mapping configuration, you can test it against a sample event and preview what the output JSON payload looks like. Choose an event from the **Select sample input event** dropdown and selecting **Run test**. ![Test transformation interface showing sample event input and JSON output preview with run test button](/assets/images/event-forwarding-test-transformations-54e02b5e0465eca443b8f05774912721.png) Snowplow provides a few out-of-the-box sample events to test with, which you can edit as needed. You can also choose **Custom event** to paste in your own JSON-formatted Snowplow event. You can use [Snowplow Micro](/docs/testing/snowplow-micro/) with the `--output-json` flag to generate your own events to test with. Select **View generated code** to see the JavaScript function generated from your filters, field mappings, and custom functions. This is exactly what will run when transforming events for your destination, and can be used directly in a [Snowbridge JavaScript transformation](/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/javascript-configuration/) for local testing. If there is an error with your configuration, the generated code will show the line number which contains the error. ## Step 5: Deploy When you're done, select **Deploy** to save your configuration and create the forwarder. This will deploy the underlying Snowbridge instance to your cloud account and begin forwarding events based on your configuration. It typically takes a few minutes to deploy a new forwarder. --- # Build custom integrations for event streams > Build custom consumers for Snowplow event streams using AWS Lambda, GCP Cloud Functions, KCL applications, or Pub/Sub client libraries to integrate with any third-party platform. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/custom-integrations/ Snowplow is underpinned by event streams: AWS Kinesis, GCP PubSub, or Apache Kafka. Before a Snowplow pipeline loads the events to a data warehouse, the enriched events are available on a stream. You can build a custom consumer to consume these events. Below we describe some high level concepts which can be used to consume the enriched event streams. ## Transforming the Enriched Stream to JSON The Snowplow events in the Enriched stream are in a tab separated format (TSV) by default. Many downstream consumers will prefer this data in JSON format, and the [Snowplow Analytics SDKs](/docs/api-reference/analytics-sdk/) have been built to help with this. ## AWS Lambda and GCP Cloud Functions [AWS Lambdas](https://aws.amazon.com/lambda/) and [GCP Cloud Functions](https://cloud.google.com/functions/) are server-less platforms that allow you to write applications that can be triggered by events from Kinesis and PubSub respectively. By configuring a function to be triggered by an event, it is possible to write applications that take the Snowplow events, perform transformations and other processing, then relay that event into another system. Server-less functions are an easy way to approach building real time consumers of the event stream for those use cases which require fast action or decisioning based on incoming events (For example, Ad Bidding, Paywall Optimization, Real-time reporting, etc). ## Kinesis Client Library (KCL) applications The KCL (Kinesis Consumer Library) allows for applications to be built to consume from AWS Kinesis. It makes use of AWS DynamoDB to keep track of shards in the data stream, and makes it far easier to consume from Kinesis than would otherwise be possible. There is comprehensive documentation on building Amazon KCL apps within the [AWS Documentation](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html). ## Pub/Sub client library applications The Pub/Sub client libraries allow for applications to be built to consume from GCP Pub/Sub. It makes it easy to build against and consume events in Pub/Sub streams, ultimately making it far easier to consume from Pub/Sub than would otherwise be possible. There is comprehensive documentation on building GCP Pub/Sub client library apps within the [GCP Documentation](https://cloud.google.com/pubsub/docs/reference/libraries). --- # Monitor and troubleshoot event forwarders > Monitor event forwarder performance, debug failures, and understand retry logic with cloud metrics, failed event logs, and Console statistics. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/event-forwarding-monitoring-and-troubleshooting/ This page outlines how to monitor event forwarder performance and diagnose delivery issues. Snowplow provides both summary metrics and detailed failed event logs to help you understand failure patterns and troubleshoot specific problems. ## Failure types and handling Snowplow handles event forwarding failures differently depending on the type of error. ### Data processing failures These failures occur when there are issues with the event data itself or how in how it's transformed before reaching the destination. **Transformation failures** occur when Snowplow hits an exception when applying your configured JavaScript transformation. While there are safeguards against deploying invalid JavaScript, transformations may still result in runtime errors. Snowplow treats transformation failures as invalid data and logs them as failed events in your cloud storage bucket without retrying. **Oversized data failures** results from events exceeding the destination's size limits. Snowplow creates [size violation failed events](/docs/api-reference/failed-events/) for these events and logs them to your cloud storage bucket without retrying. ### Destination failures These failures occur when the destination's API cannot accept or process the event data. **Transient failures** are those that are expected to succeed on retry. This includes temporary network errors, HTTP 5xx server errors, or rate limiting. Transient failures are automatically retried. **Setup failures** result from configuration issues that typically require human intervention to resolve, such as invalid API keys or insufficient permissions. When a setup error occurs, Snowplow will trigger email alerts to the configured list of users and occassionally retry the request to check if the issue has been resolved. For more information on alerting, see [configuring setup alerts](#configuring-setup-alerts) on this page. **Other unrecoverable failures** are bad requests that won't succeed on retry, such as those with missing or invalid fields. These often map to HTTP 400 response codes. Snowplow will log them as failed events in your cloud storage bucket without retrying. Failure types are defined per destination based on their expected HTTP response codes. See the list of [available destinations](/docs/destinations/forwarding-events/integrations/) for destination-specific details on retry policies and error handling. ### What happens when events fail When any type of failure occurs, Snowplow can take one or more of the following actions: - **Automatic retries**: transient failures are automatically retried according to each destination's retry policy. For all HTTP API destinations, Snowplow will retry up to 5 times with exponential backoff. - **Failed event logging**: all non-retryable failures are routed to your configured failure destination, which is typically a cloud storage bucket, where you can inspect them further. This includes transformation failures, oversized data failures, unrecoverable failures, and transient failures that have exceeded their retry limit. For how to query these logs, see [Inspecting and debugging failures](#inspecting-and-debugging-failures) on this page. - **Setup alerts**: just like warehouse loaders, setup failures trigger email alerts to notify configured users of authentication or configuration problems. ## Configuring setup alerts Once a forwarder is deployed, you can configure one or more email addresses to send alerts when setup failures occur. Follow the steps below to configure the alerts. 1. Navigate to **Destinations** > **Destinations list** from the navigation bar and click the **Details** button on a destination card to open the **Destination details** page. 2. On the table of forwarders, click the three dots next to the forwarder you want to configure alerting for and select **Alerts**. 3. You'll see a modal where you can enter the email addresses you want to be alerted in case of setup errors. Click **Save Changes** to confirm. ![Setup alerts modal dialog for configuring email addresses to receive forwarder error notifications](/assets/images/setup-alerting-screenshot-a84253dd9889a28c38179557139d9425.png) ## Metrics and monitoring You can monitor forwarders in a few ways: - **Console metrics**: you can view high-level delivery statistics in Console. - **Cloud monitoring metrics**: forwarders emit a set of metrics to your cloud provider's observability service. - **Failed event logs**: for failed deliveries, Snowplow saves detailed logs to your cloud storage bucket. Consume these logs for automated monitoring in your observability platform of choice. ### Console metrics In Snowplow Console, you can see the number of filtered, failed, and successfully delivered events over the last seven days. To view these metrics, navigate to **Destinations** > **Destinations list** and select the destination you'd like to view. On the event forwarders overview table, you will see metrics for each forwarder configured for that destination. ![Console metrics dashboard showing forwarder statistics including filtered, failed, and delivered event counts](/assets/images/event-forwarding-console-metrics-a7c681591dad5e6ed07e55980b809c1c.png) ### Cloud monitoring metrics > **Info:** Forwarder cloud metrics are only available for [CDI Private Managed Cloud](/docs/get-started/#cdi-private-managed-cloud) customers. Forwarders emit the following metrics in your cloud provider's monitoring service: - `target_success`: events successfully delivered to your destination - `target_failed`: events that failed delivery but are eligible for retry - `message_filtered`: events filtered out based on the forwarder's JavaScript filter expression - `failure_target_success`: events that failed with unrecoverable errors, such as transformation errors, and logged to your cloud storage bucket You can find forwarder metrics in the following locations based on which cloud provider you use: - **AWS**: CloudWatch metrics under `snowplow/event-forwarding` namespace - **GCP**: Cloud Monitoring metrics with `snowplow_event_forwarding` prefix To get notified of any issues, you can use these metrics to define [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) or [Cloud Monitoring alerts](https://cloud.google.com/monitoring/alerts). ## Inspecting and debugging failures This section explains how to find and query failed event logs. ### Finding failed event logs To better understand why a failure has occurred, you can directly access and review detailed failed delivery logs in file storage. The logs are automatically saved as [failed events](/docs/monitoring/exploring-failed-events/file-storage/) in your Snowplow cloud storage bucket under the prefix: `/{pipeline_name}/partitioned/com.snowplowanalytics.snowplow.badrows.event_forwarding_errors/` For more details on where to find failed events, see [Accessing failed events in file storage](/docs/monitoring/exploring-failed-events/file-storage/). Failed event logs are formatted according to the [`event_forwarding_error`](https://iglucentral.com/?q=event_forwarding_error) schema and contain: - **Original event data**: the complete Snowplow event that failed - **Error details**: specific error type and message - **Failure timestamp**: when the error occurred - **Transformation state**: data state at the point of failure ### Querying failed event logs You can query failed events using [Athena](https://aws.amazon.com/athena/) on AWS or [BigQuery external tables](https://cloud.google.com/bigquery/docs/external-tables) on GCP. **AWS:** **1. Create a table and load the data** To make the logs easier to query, run the following query to create a table: ```sql CREATE EXTERNAL TABLE snowplow_event_forwarding_failures ( data struct, failure:struct > ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://{BUCKET_NAME}/{PIPELINE_NAME}/partitioned/com.snowplowanalytics.snowplow.badrows.event_forwarding_error/' ``` If the table already exists, run the following query to pull in new data: ```sql MSCK REPAIR TABLE event_forwarding_failures ``` **2. Explore failure records** Use the query below to view a sample of failure records: ```sql SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, -- these are last because they can be quite large data.payload FROM event_forwarding_failures LIMIT 10 ``` **3. Example queries** Summarize the most common types of errors: ```sql SELECT data.failure.errorType, data.failure.errorCode, -- time range for each - is the issue still happening? MIN(data.failure.timestamp) AS minTstamp, MAX(data.failure.timestamp) AS maxTstamp, -- How many errors overall, count(*) AS errorCount, -- There might just be lots of different messages for the same error -- If this close to error count, the messages for a single error might just have high cardinality - worth checking the messages themselves -- If it's a low number, we might have more than one issue -- If it's 1, we have only one issue and the below message is shared by all count(DISTINCT data.failure.errorMessage) AS distinctErrorMessages, -- a sample of error message. You may need to look at them individually to get the full picture MIN(data.failure.errorMessage) AS sampleErrorMessage FROM event_forwarding_failures GROUP BY 1, 2 ORDER BY errorCount DESC -- Most errors first LIMIT 10 ``` View transformation errors: ```sql SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, data.payload FROM event_forwarding_failures WHERE data.failure.errorType = 'transformation' LIMIT 50 ``` View API errors: ```sql SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, FROM event_forwarding_failures WHERE data.failure.errorType = 'api' LIMIT 50 ``` Filter based on a date and hour: ```sql -- Note that the times in the paths are for the creation of the file, not the failure time SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, -- these are last because they can be quite large data.payload FROM event_forwarding_failures WHERE "$path" LIKE '%2025-07-29-16%' -- File paths are timestamped like this, so we can limit our queries this way LIMIT 50 ``` Filter for a range of timestamps: ```sql SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, -- these are last because they can be quite large data.payload FROM event_forwarding_failures -- Here we need the full path prefix WHERE "$path" > 's3://{BUCKET_NAME}/{PIPELINE_NAME}/partitioned/com.snowplowanalytics.snowplow.badrows.event_forwarding_error/2025-07-29-16' AND "$path" < 's3://{BUCKET_NAME}/{PIPELINE_NAME}/partitioned/com.snowplowanalytics.snowplow.badrows.event_forwarding_error/2025-07-29-20' ``` **GCP:** **1. Create a dataset** First, create a dataset to organize your failed event tables: ```sql CREATE SCHEMA snowplow_failed_events OPTIONS ( description = "Dataset for Snowplow failed event analysis", location = "EU" -- Should match the location of your bad rows bucket ); ``` **2. Create an external table and load the data** To make the logs easier to query, run the following query to create an external table: ```sql CREATE OR REPLACE EXTERNAL TABLE snowplow_failed_events.event_forwarding_failures ( schema STRING, data STRUCT< failure STRUCT< errorCode STRING, errorMessage STRING, errorType STRING, latestState STRING, timestamp TIMESTAMP >, payload STRING, processor STRUCT< artifact STRING, version STRING > > ) OPTIONS ( description="Event Forwarding failures", format="NEWLINE_DELIMITED_JSON", ignore_unknown_values=true, uris=["gs://{BUCKET}/partitioned/com.snowplowanalytics.snowplow.badrows.event_forwarding_error/*"] ); ``` **3. Explore failure records** Use the query below to view a sample of failure records: ```sql SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, -- these are last because they can be quite large data.payload FROM snowplow_failed_events.event_forwarding_failures LIMIT 10 ``` **4. Example queries** Summarize the most common types of errors: ```sql SELECT data.failure.errorType, data.failure.errorCode, -- time range for each - is the issue still happening? MIN(data.failure.timestamp) AS minTstamp, MAX(data.failure.timestamp) AS maxTstamp, -- How many errors overall, COUNT(*) AS errorCount, -- There might just be lots of different messages for the same error -- If this close to error count, the messages for a single error might just have high cardinality - worth checking the messages themselves -- If it's a low number, we might have more than one issue -- If it's 1, we have only one issue and the below message is shared by all COUNT(DISTINCT data.failure.errorMessage) AS distinctErrorMessages, -- a sample of error message. You may need to look at them individually to get the full picture MIN(data.failure.errorMessage) AS sampleErrorMessage FROM snowplow_failed_events.event_forwarding_failures GROUP BY 1, 2 ORDER BY errorCount DESC -- Most errors first LIMIT 10 ``` View transformation errors: ```sql SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, data.payload FROM snowplow_failed_events.event_forwarding_failures WHERE data.failure.errorType = 'transformation' LIMIT 50 ``` View API errors: ```sql SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState FROM snowplow_failed_events.event_forwarding_failures WHERE data.failure.errorType = 'api' LIMIT 50 ``` Filter based on a date and hour: ```sql -- Note that the times in the file paths are for the creation of the file, not the failure time SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, -- these are last because they can be quite large data.payload FROM snowplow_failed_events.event_forwarding_failures WHERE _FILE_NAME LIKE '%2025/07/29/16%' -- File paths are timestamped like this, so we can limit our queries this way LIMIT 50 ``` Filter for a range of timestamps: ```sql SELECT data.failure.timestamp, data.failure.errorType, data.failure.errorCode, data.failure.errorMessage, data.processor.artifact, data.processor.version, data.failure.latestState, -- these are last because they can be quite large data.payload FROM snowplow_failed_events.event_forwarding_failures -- Here we need the full path prefix WHERE _FILE_NAME > 'gs://{BUCKET}/partitioned/com.snowplowanalytics.snowplow.badrows.event_forwarding_error/jsonschema-1/2025/07/29/16' AND _FILE_NAME < 'gs://{BUCKET}/partitioned/com.snowplowanalytics.snowplow.badrows.event_forwarding_error/jsonschema-1/2025/07/29/20' ``` *** --- # Configure Amplitude Tag for GTM Server Side > Configure event mapping, user properties, entity rules, and session tracking for the Amplitude Tag in GTM Server Side. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/amplitude-tag-for-gtm-ss/amplitude-tag-configuration/ > **Tip:** The [Session ID in Amplitude](https://help.amplitude.com/hc/en-us/articles/115002323627-Track-sessions-in-Amplitude) is the session's start time in milliseconds since epoch, so it cannot be derived directly from the `session_id` of your forwarded Snowplow events, which is a UUID. Therefore, in order to populate the Session ID so that your events are stitched into sessions correctly in Amplitude, your Snowplow events need to have the [`client_session` context entity](/docs/sources/web-trackers/tracking-events/session/) attached. Then the Amplitude Tag will automatically populate the Amplitude Session ID based on the `firstEventTimestamp` property of the session the event belongs to. ## Amplitude API Key (Required) Set this to the API of your Amplitude HTTP API Data Source. ![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAATIAAAB5CAYAAAC6Lq3eAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AABfTSURBVHic7d15XNTV/sfx1zDAsLth7kuIBgmaC6K4sIiSYtY1vZp6KyNxKZeyq2n200zJtXBL+91S07BLF70pIWi4XHfDrVQyQ36iqAgqm8Cwzfz+mOvoCCqb4lc+z8djHsJ3Oed8vzLvOed8vzOj0uv1eoQQQsHMqrsBQghRWRJkQgjFkyATQiieBJkQQvEkyIQQiidBJoRQPAkyIYTiSZAJIRRPgkwIoXgSZEIIxXvig+xmjo4mH1yr8nKD12ew4VBelZcrhHj8qiTI/rLyJjM2Z1dFUQ/18oqbxF0oBCAxrZjvDj/+MNr/ZwHNp14jNUtnsnzfnwU0+eCa8dFudhrvhWeRrTW8nXXtgVxe/yaj1DJfWXGTTce1xt93nc3HfVYa59OKHt2BCPGUqHSQJacXczaliMhftRQVV0WTHmzOy/a4NTEHIDGtiLBqCLJNx7XYalT8eEJbYl3LemqSFjYgaWEDtr9Xl+vZOj7ZWr6QP3u1iAkbs1g1shat6ptXVbOFeGpVOsg2H9cS6G6FU301O8/mm6xznZnKgphbdPo0DfdZafx4Qsv8bbdo81Eq3UKus//PAgDOphTRZe51xm7IxGPudQKX3iQxrfRUfC88i5MXC1l/MI8J32cSf6UQz3nXAVi9J5fJ/8wybjsnMps5kYYQySvU805YJi4zUxmyOp1rd/WminXwydZsXGem0mP+dXb+bnocd9MW6ok5pWXWS/ZEHCs9RM3NDI9GtdS84WXNiYuFZTiTBqnZOl7/JoPp/e3o0drSuPzExUL6fH4D15mpvBuWSW6BnuNJhbjOTDW+gOw9V0DXkOtlrkuIp0Wlg2zTMS2B7awY0M6KTaU8sfPy9ez+wJGPAu2YsDETjYWKX2fX560eNnz6052eSmp2MW/1sCFupiN9ntfwfnjmA+t93cua5a/V4vnGFhz5yPGh7VyxK4eUTB37pjnyYT87ElLvBOXq/+TwW3IRRz6qz9LXajHp+ywy83SllrP9TD6tnjFncGdrrmToOJty/6Hf9Vs6fjiax/ONy9ar0hbqGbUmg8D2GkZ2tTYuv5Wv5801GbzXx44Ts+pTqDMcT4fmFthoVBw8b3hB2PV7PgFtNWWqS4inSaWC7NdLhaRmF9OztSX922mIjS8gK8/0481GdrPGwVrFS+2t0OnhrR7WWFuo6NtWQ9KNO2FS28aMLs9aADDWx4ZjSYXGuaWqEHM6n3f8bKhvb0anFhZ0d7Ywrtt0TMu7frY4WKvo1MIC96bmHE4svRdlCG4N5mbQt62GTcdMh5cXbhQb58i8F97ATKVi9kD7MrVxYfQtzqYUYW6mMln+nz/yaVZXTX93DVYWKkb3tCE2Ph+VCl50s2LHGUMPcudZCTJRM1VqAmbTMS1922owV0OT2mpcG5mz9VetSW/CWJH6v//+90lqpjIM6UpjZaHCzkrF9Vs6almrSt+onK5n62haR13ququZxYzdkMHt/CjUwYD2JRt345aOvefymfsXQzANaKfh7xFZTO9vZ9y3ZT01B6Y/vIdYmucamhMyyJ6Xlt3Eu42lcWh5NVPH6cuGYSSAHnCwMrwG9XfX8N4/sxjdy4abOXq6OFner3ghnloVDrIiHWw5qeX6LR0RR+/0SizNKTXIyiNbqydbq8fRzozC4rL3yszVoNOVvn19ezNu5pSenA0c1Cz5qwOdWliUuv62LSe1FBZDt3vmoQ4mFJjMZ1XU0C7WOD9jztxBDkz8PpPYKfWoa2tGAwczvJwt2Ti6Tol9ujpZkleoZ/nOHPxdLTF/4m+oEaLqVfjPfu8f+ZipIGlhAy4vNjyOfuzI8aRCLt4s/+XLmzk6tp/Jp7BYz+c7btG5hQX2Vg/ujdlYGnptt4ezzzqqOZpUyPVbOn6/WkTUb3cm7V9007Bmfy7FOriSUcyxpDtDx5faa1gUc4v0XB3puYarjFczSx7DpmNa5v7F3ni8lxc3YGRXayKOlbx6WRmvdrSiu7Ml74cbLlz0bG3JqeQiYk7nU6SDPX8UsGpPDgBqM+jzvIZ/xuXRV4aVooaqcJBtOq5lqIe1SQ+gUS01vV01bK7AE9tOY0bUr1razU7jcGIhS4Y6PHSfTi0taOBghsfcNAB8XTR0bG6B12fXmfR9pvE2DYB3/GyxUKvwmJvG2A2ZNK97Z5g5wc+WNg3M8V10g57zb6A2U9HQwXQYmphmuM3k1Y6mvc3XvWzYdkpLXmHVfvVByCAH/kgpYt2BXGrbmLF2VG1CY28ZrgRH36Jn6zuh1d9dg4Vahc9zEmSiZlI9CV8+cjaliGFfpXNyVv3qbooiHTpfwKo9uawPql3dTRGiWsiMisKl5+pYtSeXVzpYVXdThKg2EmQK5zH3OjaWKl5+QYJM1FxPxNBSCCEqQ3pkQgjFkyATQiieBJkQQvEq9Ralk5eKCP35FrHx+chEmxDVb8+4qr05u6rY2lhTr25tNJpH8xa6CgfZyUtFDFh6Az2G903KJQMhql/rVi2quwmlysjM5mLyVZo3bfRIwqzCQRb68y30wFAPK+a/6oCledW8uVsIUX6P4uPgq1LtWoYPWrhxM4PGjZ6p8vIrPEcWG294r6WEmBCiLGrXsicn99F8onOFe2R6DMNJCTEhRHWTq5ZCCMWTIBNCKJ4EmRBC8STIhBCKJ0EmhFA8CTIhhOJJkAkhFO+xBpmTk9PjrE4IUUNIj0wIoXgSZEIIxavUx/gIIZQrIiKCwsI73+86ZMgQzM3N0Wq1/Pvf/zYu12g0DBo0CIC4uDgSEhKM6zw8PHB2dn58jb4PCTIhaqj8/HwKCgpKLNfr9Wi1pX+uWVFRkcm64uLyfxn3oyBBJkQNkpWVRW5uLmAIrLtdu3YNtVpNfn6+yXKdTkdKSgqAcd/bMjMzjeueeeYZzMyqZ7ZKgkyIGuTUqVOcO3eu1HU7duwodXlBQQHR0dGlrjtz5gxnzpwBYPjw4Wg01fNt9zLZL4Qolb29PebmyujrSJAJIUrl7e2No6NjdTejTCTIhBCKp4x+oxDisVCr1SV+v71Mp9OVuEDwpJAgE0IYDRw40Dhhr9Fo8PHxMd5i8csvv5CYmFidzbsvCTIhhNHdN8IOGDCAo0ePGm+veJLJHJkQQvEkyIQQiidDSyFEqaKjo9HpdNXdjDKRIBOiBnnuuedo3LjxIynbwsLikZRbFo81yJ7UKx5C1BSOjo6Kucm1PGSOTAiheBJkQgjFkyATQiieBJkQQvEkyIQQiidBJoRQvAoHmQowU0FB0ZP5bnghRM1R4SDzf16DTg8fbsqSMBNCPFRGZja2NtaPpGyVvoIfMHTyUhEDlt5Aj6FnppMsE6LaXV7coLqbUKqMzGzSrt+kedNGaDSWVV5+hYMMDGEW+vMtYuPzkRwTovrtGVf617hVN1sba+rVrf1IQgwqGWRCCPEkkKuWQgjFkyATQiieBJkQQvEkyIQQiidBJoRQPAkyIYTiSZAJIRRPgkwIoXgSZEIIxZMgE0IongSZEELxJMiEEIonQSaEUDwJMiGE4kmQCSEUT4JMCKF4EmRCCMUzr4pCTv52mqiYnykqLq5YI9RqAl/swwvt3KqiOUKIGqZKemSVCTGAouJiomJ+roqmCCFqoCrpkd0OsVnTP6jQ/p98trhSQSiEqNlkjkwIoXgSZEIIxVNskOXm5uLk5MThw4dNlp8/fx4nJyfOnz//SOotLi7Gw8ODuXPnPpLy77Z27VoCAgIAw3F5eXlx7dq1Kiu/Z8+ehIeHP3S7BQsWsHLlynKVPXPmTJycnEo8Jk6cCMDAgQNZunRpif26dOlCWFhYuep6mmRlZRF39DixO/ew/+BhLl+5Wt1NUoQqmSOrSQ4dOkR6ejo//fQTM2bMwMzs8bwWNGzYkKCgIOrWrQtAZGQks2fP5tixY4+8br1ej5WVVbn369y5M5999pnJMjs7u6pq1lMlKzub78MjOHwkDrVajYODPbm5eWi1Wp5t2YK/jRiG07Mtq7uZTywJsnKKjIxk8ODBREdHc+TIEbp16/ZY6rW1tSUoKOix1HUvvV5foQCysbGhVatWj6BFFafT6arsxaeqykpNS2PBolDUajXjx75Nh/btMDc3R6/X83//d4HNWyIJWbCEccFv0aljhypo+dNHsUPLsoqPj2fIkCG0bdsWX19fIiIiTNavWbMGLy8vOnbsyOTJk8nIyLhvWYWFhWzfvp2BAwfSu3dvIiMjTdYPHDiQuXPnMmzYMFxcXAgICOD06dOsXbsWDw8POnTowMyZM9HpdACEhITw5ptv8u677/LCCy/g5eXF5s2b73scTk5OpKenM2bMGCZNmkR6ejpOTk7s2rWLHTt20L59e5N9hgwZYjJ8+/bbb/Hw8KBLly4sWLCA4nuuFO/bt49+/frh5ubG0KFDOXfuHADe3t506GB4Av3000/4+/vTtm1bBg0axMmTJx90+qvUg+rOy8tjxowZtG/fnu7du7N06VLj8YWEhDB06FDeeustXF1dCQ8Pp0uXLsb/B4Dx48fz0Ucflbus/Px80tPTmThxIu3bt8fT05N58+aVOLf3U1RUROjyVTg42DPr4w9p6+rKP775lnET3mP2p4be7JTJE/Dx7slXX68l+fKVKjmXT5unPshGjx5Nu3btiImJYfz48cyYMYNff/0VgA0bNrBu3Tq++OILwsPDSU1NZdasWfcta8+ePZiZmeHp6cmLL75ITEwMRUVFJttERUUxbtw4IiIiqFOnDiNGjODQoUOsW7eOkJAQIiIiiI6ONm7/yy+/MHjwYA4ePMiYMWOYOnUqSUlJDzymZcuWsWjRIurUqcPp06fx9vZ+6HmIi4vj008/JTg4mPXr16PT6UhJSTGu//333xk3bhxjxowhKioKNzc3goKCKCwsxMvLizZt2pCYmMj777/PhAkTiI6Oxt3dnaCgIPLz8x9af2U9rO4PPviAy5cv869//YslS5YQHh5uMv937NgxfHx82LJlC/379ycrK4sTJ04Ahheoffv20b9//3KXZWlpyZw5c0hLS2Pz5s0sXbqUrVu38s0335TpuHb/Zx83btxk4jtjsbWx4buN/+Tcnwn89dW/UKuWA18s/xKtNp8Rw4bQrGlT/hXx76o6pU+VpzrIcnNzSUlJoVevXjRr1owhQ4awcOFCrK2tAfjHP/7B3//+dzw9PWndujXTp09n27ZtJcLptsjISPr06YNarcbb25uCggL27dtnss2IESPw9vbGzc2N0aNHk52dzbx582jbti39+vXDw8OD06dPG7fv0qULPj4+2NjY8MYbb+Dm5samTZseeFwajQZLS0vAMHxTq9UPPRdhYWH4+voyevRoXFxcmD59Og0aNDCuX7NmDa+88gqvvPIKLVq0YMaMGWRmZnL06FHjNklJSahUKvz8/GjevDnTpk1j0qRJaLXaUuvcv38/Li4uJo+CgoKHtrU0D6o7OTmZmJgYFi1aRJs2bejatSujRo1iy5Ytxv09PT15/fXXcXFxwd7enp49e/Lzz4absA8ePIilpSVdu3Ytd1kqlYrExEQ6dOhAq1at6Nq1K6Ghobi4uJTpuA4ficOrmyd16tQG4PKVq7w2dDC+Pr0IDhpFdvYtLiUno1KpCOwXwKkz8eTk5FToHD7Nnuo5MhsbG4KDgxk3bhw+Pj74+fnRv39/bGxsyMrKIjk5mWnTpjF9+nTAMBdUXFzM1atXadasmUlZubm57Ny5kxUrVgCGMPH19WXr1q34+voatzM3v3NKbW1tAYwT9LfblJube982u7q6cvHixcof/D0uXLiAv7+/ybK72xofH09CQgI//vijcVleXh6XL182/u7l5UWnTp3w9fUlICAAf39/Ro4ced95oo4dO5aY7LewsKhQ+x9U95EjR9Dr9fj5+Rm3LyoqwtHR8b71BgYGsmzZMj788EN27txJ3759UavVxMfHl7us8ePHM2XKFI4fP46fnx8DBw6kYcOGZTqu5MtX6O3nY/z9k/+ZYfx53/6D2NvZ0bRJEwCcnZ3Q6XRcTbmGcyunMpVfUyg2yCwsLDAzM6OwsNBk+e1XfI1GA8C0adMYNmwYsbGxhIWFsXjxYn744Qdq1za8An7++ee4urqalNGoUaMS9cXGxpKXl8fYsWNRqVSAYbLX0tISrVZboat6pbn3eKqKSqUy9uLuZ9SoUbz22msmy+rVq2f8WaPRsHHjRo4fP87u3bv5+OOPadKkCWFhYSaheNuDJvs1Gk2pvbOCggLj/92929+vbgBLS0uioqJM9nlQT7VPnz5Mnz6dP//8k9jYWObPn29cV96yAgIC8PT0JDY2ltjYWEJDQwkNDaVv37733ee2oqJCLEsJ95+2xbAlchtTp0zCxsba2C6gwr3ap5lih5YWFhY0a9bMZJgGcPr0aaytrWncuDEJCQksWbKE5s2bExQUxObNm6lVqxZbt27FwcGB+vXrk5ycTIsWLYwPR0fHUp+UkZGRDBgwgG3bthEVFWV8qFQqdu3aVeHjuHvCGeDUqVO0aNHiofvdDtPb7OzsyMvLMynv7lB89tln+eOPP0z20ev1xp+dnZ1JSEgocS7uvlq5e/dufvjhBzp27MiUKVPYsmULcXFxFZrwd3Z2LvF/d/HiRbKyskoNvwfV7ezsTEFBAdnZ2ca2N23a1KQXdS9bW1t69erFF198gVarxcvLy9iu8pRVUFDAwoULycnJYfDgwaxevZrBgwezfv36Mp2H2rVrk1LKvYFR0Tt4+aX+tHa+cy6upRi2u7uHLwwUG2QAb7/9NqtWrSI8PJyzZ88SGRnJwoULCQoKwszMDDs7O77++mtCQ0O5dOkSBw4cICUlBScnQ7c8ODiY5cuXExkZyaVLl1i5ciWDBw82eYIDZGZmsnfvXgYNGkSrVq2Mj9atW5d69bI8Dh8+TEREBBcvXmTJkiUkJiby6quvPnS/OnXqkJGRwYEDB8jIyKB169aoVCq+/PJLzp07x+rVq/ntt9+M2w8fPpxt27axc+dO8vLyCAsL48qVO1fA3n77bfbs2cPy5ctJSkpi+/bt9OrVy+SCQEFBAbNmzSIqKorLly8TFRWFubl5iWH4bXl5eSQlJZk8bt/Q+8Ybb3Do0CEWLVrEmTNnOHDgABMmTKBz587GK6R3e1DdTk5O9O7dmylTphAXF8f58+eZOHFiiWHtvQIDA4mJiaFPnz7GF6/ylmVpacnu3buZPXs2CQkJxMfHc/LkSePf2MO0c2vLocO/mPzN6fV63vzbcHp4md7ac+DwERwd69GwwTNlKrsmUezQEgxPTpVKxZo1a7hw4QKNGzcmKCiI4OBgwHAT6ddff838+fP56quvqFevHsHBwcarU6NGjSInJ4d58+aRkZGBu7s7S5YsKdHbiYmJwcrKiu7du5doQ2BgIBMmTCA7O7tCx9CuXTsOHTrEvHnzsLa2NvYgH8bT05Nu3boRHBzMsmXL6N27N3PmzCE0NJQ1a9bg5+dH586djdt37tyZadOmMW3aNPR6PX379sXZ2dm43t3dnRUrVvD555+zfPlymjdvzuzZs03megICApg6dSoLFiwgNTWVli1bsnLlSpOLBneLi4szmT8Ew1zXd999h4uLCxs2bGDx4sWsX78eKysr/P39mTFjRqllPazuxYsX88knnxAUFGSc45o8efIDz6G/vz9WVlb069fPZHl5y/rqq6+YPXs2L7/8MhqNBj8/P6ZOnfrAum97sa8/e/cfJDIqhoEDDO3Q6fWs+t9vmDxxPC/Udgcg4Xwiu3bvZeTwoWUqt6ZR6e/tflTAvIVfVPrTK8zVaj6a+l5lm6IoISEhnDt3jnXr1lV3U0Q12rN3P99u2MiA/gG8/FJgiamNY8dP8PXa9bR93pV3xo4u8UIrqqhHFvhinyr5YEUhaiKfXj0wV6vZsDGcfQcO0aF9O+rWrUtuXi7x8We5eCkZn149GDl8qITYfVRJkL3Qzk0+3VWISujRvRvubm3Zs3cfv589xx/n/sTK2hpnZyfefH0Ez7Z8+AWgmqxKhpZCCFGdFH3VUgghQIJMCPEUkCATQiieBJkQQvEkyIQQiidBJoRQPAkyIYTiSZAJIRRPgkwIoXgSZEIIxZMgE0IongSZEELxJMiEEIonQSaEUDwJMiGE4v0/o6b4azBd12AAAAAASUVORK5CYII=) ### Use Amplitude's EU servers Enable this option to send the data to Amplitude's EU Residency Server [endpoint](https://www.docs.developers.amplitude.com/analytics/apis/http-v2-api/#endpoints), instead of the default standard server endpoint. ## Snowplow Event Mapping Options ### Include Self Describing event Indicates if a Snowplow Self Describing event should be in the `event_properties` object of the Amplitude event. ### Snowplow Event Context Rules This section describes how the Amplitude tag will use the context Entities attached to a Snowplow Event. ![](/assets/images/02-gtm-ss-amplitude-6f8f90af92e7fefe68e952297bed66cd.png) #### Extract entity from Array if single element Snowplow Entities are always in Arrays, as multiple of the same entity can be attached to an event. This option will pick the single element from the array if the array only contains a single element. #### Include Snowplow Entities in event\_properties Using this drop-down menu you can specify whether you want to Include `All` or `None` of the Snowplow context entities in Amplitude's `event_properties`. #### Snowplow Entities to Add/Edit mapping Using this table you can specify in each row a specific mapping for a particular context entity. In the columns provided you can specify: - The Entity name to add/edit-mapping (required).¹ - The key you could like to map it to (optional: leaving the mapped key blank keeps the same name). - Whether to add in `event_properties` or `user_properties` of the Amplitude event (default value is `event_properties`). - Whether you wish the mapping to apply to all versions of the entity (default value is `False`).¹ #### Snowplow Entities to Exclude Using this table (which is only available if `Include Snowplow Entities in event_properties` is set to `All`), you can specify the context entities you want to exclude from the Amplitude event. In its columns you can specify: - The Entity name (required).¹ - Whether the exclusion applies to all versions of the entity (default value is `False`).¹ > **Note:** ¹ How to specify the **Entity Name** and its relation to **Apply to all versions** option: > > Entity Names can be specified in 3 ways: > > 1. By their Iglu Schema tracking URI (e.g. `iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-0-2`) > > 2. By their enriched name (e.g. `contexts_com_snowplowanalytics_snowplow_client_session_1`) > > 3. By their key in the client event object, which is the GTM SS Snowplow prefix (`x-sp-`) followed by the enriched entity name (e.g. `x-sp-contexts_com_snowplowanalytics_snowplow_client_session_1`) > > Depending on the value set for the **Apply to all versions** column, the major version number from the 2nd and 3rd naming option above may be excluded. More specifically, this is only permitted if **Apply to all versions** is set to `True`. **pre-v0.2.0** #### Snowplow Event Context Rules ##### Extract entity from Array if single element Snowplow Entities are always in Arrays, as multiple of the same entity can be attached to an event. This option will pick the single element from the array if the array only contains a single element. ##### Include all Entities in event\_properties Leaving this option enabled ensures that all Entities on an event will be included within the Event Properties of the Amplitude event. If disabling this, individual entities can be selected for inclusion. These entities can also be remapped to have different names in the Amplitude event, and can be included in either event properties or user properties. The entity can be specified in two different formats: - Major version match: `x-sp-contexts_com_snowplowanalytics_snowplow_web_page_1` where `com_snowplowanalytics_snowplow` is the event vendor, `web_page` is the schema name and `1` is the Major version number. `x-sp-` can also be omitted from this if desired - Full schema match: `iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0` ##### Include unmapped entities in event\_properties If remapping or moving some entities to User Properties with the above customization, you may wish to ensure all unmapped entities are still included in the event. Enabling this option will ensure that all entities are mapped into the Amplitude event. ## Additional Event Mapping Options If you wish to map other properties from a Client event into an Amplitude event they can be specified in this section. ![](/assets/images/03-gtm-ss-amplitude-124150b9c2edaf9f44f0ae0067a19c0d.png) ### Event Property Rules #### Include common event properties Enabling this ensures properties from the [Common Event](https://developers.google.com/tag-platform/tag-manager/server-side/common-event-data) are automatically mapped to the Amplitude Event Properties. #### Additional Event Property Mapping Rules Specify the Property Key from the Client Event, and then the key you could like to map it to or leave the mapped key blank to keep the same name. You can use Key Path notation here (e.g. `x-sp-tp2.p` for a Snowplow events platform or `x-sp-contexts.com_snowplowanalytics_snowplow_web_page_1.0.id` for a Snowplow events page view id (in array index 0) or pick non-Snowplow properties if using an alternative Client. These keys will populate the Amplitude `eventProperties` object. ### User Property Rules #### Include common user properties Enabling this ensures user\_data properties from the [Common Event](https://developers.google.com/tag-platform/tag-manager/server-side/common-event-data) are automatically mapped to the Amplitude Event Properties. #### Map Snowplow mkt fields (standard UTM parameters) to user properties Enabling this option automatically maps all the marketing (`mkt_` prefixed) fields of the Snowplow event to the standard UTM parameters in Amplitude's user properties. #### Additional User Property Mapping Rules Specify the Property Key from the Client Event, and then the key you could like to map it to or leave the mapped key blank to keep the same name. You can use Key Path notation here (e.g. `x-sp-tp2.p` for a Snowplow events platform or `x-sp-contexts.com_snowplowanalytics_snowplow_web_page_1.0.id` for a Snowplow events page view id (in array index 0) or pick non-Snowplow properties if using an alternative Client. These keys will populate the Amplitude `eventProperties` object. ### Groups Property Rules > **Note:** This configuration option is relevant **only if** you have set up [account-level reporting in Amplitude](https://help.amplitude.com/hc/en-us/articles/115001765532). #### Groups Property Mapping Rules Specify the Property Key from the GTM Event, and the key you would like to map it to or leave the mapped key blank to keep the same name. You can use Key Path notation here (e.g. `x-sp-tp2.p` for a Snowplow events platform or `x-sp-contexts.com_snowplowanalytics_snowplow_web_page_1.0.id` for a Snowplow events page view id (in array index 0). These keys will populate the Amplitude `groups` object. ## Additional Properties In this section you can additionally set specified event or user properties to custom values (e.g. through a GTM Server-side variable). ![additional\_properties](/assets/images/05-gtm-ss-amplitude-e77fabbf0070687d579898d733c60e33.png) ### Event Properties Using the **Additional Event Properties** table allows you to set additional **event** properties in Amplitude payload. To do so add a row and simply specify the property name for Amplitude `event_properties` and then the value you would like to set it to. ### User Properties Using the **Additional User Properties** table allows you to set additional **user** properties in Amplitude payload. Similarly to the previous table in the section, add a row and specify the property name for Amplitude `user_properties` and then the value you would like to set it to. ### Groups Properties > **Note:** This configuration option is relevant **only if** you have set up [account-level reporting in Amplitude](https://help.amplitude.com/hc/en-us/articles/115001765532). Using the **Additional Groups Properties** table allows you to set additional **groups** properties in Amplitude payload. Similarly to the previous tables in the section, add a row and specify the property name for Amplitude `groups` object and then the value you would like to set it to. ## Advanced Event Settings In this section you can find advanced configuration parameters. ![advanced event settings](/assets/images/04-gtm-ss-amplitude-5cf1cc255705dea76631842bde590aa6.png) ### Forward User IP address Enabling this will forward the IP Address to Amplitude, otherwise Amplitude will not receive the users IP Address (default: `True`). ### Fallback platform identifier If there is no Platform property on the Client event, this is the value which the Tag will forward to Amplitude (default: `Web`). ### Amplitude time setting This option allows you to decide whether the event time of the Amplitude event will be set. The available options are: - `Do not set` (default): this means the event time will be set automatically by Amplitude. - `Set to current timestamp`: sets the Amplitude event time to the current timestamp. - `Set from event property`: sets the Amplitude event time from the client event property. For example, in the image below the Amplitude's event time will be set from the device created timestamp (`dvce_created_tstamp`) of the Snowplow event (`x-sp-` prefix in the client event): ![](/assets/images/07-gtm-ss-amplitude-dda2f8b6d3e860dc2e949e845b21440b.png) ### Device Identifier #### Inherit Amplitude `device_id` from common event `client_id` By default the Amplitude tag sets the `device_id` property of the Amplitude event from the `client_id` property of the common event. Unchecking this tick box allows you to override the value for `device_id` in Amplitude event payload. ### User Identifier #### Inherit Amplitude `user_id` from common event `user_id` By default the Amplitude tag sets the `user_id` property of the Amplitude event from the `user_id` property of the common event. Unchecking this tick box allows you to override the value for `user_id` in Amplitude event payload. ## Logs Settings Through the Logs Settings you can control the logging behavior of the Amplitude Tag. The available options are: - `Do not log`: This option allows you to completely disable logging. No logs will be generated by the Tag. - `Log to console during debug and preview`: This option enables logging only in debug and preview containers. This is the **default** option. - `Always`: This option enables logging regardless of container mode. > **Note:** Please take into consideration that the logs generated may contain event data. The logs generated by the Amplitude GTM SS Tag are standardized JSON strings. The standard log properties are: ```json { "Name": "Amplitude HTTP API V2", // the name of the tag "Type": "Message", // the type of log (one of "Message", "Request", "Response") "TraceId": "xxx", // the "trace-id" header if exists "EventName": "xxx" // the name of the event the tag fired at } ``` Depending on the type of log, additional properties are logged: | Type of log | Additional information | | ----------- | -------------------------------------------------------------- | | Message | “Message” | | Request | “RequestMethod”, “RequestUrl”, “RequestHeaders”, “RequestBody” | | Response | “ResponseStatusCode”, “ResponseHeaders”, “ResponseBody” | --- # Amplitude Tag for GTM Server Side > Forward Snowplow events to Amplitude from GTM Server Side using the Amplitude Tag with HTTP API v2 for product analytics and user tracking. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/amplitude-tag-for-gtm-ss/ The [Amplitude Tag for GTM SS](https://tagmanager.google.com/gallery/#/owners/snowplow/templates/snowplow-gtm-server-side-amplitude-tag) allows events to be forwarded to Amplitude. This Tag works best with events from the Snowplow Client, but can also construct Amplitude events from other GTM SS Clients such as GAv4. The tag is designed to work best with Self Describing Events, and atomic events, from a Snowplow Tracker, allowing for events to automatically merged into an Amplitude events properties. Additionally, any other client event properties can be included within the event properties or user properties of the Amplitude event. ## Template Installation > **Note:** The server Docker image must be 2.0.0 or later. ### Tag Manager Gallery 1. From the Templates tab in GTM Server Side, click “Search Gallery” in the Tag Templates section 2. Search for “Amplitude HTTP API V2” and select the official “By Snowplow” tag 3. Click Add to Workspace 4. Accept the permissions dialog by clicking “Add” ## Amplitude Tag Setup With the template installed, you can now add the Amplitude Tag to your GTM SS Container. 1. From the Tag tab, select "New", then select the Amplitude Tag as your Tag Configuration 2. Select your desired Trigger for the events you wish to forward to Amplitude 3. Enter your Amplitude API Key for a HTTP API integration. This can be retrieved from Amplitude Data Sources within your Amplitude project. 4. Click Save --- # Configure Braze Tag for GTM Server Side > Configure authentication, user identifiers, event mapping, and entity rules for the Braze Tag in GTM Server Side. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/braze-tag-for-gtm-ss/braze-tag-configuration/ ## Configuration options ### Braze REST API Endpoint (required) Set this to the URL of your Braze REST [endpoint](https://www.braze.com/docs/api/basics/#endpoints). ### Braze API Key (required) Set this to your Braze [API Key](https://www.braze.com/docs/api/basics/#app-group-rest-api-keys) that will be included in each request. The minimum permission that you need to assign for this API Key is to access the `/users/track` endpoint. ![key permission](/assets/images/key_permission-0327a99d1c82d569bcce97d5ae5508e4.png) ### Identity settings #### Braze User Identifier This section allows you to select which Braze user identifier (external user ID (`external_id`) or [User Alias](https://www.braze.com/docs/api/objects_filters/user_alias_object#user-alias-object-specification) `user_alias`) will be used by the tag. The default value is `external_id`. ![identity settings dropdown](/assets/images/identity_settings-bf117a2d0fb59b9460f358912d794ac5.png) ##### Braze external\_id This configuration section is enabled if you have selected the `external_id` as the **Braze User Identifier**. ![braze external id](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAe8AAADHCAYAAAAwAdsiAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AACAASURBVHic7d15eFTl3f/xdxaykpiFEJIgoSGBBBIi+74GAgSw0UcNKikqaLW1rVWqQFv118dKXWsrVikqKvWxigsKsq+iyBZBgQABAiSENStZJzM58/uDZiRNgCQGJ0c/r+uai5lzzn2f70mu8Jn7PmfOuJSUltn5gbHb7fWeG4bh+Ndut2MYBjU2GzWGgc1mczwsFgtWq5V+fXo7pXYREZErcXV2ASIiItI0Cm8RERGTUXiLiIiYjMJbRETEZBTeIiIiJqPwFhERMRmFt4iIiMkovEVERExG4S0iImIyCm8RERGTcXd2ASIAx44d4x//+IfjtYeHB+3bt2f06NHEx8c7sbJvzZo1y3Gb3YtNmDCBUaNGOaGiptm/fz8LFy7klltuoW/fvvXWv/POO+zatYs//vGP+Pn51Vu/fft23n//fe6++25iYmK+j5JF5BIU3tKqxMTEEBUVRXV1Nbt372bRokX88pe/pFOnTs4uDYDg4OB6wRcVFXXFduvWrWPVqlU88MADhIeHX63yRORHQuEt9Rw7doxDhw4xduzYBtevWbOGmJgYOnfu3OL7joqKIikpCYDu3bvz0ksvkZmZ2WB4G4aBq+v3e+YnMDDQUd/3yRnHKiKtl8Jb6lm3bh0HDx6kpqaG8ePH11m3YsUKNmzYQG5uLnfddddVrcPHxwcAi8UCwNatW/nwww8ZOnQou3fvZujQoYwePZpdu3axfv16CgoKCAwMJDk5mcTERADmzp1LUVFRvX4ff/xxADIzM1m5ciWFhYWEhoYyceLERo2k/9uWLVtYsmQJN910E/3792fXrl288847TJgwgZycHPbt2wfACy+8wLhx40hKSsJut7Nx40a2bNlCdXU1nTt35oYbbiAgIKDBY+3evTvPP/88I0aMIDc3l5ycHDp06EBaWhqhoaEAnDhxgmXLlpGTk4OPjw/9+vUjOTkZFxeXJh9TZWUlixcv5uDBgwQHB3Pttdc2uQ8RuTr0Vl7qSU9Pp3Pnzqxfv56VK1c6ltcGd+fOnZk6depV2bfFYqG0tJT8/HxWrVqFi4uLI4hr7dixg5iYGCIiIsjOzuadd94hICCAG2+8EV9fX9555x3Onj0LQHJyMqmpqaSmptKrVy8AunbtCkB2djZvvvkmfn5+JCcnYxgGr7/+OufPn79kfTabjaKiIsejdtuBAwcSGhrK2rVrqaqqYuXKlQQFBTFs2DBGjRrl2PfkyZPp2bMncOFN0ooVK4iLi2PkyJHk5OTw1ltv1flK24uPtdbmzZvp0KEDAwcOJC8vj08//RSAqqoqFixYwPnz50lNTaVbt26sW7eOHTt2NOt38emnn7J371569OhBbGws33zzTbP6EZGWp5G31OPh4cGMGTN49dVXWb9+PXBh2nbjxo107tyZGTNm4OHhcVX2vXHjRjZu3Oh4PWDAgHrT86mpqfTp0weAgoICpk2bRlRUFN7e3gQFBfHKK69w7Ngx2rdv79iuqqqKTZs24e/vT2pqKgCbNm3C19eX22+/HXd3d6Kiovj73//Onj17GDJkSIP1HTt2jLlz5zpeh4aG8tBDD+Hq6srkyZN59dVXeeWVVygqKiI9PR13d3c6derEoUOHAOjSpQshISEYhsHmzZvp3r07kyZNAsDd3Z2lS5dy5syZBo/19OnTAAwZMoTJkycDkJOTw4kTJ4AL311/yy23EBoaSrt27ejbty8ZGRkcOXKE/v37N+n3YLPZyMjIICoqittuuw2Atm3bsmzZsib1IyJXh8JbGtRQgF/t4Abo06cP1113HYZhcODAAb788kvCwsIYPHiwYxt/f3/H8+DgYLKzs3n11VcpLCykuroa+HaqvdbHH39McXEx06dPd0zHnzx5krKyMscUeq3/nma/WFhYGMnJyY7Xnp6ejuddu3YlNjaWAwcOEBUVRUJCwiX7KSoqorKykszMTP7whz/UWVdYWNjgsdaqrR/Ay8vLccze3t74+PiwZMkSzpw5Q2VlJYZhONY3RVFRETU1NfzkJz+psy8RaR0U3nJJFwc4cNWDG6Bdu3Z069YNgNjYWL7++mt27txZJ7wvtmfPHhYvXszgwYO56aabyM/PZ9GiRXW22bt3LxkZGQwePNgxZV4rODiYtLS0OssaCsxavr6+9OjR45LrKysrHf/a7fYrnmtOSEhg2LBhdZa1b9++WVPUxcXFvPbaa3Ts2JGpU6fStm1bnn322Sb3AzgujnNzc2tWexG5uhTeclm1AV77/PtktVqx2Wy0adPmktscPnwYgLFjx+Lr60tpaWmd9WVlZXzwwQeEhIQwceLEOutCQ0M5cuQIAQEBBAQEAJCbm0tQUFCz6t29ezfHjx93jL63b9/OgAED6mxTez47MDAQDw8Pzp49S2RkJC4uLtjtdvLy8uqMrJvi+PHjVFdXM2TIECIjI5s14q4VEBCAu7s7p06dciyz2WzN7k9EWpbCW67o+wzto0ePsmHDBmpqasjMzKS6uvqyN2kJCQkB4MMPP6Rjx458/vnnAI6bqbz//vuUl5cTHx9PRkaGo12/fv0YPXo0WVlZ/POf/6R///4UFBSwfft27r333jrTxRcrLi5mw4YNdZZ17tyZjh07snz5ckJCQpg2bRrz5s1j5cqVJCYm4uXl5RjNb9iwgYEDBxIdHc3IkSNZvXo1CxYsIDY2loMHD5Kbm1tvGr2xQkJCcHFxYf369RQXF7Nr1y5qamoavLHMlbi5udG3b1+2bdvGmjVr8PPzY+3atc2qS0RansJbWpWsrCyysrJwc3MjKCiIlJQUhg4desntBw0axNmzZ/n66685ceIEw4YNY/ny5Y7zxtnZ2QBs27atTrvExESioqJIT09n9erVrFq1Cn9/fyZOnHjJ4AbIz89nxYoVdZYlJSVx+PBhiouLSU9Px83NjZSUFBYsWMCaNWscV5hnZGRw4MABOnbsSHR0NKNHj8Zut7N9+3aOHj1Khw4dmDp1arPfLIWHh3PDDTewbt061q1bR79+/aioqKhzDr0pUlJSKCsrY8OGDQQGBtKzZ0+2bNnSrL5EpGW5lJSW2a+8mblc/FGb2ue1ow/DMLDb7RiGQY3NRo1hYLPZHA+LxYLVaqVfn95OqV1ERORK9DlvERERk1F4i4iImIzCW0RExGQU3iIiIiaj8BYRETEZhbeIiIjJKLxFRERMRuEtIiJiMgpvERERk1F4i4iImIzCW0RExGQU3iIiIiaj8BYRETEZhbeIiIjJKLxFRERMRuEtIiJiMgpvERERk1F4i4iImIzCW0RExGQU3iIiIibj7uwCROTHx2KxcDDrEPn5Bbi5uREW1oHoLlG4umo8IdIYCm8R+d7UGAZLly1nxaq1VFdXc801/tTYaigtKyMwIICbb7qBwQP7O7tMkVZP4S3iBEuWLMHX15exY8c2q/2iRYuIiYlh4MCB9daVlZWxaNEi8vLy+O1vf0twcPB3LbdFWK1W/vr3f5B99Bg3/HQSw4YMwtfXF4D8/AJWrl7LgtfeIDf3BGk33+jkakVaN4W3OM2RI0dYvnw5p0+fxs/Pj+HDhzN48OArttu2bRtRUVGEhIS0SB3nzp0jOzubAQMGtEh/LeWll15i4sSJdO7cuUnttmzZgo+PD4899hju7q3nT3zR2//meE4uf5g1k4iIcD76eBkbNn2GRxsPUn86iam3pdE1JpqX//kaYR1CGT5siLNLFmm1Ws9ftvyoVFRU8MYbb3DTTTfRvXt3zp49y5tvvomfnx8JCQmXbbtt2zb8/f1bLLzz8/PZtm1bk8LbbrcD4OLi0iI1NOSnP/0poaGhTW5XWFhIVFQUbdq0qbfObrdf1ZovJTf3BJu/+JJf/eLndOwYwWebv+DTFatIvX4SpWWlvP7GIsI6hNK/Xx+OHc9h8YdLGDigHx4eHt97rSJmoPAWpygoKMDV1ZXExEQAIiIiGD9+POXl5QAYhsGnn37Kjh07aNu2LZMnTyYuLo6//OUvFBcX8+9//5tx48bVG6lfqt3Zs2d58cUXeeihhwgICGDRokX4+/vTvn17Vq5cicViYe7cucyePRuAnTt3smrVKgzDYPDgwSQlJQHw+OOPExsby759+3jggQc4ePAgR44cwTAMDh8+TKdOnUhPT8fLywubzcZHH33E3r178fLyYuLEifTs2bPRP6N3332X1NRUunTpgtVqZfHixRw4cICIiAhsNluDbZYsWcI333zDvn37OHjwIDNmzGDRokXY7XaOHz/OuHHj6N+/P9u3b2ft2rVYrVYSEhKYPHkybdq0YfPmzRw6dIiamhqOHz9OZGQk48ePZ/HixRQWFtK9e3emTJmCq6srn3zyCT4+PowZM+aKx7J1+046dAild68Lv++jx3MYmzSKyRPHA5CZeYDM/QeJ7hJFyvhkVq5ey77MA/S6rvE/L5EfE13aKU4RFhZG27Ztef/99yksLASgd+/ejnO4n332GXl5ecyePZu0tDTeffddKisrmTVrFuHh4UyZMqXBKfZLtWvfvj0DBgxgxYoVHDt2jKNHj5KcnMygQYOYMmUK4eHhjuDOyclh+fLl3H333fzmN78hIyODw4cPO/bh7e3N7NmzCQoKAuDgwYOMGDGCOXPmUFVVxa5duwBYsWIFVquVRx99lJtuuol33333kqF7JRs2bOD8+fM8/PDDjB8/nnPnzjW4XWpqKgkJCSQnJzNjxgzH8rKyMn71q1/Rt29fsrOzWblyJXfeeSePPPIIxcXFrFq1yrFtbm4uKSkpzJo1i7KyMt5++22mTZvGww8/TG5uLgcOHACgf//+9OrVq1H1n8jLIya6i+P1tKm3MuWW/wEg69Bhzp47R2y3GADatvWlQ2h7TuSdbNoPSeRHROEtTuHu7s4vf/lLvLy8mDdvHi+99JIjFAAyMjIYNWoU3t7eREZGEhERQXZ29hX7vVy7sWPHkp2dzXvvvUdKSgre3t4N9vHVV1/Rt29f2rdvj7+/P3379mX//v2O9YMGDcLHx8cx/RwXF0fnzp3x9vYmKiqKgoICAIYMGUJqaipubm7ExMTg4uJCUVFRs35ee/fuZeTIkbRt25bIyEi6dOly5UYX6d27NwEBAbi6urJr1y769+9PWFgYXl5eTJgwgYyMDMe20dHRRERE0LZtW2JiYujevTvBwcH4+/sTGRnpOL4OHTo0+mI4q9WGRwPT+AcOZvH0c3/jxtTr6RoT7Vju4eGB1VrdpGMU+THRtLk4jY+PD5MmTSIlJYV9+/bx3nvvceONNxIfH8/58+f517/+5QhIwzAaNeV8uXaenp706dOHL7/8kt69e1+yj5KSErKysti6dStw4Txxjx49GnVMLi4u1NTUABfeoCxdupSTJy+MIK1Wq+NceVOVlZURGBjYrLb/rbS0lIiICMfrwMBAysvLsVqt9bb9789du7i4YBhGk/cZFBjA6TNn6i1fvWY93brGMD7526l3wzA4dy7fMbMhIvUpvMUpvvnmG4qKihgxYgSurq4kJCSQl5fHvn37iI+Px8/Pj5tvvpnIyMgm9Xu5dhUVFWzfvp3g4GC2bt16ySvb/f39GTNmDKNGjWrWsdX68MMP6dSpEzfffDOurq48+uijze7Lz8/PcT3AdxUQEEBJSYnjdVFREV5eXg1e4NZSEuJ7sOD1NyksLCIo6Ns3IWOSLswmXGz3N3uoqKykR/fYq1aPiNlp2lycIjAwkHXr1pGZmYnVaqWwsJCDBw8SFhYGQGJiIqtXr6aiooKKigqWLl3qCBwPDw8KCgocI9yLXa7d8uXLSUhIIC0tjdWrV1NWVubor6ysjMrKSgB69uzJli1bOHPmDFarlc8//5ysrKwmH2NZWRkuLi5YLBa2bduGxWJp9si7R48efPHFFxiGQXFxMTk5Oc3qBy5MoW/fvp0zZ85QXV3N6tWrG33u+mJnzpxxXK9wJX379iY4OIiFb/2rzsj9o4+XseXLbY7XpWVl/N+/FzOgX19C2rVrck0iPxYKb3GKa6+9lltvvZW1a9fy2GOP8fLLLxMTE8PQoUMBGDVqFKGhoTz77LM8/fTTuLq64u/vD1y4UGrVqlV8/vnn9fq9VLucnBz27NlDcnIyoaGh9OzZk2XLlgEQGRmJn58fTz75JABdunRhzJgxLFy4kD/96U8cOXKEjh07NvkYJ02axM6dO5k7dy4nT56kU6dOjjcMTTVq1Cjc3Nx48sknefvtt7/TFHqnTp1ISUnhjTfe4IknnnBcCd9UX375JV999VWjtnVzdeUXP59B1qEj/G3ey5wvLQXg97NmOi5cO3Eij7lPPYebqyvpt6c1uR6RHxOXktKy5g0FWrGLRze1z2vf7RuGgd1uxzAMamw2agwDm83meFgsFqxWK/36XPqcqEhLyc3NZf78+fWWP/jggz/Ic75Hjx1n3sv/pKysnOsSE4gID8Nmq+HosWPs3bef6C5R3H/fPVxzjb+zSxVp1RTeCm+R75XVauXzL77k6z17yS8oxM3VlbAOHejfrw+9ruvplJvIiJiNwlvhLSIiJqNz3iIiIiaj8BYRETEZhbeIiIjJKLxFRERMRndYE6exWKopKCymvKLS2aWISCvn6+NNcFAAnp76mlhQeIuTWCzV5Jw4RUi7IMLD2ju7HBFp5YpLSsk5cYpOHcMU4GjaXJykoLCYkHZBBFzj5+xSRMQEAq7xI6RdEAWFxc4upVVQeItTlFdUKrhFpEkCrvHTabb/UHiLiIiYjMJbRETEZBTeIiIiJqPwFhERMRmFt4iIiMkovEVERExG4S0iImIyCm8RERGTUXiLiIiYjMJbRETEZPTFJGJqY8aMITs7u97y//3f/+X22293QkVw7tw5BgwY4Hjt7u5OZGQk6enppKen4+Li4pS6ROSHQ+Etpjd9+nSmTJlSZ1lISMhV219tOK9cuZKuXbtecrsXXniB2NhYampq2L17N3/+858xDIM77rjjqtXWFI09DhFpfRTeYnrBwcF06dLF2WXUc+211zpCMS4ujoKCAl599dV64W0YBq6u398ZrJqaGtzc3L63/YlIy9M5b/nBGjlyJIsWLXK8zsrKokuXLpw+fRqAzZs3M2HCBOLj40lLSyMrKwuAiooKoqKiePfdd5kwYQI9evRg2rRpFBYWsnDhQseU+Pjx43nyyScbXU98fDwnT57EarXy5JNPkpaWxl133UVcXBwWi4WKigpmz57NddddR2JiIo888ggVFRV1apo3bx5JSUnExcUxffp0ioqKHP1XVlYyZ84cEhMTGTJkCH/729+oqakB4Prrr2fmzJlMmDCBtLS0Bo/jvvvu495773X0V1JSQkxMDBs3bmzGT19+zMrLy6murqaqqgqLxVLnUVVVRXV1NeXl5c4u09Q08pYfrJSUFFavXk16ejoAa9eupXfv3nTo0IH9+/dz33338cQTT9CrVy/eeustpk+fzvr16x3t58+fz5///Gfc3d158MEHmT9/Pr/73e9ISkpi5MiRLFmyhLi4uEbXc/r0aQICAmjTpg0AGRkZPProozz88MN4eHhwzz33cPr0ad58800Mw2DOnDn8/ve/569//aujj3379rF48WLOnz/PL37xC5544gmee+45AGbOnElZWRmLFy+msLCQBx98kJCQEG677TYAPvvsM+bOnUtMTAzh4eH1jmPNmjXMnDmTyspKvL29+eyzz/D19WXo0KHf+XchPy4ZGRmsWbMGV1dXDMOos6522dixYxk+fLiTKjQ/jbzF9J5//nliY2Mdj9rz3ykpKWzbto3S0lLgQninpKQA8Prrr5OamkpqaiqRkZHMmTOHkpISdu7c6ej3scceY9CgQfTr14/rr7+effv24e7ujre3NwBeXl6OIL4cu93OgQMHmD9/PhMmTHAsHzBgAD/72c+IjY3l2LFjrFu3jqeffprExER69erFM888wyeffEJeXp6jzT333ENQUBCdO3fm4YcfZunSpVgsFk6cOMHKlSt55pln6Nq1KwMHDuTOO+/k448/drSdOnUqSUlJdOrUqcHjGDVqFK6urmzevBmADRs2MGbMGNzd9R5fmmb48OEMHToUwzDqjbwNw2Do0KEK7u9If5VietOmTatzwZqXlxdwYZo6IiKCDRs2MGjQIPbs2cNLL70EQGZmJocPH2bJkiWOdpWVleTl5ZGYmAiAv7+/Y523t7djCruxpkyZgouLC4ZhYLfbmThxIrNmzXKsvzj49+/fj4+PT52RfHx8PF5eXhw6dIj+/fvX6z82NhabzcapU6fIysrCbrczevRox3qbzUa7du0crz08PC5br7e3N0lJSaxZs4YxY8awadMmnn766SYdswhceMM6btw4XF1d2bhxo+P0jZubG8OHD2fs2LHY7XZ98uI7UHiL6V3ugrWUlBTWrFlDRUUFPXv2JCwszLHuzjvv5NZbb63XV0t57rnniIuLw93dnfDw8MuO0j08POpdtGa32zEMg+rq6gbbWK1Wx3a1fXz66ad1tmnqhWkTJ05k9uzZfPXVV1RXVzNs2LAmtRcBcHFxwW63O0J606ZNAIwYMULB3UIU3vKDNnHiRNLS0jh//rxjyhwgOjqaw4cPExkZ6VhWXl6Or6/vFUfYjf1PJyIiotFXwcfExFBWVsbhw4eJjo4GYO/evVgsFmJjYx3b1QZ17fraNwZ2u53q6mpKS0uJj48HLlxVXhvwjT2OkSNHYrVaeeaZZ0hKSrriaF3kUmoDPDk52THyTk5OVnC3EJ3zFtMrKiri+PHjdR4lJSUAdO/enfbt27N582bGjx/vaDNjxgw2btzIiy++yPHjx1m1ahXDhw93XIl+Of7+/ri6urJ+/fpGbd8YkZGRjB8/npkzZ/LNN9/w9ddfM2vWLJKTk+nUqZNju+eee46srCx27tzJU089xQ033ICnpydRUVEkJSXx0EMPsWPHDo4cOcKvf/1r5s6d26Tj8PDwIDk5mR07dtR5syPSHLUBPmHCBCZMmKDgbkEKbzG9BQsWMGrUqDqPiz8iNnHiRHr27ElERIRjWUJCAvPmzWPp0qWMHTuWZ555hscff5wOHTpccX8eHh78/Oc/Z968ebz55pstdhxPPfUU0dHR3H777dxxxx0kJiby/PPP19mmd+/e3H///aSnpxMdHc2cOXMc65599ll69OjB9OnTSU1Nxd3dnQceeKDJxzFu3Dh8fHx0QZG0iNoAV3C3LJeS0jL7lTczl4unFmuf135cofbiIcMwqLHZqDEMbDab42GxWLBarfTr09sptf9YHDpynJgukVfeUIALn/OOj4/ngw8+oFevXld1Xy+//DKZmZm8+OKLV3U/Is2h/zsu0MhbRIALpx82btzIa6+9RlpamrPLEZHLUHiLCACrV6/m/vvv57bbbtONWURaOV1tLmICPj4+DX57WktKS0vTiFvEJDTyFhERMRmFt4iIiMkovEVERExG4S0iImIyCm8RERGTUXiLiIiYjMJbRETEZBTeIiIiJqPwFqfw9fGmuKTU2WWIiIkUl5Ti6+Pt7DJaBYW3OEVwUADn8gsV4CLSKMUlpZzLLyQ4KMDZpbQKuj2qOIWnpwedOoZRUFjMufxCZ5cjIq2cr483nTqG4enp4exSWgWFtziNp6cH4WHtnV2GiIjpaNpcRETEZBTeIiIiJqPwFhERMRmFt4iIiMkovEVERExG4S0iImIyCm8RERGTUXiLiIiYjMJbRETEZBTeIiIiJqPwFhERMRmFt4iIiMkovEVERExG4S0iImIyCm8RERGTUXiLiIiYjMJbRETEZBTeIiIiJqPwFhERMRmFt4iIiMm4O7sAEWmYYRgcyT7KyVOnqbHZCA4Oplu3GLw8PZ1dmog4mcJbpBXatmMn7y3+iILCQtq29cXd3Z2SkvO0adOGcWOT+OnkFNzd9ecr8mOlv36RyygsN0h47Bx5z4Y2q/2B0zamzC9i92MhjW7zwUefsGz5SkaOGMaEcWNoH3KhbXlFBV98sZWPPllG1qHDPPTA/Xh4eDSrLhExN53zFlP519ZKss/VOLuMOtbtt/Cz14pbpK8tW7ezbPlK7p5+B9Om3kpu7gkenv0o9z8wk48/+ZSk0SP445yHOXnqFG+89XaL7FNEzEfhLaby9tZKss/ZmtTGsF+lYv6j/088eGRC2+/cj81m473FHzI2aRSDB/bn1OnT/GP+a8R2i2FSyng+2/wFK1atITysA3ffdQdbtm4n++ix734AImI6Cm9xml05VsY+X0DcH85y/9slVFRfSNlf/V8Jsz8sBWD/KRvdfn+Wk8U1DJ6bT+ZJK796p4SFX1Rcto+FX1Rww0uFjHmugAfeKQEg/tFz/HNTBQOfzKfn4+f48KsqRy1bDlcz9C/5xP3xLHcuLKa0qvGJ/1mWhcc+LnW8Xn/AwuC5+Vz3/87xxn/qbIz9B7MoOX+elAnJABw5cpQuUT/hzmlTGZ88hqFDBpG5/yAAPRN60OnajmzbvrPR/YvID4fCW5yizGLnjteL+e3Ytux6LASrAfPWlwPwx0l+fLyrisNnbfxpaSn3J/kSHuDGltnt6B7ehhdvvYY7h/hctg+AQ2dqePYWf55L83cs25NnZe1Dwfzpp348/smFwM0vM7hzYTEvp1/D14+FUG6xs+Czxofuxc5X2vnFv0qYk9KWDb8LblLbEyfyCGnXjoBrrgFg6JBBzHnkIVxcXCguLmHPvkziYrs6to/uEsWJvJPNqlNEzE3hLU6x6aCFa4PcSEnwxKuNC3cP82FtpgWA9v6u/DrJlzteL+ZEUQ33jvBpch8Aw7p6cN21bWjj5uJY9uskX9p6upDcw5OCMoNyix1fTxeW/jqIhIg2eLi7MDrWk8NnmzY1X+vzw9VEt3dnUqIXgT6u3DGk4dobYrXa8PBoU2/5+dJS/vTnp4gID2NSynjHcg8PD6xWa7PqFBFz09Xm4hSnSgz2HsV98gAABl9JREFU5lmJ+8NZAOyAv9e37yWnDvLmLyvKmJPStk74NqWPhrj8pyu3/2xWY4Cvpwt782z8bvF5yi12CssNBkQ17yru/FKDiEC3ZrUNCgwgP7+Ampoa3Ny+7WPr1h1UWaq49+67cHH59mdx+sxZggIDm7UvETE3hbc4Rai/K4OjPfi/uxsOn/mbyomPcOe1zytIH+yNd5v6AX6lPhprT56Vp1aU8fH9QXS4xpUFn1XwVU7zRrQh/q4UlhvNatujexyW6moyvtpN/359HMu7x3UjPDyszsfCSkrOk7l/Pz+7/dZm7UtEzE3T5uIUw2I82HPCxsq9FmwGbDxYzcsbL5yvPl5Qw6ubK3gl/Rr6RLbhb2u+PY/t4+HC8cIabDWX76Mp8kvt2O1g2O0cOWfj491V1DQvfxka7UHmSRtf51ox7PDpN1VXbvQfgYEBDBk8kH+/9wElJecdy7ftyOD9D5c4Xtvtdha+9Tb+/v4MHNCveYWKiKkpvMUpAnxcWXhnAC+sLSPuD2d5akUZw2Iu3PbzDx+V8rNBPnQMdGN2Slve2FLhOAd920BvnlpRxoLN5ZftoylGdPNgeFcPRj1TwD1vljAwygNbTfM+X+bn5cKLt/lz379KGPqXfE4VN+1dwO1TbsbTy5Mnn36OnNwTAPzPDdfz+B9nA1BaVsaL/5jPvsz9/OLnM3SXNZEfKZeS0rKr/CnY75/dbq/33DAMx792ux3DMKix2agxDGw2m+NhsViwWq3069PbKbVL6xX/6Dkstrp/Lk/e6MfNfb1bdD+lpWW89MoCDmYdontcLF2iOuPu3oaTp06x++s9+Hh788v77qZL1E9adL8iYh4Kb4W3tFK7v/6GbTsyLvpikiASE+IZOmRwg1eli8iPh+bcRFqp6xJ7cl1iT2eXISKtkM55i4iImIzCW0RExGQU3iIiIiaj8BYRETEZXbAmTrM718YLa8pYm2nhB/eRBxET2nhf428q9H3y9fEmOCgAT8/m3bb4h0jhLU6xO9fGpL8VYAdcXcCu9BZxupgukc4uoUHFJaXknDhFp45hCvD/UHiLU7ywpgw7kNbPi7/8jz8e7g1/+YiIXH0RM884u4TLCrjGD4CCwmLCw9o7uZrWQee8xSnWZlpwdUHBLSKNEnCNH+UVlc4uo9XQyFucws6FqXIFt4hI02nkLSIiYjIKbxEREZNReIuIiJiMwltERMRkFN4iIiImo/AWERExGYW3iIiIySi8xTSioqKcXYKISKug8BYRETEZhbeIiIjJ6PaoIiLSZO+//z5Wq9Xx+uabb8bd3Z2qqio++ugjx3JPT09uvPFGAHbs2MHhw4cd6/r160d0dPT3V/QPiMJbRESazGKxUF1dXW+53W6nqqrh7wW32Wx11tXU1Fy1+n7oFN4iItIo58+fp6KiArgQ0hc7c+YMbm5uWCyWOssNw+D06dMAjra1SkpKHOvat2+Pq6vO5DaWwltERBplz549ZGVlNbhu9erVDS6vrq5mxYoVDa7bt28f+/btA+C2227D09OzZQr9EdDbHBERaXF+fn64u2t8eLUovEVEpMWNGDGCdu3aObuMHyyFt4iIiMloTkNERFqEm5tbvde1ywzDqHeRmzSfwltERFrE9ddf77jozNPTk5EjRzo+DrZ9+3ays7OdWd4PisJbRERaxMU3Z5k0aRI7d+50fBRMWpbOeYuIiJiMwltERMRkNG0uIiItbsWKFRiG4ewyfrAU3iIi0ijdunUjPDz8qvTdpk2bq9LvD5XCW0xDV6qKOFe7du1045VWQue8RURETEbhLSIiYjIKbxEREZNReIuIiJiMwltERMRkFN4iIiImo/AWp3ABXF2g2qZvGRIRaSqFtzjFmO6eGHaY9cF5BbiIXFFxSSm+Pt7OLqPV0E1axCkeGNuWtZkW3t1RxeKdVRjKbxG5hOKSUs7lF9KpY5izS2k1FN7iFNdd686y3wTzwpoy1mZanF2OiACHjhx3dgkN8vXxplPHMDw9PZxdSquh8Banue5ad964K8DZZYiImI7OeYuIiJiMwltERMRkFN4iIiImo/AWERExGYW3iIiIySi8RURETEbhLSIiYjIKbxEREZNReIuIiJiMwltERMRkFN4iIiImo/AWERExGYW3iIiIySi8RURETEbhLSIiYjIKbxEREZP5/6CzcMYfmXHcAAAAAElFTkSuQmCC) - **Set external\_id from:** Use this option to select how you want to set the `external_id`, either from the Event Property you specify or directly from the Value you provide. - **external\_id** Depending on the previous selection, here you can specify the value or the client event property (e.g. client\_id) that corresponds to the `external_id` user identifier for Braze API. ##### Braze User Alias Object This configuration section is enabled if you have selected the `user_alias` as the **Braze User Identifier**. ![braze user alias object](/assets/images/user_alias_object-b7392e429e51269631693e8083b3e47a.png) ###### Update existing users only When enabled(default), this option will only update existing users. Uncheck this box to allow creating alias-only users. ###### User Alias Name - **Set user alias name from:** Using this section you can select how you want to set the name of the user alias object: from an Event Property or directly from a Value you provide. - **Alias Name** Depending on the previous selection, here you can specify the value or the client event property that corresponds to the User Alias Name. ###### User Alias Label - **Set user alias label from:** Using this section you can select how you want to set the label of the user alias object: from an Event Property or directly from a Value you provide. - **Alias Label** Depending on the previous selection, here you can specify the value or the client event property that corresponds to the User Alias Label. ## Snowplow Event Mapping Options This section includes the mapping rules that concern a Snowplow event as claimed by the [Snowplow Client](/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-client-for-gtm-ss/): ### Snowplow Self Describing Event ![snowplow event mapping options](/assets/images/snowplow_event_mapping_options-cfd9b92901b525b1cee1dcb7a6a2b004.png) #### Include Self Describing event This option indicates if the Snowplow Self-Describing event data will be included in the event's properties object that will be sent to Braze. By default, this option is enabled. #### Self Describing Event Location This section is available only if the [Include Self Describing event](/docs/destinations/forwarding-events/google-tag-manager-server-side/braze-tag-for-gtm-ss/braze-tag-configuration/#snowplow-self-describing-event) option is enabled. Using this drop-down menu you can indicate the location where Snowplow Self Describing event properties should be added under Braze event properties. The available options are: - **Nest under schema name** (default): The schema name will be used as a key in Braze event properties with the self-describing data as its value. - **Merge to root level**: The self-describing properties will be added directly as Braze event properties without nesting. ### Snowplow Event Context Rules This section describes how the Braze Tag will use the context Entities attached to a Snowplow Event. #### Extract entity from Array if single element Snowplow Entities are always in Arrays, as multiple of the same entity can be attached to an event. This option will pick the single element from the array if the array only contains a single element. #### Include Snowplow Entities in event object Using this drop-down menu you can specify whether you want to Include `All` or `None` of the Snowplow context entities in Braze's `event_object`. #### Snowplow Entities to Add/Edit mapping Using this table you can specify in each row a specific mapping for a particular context entity. In the columns provided you can specify: - The Entity name to add/edit-mapping (required).¹ - The key you could like to map it to (optional: leaving the mapped key blank keeps the same name). - Whether to add in `event_object` or `user_attributes_object` of the Braze event (default value is `event_object`). - Whether you wish the mapping to apply to all versions of the entity (default value is `False`).¹ #### Snowplow Entities to Exclude Using this table (which is only available if `Include Snowplow Entities in event object` is set to `All`), you can specify the context entities you want to exclude from the Braze event. In its columns you can specify: - The Entity name (required).¹ - Whether the exclusion applies to all versions of the entity (default value is `False`).¹ > **Note:** ¹ How to specify the **Entity Name** and its relation to **Apply to all versions** option: > > Entity Names can be specified in 3 ways: > > 1. By their Iglu Schema tracking URI (e.g. `iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-0-2`) > > 2. By their enriched name (e.g. `contexts_com_snowplowanalytics_snowplow_client_session_1`) > > 3. By their key in the client event object, which is the GTM SS Snowplow prefix (`x-sp-`) followed by the enriched entity name (e.g. `x-sp-contexts_com_snowplowanalytics_snowplow_client_session_1`) > > Depending on the value set for the **Apply to all versions** column, the major version number from the 2nd and 3rd naming option above may be excluded. More specifically, this is only permitted if **Apply to all versions** is set to `True`. ## Additional Event Mapping Options If you wish to pick other properties from the Client event and map them onto the Braze event, these can be specified in this section. ### Event Property Rules #### Include common event properties Enabled by default, this option sets whether to automatically include the event properties from the [Common Event definition](https://developers.google.com/tag-platform/tag-manager/server-side/common-event-data) in the properties of the Braze event. #### Additional Event Property Mapping Rules Specify the Property Key from the Client Event, and then the properties' object key you could like to map it to or leave the mapped key blank to keep the same name. You can use Key Path notation here (e.g. `x-sp-tp2.p` for a Snowplow events platform or `x-sp-contexts_com_snowplowanalytics_snowplow_web_page_1.0.id` for a Snowplow events page view id (in array index 0) or pick non-Snowplow properties if using an alternative Client. These keys will populate the Braze event's properties object. #### Include common user properties Enabled by default, this option sets whether to include the `user_data` properties from the common event definition in the Braze User Attributes object. #### Additional User Property Mapping Rules Using this table, you can additionally specify the Property Key from the Client Event, and then the User Attribute key you could like to map it to (or leave the mapped key blank to keep the same name). You can use Key Path notation here (e.g. `x-sp-tp2.p` for a Snowplow events platform or `x-sp-contexts_com_snowplowanalytics_snowplow_web_page_1.0.id` for a Snowplow events page view id (note the array index 0) or pick non-Snowplow properties if using an alternative Client. ## Advanced Event Settings ![advanced event settings](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAbsAAADICAYAAABxqE/FAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AACAASURBVHic7d15fJTl3e/xT/aNTEISAgmEhJAAgQREwm4IKCISBAtSooD6KPB4bM9TT33KgVotPdalnmp92VN7jsW6tVoqFhSFAGUTDFvYCZCQBUMSwpJlQrZJZjl/8GRqyBAWEwI33/frNS+d+7rnun73PTDfue5lcKs0VzvsdjsAdrsdh8OB3W7HZrVis9uxWq3Oh8VioampieHD7kRERORW4d7ZBYiIiHQ0hZ2IiBiewk5ERAxPYSciIoansBMREcNT2ImIiOEp7ERExPAUdiIiYngKOxERMTyFnYiIGJ5nZxdwq1mzZg1btmzh4YcfZujQoW2uu3v3blasWMGCBQuIj4+/QRV+f1eqe/HixTT/xNx33X///UyYMOFGlNimnTt38s0331BRUUGXLl0YNmwY99xzDx4eHu02xocffkhNTQ1PP/10u/UpIh1HYXeNDh06BMCBAweuGHZGFhoaSnJycotlsbGxHT7u4sWLSUhI4LHHHnPZ/vXXX/Pll18SGxtLUlISpaWl/POf/8RsNjNr1qzrGuPkyZO8/fbbPPDAA6SkpABgMpnw9NRfH5FbxS33t/XkyZOcOHGCe++912X7hg0biI+PJyYmpt3HLioqoqKigrCwMHJzc6mvr8fPz6/dx7kVdO3alXvuuaezy2hl9+7dBAUFsXDhQtzdLx6lX7ZsGVlZWaSlpeHv798u4zz44IPt0o+I3Bi3XNht3LiRnJwcbDYbkydPbtG2du1aNm/ezKlTp3jiiSfafeyDBw/i4eHB9OnTeffddzly5AjDhw93ttfX1/Ppp5+Sk5NDaGgoUVFRzrbt27fzxRdfsHDhQuLi4gD4zW9+g5eXFz/96U+pqalh9erVHD9+HIB+/frxgx/8AH9/f8rKynjjjTdITU3l1KlTFBUV0aNHD2bPnk337t0BqKqqYtWqVeTl5eHr68vQoUOZPHmy89Dd0aNHycjIoKKigu7du5OWluacibVV97X64IMPyMnJ4Ve/+hVeXl6YzWZeeuklxo4dy/Tp02loaOCLL77gyJEjeHt7k5iYSFpaGl5eXmRmZrJq1SqmTZvGjh07MJvNxMfHk56eTnl5OW+++SYA2dnZLF68mFdffbXV+FarlcbGRhoaGpzB9uCDD1JZWemciV2uhnPnzrUaIy0tjdWrVwOwevVqjh07xsKFC3nrrbeora1lyZIlACxdupSYmBj8/f05dOgQ/v7+TJs2jcTERAAaGxtZsWIFR48eJTg4mDvuuIP169czb948kpKSqKqq4vPPPyc/Px9PT0+SkpKYOnUqXl5e1/1eiMi/3HIXqMybN4+YmBg2bdpERkaGc3lz0MXExDB37tx2H9fhcHDo0CHi4uLo168fJpOJAwcOtFjnq6++4siRIwwaNIgBAwY4D3kCzg+9nJwcAMrLyykvLycpKQmAv/71r2RnZ3PPPfcwfvx4Dh8+zOeff96i/23bttGjRw9GjRpFSUkJX331lbO2Dz/8kPz8fFJTUxk4cCBbt27ln//8JwAFBQV88MEHBAYGMmnSJOx2O3/+85+prq6+Yt2XY7VaqaysdD6a+0pKSsJqtZKXl9die5u3/8MPP+Tw4cOkpKRwxx13sHPnzhbvI8CmTZsYPHgwcXFxZGdns3PnTkJCQnjsscdwc3MjKiqKRx991GVdY8eOpb6+njfeeIPNmzdjNpsJCwsjPj4eb2/vNmtwNcbAgQO5//77ARg+fDj33XffZffJ0aNHqaurIzU1lcbGRj799FNsNhtw8VzvgQMH6NevH4mJiWzfvr3Fa//2t79x8uRJpk6dysiRI9mxY0er/SIi1++Wm9l5e3szf/58li1bxqZNm4CL/w7fli1biImJYf78+c4PtfZUWFiI2Wxm4sSJuLm5MXDgQHbv3k1tbS0BAQFYrVb27t1LbGwsjzzyCABdunThyy+/BCA4OJioqChycnJIS0trFQLjxo3D39+f6OhoAHJzcykoKGhRw9ixY3nggQeAi4dUi4uLgYuHdouLi1ucU7pw4QKFhYUAbN26lYCAAObMmYOnpyexsbG89dZbHD58mJEjR7ZZ9+WcPHmSV155xfm8e/fuPPvssyQkJODh4UFOTg4JCQnk5OQQEBBAnz59KC4uJi8vj7S0NEaPHg1ARUUFe/fudW4XwEMPPcTAgQNpamriF7/4BcXFxaSmpjJo0CDc3NwwmUwMHDjQZV0pKSl4e3uzbt061q5dS0ZGBnfeeSfTp0/H19f3ijW4GqNPnz4A9OjRw/n+uNKzZ09nWFqtVjZv3kx5eTlhYWHs3r2b6OhoZ0h369aN5cuXO19bWlpKVFQUw4cPd47v4+PT5nsgIlfvlgs7cB14HRl0cPEQJlz8UC8vL6dXr17s3LmTQ4cOMXr0aCorK7HZbM4PRgBfX98WfSQmJrJ27VrMZrPzkGFERARw8YN0/fr1fPLJJ9TW1tLU1NRqW757vsnX15fGxkYAzp49C0CvXr2c7d+9gKO0tJSamhqWLl3aor/mWdmV6nYlIiKCSZMmOZ83fzD7+fnRt29fcnNzsdvt5OXlkZSUhLu7O6WlpcDFmWTzrLSZxWJptZ1eXl54eHg4t/NqjRw5kuTkZI4ePUpmZiZ79+7FbDazcOHCK9bwfQLGz88PNzc34F/7sLGxkaqqKqxWK3379nWue+l7O3bsWDZu3Mhrr71GQkICQ4YMaTNYReTa3JJhBy0DD+jQoLPb7Rw+fBiAt99+u0XbwYMHGT16tPNiiLYub09KSmLt2rUcPXqU/Px858wC4L333qO+vp4ZM2YQFhbGihUrOH369FXV1/wB25bQ0FBmz57dYpnJZMLhcFyxblcCAgIYNGiQy7akpCQ+++wz9u/fT319vXP22mzy5Mmtrtxsj3NTtbW15OTkEBERQUREBElJSSQlJfHuu++Sk5OD2Wzu8BpcaX5/2nqf7rvvPvr378+BAwfIzs5m+/btpKWlkZqa2iE1idxubtmwg38FXvP/d5T8/HxqamoYPXq08+ISgD179pCTk0N1dTXBwcF4enq2CCir1dqin7CwMHr06MGmTZtobGx0nq+rqamhrKyMMWPGkJCQAOA813M1wsPDASgpKXHO0DIyMqitrWXmzJl0796d/Px8goODCQ4OBuDUqVOEhIRgs9muWPe1SkxMZOXKlWRkZODr6+u8V69Hjx7AxYtpmq+Wrauro76+3vll4Wo0B/SlrFYry5cvp1+/fjz55JPO5V26dAEuhs3V1uBqjMuNeyVBQUF4eXk5DytDy5ms2Wxm8+bNJCUl8eCDDzJt2jRef/11srKyFHYi7eSWDjvo2JBr1nwhyvjx4+natatzuZubG8ePH+fgwYOkpKSQnJzMrl272LBhA4GBgc4LRL4rKSmJDRs2YDKZnFc9+vv7ExAQwIEDBwgODqakpISioqKr3rbo6GiioqJYv3499fX11NXVkZmZ6fygvPvuu8nNzeWdd95hxIgRlJeXs3v3bp566in69OlzVXVfqqqqis2bN7dYFhMTQ58+fQgICCAmJoaCggLuuOMO56yxd+/exMXFsWvXLux2O926dWtxq8DVMJlMnDx5ko0bNzJhwoQWARUUFERycjJ79uxh2bJl9OnTh4qKCvbt20efPn0wmUyYTKYr1nDpGCaTCbj458Df37/V/YVX4u7uzvDhw8nMzOQvf/kLYWFh7Ny509keEBDAkSNHOHbsGBMnTqSpqQmz2XzZmbOIXLtb7mrMG81ms3HkyBF69OjRIujg4u0Bnp6ezvN5U6ZMYdCgQWzevJlt27YxePDgVv01z+YSExOdh7Xc3d159NFHMZlMbNy4EavVSmJiIo2NjdTW1l6xRjc3N+bNm0dsbCxbtmzhwIEDjBs3znnlYGxsLPPmzcPDw4N169aRm5tLWlqacxZ4NXVf6vz586xdu7bFIzc3t9V2Nv+32dy5cxk6dCiHDx9mw4YNhIaGMnPmzCuO16z5dpPt27e7nGnNnDmTtLQ0qqqq2LhxI7m5uYwYMaLF1ZtXquHSMUJCQhgzZgxnz57lyJEjV13rd02ZMoU77riDY8eOcfjwYeehXTc3Nzw9PVmwYAGhoaGsWrWK9evXk5iYyPTp069rLBFpza3SXO1o/uknu92Ow+HAbrdjs1qx2e1YrVbnw2Kx0NTUxPBhd3Zy2SK3lua/O82HVDds2MCGDRt45plniIyM7OTqRIzvlj+MKXIrWLduHdnZ2YwYMYKmpia2b99Or169nOcQRaRjKexEboBx48Zx4cIFtm7dioeHBwkJCUydOvWaLswRkeunw5giImJ4+lopIiKGp7ATERHDU9iJiIjhKexERMTwFHYiImJ4CjsRETE8hZ2IiBiewk5ERAxPYSciIoansBMREcNT2ImIiOEp7ERExPAUdiIiYngKOxERMTyFnYiIGJ7CTkREDE9hJyIihufZ2QWI3Kyqq6vJyc3DbK7G18+XPjHR9IyM6OyyROQ6KOxELlF94QKfLF/Bzl178PDwwGQKpK6unoaGBvrERDNvTjqxfWI6u0wRuQYKO7nlZWZmUlhYyJw5c753X2fPneM3//tNPDw8ePqp+QwdMhhPT08cDgeFhSf5x+erefk3r/PfFj7BsDuHtkP1InIjKOxcsNvtLF68uNXy9PR07rzzznYd69y5cxQUFDBy5MjL1jFt2jTuuusu5/JVq1bh6enJ1KlT27WWq1VUVMTq1aspLS0lKCiI8ePHM2LEiA4dc82aNfj7+zN+/PgOG8NqtfLm7/+IyRTIf/70P3DDjT+9+wGHjhwhvFs3Hpv7MM8+89/5698+5f8te48XnutOr56RHVaPiLQfhV0bnnvuOYKCgjp0jPPnz7Nr1y6XYdds/fr1DB48GJPJ1KG1XI2KigqWLVvG5MmTmTdvHqdPn+bvf/87vr6+DB48+Hv373A4cHNza/V81KhReHh4fO/+27J56zbKyyt49ddLCfD3551l75F7Io8fzvwB+w8e4ne/f5vXXn6ROemzKCw8yacrVvI/fvKjDq1JRNqHwu4affzxx/Tq1Ytx48YB8NFHHxEbG8vYsWMpKiris88+o7KykoSEBGbOnIm3tzeZmZnk5+djt9vJy8ujd+/ezJs3j/3795ORkYHFYuGVV15hyZIlLseMioriiy++YO7cuS7b8/Pz+eyzz6itraVPnz6kp6fj6+vLtm3bOHHiBDabjW+//Zbo6GgmT57Mp59+SkVFBQMHDiQ9PR13d3fsdjtfffUVe/bsoUuXLjzwwAMkJCS0GiszM5OBAwcyZswYAEwmE/fffz9ff/01gwcPvu79c+DAASwWCxEREYwfP5533nmH3r17c/r0aZYsWcLXX39NQEAA9957LwAbN25k+/bt+Pv707NnzxY1ZmVlsW7dOux2O2PGjOGee+65qvd25649jBk9kq5dgwEoKT3Nw7MfYsTwYQxPHsaPn/lPThUX0y8+jrT77+P//PEdamtrCQgIuKr+RaTz6NaDa5SUlMTRo0cBsNlsnDhxgsTERCwWC++//z733nsvzz//PDabjc2bNztfl5OTQ2pqKj//+c9paGhg//79jB49mvT0dCIjIy8bdABTpkwhPz+fnJycVm01NTW8//77zJkzh+eff57Gxka2bdvmbD916hRTpkxh8eLF1NTU8Ne//pXHHnuMRYsWcerUKY4fPw7A119/TUlJCUuWLGH27NksX76c+vr6VuOVlJQQGxvbYllMTAwlJSXfa/+cPXuWhx56iFmzZgFQW1tLUlISzz77bKsa8vLy+Oabb1i4cCELFiygvLzc2VZUVMSaNWtYsGABP/nJT9i7dy95eXmX3bffVVxSSnxcX+fzX73wc0YMHwbAtu2ZBHbpQq//Cta4uFjsdjuny85cVd8i0rkUdm146aWXWLRoEYsWLeJ3v/sdAAMGDKC0tJS6ujoKCwsJDw8nKCiI3NxcQkJCSExMxMvLi5SUFI4dO+bsKyEhgZiYGPz8/IiNjW3xAX0l/v7+TJ06lZUrV2K1Wlu0+fj48OMf/5iePXvi6elJ//79OXfunLM9Li6Onj170qVLF+Lj4xk4cCChoaGYTCaio6Oddezdu5cJEybg5+dHdHQ0PXv2pKCgoFUt9fX1rWYyAQEB2Gw2mpqarnv/xMfHExUV5TxUGRAQwLBhw/D29m5VQ3Z2NsnJyURERBAcHMywYcOcbfv27SM5OZnw8HBMJhPJycktxmmL1dqEt5dXq+VfrsngH5+v5j9+/BT+/n4AzroaGxuvqm8R6Vw6jNkGV+fsvLy8iI+P5/jx4xQXFzvPU5nNZkpKSnjhhRec6/r6+rrs183NDZvNdk21DBs2jKysLDZu3NiqntLSUlasWEFjY6PzUKYr7u4tv9u4ublht9uBi/eU/eUvf3GeL7Pb7S7Pwfn6+lJbW9tiWW1tLR4eHnh6euLm5va998+V1NTUtJpdNjObzeTm5rJz507g4jm/QYMGXVW/wcHBlJ1pPVP7au16pj8wpcWs78x/zehCQkKutXwR6QQKu+swePBgDh06xOnTp1mwYAEAgYGB9O3bl/nz53fYuDNmzOCtt94iKiqKyMiLVwGWlJSQkZHB008/TVBQENu2baOoqOia+w4MDGTWrFlER0e3uV5UVBT5+fktrr48efIkPXv2dAZlR++fwMDAVoHbzGQyMXHiRCZMmHDN/Q5OHMSOnbtJu/8+57Y4HA4en/cI/fvFt1j3m527CAsLpUf38GvfABG54XQY8zokJCRw4sQJfH196dq1K3DxMFxJSQnZ2dnY7XZyc3PZunXrFfvy9vampqbG5fmxS3Xr1o2UlJQW56BqampwOBw4HA7OnTvHwYMHnbO1azFkyBDWr19PXV0ddXV1rF69GrPZ3Gq90aNHc+zYMXbs2EFNTQ15eXlkZGSQkpLiXKc9948rgwYNYt++fdTV1WG1Wp3nCOFi0GZmZnLmzBmamprYvn07ubm5V9Xv5EkTKTtzltVfZTiX2R0O/vjOu5z8zheIvPwCNm3+mqlTJl9X/SJy42lm14aXXnqpxfMZM2YwatQovL29iYuLIyoqytnm7+/P448/zqpVq/jkk08IDw9n5syZVxwjOjqawMBAXn75ZV588cUrrn/33Xdz4MAB5/N+/frRr18/Xn/9dbp27Ur//v05f/78NWzlRRMmTGDNmjX89re/xW63M3z4cJe3OoSEhPDkk0+yevVqVq9eTVBQEJMmTWLIkCHOddpz/7jSt29fhg0bxhtvvIGfnx/du3dv0TZx4kTee+89amtriYuLu+p7I7t3D2fenHQ++Ohjmpoamf5AGp6enry/7I/Odfbu28+y9z7kjiFJpKaMva76ReTGc6s0VzuaZwJ2ux2Hw4HdbsdmtWKz27Farc6HxWKhqamJ4cPa98ZqkZvJ9m928NHHy/Hz82XokMGEhIRQV1/H0aPHKTpVzPhxdzH3kdkdft+fiLQfhZ2IC2ZzNVu+3sax47lUV1fj6+dHn5je3DVmNH1i2j6vKSI3H4WdiIgYni5QERERw1PYiYiI4SnsRETE8BR2IiJieLrP7jIslkbKK6qorbvyzd4icvsK8PcjNCQYH5/Wv+MqNw+FnQsWSyNFxafpFhZCZIR+Dkqks53I/5b4vjfnLR9V5gsUFZ+md68IBd5NTIcxXSivqKJbWAjBQYGdXYqI3OSCgwLpFhZCeUVVZ5cibVDYuVBbV6+gE5GrFhwUqFMeNzmFnYiIGJ7CTkREDE9hJyIihqewExERw1PYiYiI4SnsRETE8BR2IiJieAo7ERExPIWdiIgYnsJOREQMT2EnIiKGp3/1QEQMZ8WKFTQ1NTmfz5o1C09PTxoaGli5cqVzuY+PDzNmzABgz5495OXlOduGDx9OXFzcjStaOpTCTkQMx2Kx0NjY2Gq5w+GgoaHB5WusVmuLNpvN1mH1yY2nsBMRQ6iurqaurg64GGrfdebMGTw8PLBYLC2W2+12ysrKAJyvbWY2m51t4eHhuLvrrM+tTGEnIoZw+PBhcnNzXbatX7/e5fLGxkbWrl3rsi07O5vs7GwAHnnkEXx8fNqnUOkU+qoiIredwMBAPD31Xf92orATkdtOamoqYWFhnV2G3EAKOxERMTzN40XktuDh4dHqefMyu93e6qIWMRaFnYjcFqZNm+a8yMTHx4fx48c7by/YvXs3BQUFnVmedDCFnYjcFr57M/nUqVPJyspy3logxqdzdiIiYngKOxERMTwdxhSR287atWux2+2dXYbcQAo7ETGE/v37ExkZ2SF9e3l5dUi/cuMo7ETEEMLCwnSjuFyWztmJiIjhKexERMTwFHYiImJ4CjsRETE8hZ2IiBiewk5ERAxPYSciIoansHMhwN+PKvOFzi5DRG4RVeYLBPj7dXYZ0gaFnQuhIcGcO1+hwBORK6oyX+Dc+QpCQ4I7uxRpg35BxQUfH29694qgvKKKc+crOrscEQFO5H/b2SW4FODvR+9eEfj4eHd2KdIGhd1l+Ph4ExkR3tlliIhIO9BhTBERMTyFnYiIGJ7CTkREDE9hJyIihqewExERw1PYiYiI4SnsRETE8BR2IiJieAo7ERExPIWdiIgYnsJOREQMT2EnIiKGp7ATERHDU9iJiIjhKexERMTwFHYiImJ4CjsRETE8hZ2IiBiewk5ERAxPYSciIobn2dkFiHSW0tNlFJ48SX1dAyZTIP36xREcFNTZZYlIB1DYyW3n22+L+Ojj5eTlF+Dr44N/gD/V1Rew2WyMSB7Gw+kPKfREDMat0lztsNvtANjtdhwOB3a7HZvVis1ux2q1Oh8Wi4WmpiaGD7uzk8sWuT4HDh7iD/93GfF9Y5nxg2n0je2Dm5sbVquVg4eO8Olnq7BYLPzP/3yGHj26d3a5ItJOFHYu2O12Fi9e3Gp5eno6d97Zvtt+7tw5CgoKGDlypMv2Xbt2ERsbS7du3QD4wx/+QFpaGjExMe1ax83s0n1wvU6XlbH0xVcZM2oEj859mNLTZfz5/Y8oKT3NgP79ePzRR/Dx9uH1N3/PhZoaXvzlL/D29mqnrRCRzqQLVNrw3HPP8dprrzkf7R10AOfPn2fXrl2Xbd+1axfnz593Pp8+fTo9e/Zs9zquhcPhwOFw3LAxLt0H12vFPz6nR4/uzJuTjs1m483fv42HhwfpP5zJ6bIylv35A/z8fPmPH/071dXVbNy05XuPKSI3B52zu0Yff/wxvXr1Yty4cQB89NFHxMbGMnbsWIqKivjss8+orKwkISGBmTNn4u3tTWZmJvn5+djtdvLy8ujduzfz5s1j//79ZGRkYLFYeOWVV1iyZEmLsV599VWqqqr429/+xn333ceYMWNYvnw5Dz74IH379uWDDz7AZDJRUFBAZWUlo0aNIjo6ms8//xyr1crdd9/trLOmpoa///3vFBYWEhERwezZswkNDW21fUuXLmXIkCGcOHECm83GtGnTGDRokLNtwIABZGdn88wzzxAcHMyaNWvIysrC19eXlJQU7rrrLgAyMzM5cuQIVquViooKoqOjmT17Nt7e3gBkZWWxbt067HY7Y8aM4Z577nE5xp/+9KcW+8BisVBYWMgTTzwBQEZGBtXV1fzwhz9s831raGjgwMHDLJz/OO7u7pSdOYsbbvz7/H8jNDSELgEB/PGdd3E4HJhMJlLGjmHn7j3cP/ne6/2jIiI3Ec3srlFSUhJHjx4FwGazceLECRITE7FYLLz//vvce++9PP/889hsNjZv3ux8XU5ODqmpqfz85z+noaGB/fv3M3r0aNLT04mMjGwVdACLFy8mMjKS9PR0xowZ47KesrIyFixYwI9+9CN27NjB3r17efbZZ3niiSfIyMigrq4OgOXLlxMeHs4vf/lLkpKSWL58+WW3sWvXrvzsZz9j9uzZLF++nPr6emebn58fS5YsISQkhM2bN3Pq1CkWLVrE/Pnz2bZtG8eOHXOuW15ezrx581iyZAkWi8W5P4qKilizZg0LFizgJz/5CXv37iUvL8/lGJfug6SkJPLy8mhsbATg+PHjzjBuy+myM9hsNuLj+gIQGdGD1175X4SGhmC1WtmxazcD+sfj5uYGQFzfWEpKT3f4DFZEbgyFXRteeuklFi1axKJFi/jd734HwIABAygtLaWuro7CwkLCw8MJCgoiNzeXkJAQEhMT8fLyIiUlpcUHf0JCAjExMfj5+REbG0t5eXm71Dh06FBMJhMRERF0796d5ORk/Pz86N27NwEBAVRWVlJbW8uJEyeYNGkSnp6ejB07luLiYhoaGlz2OXDgQNzc3IiNjaV79+4UFBQ420aPHo2/vz9ubm7s27ePiRMnEhAQQLdu3UhJSSErK8u5bu/evQkMDMTDw4Nx48Y598e+fftITk4mPDwck8lEcnJyi3313TEuFRYWRmhoKDk5OVRXV3P+/Hn69et3xf3U1NQEgLeXd6u2N3//NqdOlfDUgiedy7x9vLHZbNgVdiKGoMOYbXjuuecIuuQSdC8vL+Lj4zl+/DjFxcUMHjwYALPZTElJCS+88IJzXV9fX5f9urm5YbPZ2r1ed3d3PDw8Wjx3OBxUV1fjcDj49a9/3WL9CxcuXLbGZl26dKGmpsZlW01NDV27dnU+79q1KwcOHHC5bmBgoLMfs9lMbm4uO3fuBC6en7ua2VmzpKQksrOzaWhoIC4uDi+vK19EEvJfdZadOUNcl1jn8qJTxRzJPsYvlvyMwMAuzuVlZWcIMpnwcNf3QREjUNhdh8GDB3Po0CFOnz7NggULgIsf5n379mX+/PmdXF1rgYGBeHl5sXTpUtyv8cO7qqqKLl26uGwLCgqiqqrKeZVkZWXlZdf9bpvJZGLixIlMmDDhmmpplpiYyDvvvIPFYrnqkAwLCyWiR3cyd+wiru+/wi44yMR/W/gkffrEOJc5HA527NxN4qCE66pPRG4++tp6ssdB5QAABd1JREFUHRISEjhx4gS+vr7OmU18fDwlJSVkZ2djt9vJzc1l69atV+zL29ubmpqaFufFLm0vLy//XjPBLl26EBUVRUZGBjabjbNnz7Jy5crLno/at28fVquVI0eOcP78eWJjY12ud+edd7Jx40bq6+uprKzkm2++YejQoc72wsJCzp49S0NDA1u2bCEh4WJ4DB48mMzMTM6cOUNTUxPbt28nNzf3svVfug8iIyPx9fXl2LFjzj6vRtqUyWzdtp3jOf8a61RxKX98513nYU6AjPX/5FRxCVMmT7rqvkXk5qaZXRteeumlFs9nzJjBqFGj8Pb2Ji4ujqioKGebv78/jz/+OKtWreKTTz4hPDycmTNnXnGM6OhoAgMDefnll3nxxRdbtY8YMYKVK1dis9lITU297m1JT0/nH//4B7/61a/w8/Nj8uTJLs+JwcXDk6+88srFy/LT0/Hz83O53rhx46itreW3v/0tDoeDcePGMWTIEGd7165dWbFiBWVlZcTHxztncn379mXixIm899571NbWEhcX1+ZtHa72QWJiIkVFRZedSbpy15hRHDlylDffept/e3wuI4cnM2jgAN5f9kcArFYrX65Zx+erv+KR9FlERkZcdd8icnPTTeXSwtKlS3n66acJDw//Xv1kZmZSWFjInDlz2qmyljZs2IC3t/c1fwGw2e18snwFGzdtIapXTwYNTMDf35/KykoOHDpMTU0Nj6T/kPHj7uqQukWkc2hmJ7cUu91OZWUlWVlZPPXUU9f8eg93d+Y+/EPG3TWGr7dnknsij/qGBgIDu5Aydgzjx91F167BHVC5iHQmhZ3cUoqKili2bBmTJk1qcSXoteod1Yu5D7d9I7qIGIcOY4qIiOHpakwRETE8hZ2IiBiewk5ERAxPF6hchsXSSHlFFbV1rm/2FhEBCPD3IzQkGB+f1r+7KjcPhZ0LFksjRcWn6RYWQmTE97vfTES+vxP53xLfN7qzy3CpynyBouLT9O4VocC7iekwpgvlFVV0CwshOCiws0sRkZtccFAg3cJCKK+o6uxSpA0KOxdq6+oVdCJy1YKDAnXK4yansBMREcNT2ImIiOEp7ERExPAUdiIiYngKOxERMTyFnYiIGJ7CTkREDE9hJyIihqewExERw1PYiYiI4SnsRETE8PSvHoiI4axYsYKmpibn81mzZuHp6UlDQwMrV650Lvfx8WHGjBkA7Nmzh7y8PGfb8OHDiYuLu3FFS4dS2ImI4VgsFhobG1stdzgcNDQ0uHyN1Wpt0Waz2TqsPrnxFHYiYgjV1dXU1dUBF0Ptu86cOYOHhwcWi6XFcrvdTllZGYDztc3MZrOzLTw8HHd3nfW5lSnsRMQQDh8+TG5ursu29evXu1ze2NjI2rVrXbZlZ2eTnZ0NwCOPPIKPj0/7FCqdQl9VROS2ExgYiKenvuvfThR2InLbSU1NJSwsrLPLkBtIYSciIoanebyI3BY8PDxaPW9eZrfbW13UIsaisBOR28K0adOcF5n4+Pgwfvx45+0Fu3fvpqCgoDPLkw6msBOR28J3byafOnUqWVlZzlsLxPh0zk5ERAxPYSciIoanw5gicttZu3Ytdru9s8uQG0hhJyKG0L9/fyIjIzukby8vrw7pV24chZ2IGEJYWJhuFJfL0jk7ERExPIWdiIgYnsJOREQMT2EnIiKGp7ATERHDU9iJiIjhKexERMTwFHYuBPj7UWW+0NlliMgtosp8gQB/v84uQ9qgsHMhNCSYc+crFHgickVV5gucO19BaEhwZ5cibdAvqLjg4+NN714RlFdUce58RWeXIyLAifxvO7sElwL8/ejdKwIfH+/OLkXaoLC7DB8fbyIjwju7DBERaQc6jCkiIoansBMREcNT2ImIiOEp7ERExPAUdiIiYngKOxERMTyFnYiIGJ7CTkREDE9hJyIihqewExERw1PYiYiI4SnsRETE8BR2IiJieAo7ERExPIWdiIgYnsJOREQMT2EnIiKG9/8BVCP5Oaebrs0AAAAASUVORK5CYII=) This section offers additional configuration options on the Braze event object: ### Event Name Override You can use this option to override the name of the Braze event object or leave it blank to inherit from common event properties, which is the default. Please note that the `name` property of the [Braze event object](https://www.braze.com/docs/api/objects_filters/event_object/#event-object), that this option populates, is a required property according to Braze API. ### Event time property This option enables you to specify the client event property to populate the event time (in ISO-8601 format) or leave it empty to use the current time (default behavior). ## Logs Settings Through the Logs Settings you can control the logging behavior of the Braze Tag. The available options are: - `Do not log`: This option allows you to completely disable logging. No logs will be generated by the Tag. - `Log to console during debug and preview`: This option enables logging only in debug and preview containers. This is the **default** option. - `Always`: This option enables logging regardless of container mode. > **Note:** Please take into consideration that the logs generated may contain event data. The logs generated by the Braze GTM SS Tag are standardized JSON strings. The standard log properties are: ```json { "Name": "Braze", // the name of the tag "Type": "Message", // the type of log (one of "Message", "Request", "Response") "TraceId": "xxx", // the "trace-id" header if exists "EventName": "xxx" // the name of the event the tag fired at } ``` Depending on the type of log, additional properties are logged: | Type of log | Additional information | | ----------- | -------------------------------------------------------------- | | Message | “Message” | | Request | “RequestMethod”, “RequestUrl”, “RequestHeaders”, “RequestBody” | | Response | “ResponseStatusCode”, “ResponseHeaders”, “ResponseBody” | --- # Braze Tag for GTM Server Side > Forward Snowplow events to Braze from GTM Server Side using the Braze Tag with Track Users API for personalization and marketing automation. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/braze-tag-for-gtm-ss/ The Braze Tag for GTM SS allows events to be forwarded to [Braze](https://www.braze.com/). This Tag works best with the [Snowplow Client](/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-client-for-gtm-ss/), but can also work with other GTM SS Clients such as GAv4. ## Template installation > **Note:** The server Docker image must be 2.0.0 or later. ### Tag Manager Gallery 1. From the Templates tab in GTM Server Side, click "Search Gallery" in the Tag Templates section 2. Search for "Braze" and select the official "By Snowplow" tag 3. Click Add to Workspace 4. Accept the permissions dialog by clicking "Add" ### Manual Installation 1. Download [the template file](https://github.com/snowplow/snowplow-gtm-server-side-braze-tag/blob/main/template.tpl) `template.tpl` – Ctrl+S (Win) or Cmd+S (Mac) to save the file, or right click the link on this page and select "Save Link As…" 2. Create a new Tag in the Templates section of a Google Tag Manager Server container 3. Click the More Actions menu, in the top right hand corner, and select Import 4. Import `template.tpl` downloaded in Step 1 5. Click Save ## Braze Tag Setup With the template installed, you can now add the Braze Tag to your GTM SS Container. 1. From the Tag tab, select “New”, then select the Braze Tag as your Tag Configuration 2. Select your desired Trigger for the events you wish to forward to Braze. 3. Enter the required parameters and then optionally 4. [Configure the tag](/docs/destinations/forwarding-events/google-tag-manager-server-side/braze-tag-for-gtm-ss/braze-tag-configuration/) to customize your Braze Tag. 5. Click Save --- # Configure HTTP Request Tag for GTM Server Side > Configure JSON request body construction, entity mapping, custom headers, and post-processing for the HTTP Request Tag in GTM Server Side. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/http-request-tag-for-gtm-ss/http-request-tag-configuration/ In the following short video a complete example configuration of the Snowplow GTM SS HTTP Request Tag is presented. Scenario: The example assumes that we want to send a POST HTTP Request to an example custom destination endpoint, where we would like the body of the request to have the following structure: ```json { "api-key": "myApiKey", "user_identifier": ... "event_data": { ... }, "user_data": { ... } } ``` where, for this example: - Our endpoint expects the `api-key` inside the request body. - As our `user_identifier` we want to map the value of the `client_id` from the client event. - Inside `event_data` we want to include: - the common event data - the Self-Describing event data - the performance timing data from the Snowplow [Performance Timing Context](https://github.com/snowplow/iglu-central/blob/master/schemas/org.w3/PerformanceTiming/jsonschema/1-0-0)), with `performance_timing` as the property name - the page view id from the Snowplow [web\_page context](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0), with `page_view_id` as the property name. - Under `user_data` we want to map the `user_data` from the client event. You can read on below for more details on each configuration option. ## Configuration options ### Destination URL (required) Set this to the URL of your custom HTTP endpoint. ### Wrap the request body inside an array By default, the JSON body is an object. For example: ```json { "myProperty": "myValue" } ``` This option allows you to wrap the resulting object of the request body inside an array: ```json [{ "myProperty": "myValue" }] ``` ### Include all event data in the request body This option allows you to relay the full client event into the body of the request. Enabling this option, consequently disables both the Snowplow and the Additional Event Mapping Options, which allow to cherry-pick event properties and customize the request body. ### Use alternative separator to dot notation Enable this option to use an alternative separator to dot notation when specifying possibly nested object paths. This setting **applies everywhere dot notation can be used** and it is useful when you want to allow dots or special characters in key names. Enabling this option reveals the text-box where you can specify the character you wish to use to denote nested paths. #### Example Let's imagine this property name: `user_data.address.city` This is by default being interpreted as a nested path, where the dot character denotes a change in nesting level: ```json "user_data": { "address": { "city": "Foobar City" } } ``` If you wish it to denote a different nesting path where a key name may include a dot, for example: ```json "user_data.address": { "city": "Foobar City" } ``` then you can use this setting to set a different separator for nesting. As an example, if you set: ![first alternative separator example](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZwAAAByCAYAAABneGd/AAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AABpySURBVHic7d15WFNn2gbwmzUSFlGhClRpwqJUFAvFKqIWRcUNW5e6daHiYGtbdUangFu134BLhbbTOlpq3abuiK2Oa91aa11QBwVEEawgagSBsIQlJHm+PxxOjUQIVRLE53dduTTvOec9T96ccJ+8ORATeWkZ4X+I7v9Xo9EI/xIRNBoN1CoV1BoNVCqVcKuurkZNTQ38/XzBGGOM1cfU2AUwxhh7NnDgMMYYMwgOHMYYYwbBgcMYY8wgOHAYY4wZhLmxC2isbclViNpZCqWKGl5ZB0tzEywdY4fx/q2ecGWMMcbq89S9w3mcsAEApYoQtbP0CVbEGGNMH09d4DxO2DzJPhhjjDXOUxc4jDHGnk7PROAEupkYuwTGGHvmtfjACXQzwbQ+ZrgQafGn+6ioqIBUKsXp06e12rOzsyGVSpGdnf24Zert8uXLkEqlKC4uNtg+GyM7OxsBAQG4e/eusUthj9C3b19s27bN2GU8EcY+3m7fkeHkqdM4fOQ4ziafh7ykxCh1PC1adODUhk1XJxPsTtXAya7lvdPZs2cP/Pz8ms3+O3TogPDwcLRt29ZoNRnT/PnzMX369CfWX0FBAaRSKTIzM59Yn43R1MdXY8eruRxvOTm5+MeSzzB3wWL8+/ut2HfwEBK+W4+/zonGqm++4+B5hKfusmh9PRw23/yqwZ1SvljgQWq1GmZmZk+0T2tra4SHhz/RPpuLphivB2k0GpiatuhzwCfOGMdbysVLWLl6DTzcpJgf/Xe4SSUwMTGBSqXCxUtp2LHzByz+v6WInDMLHTq0N2htzd1Tf3S7OdR912KssLl8+TLGjRuHrl27IigoCImJiVrL165di4CAAPj6+mLWrFmQy+U6+ykvL8fcuXPh6+uLHj16YObMmSgtrXsp97Rp0zBz5kwUFxdDKpXi6NGjAIDc3Fy8/fbb8Pb2xuDBg3HgwAFhm9DQUMyZMwdDhw7F+PHjhenCbdu2YejQoejatSveeecdFBUVCdscO3YMI0aMQJcuXRAcHCz0p2v/D075JSUloWfPnsJfHweA6dOnY968eQCAyspKzJ07Fz4+PujTpw++/PJLqNVqnWPyn//8B8HBwejatStGjx6NlJQUYVl9/cTGxiIsLAwffvghevTogYCAACQlJek11rVjExMTg4CAAMTFxdW7vo+PDzZv3owDBw5AKpWioqICAHD8+HEMGzYMXbp0QUhICI4fPy7sPzY2FuPHj8eUKVPg5eWF6upqYdm6devwyiuvAABCQkIQGxsr1BUdHY0ePXrAx8cHkZGRwr502bBhA/z9/dGzZ08sW7aszhg/qr5HHV8Pio2NxXvvvYdFixahe/fu6N27t9ZxX1+tjxqvP3u81UpISECfPn3g5eWFt956C7///ruwLDQ0FHFxcZgyZQpefPFFhISE4NKlS48cu4fdkcmwKmEtAgN64e+zZ8LKygr/WPIZ3vvwr/h61bdwc5Pgk/lRaNeuLT7/6l9QKmv07vuZIC8to9pbcUkpFZeUUmGxnAqL5VRQWET59wpJll9At27fody8W3T9Rg5lZmXT5StX6b8XL9HZc+fJkJxny4SbLL+Qvj9RQH2X3BXa3vhXPh25eI9k+YWUcKSA/Bbf1dqm9tYYCoWCJBIJnTp1Sqs9KyuLJBIJZWVlERFRQEAAffrpp5Sbm0vbt28nDw8PSklJISKijRs3Ut++fen06dOUmZlJEydOpBkzZujc38yZM2nMmDF09epVyszMpJEjR9L8+fOJiCg9PZ0kEgkVFRVRVVUVJSYmkq+vLykUClKpVKRQKCgwMJCWLFlCN27coMTERPL09KTMzEwiIho5ciT5+/vT4cOHKScnR3hsQUFB9Ntvv9HZs2cpMDCQYmNjiYjo2rVr5O7uTlu3bqXbt2/Txo0bydPTk27fvq1z/w/WV1ZWRp07d6Zz584REZFSqSRvb2/69ddfiYho+vTp9Pbbb9PVq1fp1KlT1Lt3b9q0aVOd8cjOziYPDw/64YcfKCcnhxYuXEi+vr5UVVXVYD8xMTHk5eVFx44dI4VCQevXryc3Nze6ceNGg2NdOzZjx46l8+fPU35+foPrR0ZG0rRp00ihUBARUUpKCnl4eNC3335L2dnZtGrVKvLw8KBr164J9bm5udGGDRsoIyODNBqN8LhramooJyeHJBIJXbx4kZRKJRERTZ06lUaMGEEpKSl04cIFCgkJoVmzZuk8ls6ePUtubm6UkJBAGRkZFBsbSxKJhLZu3dpgfbqe34fV1r98+XK6cuUKLV++nDp37kxFRUUN1qprvB7neCMiWrNmDb300kt06NAhyszMpBkzZlD//v2FsRs5ciR5eXnRDz/8QOnp6RQWFkahoaE6x06Xf65cTQs/jSW1Wk01NTU0J2o+xSxdQcd+PkEfz11In8V/SUREJSUl9N6Hs2jf/kN69/0seGoDZ8QXd2lP8j2t0NE3bJoicBQKBUmlUjp+/LiwfNeuXXT16lUiIurbty/t3r1bWHbp0iVyd3enmpqaOvvLzMykgoIC4f53331HQ4cOJSKq8wLbvXs3+fr6Cuvu2LGDgoODtfqbMmUKxcfHE9H9F9yXX35Z57E9WPfy5ctp8uTJREQkl8spLS1Nq79u3brRwYMHde7/4fqmTp1KS5YsISKi48ePk6+vL6lUKrp58yZJpVK6e/eusG1CQgK98cYbdcbj6NGj5OnpSaWlpULNGzZsILlc3mA/MTEx9M4772j1N2rUKIqLi2twrGvH5syZM8Ly+tYnIpo3bx69//77wv2ZM2fStGnTtPYfHh5OH3/8sVDfpEmT6jzmWvn5+SSRSITj6Pr16ySRSOjy5cvCOqmpqSSVSikvL6/O9jNnzqSpU6dqtfXq1UsInIbqe/j5fVhMTAy9/vrrwv2qqiqSSqWUnJysV60Pj9fjHG8ajYZ69epFa9eu1arHz8+PkpKSiOj+8b9s2TJh+c8//0weHh6kVqsf+RhrVVZW0pSID+j02WQiIrp1+w79PWoB3btXSEREyecu0JSID4SThk1bttPCxTEN9vsseWo/w7lwk/DNSTUAMwR3vj8z6GRnYrTPbMRiMSIiIvD+++/j1VdfxYABAzBs2DCIxWKUlpYiLy8PkZGRiI6OBnD/y+7UajXu3LmDjh07avXl6OiI+Ph4/PLLLygqKoJKpYKzs7NedWRkZODGjRvw9vYW2pRKJdq0aSPct7S0rLOdnZ2d8H8rKytheqN169Y4d+4cFi1ahOvXr0OlUkGhUGhN/dRn+PDh+Oc//4moqCgcOXIEgwcPhpmZGS5fvgwiwoABA4R1VSoVHBwc6vQREBAAPz8/BAUFYciQIQgODsabb74JU1NTnDlzRu9+anl5eSE3NxeAfmNtYfHHFY6NfW4yMjIwduxYrTZ/f3+tac4H+29IRkYGxGIxvLy8hDZvb2+0atUK165dg4uLi9b6N27cQHBwsFabufkfL3t96mvIg8eOSCSCqakpFAoF8vPzG1Ur8HjHW3FxMe7evYuXX35Zq57u3btrXXTxYL1isRgqlQo1NTUQiUT19n9HdhdqtRoe7m4AAGenDli+5FMA94+5U2fOoktnD5iY3J/md3eT4ujxX0BEQtuz7qkNHEB36DRF2FhYWMDU1BQ1NdrzsUqlEgCEAzUyMhITJkzA4cOHsWnTJqxYsQLbt2+Hvb09ACA+Pl7rxQcATk5OdfYXFRUFlUqFLVu2wMnJCRs3bsTGjRv1rtfPzw/Lli3TarO2ttZ7+welpaVhxowZiI+Px4ABA2BhYdGoq5YGDRqE6OhoXLt2DYcPH8bSpUuFZZaWlti7d6/W+ro+lBeJRNi8eTMuXLiAY8eOYcGCBXBxccGmTZsa1U+tB5/Hxo51Y9e3tLTU+gEP3L84oPbYaSxLS8s6FxbQ/76VV1efJiYmOk8wmqq+x6kVeLzjrfZxPvzcq9XqJ/J4ao8bS4u64/nFV/9Cfv49LJj78R/1iCyhVquhIYIZBw6AFnDRQG3oJOdQk72zsbCwQMeOHZGWlqbVnpaWBisrKzg7OyMrKwtxcXHo1KkTwsPDkZSUhNatW2P37t2ws7ODo6Mj8vLy4OrqKtwcHBzqvNgB4OTJkxg/frwQRvV9IPzwmZObmxuysrLQvn17YT+Ojo71nvHX58yZM5BIJBgyZAgsLCygUqm0XrwNnblZW1ujX79++Pzzz1FVVYWAgAAAgLu7O5RKJcrKyoQ6n3/+eZ11Hjt2DNu3b4evry9mz56NH3/8EcnJyUhJSdGrnwcvWgCA1NRUuLq6AmjcWOuz/sPj4eHhgQsXLmi1nTt3Dl26dKl3P/X1V15ejqysLKEtLS0N1dXVOvuUSCS4evWqVhvRH6+Phup7nDNzfWp9uP/HOd5sbGzQoUMH/Pe//xXaqqurkZaWpvd416ft/2YJZA/9zk/uzTykpWcgYmoYbG1thHaZ7C5a29nBjK88FLSIkagNnaacRps6dSpWrVqFbdu24cqVK9izZw+WL1+O8PBwmJqawsbGBmvWrMEXX3yBmzdv4uTJk5DJZJBKpQCAiIgIfPXVV9izZw9u3ryJlStXYuzYsVov/loSiQT//ve/kZKSgh07dmD16tWPPENr06YN5HI5Tp48Cblcjtdeew0WFhb429/+hszMTKSkpOD111/H/v37/9TjfuGFF5CZmYmkpCScP38eH330ERQKhVDPw/vXZfjw4Thw4AAGDRokBKxUKsXAgQMxe/ZsJCcnIzs7GzNmzMCSJUvqbK9UKvHJJ59g7969uHXrFvbu3Qtzc3N07NhRr35Onz6NxMRE5ObmIi4uDtevX8eYMWMaPdb6rG9vb4+MjAykpqZCpVLhL3/5Cw4ePIh169bh999/R0JCAk6cOIF3331Xr/G3s7ODqakpjh49CplMBldXV4SEhGDOnDm4dOkSLl68iKioKAwePBidOnWqs/2kSZOwb98+HDlyBJWVldi0aRNu374tLG+oPn2e30fRp9aHx+txj7f33nsPn3/+OY4cOYKsrCxERUVBJBJh+PDhjapdFweHdnDq0B6/nTqj1W7f2g7vR4RDInlBaCMinDp9Ft5dtWc0nnUtInCA+6HTlJ/ZTJo0CVFRUVi7di1GjRqF+Ph4hIeHY9asWQDu/wLamjVrcPToUQwaNAiRkZGIiIjAsGHDAADvvvsuwsPDERMTg0GDBuH48eOIi4vTecb22WefoaysDG+++SZ27dqFWbNmQSwW6/xB+Morr6B3796IiIjA+fPnIRaLsX79esjlcoSGhmLatGkYMmQIBg8e/Kce98CBAxEREYGYmBh89NFH6N69O4KCglBWVqZz/7oEBwejVatWGDp0qFb7ihUr0LVrV4SHh+O1116Dubm5MJ4PGjJkCD7++GMsW7YMAwcOxObNm7Fy5Uq0b99er366d++OU6dOYdSoUdi5c6fwTrSxY63P+mPGjIGJiQkmT56MsrIyeHl5YeXKldiyZQtCQkLw448/4ttvv9X6jK0+lpaWmDZtGr7++mts2LABALBs2TK4u7tj8uTJCAsLg4+PD+Lj43Vu//LLLyMyMhKRkZEIDAxEWloa3N3dheUN1afP81ufhmp9eLwe93h76623EBYWhnnz5iE0NBRFRUXYsmULxGJxo2vXZfiwEPx84ldcufrHZ0I3825jVcJ3WlO1Bw4dxs28WxgW8udedy2Viby0TPgpXXu2XTsFodFohDlXtUoFtUYDlUol3Kqrq1FTUwN/P1+DFSyJyn/sv/ZsaW6C35c+94QqYs1ZbGwsMjMzsX79emOXwlqI1QlrkXLxEt4NexOv+L+stUylUuE/+w7ixz17MWnCOAwaGGSkKpunp+6igaVj7J7IF7Axxtif8ZepYdiyLRGrE9Zi776D6PqiF8RiMYqLi5FyKRXl5eV4561JeLVfoLFLbXaeusAZ79+Kv62TMWY0ZqameHPiG+gXGIBffv0NmdeyUFlVBVtbG/TtE4BX+wWiTRt7Y5fZLD11U2qMMcaeTi3mogHGGGPNGwcOY4wxg+DAYYwxZhAcOIwxxgyCA4cxxphBcOAwxhgzCA4cxhhjBsGBwxhjzCA4cBhjjBkEBw5jjDGD4MBhjDFmEBw4jDHGDIIDhzHGmEFw4DDGGDOIp+77cIwlOzsb+/btg0wmg62tLfr164eAgABjlwUAOHPmDKRSKRwdHY1dCmOMPRIHjh4qKiqwfv16jB07Fi+++CLy8/OxYcMG2Nraolu3bk26byKCiYlJveucOXMGdnZ2jQqc2u8+aqhvxhh7Ujhw9FBYWAhTU1P4+PgAAFxcXBASEgKFQgHg/hfV7d27F8nJybCxscHIkSPh5eUFmUyGtWvXwtXVFTk5ObC1tcWECROEYMjOzsbOnTuhUCggkUgwYcIEtGrVCjKZDAkJCejUqRPu3LmD6OjoR667dOlSyOVybN26FUOGDEFAQABycnKwa9cuFBYWwtXVFWPHjoW9/f1vIFy0aBG6dOmC9PR0zJo1C+3atTPOoDLGnjn8GY4enJycYGNjg8TERBQVFQEAfH190atXLwDAL7/8glu3biE6Ohrjx4/Htm3bUFlZCQAoKytDnz59MHfuXHh5eWHHjh0AgPLycqxfvx6TJ0/GggULoFQqceLECWGfCoUC3bp1w+zZs+tdNyoqCs7OzpgwYQICAgJQWVmJdevWISgoCJ988glcXV2xceNG4R0NAFhZWSE6Ohpt27Y1yPgxxhjAgaMXc3NzfPDBB2jVqhW+/vprrFy5EleuXBGWnz9/HkFBQbCysoKrqytcXFxw/fp1APd/uL/wwgsAgP79+yMnJwdVVVUQiUT48MMP4eLiAnNzc3Tu3BkFBQVCn9bW1vDz84OlpWWD6z7oypUrcHR0hI+PD8zNzTFw4EAUFxcjPz9fWKd3794Qi8U8ncYYMyieUtOTWCzGiBEjMGzYMKSnp2P79u0YPXo0vL29UVpaiu+//174Aa7RaNC9e/c601UWFhZo1aoVysvL4eDggNu3byMxMRFKpVKYKtPFwsJC73XLy8vRpk0b4b6pqSlat24NuVyO9u3bP6HRYIyxxuPA0cOlS5dQXFyM/v37w9TUFN26dcOtW7eQnp4Ob29v2NraYty4cXB1ddXaTiaTad2vqqpCVVUVbGxscOvWLRw4cADTp09H69atceLECeTm5urcf2PWtbe3R2pqqnBfo9GgpKQENjY2jzkKjDH2eHhKTQ9t2rTBkSNHcPnyZdTU1KCoqAhXr16Fk5MTAMDHxweHDh1CRUUFKioqsGfPHpSUlAC4f4Vbeno61Go1fvrpJ7i6ugrvcogIRISCggJcvHgRGo1G5/4bWtfS0hKFhYVQq9Xw9PTEvXv3kJaWBrVajZ9//hk2NjZwdnZu+oFijLF68DscPXTs2BETJ07ETz/9hO+//x7W1tZ46aWXEBgYCAAICgrCvn37sGLFCmg0Gvj7+8POzg6VlZUQiURITU3F9u3b0a5dO0ycOBEA4OnpCU9PT8TFxaFNmzbo3Lkz7t27p3P/Da3bs2dP7Nq1C2q1Gv3798eUKVOQlJSEbdu2wcXFBWFhYfx5DWPM6EzkpWXC5Uu1VzLVnj1rNBoQETQaDdQqFdQaDVQqlXCrrq5GTU0N/P18jVN9M1d7efPChQuNXQpjjBkdT6kxxhgzCA4cxhhjBsFTaowxxgyC3+EwxhgzCA4cxhhjBsGBwxhjzCD493D0VF2tRGGRHIqKSmOXwhhrxqzFVmjX1h4ikaWxS2l2OHD0UF2tRG7eHTg6tIWz03PGLoexZ9617Bx4uLk2vKIRyEvKkJt3B52ed+LQeQhPqemhsEgOR4e2sG9ta+xSGGPNnH1rWzg6tEVhkdzYpTQ7HDh6UFRUctgwxvRm39qWp9914MBhjDFmEBw4jDHGDIIDhzHGmEFw4DDGGDMIDhzGGGMGwYHDGGPMIDhwGGOMGQQHDmOMMYPgwGGMMWYQHDiMMcYMggOHMcaYQfBfi2aMtXiJiYmoqakR7o8bNw7m5uaoqqrCrl27hHaRSITRo0cDAJKTk5GVlSUs8/f3h7u7u+GKboE4cBhjLV51dTWUSmWddiJCVVWVzm1UKpXWMrVa3WT1PSs4cBhjLVJpaSkqKioA3A+WB929exdmZmaorq7WatdoNJDJZAAgbFurpKREWPbcc8/B1JQ/kWgsDhzGWIuUmpqKzMxMncsOHTqks12pVGL//v06l6WnpyM9PR0AMGnSJIhEoidT6DOEI7oJpaenY/DgwfDz88OWLVuE9m+++QaLFy82YmWMsQfZ2trC3JzPv5saB04TioyMxKBBg7B48WIsXboUq1evRmlpKfbt2wdfX19jl8cY+5/+/fvDwcHB2GW0eBzpTaSiogJ5eXn44IMPIBaL0bFjR0RERGD58uXo168fhg0bZuwSGWPMoDhwmohYLEZKSopw38fHB7/99htkMhlcXFyMWBljDADMzMzq3K9t02g0dS40YI+PA8eAzMzMOGwYayZCQ0OFD/5FIhFeffVV4dLns2fP4vr168Ysr0XiwGGMPZMe/IXPESNG4Ny5c8Jlz6xp8EUDjDHGDIIDhzHGmEHwlBpj7Jm3f/9+aDQaY5fR4nHgMMZapM6dO8PZ2blJ+rawsGiSfls6DhzGWIvk4ODAv8zZzPBnOIwxxgyCA4cxxphBcOAwxhgzCA4cxhhjBsGBwxhjzCA4cBhjjBkEBw5jjDGD4MDRg7XYCvKSMmOXwRh7SshLymAttjJ2Gc0OB44e2rW1R8G9Ig4dxliD5CVlKLhXhHZt7Y1dSrPDf2lADyKRJTo974TCIjkK7hUZuxzGGIBr2TnGLkEna7EVOj3vBJHI0tilNDscOHoSiSzh7PScsctgjLGnFk+pMcYYMwgOHMYYYwbBgcMYY8wgOHAYY4wZBAcOY4wxg+DAYYwxZhAcOIwxxgyCA4cxxphBcOAwxhgzCA4cxhhjBsGBwxhjzCA4cBhjjBkEBw5jjDGD4MBhjDFmEBw4jDHGDIIDhzHGmEFw4DDGGDOI/wcoyo0tIkmfFQAAAABJRU5ErkJggg==) then the tilde character (`~`) denotes nesting everywhere "dot notation" can be used. You can now denote the example path above as `user_data.address~city`. It is also possible to use more than one character as alternative separator, for example: ![second alternative separator example](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZsAAABxCAYAAAADMA6oAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AABouSURBVHic7d15WFNn2gbwGwTCLipUgSombFJRLBSriFoUFTdsXerWhYqDrW3VGZ0CbnX6DbhUaDutY4daResuYqt1rVtrrQvqoIAoghXcEATCEpaQ5Pn+cDglEiWop0F8fteVS/Kec97z5M0J98mbgzEqLa8gIkKdup81Go3wLxFBo9FArVJBrdFApVIJt5qaGtTW1sLfzxeMMcaYLsaGLoAxxljLx2HDGGNMdBw2jDHGRMdhwxhjTHQcNowxxkTHYcMYY0x0JoYu4FFsSalG1PYyKFXU+Mo6mJkYYckYW4z3N3/ClTHGGNPlqXxn8zhBAwBKFSFqe9kTrIgxxtjDPJVh8zhB8yT7YIwxpp+nMmwYY4w9XZ6ZsAl0NTJ0CYwx9sx6JsIm0NUI0/q0wrlI00fuo7KyEjKZDCdPntRqz8nJgUwmQ05OzuOWqbeLFy9CJpOhpKTkT9tnU+Tk5CAgIAB37twxdCnsAfr27YstW7YYuownwtDH263b+Th+4iQOHjqK0ylnIS8tNUgdzV2LD5u6oOnqaISdaRo42ra8dzi7du2Cn59fs9l/hw4dEB4ejrZt2xqsJkOaP38+pk+f/sT6KywshEwmQ1ZW1hPrsynEPr6aOl7N5XjLzc3DPxd/irkL/oHv1m/Gnv0HkPBtIv46Jxor//Mth859nspLn/V1f9D851cNbpfxhQH1qdVqtGrV6on2aWVlhfDw8CfaZ3MhxnjVp9FoYGzc4s8BnyhDHG+p5y9gxder4O4qw/zov8NVJoWRkRFUKhXOX0jHtu3f4x//twSRc2ahQ4f2f2ptzVWLOKpd7Ru+WzFU0Fy8eBHjxo1D165dERQUhKSkJK3lq1evRkBAAHx9fTFr1izI5XKd/VRUVGDu3Lnw9fVFjx49MHPmTJSVNbxce9q0aZg5cyZKSkogk8lw+PBhAEBeXh7eeusteHt7Y/Dgwdi3b5+wTWhoKObMmYOhQ4di/PjxwhThli1bMHToUHTt2hVvv/02iouLhW2OHDmCESNGoEuXLggODhb607X/+tN8ycnJ6Nmzp/CVFQAwffp0zJs3DwBQVVWFuXPnwsfHB3369MEXX3wBtVqtc0x+/PFHBAcHo2vXrhg9ejRSU1OFZQ/rJzY2FmFhYfjggw/Qo0cPBAQEIDk5Wa+xrhubmJgYBAQEIC4u7qHr+/j4YOPGjdi3bx9kMhkqKysBAEePHsWwYcPQpUsXhISE4OjRo8L+Y2NjMX78eEyZMgVeXl6oqakRlq1ZswYvv/wyACAkJASxsbFCXdHR0ejRowd8fHwQGRkp7EuXtWvXwt/fHz179sTSpUsbjPGD6nvQ8VVfbGws3n33XSxatAjdu3dH7969tY77h9X6oPF61OOtTkJCAvr06QMvLy+8+eab+P3334VloaGhiIuLw5QpU/DCCy8gJCQEFy5ceODY3e92fj5WJqxGYEAv/H32TFhYWOCfiz/Fux/8FV+t/AaurlJ8PD8K7dq1xWdf/htKZa3efbdopeUVJC8rF24lpWVUUlpGRSVyKiqRU2FRMRXcLaL8gkK6ees25d24SVev5VJWdg5dvHSZ/nv+Ap0+c5b+TE6z84VbfkERrT9WSH0X3xHaXv93AR06f5fyC4oo4VAh+f3jjtY2dbemUCgUJJVK6cSJE1rt2dnZJJVKKTs7m4iIAgIC6JNPPqG8vDzaunUrubu7U2pqKhERrVu3jvr27UsnT56krKwsmjhxIs2YMUPn/mbOnEljxoyhy5cvU1ZWFo0cOZLmz59PREQZGRkklUqpuLiYqqurKSkpiXx9fUmhUJBKpSKFQkGBgYG0ePFiunbtGiUlJZGHhwdlZWUREdHIkSPJ39+fDh48SLm5ucJjCwoKot9++41Onz5NgYGBFBsbS0REV65cITc3N9q8eTPdunWL1q1bRx4eHnTr1i2d+69fX3l5OXl6etKZM2eIiEipVJK3tzf9+uuvREQ0ffp0euutt+jy5ct04sQJ6t27N23YsKHBeOTk5JC7uzt9//33lJubSwsXLiRfX1+qrq5utJ+YmBjy8vKiI0eOkEKhoMTERHJ1daVr1641OtZ1YzN27Fg6e/YsFRQUNLp+ZGQkTZs2jRQKBRERpaamkru7O33zzTeUk5NDK1euJHd3d7py5YpQn6urK61du5YyMzNJo9EIj7u2tpZyc3NJKpXS+fPnSalUEhHR1KlTacSIEZSamkrnzp2jkJAQmjVrls5j6fTp0+Tq6koJCQmUmZlJsbGxJJVKafPmzY3Wp+v5vV9d/cuWLaNLly7RsmXLyNPTk4qLixutVdd4Pc7xRkS0atUqevHFF+nAgQOUlZVFM2bMoP79+wtjN3LkSPLy8qLvv/+eMjIyKCwsjEJDQ3WOnS7/WvE1LfwkltRqNdXW1tKcqPkUs2Q5Hfn5GH00dyF9Gv8FERGVlpbSux/Moj17D+jdd0v2VIfNiM/v0K6Uu1qBo2/QiBE2CoWCZDIZHT16VFi+Y8cOunz5MhER9e3bl3bu3Cksu3DhArm5uVFtbW2D/WVlZVFhYaFw/9tvv6WhQ4cSETV4ce3cuZN8fX2Fdbdt20bBwcFa/U2ZMoXi4+OJ6N6L7Ysvvmjw2OrXvWzZMpo8eTIREcnlckpPT9fqr1u3brR//36d+7+/vqlTp9LixYuJiOjo0aPk6+tLKpWKrl+/TjKZjO7cuSNsm5CQQK+//nqD8Th8+DB5eHhQWVmZUPPatWtJLpc32k9MTAy9/fbbWv2NGjWK4uLiGh3rurE5deqUsPxh6xMRzZs3j9577z3h/syZM2natGla+w8PD6ePPvpIqG/SpEkNHnOdgoICkkqlwnF09epVkkqldPHiRWGdtLQ0kslkdOPGjQbbz5w5k6ZOnarV1qtXLyFsGqvv/uf3fjExMfTaa68J96urq0kmk1FKSopetd4/Xo9zvGk0GurVqxetXr1aqx4/Pz9KTk4monvH/9KlS4XlP//8M7m7u5NarX7gY6xTVVVFUyLep5OnU4iI6Oat2/T3qAV0924RERGlnDlHUyLeF04YNmzaSgv/EdNov8+Cp/ozm3PXCf85rgbQCsGe92YEHW2NDPYZjaWlJSIiIvDee+/hlVdewYABAzBs2DBYWlqirKwMN27cQGRkJKKjowHc+1ZUtVqN27dvo2PHjlp9OTg4ID4+Hr/88guKi4uhUqng5OSkVx2ZmZm4du0avL29hTalUok2bdoI983MzBpsZ2trK/xsYWEhTGm0bt0aZ86cwaJFi3D16lWoVCooFAqt6Z6HGT58OP71r38hKioKhw4dwuDBg9GqVStcvHgRRIQBAwYI66pUKtjb2zfoIyAgAH5+fggKCsKQIUMQHByMN954A8bGxjh16pTe/dTx8vJCXl4eAP3G2tT0jysZm/rcZGZmYuzYsVpt/v7+WlOb9ftvTGZmJiwtLeHl5SW0eXt7w9zcHFeuXIGzs7PW+teuXUNwcLBWm4nJHy99feprTP1jRyKRwNjYGAqFAgUFBU2qFXi8462kpAR37tzBSy+9pFVP9+7dtS6wqF+vpaUlVCoVamtrIZFIHtr/7fw7UKvVcHdzBQA4OXbAssWfALh3zJ04dRpdPN1hZHRvat/NVYbDR38BEQltz6qnOmwA3YEjRtCYmprC2NgYtbXa869KpRIAhIM0MjISEyZMwMGDB7FhwwYsX74cW7duhZ2dHQAgPj5e64UHAI6Ojg32FxUVBZVKhU2bNsHR0RHr1q3DunXr9K7Xz88PS5cu1WqzsrLSe/v60tPTMWPGDMTHx2PAgAEwNTVt0tVJgwYNQnR0NK5cuYKDBw9iyZIlwjIzMzPs3r1ba31dH8BLJBJs3LgR586dw5EjR7BgwQI4Oztjw4YNTeqnTv3nsalj3dT1zczMtH65A/cuBKg7dprKzMyswUUE9L+vbtfVp5GRkc6TC7Hqe5xagcc73uoe5/3PvVqtfiKPp+64MTNtOJ6ff/lvFBTcxYK5H/1Rj8QMarUaGiK0esbDpkVcIFAXOCm5JNo7GlNTU3Ts2BHp6ela7enp6bCwsICTkxOys7MRFxeHTp06ITw8HMnJyWjdujV27twJW1tbODg44MaNG3BxcRFu9vb2DV7oAHD8+HGMHz9eCKKHffh7/xmTq6srsrOz0b59e2E/Dg4ODz3Tf5hTp05BKpViyJAhMDU1hUql0nrhNnbGZmVlhX79+uGzzz5DdXU1AgICAABubm5QKpUoLy8X6nz++ed11nnkyBFs3boVvr6+mD17Nn744QekpKQgNTVVr37qX6AAAGlpaXBxcQHQtLHWZ/37x8Pd3R3nzp3Tajtz5gy6dOny0P08rL+KigpkZ2cLbenp6aipqdHZp1QqxeXLl7XaiP54fTRW3+OcketT6/39P87xZm1tjQ4dOuC///2v0FZTU4P09HS9x/th2v5vdiD/vr/pybt+A+kZmYiYGgYbG2uhPT//Dlrb2qIVX2HYMsIG+CNwxJw6mzp1KlauXIktW7bg0qVL2LVrF5YtW4bw8HAYGxvD2toaq1atwueff47r16/j+PHjyM/Ph0wmAwBERETgyy+/xK5du3D9+nWsWLECY8eO1Xrh15FKpfjuu++QmpqKbdu24euvv37gmVmbNm0gl8tx/PhxyOVyvPrqqzA1NcXf/vY3ZGVlITU1Fa+99hr27t37SI+7c+fOyMrKQnJyMs6ePYsPP/wQCoVCqOf+/esyfPhw7Nu3D4MGDRLCVSaTYeDAgZg9ezZSUlKQk5ODGTNmYPHixQ22VyqV+Pjjj7F7927cvHkTu3fvhomJCTp27KhXPydPnkRSUhLy8vIQFxeHq1evYsyYMU0ea33Wt7OzQ2ZmJtLS0qBSqfCXv/wF+/fvx5o1a/D7778jISEBx44dwzvvvKPX+Nva2sLY2BiHDx9Gfn4+XFxcEBISgjlz5uDChQs4f/48oqKiMHjwYHTq1KnB9pMmTcKePXtw6NAhVFVVYcOGDbh165awvLH69Hl+H0SfWu8fr8c93t5991189tlnOHToELKzsxEVFQWJRILhw4c3qXZd7O3bwbFDe/x24pRWu11rW7wXEQ6ptLPQRkQ4cfI0vLtqz2Q8q1pM2AD3AkfMz2gmTZqEqKgorF69GqNGjUJ8fDzCw8Mxa9YsAPf+uGzVqlU4fPgwBg0ahMjISERERGDYsGEAgHfeeQfh4eGIiYnBoEGDcPToUcTFxek8U/v0009RXl6ON954Azt27MCsWbNgaWmp85fgyy+/jN69eyMiIgJnz56FpaUlEhMTIZfLERoaimnTpmHIkCEYPHjwIz3ugQMHIiIiAjExMfjwww/RvXt3BAUFoby8XOf+dQkODoa5uTmGDh2q1b58+XJ07doV4eHhePXVV2FiYiKMZ31DhgzBRx99hKVLl2LgwIHYuHEjVqxYgfbt2+vVT/fu3XHixAmMGjUK27dvF96BNnWs9Vl/zJgxMDIywuTJk1FeXg4vLy+sWLECmzZtQkhICH744Qd88803Wp+pPYyZmRmmTZuGr776CmvXrgUALF26FG5ubpg8eTLCwsLg4+OD+Ph4ndu/9NJLiIyMRGRkJAIDA5Geng43NzdheWP16fP8Pkxjtd4/Xo97vL355psICwvDvHnzEBoaiuLiYmzatAmWlpZNrl2X4cNC8POxX3Hp8h+fAV2/cQsrE77Vmp7dd+Agrt+4iWEhj/a6a2mMSssrqP6Zdd3PddMOGo1GmGNVq1RQazRQqVTCraamBrW1tfD38/3TipZGFTz2/9psZmKE35c894QqYs1ZbGwssrKykJiYaOhSWAvxdcJqpJ6/gHfC3sDL/i9pLVOpVPhxz378sGs3Jk0Yh0EDgwxUZfPyVF4gsGSM7RP58jTGGHsUf5kahk1bkvB1wmrs3rMfXV/wgqWlJUpKSpB6IQ0VFRV4+81JeKVfoKFLbTaeyrAZ72/O37LJGDOYVsbGeGPi6+gXGIBffv0NWVeyUVVdDRsba/TtE4BX+gWiTRs7Q5fZrDyV02iMMcaeLi3qAgHGGGPNE4cNY4wx0XHYMMYYEx2HDWOMMdFx2DDGGBMdhw1jjDHRcdgwxhgTHYcNY4wx0XHYMMYYEx2HDWOMMdFx2DDGGBMdhw1jjDHRcdgwxhgTHYcNY4wx0T2V32djKDk5OdizZw/y8/NhY2ODfv36ISAgwNBlAQBOnToFmUwGBwcHQ5fCGGMNcNjoqbKyEomJiRg7dixeeOEFFBQUYO3atbCxsUG3bt1E3TcRwcjI6KHrnDp1Cra2tk0Km7rvLmqsb8YYe1wcNnoqKiqCsbExfHx8AADOzs4ICQmBQqEAcO9L5nbv3o2UlBRYW1tj5MiR8PLyQn5+PlavXg0XFxfk5ubCxsYGEyZMEEIhJycH27dvh0KhgFQqxYQJE2Bubo78/HwkJCSgU6dOuH37NqKjox+47pIlSyCXy7F582YMGTIEAQEByM3NxY4dO1BUVAQXFxeMHTsWdnb3vjlw0aJF6NKlCzIyMjBr1iy0a9fOMIPKGHtm8Gc2enJ0dIS1tTWSkpJQXFwMAPD19UWvXr0AAL/88gtu3ryJ6OhojB8/Hlu2bEFVVRUAoLy8HH369MHcuXPh5eWFbdu2AQAqKiqQmJiIyZMnY8GCBVAqlTh27JiwT4VCgW7dumH27NkPXTcqKgpOTk6YMGECAgICUFVVhTVr1iAoKAgff/wxXFxcsG7dOtT/RlYLCwtER0ejbdu2f8r4McaebRw2ejIxMcH7778Pc3NzfPXVV1ixYgUuXbokLD979iyCgoJgYWEBFxcXODs74+rVqwDu/WLv3LkzAKB///7Izc1FdXU1JBIJPvjgAzg7O8PExASenp4oLCwU+rSysoKfnx/MzMwaXbe+S5cuwcHBAT4+PjAxMcHAgQNRUlKCgoICYZ3evXvD0tKSp9AYY38KnkZrAktLS4wYMQLDhg1DRkYGtm7ditGjR8Pb2xtlZWVYv3698Mtbo9Gge/fuDaaoTE1NYW5ujoqKCtjb2+PWrVtISkqCUqkUpsd0MTU11XvdiooKtGnTRrhvbGyM1q1bQy6Xo3379k9oNBhjTH8cNnq6cOECSkpK0L9/fxgbG6Nbt264efMmMjIy4O3tDRsbG4wbNw4uLi5a2+Xn52vdr66uRnV1NaytrXHz5k3s27cP06dPR+vWrXHs2DHk5eXp3H9T1rWzs0NaWppwX6PRoLS0FNbW1o85Cowx9mh4Gk1Pbdq0waFDh3Dx4kXU1taiuLgYly9fhqOjIwDAx8cHBw4cQGVlJSorK7Fr1y6UlpYCuHclW0ZGBtRqNX766Se4uLgI726ICESEwsJCnD9/HhqNRuf+G1vXzMwMRUVFUKvV8PDwwN27d5Geng61Wo2ff/4Z1tbWcHJyEn+gGGNMB35no6eOHTti4sSJ+Omnn7B+/XpYWVnhxRdfRGBgIAAgKCgIe/bswfLly6HRaODv7w9bW1tUVVVBIpEgLS0NW7duRbt27TBx4kQAgIeHBzw8PBAXF4c2bdrA09MTd+/e1bn/xtbt2bMnduzYAbVajf79+2PKlClITk7Gli1b4OzsjLCwMP58hjFmMEal5RVU/yqlup/rzpo1Gg2ICBqNBmqVCmqNBiqVSrjV1NSgtrYW/n6+BnkAzV3dJcwLFy40dCmMMWYwPI3GGGNMdBw2jDHGRMfTaIwxxkTH72wYY4yJjsOGMcaY6DhsGGOMiY7/zqYJamqUKCqWQ1FZZehSGGPNmJWlBdq1tYNEYmboUpoNDhs91dQokXfjNhzs28LJ8TlDl8PYM+9KTi7cXV0aX9EA5KXlyLtxG52ed+TA+R+eRtNTUbEcDvZtYdfaxtClMMaaObvWNnCwb4uiYrmhS2k2OGz0pKis4qBhjOnNrrUNT7nXw2HDGGNMdBw2jDHGRMdhwxhjTHQcNowxxkTHYcMYY0x0HDaMMcZEx2HDGGNMdBw2jDHGRMdhwxhjTHQcNowxxkTHYcMYY0x0/L8+M8aeCUlJSaitrRXujxs3DiYmJqiursaOHTuEdolEgtGjRwMAUlJSkJ2dLSzz9/eHm5vbn1d0C8Jhwxh7JtTU1ECpVDZoJyJUV1fr3EalUmktU6vVotXX0nHYMMZarLKyMlRWVgK4Fyr13blzB61atUJNTY1Wu0ajQX5+PgAI29YpLS0Vlj333HMwNuZPIvTFYcMYa7HS0tKQlZWlc9mBAwd0tiuVSuzdu1fnsoyMDGRkZAAAJk2aBIlE8mQKfQZwLDPGGAAbGxuYmPD5t1g4bBhjDED//v1hb29v6DJaLA4bxhhjouP3jIyxZ1arVq0a3K9r02g0DS4qYI+Ow0ZkiYmJ6N27Nzw9PVFZWYlVq1YhLCwMtra2hi6NsWdeaGio8CG/RCLBK6+8IlzefPr0aVy9etWQ5bUoHDYi27hxI8zNzeHp6YmCggKsX78egwcP5rBhrBmo/8ecI0aMwJkzZ4RLm9mTxWEjsvqXV3bu3BmnT582YDWMMWYYfIEAY4wx0fE7G8YYA7B3715oNBpDl9FicdgwxlosT09PODk5idK3qampKP22VBw2jLEWy97env9Qs5ngz2wYY4yJjsOGMcaY6DhsGGOMiY7DhjHGmOg4bBhjjImOw4YxxpjoOGwYY4yJjsNGT1aWFpCXlhu6DMbYU0JeWg4rSwtDl9FscNjoqV1bOxTeLebAYYw1Sl5ajsK7xWjX1s7QpTQb/D8I6EkiMUOn5x1RVCxH4d1iQ5fDGANwJSfX0CXoZGVpgU7PO0IiMTN0Kc0Gh00TSCRmcHJ8ztBlMMbYU4en0RhjjImOw4YxxpjoOGwYY4yJjsOGMcaY6DhsGGOMiY7DhjHGmOg4bBhjjImOw4YxxpjoOGwYY4yJjsOGMcaY6DhsGGOMiY7DhjHGmOg4bBhjjImOw4YxxpjoOGwYY4yJ7v8B5QR/rDHF1T0AAAAASUVORK5CYII=) Now, you can denote nesting by using 2 dots, and the example path above can be denoted as `user_data.address..city`. ## Snowplow Event Mapping Options This section includes the mapping rules that concern a Snowplow event as claimed by the [Snowplow Client](/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-client-for-gtm-ss/): ### Snowplow Atomic Properties Rules This option indicates if all Snowplow [atomic](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0) properties of the event should be included in the JSON body. By default this option is disabled. If enabled, an additional text field optionally allows you to specify a key under which those atomic properties will be nested. Leaving it blank adds those properties in the request body without nesting. Dot notation can also be used here. As an example, this section configured as: ![snowplow atomic properties rules](/assets/images/snowplow_atomic_nest-4975fe7246e9a491c69b3042e45b280c.png) will result in the following JSON structure: ```json { ..., "snowplow_atomic": { "app_id": "fooBar", "platform": "mobile", ... }, ... } ``` Please, note that some of the Snowplow atomic properties are already mapped to [common event properties](https://developers.google.com/tag-platform/tag-manager/server-side/common-event-data) by the [Snowplow Client](/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-client-for-gtm-ss/). ### Snowplow Self-Describing Event Rules This option indicates if the Snowplow Self-Describing data will be included in the request body and it is enabled by default. Similarly to the above section, you can also specify a key under which the self-describing data will be nested. Leaving it blank adds those properties in the request body without nesting. Dot notation can also be used here. As an example, this section configured as: ![snowplow self-describing event rules](/assets/images/snowplow_self_desc_no_nest-c7502209e3ea302074ec4427c13b5b6d.png) will result in the following JSON structure: ```json { ..., "self_describing_event_com_acme_test_foo_1": { "mySelfDescProp": "exampleValue", ... }, ... } ``` ### Snowplow Event Context Rules This section describes how the HTTP Request tag will use the context Entities attached to a Snowplow Event. ![snowplow event context rules](/assets/images/context_rules-9da59ffc2b9d28034d92e7d5a9013550.png) #### Extract entity from Array if single element Snowplow Entities are always in Arrays, as multiple of the same entity can be attached to an event. This option will pick the single element from the array if the array only contains a single element. #### Include Snowplow Entities in request body Using this drop-down menu you can specify whether you want to Include `All` (default) or `None` of the Snowplow context entities in HTTP Request's body. #### Nest all unmapped Entities under key This option is available only if the previous option ([Include Snowplow Entities in request body](#include-snowplow-entities-in-request-body)) is set to `All`. It applies **only** to unmapped entities, i.e. all included entities whose mapping is not edited in the following ([Snowplow Entities to Add/Edit mapping](#snowplow-entities-to-addedit-mapping)) table. With this setting you can specify a key under which the Snowplow event's unmapped entities will be nested. Alternatively, leaving it blank adds the unmapped entities in the request body without nesting. You can also use dot notation in this value. #### Snowplow Entities to Add/Edit mapping Using this table you can specify in each row a specific mapping for a particular context entity. In the columns provided you can specify: - **Entity Name**: The Entity name to add/edit-mapping (required).¹ - **Destination Mapped Name**: The key you could like to map it to in the request body (optional: leaving the mapped key blank keeps the same name). You can use dot notation here as well to signify further nesting. This value is independent of the [nesting of unmapped entities](#nest-all-unmapped-entities-under-key) setting above. - **Apply to all versions**: Whether you wish the mapping to apply to all versions of the entity (default value is `False`).¹ #### Snowplow Entities to Exclude Using this table (which is only available if [Include Snowplow Entities in request body](#include-snowplow-entities-in-request-body) is set to `All`), you can specify the context entities you want to exclude from the HTTP Request body. In its columns you can specify: - **Entity Name**: The Entity name (required).¹ - **Apply to all versions**: Whether the exclusion applies to all versions of the entity (default value is `False`).¹ > **Note:** ¹ How to specify the **Entity Name** and its relation to **Apply to all versions** option: > > Entity Names can be specified in 3 ways: > > 1. By their Iglu Schema tracking URI (e.g. `iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-0-2`) > > 2. By their enriched name (e.g. `contexts_com_snowplowanalytics_snowplow_client_session_1`) > > 3. By their key in the client event object, which is the GTM SS Snowplow prefix (`x-sp-`) followed by the enriched entity name (e.g. `x-sp-contexts_com_snowplowanalytics_snowplow_client_session_1`) > > Depending on the value set for the **Apply to all versions** column, the major version number from the 2nd and 3rd naming option above may be excluded. More specifically, this is only permitted if **Apply to all versions** is set to `True`. **pre-v0.2.0** #### Snowplow Event Context Rules This section describes how the HTTP Request tag will use the context Entities attached to a Snowplow Event. ##### Extract entity from Array if single element Snowplow Entities are always in Arrays, as multiple of the same entity can be attached to an event. This option will pick the single element from the array if the array only contains a single element. ##### Include all Entities in request body Leaving this option enabled (default) ensures that all Entities on an event will be included within the request data. Optionally, you can also specify a key under which the Snowplow event's contexts will be nested. Alternatively, leaving it blank adds all entities in the request body without nesting. Disabling this option, reveals the options so that individual entities can be selected for inclusion. Using the "Snowplow Entity Mapping" table, these entities can also be remapped to have different names in the JSON body of the request. The entity can be specified in two different formats: - Major version match: `x-sp-contexts_com_snowplowanalytics_snowplow_web_page_1` where `com_snowplowanalytics_snowplow` is the event vendor, `web_page` is the schema name and `1` is the Major version number. `x-sp-` can also be omitted from this if desired - Full schema match: `iglu:com.snowplowanalytics.snowplow/webPage/jsonschema/1-0-0` ##### Include unmapped entities in request body This option enables you to ensure that all unmapped entities (i.e. any entites not found in the "Snowplow Entity Mapping" rules above) will be included in the request body. Again, optionally, you can also specify a key under which the Snowplow event's unmapped entities will be nested. Alternatively, leaving it blank adds the unmapped entities in the request body without nesting. ## Additional Event Mapping Options If you wish to pick other properties from the Client event and map them into the request body, this can be specified in this section. ### Event Property Rules #### Include common event properties Enabled by default, this option sets whether to include the event properties from the [Common Event definition](https://developers.google.com/tag-platform/tag-manager/server-side/common-event-data) in the request body. Inclusion of the `user_data` property is not affected by this setting (see next option). Also, you can optionally specify a key under which the Common Event properties will be nested. Alternatively, leaving it blank adds the the common event properties in the request body without nesting. #### Include common user properties Disabled by default, this option sets whether to include the `user_data` properties from the common event definition in the request body. Again, you can optionally specify a key under which the `user_data` properties from the common event will be nested. Alternatively, leaving it blank adds the `user_data` in the request body without nesting. #### Additional Event Property Mapping Rules Using this table, you can additionally specify the Property Key from the Client Event, and then the key you could like to map it to (or leave the mapped key blank to keep the same name). You can use Key Path notation here (e.g. `x-sp-tp2.p` for a Snowplow events platform or `x-sp-contexts_com_snowplowanalytics_snowplow_web_page_1.0.id` for a Snowplow events page view id (note the array index 0) or pick non-Snowplow properties if using an alternative Client. ## Additional Request Data This section allows you to add custom properties in the request body that are "external" to the event, in other words it provides the ability to add custom constant or variable request data. ## Post-processing ![post processing](/assets/images/post_processing-706f515d594c968ce8cfbd595f0bbd6e.png) This section provides a way to easily configure some basic post-processing of values in the constructed HTTP request payload. The order of the subsections denotes the post-processing order. For more advanced use cases you can still use the **Additional Request Data** section above and provide values through GTM server-side variables. ### JSON Stringify In this table you can specify the property names or paths of the HTTP request payload whose values you want to transform into JSON strings. Dot notation can also be used to denote nested paths. ### Encode base64url In this table you can specify the property names or paths of the HTTP request payload whose values you want to encode to base64url. Encoding is only applied to string values. Dot notation can also be used to denote nested paths. ## Request Headers Similarly to the above, this section allows you to add custom headers to the HTTP request towards your custom endpoint. ## Additional Options Finally, this section offers two additional configuration options: - Changing the HTTP request method from POST (default) to PUT. - Changing the default request timeout (5000 seconds) ## Logs Settings Through the Logs Settings you can control the logging behavior of the HTTP Request Tag. The available options are: - `Do not log`: This option allows you to completely disable logging. No logs will be generated by the Tag. - `Log to console during debug and preview`: This option enables logging only in debug and preview containers. This is the **default** option. - `Always`: This option enables logging regardless of container mode. > **Note:** Please take into consideration that the logs generated may contain event data. The logs generated by the HTTP Request GTM SS Tag are standardized JSON strings. The standard log properties are: ```json { "Name": "HTTP Request", // the name of the tag "Type": "Message", // the type of log (one of "Message", "Request", "Response") "TraceId": "xxx", // the "trace-id" header if exists "EventName": "xxx" // the name of the event the tag fired at } ``` Depending on the type of log, additional properties are logged: | Type of log | Additional information | | ----------- | -------------------------------------------------------------- | | Message | “Message” | | Request | “RequestMethod”, “RequestUrl”, “RequestHeaders”, “RequestBody” | | Response | “ResponseStatusCode”, “ResponseHeaders”, “ResponseBody” | --- # HTTP Request Tag for GTM Server Side > Send Snowplow events to custom HTTP endpoints from GTM Server Side using the HTTP Request Tag with flexible request configuration and authentication options. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/http-request-tag-for-gtm-ss/ The HTTP Request Tag for GTM SS allows events to be forwarded to any JSON HTTP endpoint. This Tag works best with events from the Snowplow Client, but can also work with other GTM SS Clients such as GAv4. ## Template installation > **Note:** The server Docker image must be 2.0.0 or later. ### Tag Manager Gallery 1. From the Templates tab in GTM Server Side, click “Search Gallery” in the Tag Templates section 2. Search for “HTTP Request” and select the official “By Snowplow” tag 3. Click "Add to Workspace" 4. Accept the permissions dialog by clicking “Add” ### Manual Installation 1. Download `template.tpl` – Ctrl+S (Win) or Cmd+S (Mac) to save the file, or right click the link on this page and select "Save Link As…" 2. Create a new Tag in the Templates section of a Google Tag Manager Server container 3. Click the More Actions menu, in the top right hand corner, and select Import 4. Import `template.tpl` downloaded in Step 1 5. Click Save ## HTTP Request Tag Setup With the template installed, you can now add the HTTP Request Tag to your GTM SS Container. 1. From the Tag tab, select “New”, then select the HTTP Request Tag as your Tag Configuration 2. Select your desired Trigger for the events you wish to forward to your custom destination. 3. Enter your destination URL 4. [Configure the tag](/docs/destinations/forwarding-events/google-tag-manager-server-side/http-request-tag-for-gtm-ss/http-request-tag-configuration/) to construct the desired JSON request body 5. Click Save --- # Forward events with Google Tag Manager Server Side > Send Snowplow events to multiple destinations using Google Tag Manager Server Side with Snowplow-authored tags and vendor/community tags for flexible event routing. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/ To support sending events to adjacent destinations, Snowplow works with Google Tag Manager Server Side. There are both Snowplow Authored and Vendor/Community authored Tags which will allow event data to be forwarded to different destinations. Before reading this documentation, we recommend you become familiar with [the fundamentals of Server Side tagging](https://developers.google.com/tag-platform/tag-manager/server-side/intro). Taking an approach with Snowplow and GTM SS offers many additional flexibility and control: - Full visibility into all transformations on the data - Ability to evolve sophistication over time - All data remains in your private cloud until you choose to forward it - Ease of setup due to rich libraries of tags and familiar GTM UI ## Configuration Options GTM SS with Snowplow can be setup in two different configurations. ![](/assets/images/gtmssoptions2-b962d1cc761a54c53f1c793e66d8ceec.png) ### Destinations Hub (Post-pipeline) Use GTM SS to relay enriched events to destinations. Events are sent to GTM SS via Snowbridge after being processed by your Snowplow pipeline. - For Snowplow CDI, you can [request setup](https://console.snowplowanalytics.com/destinations) through Console - For Snowplow Self-Hosted, see [Snowbridge](/docs/api-reference/snowbridge/) > **Note:** Destinations Hub is the recommended way to set up GTM Server-Side because it allows you to take full advantage of the Snowplow pipeline, and forward validated and enriched data to downstream destinations. ### Server Side Tag Manager (Pre-pipeline) Use GTM SS to relay raw events before the Snowplow pipeline to destinations, including to your Snowplow pipeline. This is useful if your company uses GA but does not want to send data to Snowflake or Databricks via BigQuery. ### Principles for AWS deployment > **Info:** GTM SS **should** be deployed into a different account to the Snowplow sub-account to maintain full segmentation of the infrastructure that Snowplow manages from that which is managed by the Snowplow customer. > > It would be possible to set up a separate VPC within the Snowplow sub-account but it is discouraged. VPC peering would be required to keep the traffic private otherwise traffic would go over public internet. > > GTM SS as a Destination Hub is normally intended to be public facing, however as a Server Side Tag Manager it could use both public internet routes and VPC peering to fan out traffic via private routes. ## Deploying Google Tag Manager Server Side GTM SS must first be deployed before it can be used. This can easily achieved by deploying to GCP App Engine from directly within the GTM User Interface. Alternatively, docker images are available for other deployment options, such as on AWS or with Kubernetes. - [App Engine Setup Guide](https://developers.google.com/tag-platform/tag-manager/server-side/script-user-guide) - [Manual Setup](https://developers.google.com/tag-platform/tag-manager/server-side/manual-setup-guide) ## Snowplow Client To receive events in your GTM SS container, the Snowplow Client must be installed. This works for both events direct from the tracker, or enriched events from the pipeline. The Snowplow Client populates the common event data so many GTM SS tags will just work, however it also populates a set of additional properties to ensure the rich Snowplow event data is available to Tags which wish to take advantage of this, such as the Snowplow Authored Tags. ## Snowplow Tag If using GTM SS as a Server Side Tag Manager for Snowplow JavaScript Tracker events, you will want to ensure you forward these events to your Snowplow Collector. The Snowplow Tag will automatically forward any events the Snowplow Client receives once it has been configured with your Collector URL. It can also construct Snowplow events from other GTM SS Clients such as GAv4. ## Snowplow Authored Tags Snowplow have created a number of GTM SS Tags which work best with the Snowplow Client and make use of the rich data available from Snowplow tracker or Enriched events. See the tags for: - [Amplitude](/docs/destinations/forwarding-events/google-tag-manager-server-side/amplitude-tag-for-gtm-ss/) - [Braze](/docs/destinations/forwarding-events/google-tag-manager-server-side/braze-tag-for-gtm-ss/) - [Iterable](/docs/destinations/forwarding-events/google-tag-manager-server-side/iterable-tag-for-gtm-ss/) - [LaunchDarkly](/docs/destinations/forwarding-events/google-tag-manager-server-side/launchdarkly-tag-for-gtm-ss/) --- # Iterable Tag for GTM Server Side > Forward Snowplow events to Iterable from GTM Server Side using the Iterable Tag for cross-channel marketing automation and customer engagement. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/iterable-tag-for-gtm-ss/ The Iterable Tag for GTM SS allows events to be forwarded to Iterable. This Tag works best with events from the Snowplow Client, but can also construct Iterable events from other GTM SS Clients such as GAv4. The tag is designed to work best with Self Describing Events, and atomic events, from a Snowplow Tracker, allowing for events to automatically be converted into an Iterable events, include Iterable Identity events. Additionally, any other client event properties can be included within the event properties or user properties of the Iterable event. ## Template Installation > **Note:** The server Docker image must be 2.0.0 or later. There are two methods to install the Iterable Tag. ### Tag Manager Gallery 1. From the Templates tab in GTM Server Side, click “Search Gallery” in the Tag Templates section 2. Search for “Iterable” and select the official “By Snowplow” tag 3. Click Add to Workspace 4. Accept the permissions dialog by clicking “Add” ### Manual Installation 1. Download [template.tpl](https://raw.githubusercontent.com/snowplow/snowplow-gtm-server-side-iterable-tag/main/template.tpl) - Ctrl+S (Win) or Cmd+S (Mac) to save the file, or right click the link on this page and select “Save Link As…” 2. Create a new Tag in the Templates section of a Google Tag Manager Server container 3. Click the More Actions menu, in the top right hand corner, and select Import 4. Import `template.tpl` downloaded in Step 1 5. Click Save ## Iterable Tag Setup With the template installed, you can now add the Iterable Tag to your GTM SS Container. 1. From the Tag tab, select "New", then select the Iterable Tag as your Tag Configuration 2. Select your desired Trigger for the events you wish to forward to Iterable 3. Enter your Iterable API Key for a Standard Server Side integration. This can be generated from Iterable "Integrations -> API Keys" settings page 4. Click Save --- # Configure Iterable Tag for GTM Server Side > Configure user identifiers, identity events, entity mapping, and event properties for the Iterable Tag in GTM Server Side. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/iterable-tag-for-gtm-ss/iterable-tag-configuration/ ## Iterable API Key (Required) Set this to the API of your Iterable HTTP API Data Source. Iterable provides four different types of API keys, each of which can access a different subset of Iterable's API endpoints. For the endpoints currently in use (`events/track` and `users/update`) the Javascript type key has enough permissions. The Mobile and Standard key types have more permissions than the Javascript type, so can also be used. ## Identity Settings Iterable requires users to be identified to work best. The options in this section configure how you wish to identify users to Iterable based on your Snowplow events. ### Identifiers #### Use client\_id for anonymous users Specify whether `client_id` is used to create a placeholder email for anonymous users. This is useful for implementations where there is no identifiers for a user besides device identifiers (such as Browser Cookies). #### email ##### Use email\_address from common user data For Snowplow Tracking, the common user data can be populated by using the `iglu:com.google.tag-manager.server-side/user_data/jsonschema/1-0-0` context entity. This schema is available on [Iglu Central](https://github.com/snowplow/iglu-central/blob/853357452300b172ebc113d1d75d1997f595142a/schemas/com.google.tag-manager.server-side/user_data/jsonschema/1-0-0). This option is enabled by default. Disabling it allows for any other properties of the event to be selected for the `email` property on the Iterable event. ##### Specify email As mentioned above, this table is revealed when disabling the "Use email\_address from common user data" configuration option. Using this table allows you to specify key paths to look for the `email` value. You can also set the search priority to denote the preference for the value to use. The columns of this table are: - **Search Priority**: The priority of a key path when looking for `email` (higher number means higher priority). - **Property Name or Path**: The key path to look for in the server-side common event. #### userId ##### Use user\_id from common user data Iterable can also accept a User Id, rather than the preferred e-mail address. Enabling this property will use the `user_id` property from the server-side common event as the `userId` identifier of the user. ##### Specify userId This table is revealed when disabling the "Use user\_id from common user data" configuration option. Using this table allows you to specify key paths to look for the `userId` value. You can also set the search priority to denote the preference for the value to use. The columns of this table are: - **Search Priority**: The priority of a key path when looking for `userId` (higher number means higher priority). - **Property Name or Path**: The key path to look for in the server-side common event. As an example of how Search Priority works, according to the following setup, in order to set the value for Iterable's `userId`, the Tag will first look for `user_id` in common event. If that is not found, then it will use the value of `user_data.email_address`: ![userId identifier example](/assets/images/user_id_example-c2aa0f9095a8b50674d151370535eb2b.png) ### Identity Events #### Use the default `identify` event Iterable allows for user information to be updated once a user has identified themselves (for example, to update their placeholder email to their real email address). To Identify a user to Iterable, you can send a Self Describing `identify` event. This schema is [available on Iglu Central](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/identify/jsonschema/1-0-0). For example, using the JavaScript Tracker v3, this would look like: ```javascript window.snowplow('trackSelfDescribingEvent', { schema: 'iglu:com.snowplowanalytics.snowplow/identify/jsonschema/1-0-0', data: { id: '2c5ba856-ee07-47b5-a3a6-63100026ed63', email: 'john.doe@example.com' } }) ``` If you would like to specify your own event, disabling this option allows you to select your own event name and properties which can used to fire identity updates to Iterable. #### Specify identity event(s) by event name This multi-line text box is revealed when disabling the "Use the default `identify` event" configuration option above. In general, "identity events" are the event names which will make the Iterable Tag call the `/users/update` [API endpoint](https://api.iterable.com/api/docs#users_updateUser) (create or update a user), using the identifiers and the user\_data specified by the tag configuration. These events might be different than the default Snowplow Identify schema, for example sign\_up, login etc, from your own custom event schemas. ## Snowplow Event Mapping Options ### Include Self Describing event Indicates if a Snowplow Self Describing event should be in the `dataFields` object. ### Snowplow Event Context Rules This section describes how the Iterable tag will use the context Entities attached to a Snowplow Event. ![snowplow event context rules](/assets/images/context_rules-3b86d3660566214e58f2854ec459ac3c.png) #### Extract entity from Array if single element Snowplow Entities are always in Arrays, as multiple of the same entity can be attached to an event. This option will pick the single element from the array if the array only contains a single element. #### Include Snowplow Entities in event properties Using this drop-down menu you can specify whether you want to Include `All` or `None` of the Snowplow context entities in Iterable's within the Event Data fields of the Iterable event. If disabling this, individual entities can be selected for inclusion. These entities can also be remapped to have different names in the Iterable event, and can be included in either event data or user data. The entity can be specified in two different formats: - Major version match: `x-sp-contexts_com_snowplowanalytics_snowplow_webPage_1` where `com_snowplowanalytics_snowplow` is the event vendor, `webPage` is the schema name and `1` is the Major version number. `x-sp-` can also be omitted from this if desired - Full schema match: `iglu:com.snowplowanalytics.snowplow/webPage/jsonschema/1-0-0` #### Snowplow Entities to Add/Edit mapping Using this table you can specify in each row a specific mapping for a particular context entity. In the columns provided you can specify: - The Entity name to add/edit-mapping (required).¹ - The key you could like to map it to (optional: leaving the mapped key blank keeps the same name). - Whether to add in event data fields or user data fields of the Iterable event (default value is `event data`). - Whether you wish the mapping to apply to all versions of the entity (default value is `False`).¹ #### Snowplow Entities to Exclude Using this table (which is only available if `Include Snowplow Entities in event properties` is set to `All`), you can specify the context entities you want to exclude from the Iterable event. In its columns you can specify: - The Entity name (required).¹ - Whether the exclusion applies to all versions of the entity (default value is `False`).¹ > **Note:** ¹ How to specify the **Entity Name** and its relation to **Apply to all versions** option: > > Entity Names can be specified in 3 ways: > > 1. By their Iglu Schema tracking URI (e.g. `iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-0-2`) > > 2. By their enriched name (e.g. `contexts_com_snowplowanalytics_snowplow_client_session_1`) > > 3. By their key in the client event object, which is the GTM SS Snowplow prefix (`x-sp-`) followed by the enriched entity name (e.g. `x-sp-contexts_com_snowplowanalytics_snowplow_client_session_1`) > > Depending on the value set for the **Apply to all versions** column, the major version number from the 2nd and 3rd naming option above may be excluded. More specifically, this is only permitted if **Apply to all versions** is set to `True`. **pre-v0.2.0** #### Snowplow Event Context Rules ##### Extract entity from Array if single element Snowplow Entities are always in Arrays, as multiple of the same entity can be attached to an event. This option will pick the single element from the array if the array only contains a single element. ##### Include all Entities in event\_properties Leaving this option enabled ensures that all Entities on an event will be included within the Event Data of the Iterable event. If disabling this, individual entities can be selected for inclusion. These entities can also be remapped to have different names in the Iterable event, and can be included in either event data or user data. The entity can be specified in two different formats: - Major version match: `x-sp-contexts_com_snowplowanalytics_snowplow_webPage_1` where `com_snowplowanalytics_snowplow` is the event vendor, `webPage` is the schema name and `1` is the Major version number. `x-sp-` can also be omitted from this if desired - Full schema match: `iglu:com.snowplowanalytics.snowplow/webPage/jsonschema/1-0-0` ##### Include unmapped entities in event\_properties If remapping or moving some entities to User Data with the above customization, you may wish to ensure all unmapped entities are still included in the event. Enabling this option will ensure that all entities are mapped into the Iterable event. ### Additional Event Mapping Options If you wish to map other properties from a Client event into an Iterable event they can be specified in this section. #### Event Property Rules ##### Include common event properties Enabling this ensures properties from the [Common Event](https://developers.google.com/tag-platform/tag-manager/server-side/common-event-data) are automatically mapped to the Iterable Event Data. ##### Additional Event Property Mapping Rules Specify the Property Key from the Client Event, and then the key you could like to map it to or leave the mapped key blank to keep the same name. You can use Key Path notation here (e.g. `x-sp-tp2.p` for a Snowplow events platform or `x-sp-contexts.com_snowplowanalytics_snowplow_web_page_1.0.id` for a Snowplow events page view id (in array index 0) or pick non-Snowplow properties if using an alternative Client. #### User Property Rules ##### Include common user properties Enabling this ensures user\_data properties from the [Common Event](https://developers.google.com/tag-platform/tag-manager/server-side/common-event-data) are automatically mapped to the Iterable Event Properties. ##### Additional User Property Mapping Rules Specify the Property Key from the Client Event, and then the key you could like to map it to or leave the mapped key blank to keep the same name. You can use Key Path notation here (e.g. `x-sp-tp2.p` for a Snowplow events platform or `x-sp-contexts.com_snowplowanalytics_snowplow_web_page_1.0.id` for a Snowplow events page view id (in array index 0) or pick non-Snowplow properties if using an alternative Client. ## Advanced Event Settings ### Merge user dataFields Enabling this option will merge the user dataFields when updating an Iterable user, instead of replacing with the new user data, which is the default behavior. ## Logs Settings _(Available since v0.2.0)_ Through the Logs Settings you can control the logging behavior of the Iterable Tag. The available options are: - `Do not log`: This option allows you to completely disable logging. No logs will be generated by the Tag. - `Log to console during debug and preview`: This option enables logging only in debug and preview containers. This is the **default** option. - `Always`: This option enables logging regardless of container mode. > **Note:** Please take into consideration that the logs generated may contain event data. The logs generated by the Iterable GTM SS Tag are standardized JSON strings. The standard log properties are: ```json { "Name": "Iterable", // the name of the tag "Type": "Message", // the type of log (one of "Message", "Request", "Response") "TraceId": "xxx", // the "trace-id" header if exists "EventName": "xxx" // the name of the event the tag fired at } ``` Depending on the type of log, additional properties are logged: | Type of log | Additional information | | ----------- | -------------------------------------------------------------- | | Message | “Message” | | Request | “RequestMethod”, “RequestUrl”, “RequestHeaders”, “RequestBody” | | Response | “ResponseStatusCode”, “ResponseHeaders”, “ResponseBody” | --- # LaunchDarkly Tag for GTM Server Side > Forward Snowplow events to LaunchDarkly from GTM Server Side using the LaunchDarkly Tag with metric import REST API for experiment tracking. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/launchdarkly-tag-for-gtm-ss/ The [LaunchDarkly Tag for GTM SS](https://github.com/snowplow/snowplow-gtm-server-side-launchdarkly-tag) allows events to be forwarded to LaunchDarkly using its [metric import REST API](https://docs.launchdarkly.com/home/creating-experiments/import-metric-events). This Tag works best with events from the Snowplow Client, but can also work with events from other GTM SS Clients such as GAv4. ## Template Installation > **Note:** The server Docker image must be 2.0.0 or later. ### Tag Manager Gallery Coming soon! ## LaunchDarkly Tag Setup With the template installed, you can now add the LaunchDarkly Tag to your GTM SS Container. 1. From the Tag tab, select "New", then select the LaunchDarkly Tag as your Tag Configuration. 2. Select your desired Trigger for the events you wish to use as metrics in LaunchDarkly experiments. 3. [Configure](/docs/destinations/forwarding-events/google-tag-manager-server-side/launchdarkly-tag-for-gtm-ss/launchdarkly-tag-configuration/) the Tag. 4. Click Save. --- # Configure LaunchDarkly Tag for GTM Server Side > Configure metric type, authentication, context keys, and event creation time for the LaunchDarkly Tag in GTM Server Side. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/launchdarkly-tag-for-gtm-ss/launchdarkly-tag-configuration/ This is the configuration of the LaunchDarkly Tag. ## Event Name (Required) This is the value that identifies your metric. When you create your metric in LaunchDarkly, its Event name must exactly match this value. ## Metric Type ![](/assets/images/01-metric-type-35a667b323c01a8c56eb17ec207d2432.png) This section allows you to select the Metric Type for this Tag. The options are: 1. **Numeric Metric** (default): Choosing this metric type reveals the [Metric Options](#metric-options) configuration section, where you must specify a Metric Value. 2. **Conversion**: Conversion Metrics should just be triggered and the metric name will be sent as a conversion. ## Authentication ![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAxYAAADPCAYAAABoZ0fdAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AACAASURBVHic7d17VFTnvf/xDwwwckcEBUFAGLyAl3ghiVhv1SSmamKtVmNizMXk9PR3fr+2Oa3V2pWY1XpO0npWe3659ZdaY6JRiVZP4l2DJire0GgkGAREoxgVhXCR2wx75veHdSKVROKGDJD3ay3Xmnn2s5/9ncE/+PA8z95eJVdKXYZhyDAMOQ1DhtOQw9Egh90uwzBUV1+vtCGDBQAAAABfxdvTBQAAAABo/wgWAAAAAEwjWAAAAAAwjWABAAAAwDSCBQAAAADTCBYAAAAATCNYAAAAADCNYAEAAADANIIFAAAAANMIFgAAAABMI1gAAAAAMI1gAQAAAMA0ggUAAAAA0wgWt7B582bNnTtXR48eNT3WxYsXNXfuXGVmZrZAZc2zd+9eLVy4UNXV1e1yfAAAALQPBItbOH78uCTp2LFj3/jc119/Xc8++2xLl/S15s2bpzfffNP9PiAgQJ07d5bFYjE99pkzZzR37lzt2bOnVcYHAABA++Xj6QJu5cyZMyooKNA999zT5PEdO3YoOTlZCQkJLX7ts2fPqqysTBEREcrPz1dtba38/f1b/DqtafDgwRo8eHC7HR8AAADtg+VXc3+90OVy6cZ/TqdTTsOQy+VSg2Eopnu0xwr8+9//rkOHDskwDNlstkbHtmzZoszMTFVVVWnQoEEtfu3du3fr/PnzmjFjho4cOaLIyEjFxMS4jz/77LM6deqU+xfrXbt26bXXXlNqaqr++Mc/qqSkRA0NDe7w4+Pjo/379ysyMlJZWVlav369cnNzlZiYqMDAQElSXV2d1q1bp4yMDGVlZamsrExJSUmyWCzat2+fXnrpJfn7+ysjI0NbtmzRuXPn1LdvX5WUlOh3v/udXC6XLl++rMzMTI0bN04bN27UkiVLNHToUPn7+6uhoUEbNmxQRkaGdu7cqeLiYvXs2VNWq1WSdPToUa1cuVIbN27URx99pKCgIEVFRWnPnj16++23JUn5+fk6ffq0hgwZctP4LpdL77//vlavXq2tW7fq1KlTio2NVVBQkCRp4cKFOnPmjPLz87Vq1SodOHBAnTt3VteuXVv85wcAAIBvT5tfCjVr1iwlJCRo586d2rp1q7t9y5Yt2rVrlxISEvTII4+0yrWPHz8um82m3r17KyQk5Bsth3rooYcUHR0tX19fzZ49u9EvzocOHVJkZKTS0tJUXFysjRs3uo+99dZbysnJ0YgRI3THHXfowIEDjT63JO3cuVMDBgyQzWZTbm6uDhw4oPDwcM2ePVteXl7q0aOHHn300Sbr2rp1q7KystSvXz+NGDFC+fn5WrFihSSpqKhIq1atUlhYmKZMmaLAwECtWrVKJSUlSklJ0f333y9JSktL03333dfk+Nu2bdOOHTvUs2dPjR07VhcuXNCSJUtkt9vdfU6cOKGamhqNGjVKdrtda9askWEYzf5uAQAA0Pa0+aVQfn5+mjNnjpYsWaKdO3dKkpxOpz744AMlJCRozpw58vPza/Hrnj59WhUVFRo3bpwkKSUlRYcOHVJ1dbV7duHr9O3bV3v27NEXX3yh1NRUSVJlZaUkacSIEZowYYKka8utiouLJUnFxcUqLCzUhAkTNGzYMElSWVmZjhw5okmTJrnHnjp1qlJSUuRwOPTb3/5WxcXFGjVqlFJTU+Xl5aWQkBClpKTcVJNhGNq3b59SU1P1ox/9SJLk6+urPXv2qKamRqGhoZo9e7YSExPl7++v8PBw/eUvf9GZM2d05513qmfPnpKkqKgoxcfH3zR+Q0ODsrKylJSUpJkzZ0qSYmJitHTpUh0+fFjp6enutscee8x9zq5du1RaWsqsBQAAQDvW5oOF1HS4aM1QIX25Wbtbt24qLS1VbGysDhw4oOPHj7t/6b9dnTp1avS6vr5ekvT5559LkjZt2qRNmzY1Oud6H+nahmnpWiiwWCyNZgO+TllZmRoaGhot5xo9erRGjx7tHreoqEhLlixRWVmZe9wbr32r8evr690BRJKSkpIkSefPn3e33bhP5fp30dzPAAAAgLapXQQLqXG4kNSqocLpdConJ0eS9OqrrzY69vHHH5sOFrcyfvx4JSYmNmrz9fU1Pa6399evfMvJydGaNWuUnp6uqVOn6sqVK1q+fHmzx/fxufbfycvL66ZjTqfzmxULAACAdqXdBAvpy3Bx/XVrOXXqlK5evaphw4Y12jCenZ2tkydPqrKyUiEhIbJaraqrq3Mf/+e/7Ht5ecnlcjX7ulFRUZKk8vJy912uampqVFtbe8tQcKOvumZYWJh8fHzcMyPStZmZo0ePavr06SosLJQk3XPPPQoMDFRVVdU3Ht9qter06dPutuuvo6M9dwMAAAAAtL52FSyk1g0U111fBjV69Gh17tzZ3e7l5aW8vDx9/PHHGjFihGJjY3XixAlt3LhRDQ0NOnjwYKNxgoODVV9fr82bN+uuu+665XXj4uJks9l08OBBOZ1ORUZG6tChQwoNDdXTTz/drNpDQkJ05swZZWZmasyYMY2OWSwWDR8+XLt379a6desUGhqq3bt3KzIyUgEBAYqMjJQkrVu3TrGxsdq7d6+kL2cbQkJC3N9PQECAhg4d2mh8b29vjRw5Ujt27NDKlSsVFRWlrKwsBQQEaMiQIc2qHwAAAO1Tm78r1LfNMAx98sknioqKahQqJKlXr17y8fHRxx9/LEmaOHGi4uLidODAARUXF98UHkaNGqWIiAjt37/fvXH7Vh555BENGjRIOTk52r59u7p06eLeaN0c48ePl3TtidhNzSzcd999Sk9P1/Hjx7Vr1y7ZbDb3HaSGDRumu+++W4WFhTpw4IBGjBgh6dreCUkKDw9Xenq6SkpK9MknnzR5/bFjx2rMmDEqLCxUZmamunbtqp/85CfN2vAOAACA9sur5EqpyzAMGYYhp2HIcBpyOBrksNtlGIbq6uuVNoQHoAEAAAD4asxYAAAAADCNYAEAAADANIIFAAAAANMIFgAAAABMI1gAAAAAMI1gAQAAAMA0ggUAAAAA0wgWAAAAAEwjWAAAAAAwjWABAAAAwDSCBQAAAADTCBYAAAAATCNYAAAAADCNYAEAAADANIIFAAAAANMIFgAAAABMI1gAAAAAMI1gAQAAAMA0ggUAAAAA0wgWAAAAAEzz8XQBaH8qKyt1Mr9QFRUV6uTvr54JcYrp3t3TZQEAAMCDCBZotsqqKq3KWKsDB7NlsVgUEhKsmppa1dXVqWdCvGY9PEOJPRM8XSYAAAA8gGCBZim5fFkv/vHPslgs+ulP5mjQwAHy8fGRy+VS0ekzWv/uBv3Hi/+lf336CQ0ZPMjT5QIAAOBbZvnV3F8vdLlcuvGf0+mU0zDkcrnUYBiK6R7t6TpbldPp1K9//Wvt2LFDO3bs0O7du3Xq1CnFxsYqKCjoG4+3ePFiJScnKzAw8LbqMQxDmZmZSkhIkLf3zdtgXn31VVksFkVHX/u5nDx5Uq+88opSU1Nv+5pfp6GhQf/5xz8pMCBAv5n37+oaGaGly5Zr6ZvLlX34Iw0c0F8T7r9PVVevat3/vKdBdwxUSEhwi9cBAACAtovN2zeYP3++XnjhBS1YsEBxcXFatmzZTX1cLtctx/nxj3+s8PDw267DMAzt2LFDhmHcsu/Fixe1atUqPfzww4qMjLzta36dXR/uUWlpqf7P//qJAgMCtGJlhvILTunHP/qhQkND9OeXXlVdXb0enjFNPWJj9c7a9a1SBwAAANoulkLdwNvbW97e3urUqZPGjBmjzMxM1dTU6NixYzp27Jjq6+sVHR2tGTNm6LPPPtP69etVWlqq+Ph4TZ06VWFhYZKk5cuX6/HHH1f37t3ldDq1adMmZWdnKygoSJMmTVLfvn0lSfn5+XrvvfdUUVGhpKQkTZs2TZWVlfrrX/8qSXrhhRf05JNPKiYmpsl6q6qqtHTpUt1///2y2Wzu9sOHD2vbtm1yOp1KT0/X2LFjdfbsWS1ZskTPPfecLBaLCgoKtHbtWs2fP/+W38uBg9lKH3a3One+9vnOf35BD02fqjvThiht6BD9289/qXPFxeqVbNOE++/Ty6+9rurq6laZPQEAAEDbxIxFE+x2u/bt26fQ0FAFBARIkkpKSjR16lRNmzZNtbW1euONNzRmzBg999xzio+P11tvvdXkWLt379b58+c1f/58TZ8+XRkZGaqtrVV1dbVWrFihSZMm6bnnnlNoaKg2btyo6OhozZs3T5I0b968rwwVDodDy5Yt04ABA3TXXXe528+ePavNmzfrqaee0s9+9jMdOXJEhYWFiouLk9VqVVFRkSQpLy9Pqampzfo+is+fV7It0f3++Wd/ozvThkiS9uzdp+CgIMXGXLsrlM2WKKfTqQsXLjVrbAAAAHQMzFjcYNGiRZIkPz8/xcTEaPbs2e5jycnJ6tGjh6Rrv5RHRkZq4MCBkqSxY8dq3759unTpkrp169ZozCNHjmjixIny9/dXfHy8YmJiVFRUJLvdrpiYGPXu3VuSdM899ygvL6/ZtW7btk319fVKSkpq1P7RRx9p6NCh6tq1qyRp6NCh+vTTT2Wz2ZSamqrc3FwlJycrLy9PU6ZMada1GhwN8vP1u6l94+atenfDZs3995+5A5if37V+doe92Z8FAAAA7R/B4gYLFixQaGjoLftdvXpVnTt3dr/39vZWaGioysvLbwoWlZWVWrFihby8vCRd2z8xYMAAORwOhYSEuPsFBQVp6NChza41KipKkydP1ssvv6xevXq5l0JVVFQoPz9fBw4ckHRtT8j1mYl+/frpnXfe0ciRI1VdXa2ePXs261phncN08dLNMxCbtmzXg5N+oGTbl+Hm0sVr/cLDO9/UHwAAAB0XweI2hIWFKScnx/3e6XSqoqKiyTtIBQcHa9q0aYqPj2/UnpOToxMnTrjf2+12Xbhw4aZ+X+X6rMTkyZO1evVq/eIXv1BgYKBCQkI0btw4jRkz5qZzEhMT5XA4tHPnTvXp06fJO041ZUC/VO0/cEgT7r/PHZBcLpcemzVTvXslN+qbtf+gIiK6KOqfAhYAAAA6NvZY3IZevXrpypUr+uSTT2QYhj788EMFBQU1uR9i4MCB2r59u2pqalRTU6MNGzaooqJCNptNn3/+uQoLC+V0OrVjxw5lZWVJknx8fOTt7a3S0lI5nc6vrWXw4MFKSkrSmjVrJEkDBgxwL8tyOBzau3ev8vPzJV2bWUlJSVF2dnaz91dI0vh7x+nipRJt2LTV3eZ0uvTa63/Tmc/OutsKC4u084PdmviD8c0eGwAAAB0DMxa3wWq16oknntC6deuUkZGhmJgYPfbYY032HTNmjDZv3qzFixfL6XQqLS3Nvdzq0Ucf1bvvvquysjL16tVL06dPl3QtANx11116+eWX9fTTT99yFuOHP/yh/vznP2vfvn1KT0/XuHHj9MYbb6i6ulo2m02DBw929+3Xr5+OHj3q3tvRHN26ddWsh2fozeUrZXfYNXnSBPn4+GjZktfcfY58dFRL3nhLdwzsr1Ejhjd7bAAAAHQMXiVXSl2GYcgwDDkNQ4bTkMPRIIfdLsMwVFdfr7Qhg289EtwWLVqkJ554wv0Au7akqKhIH374oR5//PFvfO7erP1avnK1/P39NWjgAIWHh6umtkYnTuTp7LlijR75PT0yc7osFksrVA4AAIC2jBmLFlZRUaGamhr5+/t7upSb1NTU6IMPPtCgQYNu6/zvDR+m/v1S9cHuPfo0L18n8wvUyd9ftqREPfbow+qZ0Lz9IQAAAOh4CBYtqKKiQn/4wx/Uv39/98Py2pJFixYpJSXFfZvc2xEaGqIHJ03Qg5MmtGBlAAAAaO9YCgUAAADANO4KBQAAAMA0ggUAAAAA0wgWAAAAAExj8/Y/1NfbVVpWruqaWk+XAqANCwzwV5fwMFmtfp4uBQCANoVgoWuh4mzxBUVGhKt7dFdPlwN85xWc+kzJSW3z9sXlFVU6W3xBcbHRhAsAAG7AUihJpWXliowIV1hosKdLAdDGhYUGKzIiXKVl5Z4uBQCANoVgIam6ppZQAaDZwkKDWTYJAMA/IVgAAAAAMI1gAQAAAMA0ggUAAAAA0wgWAAAAAEwjWAAAAAAwjWABAAAAwDSCBQAAAADTCBYAAAAATCNYAAAAADCNYAEAAADANIIFAAAAANN8PF0AAJi1du1aORwO9/tp06bJx8dHdXV1Wr9+vbvdarVqypQpkqTs7GwVFha6j6Wlpclms317RQMA0MEQLAC0e/X19bLb7Te1u1wu1dXVNXlOQ0NDo2OGYbRafQAAfBcQLAC0S5WVlaqpqZF0LUDc6NKlS7JYLKqvr2/U7nQ6dfHiRUlyn3tdRUWF+1i3bt3k5eXVWqUDANAhESwAtEs5OTnKz89v8tj27dubbLfb7dqyZUuTx3Jzc5WbmytJmjlzpqxWa8sUCgDAdwSbtwF0eMHBwfLx4e8oAAC0JoIFgA5v1KhRioiI8HQZAAB0aAQLAAAAAKaxNgBAh2SxWNyvvby8ZLFY3G1Op/OmDd8AAMAcggWADumBBx5wb8C2Wq0aPXq0+5ayhw4dUlFRkSfLAwCgwyFYAOiQbnww3sSJE3X48GH37WQBAEDLY48FAAAAANMIFgAAAABMYykUgA5vy5Ytcjqdni4DAIAOjWABoF3q3bu3unfv3ipj+/r6tsq4AAB0ZAQLAO1SREQED70DAKANYY8FAAAAANMIFgAAAABMI1gAAAAAMI1gAQAAAMA0ggUAAAAA0wgWAAAAAEwjWAAAAAAwjWAhKTDAX+UVVZ4uA0A7UV5RpcAAf0+XAQBAm0KwkNQlPEyXr5QRLgDcUnlFlS5fKVOX8DBPlwIAQJvCk7clWa1+iouNVmlZuS5fKfN0OQAkFZz6zNMlNCkwwF9xsdGyWv08XQoAAG0KweIfrFY/dY/u6ukyAAAAgHaJpVAAAAAATCNYAAAAADCNYAEAAADANIIFAAAAANMIFgAAAABMI1gAAAAAMI1gAQAAAMA0ggUAAAAA0wgWAAAAAEwjWAAAAAAwjWABAAAAwDSCBQAAAADTCBYAAAAATCNYAAAAADCNYAEAAADANIIFAAAAANMIFgAAAABMI1gAAAAAMI1gAQAAAMA0ggUAAAAA03w8XQDQXPX19TqZX6DLV0rlY7EoOjpKtqREeXuTjwEAADyNYIE2zzCc2rBps7Zs2yG73aHQ0BAZDYaqrl5V57AwTZs6Wel33+XpMgEAAL7TCBZo0xwOh/70f19V0ekz+uGDkzRi+DAFBgZKkq5cKdXW7e/rr397U+fOndf0aVM8XC0AAMB3F8FCktPp1Lx5825qnzFjhgYPHnxbY77yyiuaMGGCEhISTFb37Tp48KASExMVGRl507Ft27apqqpKU6dOlSSVl5fr5Zdf1sSJE3XHHXe0Sj1vvb1an509p9/O+6ViYrpr/bsbtevD3fLz9dPkByfqkZnT1SvZptde/5uio7pp5IjhrVIHAAAAvh7B4gYLFixQaGhoi4z14IMPqlu3bi0y1rfp4MGDCgkJaTJY3Ki+vl5Lly7V3Xff3Wqh4uy5Yu3N2q///dN/UWxsjHbvydKmLds0+YGJqqqq0tJlyxXdrZvuTBuiM5+d1Zp1/6O770qTn59fq9QDAACAr0awuIWLFy9q2bJlGjhwoA4ePKigoCA9/PDDio6O1sqVKxUbG6uRI0dKkpYvX67ExEQNHz5cGRkZmjx5spKSkrRw4UL16dNHubm5+vnPf66wsDBt3rxZhw8fVqdOnTRixAh973vfkyTt27dPp06dktPpVGFhoeLi4jRr1ix16tRJe/bsUUFBgQzD0JkzZ5SQkKDx48drzZo1KisrU0pKimbMmCFvb285nU5t2rRJ2dnZCgoK0qRJk9S3b19J0sKFC/X9739fWVlZstvteuCBBzRo0CC98MILKi8v1+rVq3XfffcpPT29ye/E6XTq7bffVnR0tMaNG+duv3r1qt555x2dPn1a0dHRmj59usLCwvT888/rqaeeUo8ePWS327Vw4UL96le/UufOnb/2uz946LCiunXT4EEDJUmnPzurcWNHa9KE8ZKkE5/m6UTeSdlsifrB+Hu1dfv7yj2Rp0F3DDD3QwcAAMA3xu10mqGsrEyhoaFasGCBkpKSlJmZKUnq37+/Tpw4IUkyDEMFBQXq169fk2P4+/tr/vz5Cg8P165du3Tu3DnNnTtXc+bM0Z49e/Tpp5+6+548eVKjRo3Sb37zG9XV1eno0aPuY+fOndOECRM0f/58Xb16VW+//bZmz56tuXPn6ty5c8rLy5Mk7d69W+fPn9f8+fM1ffp0ZWRkqLa21j3O+fPn9cwzz+jBBx/Ue++9J0maN2+eunfvrhkzZnxlqJCk9957T7W1tZo2bVqj9oyMDHXt2lXPPfec+vfvr4yMDFksFneokqSCggJFRkbeMlRIUvH580pOTnK/n/3IQ3rox9eWYeUXFKrk8mX16Z0sSQoKClRUt64qPv/5LccFAABAyyNY3GDRokWaO3eu5s6dqz/96U/u9qCgIKWnp8vX11d9+vRRWVmZJKlPnz76/PPPVVNTo9OnT6tr165fuZRq2LBhCggIkJeXlz766CONGzdOgYGBioyM1IgRI3T48GF33759+yohIUH+/v5KTExUaWmp+5jNZlP37t0VFBSk5ORkpaSkqEuXLgoJCVF8fLy775EjRzRmzBj5+/srPj5eMTExKioqco8zduxYWa1WpaSkqLq6Wna7vVnf0YkTJ5SdnS1JjW7zWl1drYKCAt17773y8fHR8OHDVVxcrLq6ukYBLC8vT6mpqc26lsPRID9f35va807m6w//9d+aMvkB9Uq2udv9/PzkcDTvcwAAAKBlsRTqBs3ZY+Ht7S3DMCRJvr6+Sk5OVl5enoqLizVgQPOW4Fy9erXRX+w7d+6sY8eONdnXy8vLfb2mavnnvk6nU5JUWVmpFStWyMvLS9K1GZWm6rs+xlddo6n+zzzzjFasWKH3339f9957r/t6LpdLv//97xv1r6qqUu/evbV69WqVlZXp5MmTmj17drOuFd45TBcvXbqpffuOnerdK1nj7/1yGZbT6dTly1cUHh7erLEBAADQsggWJg0YMEDHjx/XhQsX9NRTTzXrnNDQUJWXl7s3SH/xxRcKCgpq0bqCg4M1bdo0xcfHt+i4ffr0UZcuXTRz5ky99NJLSk5OVs+ePRUcHCwfHx8tXLiwyQfW9erVSzt37pQkxcTENOta/ful6q9L31RZ2RcKD/8yiI0bO/qm7+vY8RzV1NYqNaWPiU8HAACA28VSKJP69u2rgoICderUqVn7BiRp8ODByszMVG1trb744gtlZWVp0KBBLVrXwIEDtX37dtXU1KimpkYbNmxQRUXFLc/z8/NTaWnpLWcwIiMjNXHiRK1atUq1tbUKCgpSXFyctm7dKsMwVFJSovXr18vlckmS+vXrp+zsbKWkpDT7MwwdMlhduoTrjTdXuGdiJGn9uxu1b/9B9/uqq1e1cvUa3ZU2VJEREc0eHwAAAC2HGYsbLFq0qNH7KVOm3PI5FH5+frLZbOrRo0ezrzNy5EhVV1dr8eLFcrlcGjlypAYOHHg7JX+lMWPGaPPmzVq8eLGcTqfS0tKadSvdO++8U+vXr5dhGBo1atQt+548eVJr167VrFmzNGPGDK1bt07PP/+8/P39NX78ePdSrJSUFHl7ezd7f4UkWSze+um/zNF/vLhY//3ya3ry8UcVEhysBfN+6e5TXHxer/6/JbJ4e2vWw9ObPTYAAABallfJlVKXYRgyDENOw5DhNORwNMhht8swDNXV1yttyO09JA64zuFw6MUXX9T8+fNlsVi+0bmnz3yml197XVevXtUdAwcoJjpaDYah06fP6JMTn8qWlKh/+9enFRoa0krVAwAA4FYIFmh1DodDH3zwgSoqKtxP7b6dMfZm7dfHxz/RldIyWSzeio6K0p1pQzTojgHumREAAAB4BsECrW7JkiWqqqrSk08+qZAQZhUAAAA6IvZYoNXNmTPH0yUAAACglXFXKAAAAACmESwAAAAAmEawAAAAAGAaeyz+ob7ertKyclXX1Hq6FABtWGCAv7qEh8lq9fN0KQAAtCkEC10LFWeLLygyIlzdo7t6uhzgO6/g1GdKTor3dBlNKq+o0tniC4qLjSZcAABwA5ZCSSotK1dkRLjCQoM9XQqANi4sNFiREeEqLSv3dCkAALQpBAtJ1TW1hAoAzRYWGsyySQAA/gnBAgAAAIBpBAsAAAAAphEsAAAAAJhGsAAAAABgGsECAAAAgGkECwAAAACmESwAAAAAmEawAAAAAGAawQIAAACAaQQLAAAAAKYRLAAAAACY5uPpAgDArLVr18rhcLjfT5s2TT4+Pqqrq9P69evd7VarVVOmTJEkZWdnq7Cw0H0sLS1NNpvt2ysaAIAOhmABoN2rr6+X3W6/qd3lcqmurq7JcxoaGhodMwyj1eoDAOC7gGABoF2qrKxUTU2NpGsB4kaXLl2SxWJRfX19o3an06mLFy9Kkvvc6yoqKtzHunXrJi8vr9YqHQCADolgAaBdysnJUX5+fpPHtm/f3mS73W7Xli1bmjyWm5ur3NxcSdLMmTNltVpbplAAAL4j2LwNoMMLDg6Wjw9/RwEAoDURLAB0eKNGjVJERISnywAAoEMjWAAAAAAwjbUBADoki8Xifu3l5SWLxeJuczqdN234BgAA5hAsAHRIDzzwgHsDttVq1ejRo923lD106JCKioo8WR4AAB0OwQJAh3Tjg/EmTpyow4cPu28nCwAAWh57LAAAAACYRrAAAAAAYBpLoQB0eFu2bJHT6fR0GQAAdGgECwDtUu/evdW9e/dWGdvX17dVxgUAoCMjWABolyIiInjoHQAAbQh7LAAAAACYRrAAAAAAYBrBAgAAAIBpBAsAVmTd6QAAATNJREFUAAAAphEsAAAAAJhGsAAAAABgGsECAAAAgGkEC0mBAf4qr6jydBkA2onyiioFBvh7ugwAANoUgoWkLuFhunyljHAB4JbKK6p0+UqZuoSHeboUAADaFJ68Lclq9VNcbLRKy8p1+UqZp8sBIKng1GeeLqFJgQH+iouNltXq5+lSAABoUwgW/2C1+ql7dFdPlwEAAAC0SyyFAgAAAGAawQIAAACAaQQLAAAAAKYRLAAAAACYRrAAAAAAYBrBAgAAAIBpBAsAAAAAphEsAAAAAJhGsAAAAABgGsECAAAAgGkECwAAAACmESwAAAAAmEawAAAAAGAawQIAAACAaQQLAAAAAKYRLAAAAACYRrAAAAAAYBrBAgAAAIBpBAsAAAAAphEsAAAAAJhGsAAAAABg2v8Heg6qijPgOdIAAAAASUVORK5CYII=) ### Project Key (Required) In this text box you need to provide the key for the **project** your metric events pertain to. You can find it under Environments on the Projects tab on your LaunchDarkly Account settings page. ### Environment Key (Required) In this text box you need to provide the key for the **environment** your metric events pertain to. You can find it under Environments on the Projects tab on your LaunchDarkly Account settings page. ## Authorization ![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAuUAAACDCAYAAAAqCGWQAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AABUtSURBVHic7d17cFRlnsbxp9OdNLk3uRIISSABhkDIoMRwWS4RcIRBQATB8YLrouWKVWs5VSg7IzBbs7Wu5U7V6q7FzGJhMeIYhhGMGBAFgnInXEIIkCCEYEK4hnTu6eR07x9MjkTCACN4DPl+qlJJ3tPnfX+nwx9PXn7nxHbZXePzer2SJK/XK5/PJ8MwZBiGvIYhw2uopaVVLR6PDMNQU3OzMu69RwAAAABuDz+rCwAAAAC6OkI5AAAAYDFCOQAAAGAxQjkAAABgMUI5AAAAYDFCOQAAAGAxQjkAAABgMUI5AAAAYDFCOQAAAGAxQjkAAABgMUI5AAAAYDFCOQAAAGAxQjkAAABgMUI5AAAAYDFCOQAAAGAxQjkAAABgMUI5AAAAYDFC+S3Kzc3VggULdODAge8919mzZ7VgwQJt2rTpNlT2rRUrVuidd965rXO2+eabb/Taa6+puLj4jswPAADQFRHKb9GhQ4ckSQcPHrzlc//whz9o0aJFt7uka4SFhcnlct2Wub5bc0BAgLp37y6n03lb5gcAAIDksLqAW3Xq1CkdP35cEydO7PD4559/rn79+ikpKem2r3369GlVVVUpKipKJSUlamxsVGBg4G1f5+/l9Xrl5+en6dOn37E1YmNj9fLLL9+x+QEAALqiThfKN23apOLiYhmGoQcffLDdsfXr12vLli365ptv9Mwzz9z2tQsKCmS32zVt2jS9++67Onz4sDIyMszjixYtUkJCgubNmydJ2rJli9avX6+XXnpJS5cuVVNTkyRpwYIFeuGFF9StWzdJktvt1rJly1RWVqaYmBjNmTNH0dHRkqSGhgbl5OToyJEjkqRBgwZp2rRp6tatm86fP68333xTQ4cOVUVFhVwul+bNm6e33npL9fX1WrhwofLz87Vq1aprrmX69OkaOXKk6urq9Mknn+jYsWOSpP79++vhhx9WUFCQFi1adE3NjY2NWr58uR599FENGzZMklRZWamcnBydPn1aQUFBGj58uMaPHy9J2rFjh9auXaupU6dq586dcrvd6tevn+bMmcNuOwAAwF91uvaVJ598UklJSdq8ebM2bNhgjrcF8qSkJD3xxBN3ZO1Dhw4pJSVFAwYMUFhY2C21sDz22GOKi4uTv7+/5s6dq5iYGPPYnj17FB0drYyMDJWXl2vdunXmsffee08FBQUaPny4MjIydODAAWVnZ7eb+8CBA4qMjNTAgQOvWTchIUHTp0/X9OnTNW3aNAUFBclut6tPnz6SpJUrV6qoqEjjx4/XuHHjVFhYqI8//viGNbepq6vT0qVLdfHiRU2YMEG9e/fWZ599pq+++qrd6zZv3qwhQ4YoJSVFRUVF2rVr102/dwAAAHe7TrdTHhAQoHnz5mnZsmXavHmzpCttG3l5eUpKStK8efMUEBBw29ctLS2V2+3WhAkTJEmpqanas2eP6uvrFRwcfMPzBw4cqK+++kqXL1/WoEGDJEk1NTWSpNGjR+vnP/+5pCstMuXl5ZKksrIynTp1ShMnTjTbdfz8/LR161adO3dONptNkjRs2DA9+uijHa4bExNjhult27apoaFBkydPVlxcnCRpzJgxCgoKUmJioiSppKREJ0+evG7N37Vnzx41NjZq7ty56tu3ryTp7bff1ubNmzV69GjzdTNnzlRqaqpaWlr061//2rxGAAAAdMKdcunbYN62Y36nA7n07Y2dsbGxunTpkuLj4+X1es0bP7+PtjaWtq+bm5slSWfOnJEkc1dbklJSUiRJFRUV5lh4ePgN1zh//rxyc3OVlJSksWPHmuM9evTQrl279Prrr+u1115TaWmpuf7NqKyslM1ma9fDn5ycrPr6elVXV5tjQUFBkiR/f3/Z7XZ5PJ6bXgMAAOBu1+l2yttcvWMu6Y4Gcq/Xq8LCQkm65lGDBQUFGjFixB1Z1+G48uNp2xFvq+XqzzfD6/Xqww8/lN1u15w5c9rNt3z5cjU2NmrGjBmKiorS6tWrVVlZedNz2+32dvNJks/na/cZAAAAf1unDeXSt8G87es75cSJE6qrq9OIESPMnWpJ2rt3r4qLi1VTU6OwsDA5nU7zxkhJ1+w422y2WwqqsbGxkq60ziQnJ0u68vQZ6coO98364osvVF5erlmzZikiIsIcr6ur09mzZzVy5EizH90wjFuquUePHvJ6vSorKzN39EtLSxUYGHhTO/gAAADo5KFcurNhvE1b68q4cePUvXt3c9xms+nYsWMqKCjQ6NGjFR8fryNHjmjdunVqbW3V7t27280TGhqq5uZm5ebmKjMz84brJiQkKDk5WZs2bZLH45HP59O2bdvUt29fxcfH6/z58zeco7y8XJs3b1ZYWJgMwzBvsIyLi1Pv3r0VHBysgwcPyuVyqaKiQqdPn273nt6o5szMTOXl5emDDz7QyJEjzTkmTpwoP79O2R0FAADwgyM13YBhGDp8+LB69OjRLpBLVx4f6HA4VFBQIEmaMmWKEhIStGvXLpWXl18TYseOHauoqCjt3LnTvMnzRh5//HENHjxYO3fu1J49e5Senq65c+fedP2lpaXyer2qqanRRx99ZH4UFBTIz89PTz31lMLCwrRp0ya1trZq8ODB8ng8qq+vv6maAwMDNW/ePHXv3l0bN25UWVmZHnjgAfOGWAAAANyY7bK7xnd1n7LP55NhGDIMQ17DkOE11NLSqhaPR4ZhqKm5WRn33mNx2QAAAMDdg51yAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBiDqsLQNfl9Xp14mSpzlSeVWtrq6IiIzVgQD91czqtLg0AAOAHRSiHJXbvzdeqP6/RpaoqhYQEy+FwyO2ukb+/v342cbymPTRZDgf/PAEAQNdA6unilixZohdeeEExMTE/2Jp/WZOjdbkbNG7saE362QTFREdLkuobGrR9+y6tyVmnkuNf65cvvaiAgIAfrC4AAACrEMpv4J133lFcXJwefvhhq0u5ri1btmj9+vUdHnvllVcUGRn5A1d0fTt27da63A169p/mauTwTO3bf0Bv/u5tNTQ2aOTwTM2eNUODB6fqP974L723YqWem/ePVpcMAABwxxHK/4bLly/r3LlzunDhgqZOnSq73W51SR3KyspSVlaWJGnZsmUaMmSI7rvvPourulZra6tW/XmNJo7P0sjhmao8e1bv/P5djRyRqV4947T243UKCwvVlMkP6tlnntbv/vt/NGF8lvr2SbK6dAAAgDuKUP437N+/X2lpaTp37pyOHTumQYMGmceqqqq0atUqlZeXKyYmRo888oh69eolSdq6dau2bt0qr9erYcOGacqUKZKkuro6rVq1SqWlpYqLi9Ps2bPNXezdu3fr888/V0tLi9LS0jRjxgz5+fmptrZW2dnZKisrk8vl0owZM9SnT59buo6ysjKtWbNGly5dUmJiombOnCmXy9XuNT6fT++9955CQ0M1c+ZMeb1effrpp9q7d69CQkL00EMPaeDAgZKutLzcf//92r59uzwej6ZOnaqhQ4fesI6jxSVy19Ro8qQHJEknTpQquW8fPTP3CdlsNl28eElHjhZryuQHNSRtkHr3jtfuPfmEcgAAcNfjkYh/w4EDB5SWlqYhQ4Zo//797Y6tXLlSycnJWrJkiUaPHq0VK1ZIkoqLi7V9+3bNnz9fCxYs0KlTp5Sfny9Jys7OVkxMjBYvXqy0tDRlZ2dLktxutz799FPNnz9fr776qs6cOWOe89lnnykmJka/+c1vlJWVpQ8++OCWrqGxsVHLly9XVlaWFi9erMTERLPWq+Xk5Mjj8ZhtOl9++aUqKiq0cOFCzZ49W9nZ2WpsbDRfX1FRoZdfflnTpk1TTk7OTdVSXl6h6KgoucLDJUn/MGqE/vWVX8pms6m62q3CoiMa+JP+5uv7JfdVecWZW7peAACAzohQfh3l5eWqra1VSkqK0tLSdPToUTOUut1uVVZW6v7775fD4dDQoUM1ZswYtbS0qLCwUPfdd58iIyMVFBSk6dOnKzw8XPX19Tp+/LgeeOABORwOjRo1SuXl5WpqapLdbpfP59P58+fldDr13HPPaciQIZIkf39/ud1u1dXV6Z577tGLL754S9dx7NgxRUdHKz09XQ6HQ+PHjzfbctrs3r1bxcXFeuqpp8wWnX379ikrK0uBgYFKTExUr169dPLkSfOc8ePHy+l0KjU1VfX19fJ4PDespaWlVQEB/teM19TW6t/+/T/Vq2ecpkx+0BwPCAhQS0vLLV0vAABAZ0T7ynXs379fqampstvtcrlciouL06FDh5SZmamGhgYFBwe36zEfNWqUJKm+vl4JCQnmeHx8vCSpsrJSPp9Pv/3tb9utU1tbq+joaM2dO1dffPGF/vSnPyk9Pd1seZk8ebI2bNigt956S2FhYZo0aZLC/7rTfDPq6urUvXt383s/Pz+Fh4erurpasbGxkqRdu3YpKCio3Xk1NTV6//33ZbPZJEmGYZi/KFzNz8/PPH4jEd1dunjxkgzDaPfe7dq1V03NTXr+2WfM9STp7LnziriqdgAAgLsVobwDXq9XBw8eVF1dnfbt22eO2+12ZWZmKiwsTPX19e3C5cmTJ5WQkKDw8HDV1NSY51RXV8vj8Sg0NFQOh0NLliwxg2yburo6RURE6Pnnn1dDQ4P++Mc/aufOnRozZowuXLigSZMm6aGHHlJRUZHef/99LV68+Jo5rsflcqmwsLDdtbndboWEhJhjTz75pPLz85WTk6PZs2dLkkJDQzVr1iwlJibe+ht4HYNSB6rZ49G+/Qd1X8a95njqwAHq2TOu3eMP3e4aHTl6VE89/thtWx8AAODHivaVDpSUlMhms+n111/XG2+8oTfeeEO/+tWvdPr0aVVVVSk4OFgJCQnKy8uT1+vVkSNH9OGHH8rPz09paWnau3evqqqq1NTUpOzsbJ08eVIhISFKSEjQhg0bZBiGzp8/rzVr1sjn8+nMmTNaunSp3G63nE6nunXrptbWVknSmjVrtHXrVvl8PoWFham1tVU+n++mr6V///66ePGiDh8+LMMwtHXrVoWEhJg3pUpSRESEZsyYoZKSEhUVFUmS0tPTtXHjRjU0NKihoUGffPKJ3G7393pfu3d3adTI4fpw1V/kdn/7i8vuvfu0+qO15vc+n0/LV6xUWFiYhmdmfK81AQAAOgP7qwsXLmkLeVd/vvrD6/XKaxjy+XxqNQz16hlnZc133MaNG5WSkqJ+/fqZY926dVNFRYXq6+vVt29fpaSkaMeOHfr4449VWVmpp59+WiEhIYqIiJDdbtfq1auVl5en9PR0jR07VpLUr18/5efna+3atSosLNSwYcMUFxenyMhI+Xw+ZWdnKy8vT9HR0Zo0aZLsdrv69++vvLw85eTk6Pjx43rkkUfMtpOO7N+/X7GxsWbodjgcSk5OVm5urnJzc+XxePT444+b7Sp5eXnKyMiQy+VSTEyMVq9erWHDhiklJUWVlZX66KOPtG3bNsXHxystLa3dOcHBwfJ6vdq0aZOysrLk739tv/h3DfxJf23fsVtfbt+hAf37KTw8TKkDB2jc2NGSpNq6Ov1+2XIdLjqql/9lviIjI/6+HyIAAEAnYrvsrvF5vV5JV1obfD6fDMOQYRjyGoYMr6GWlla1eDwyDENNzc3KuPcei8tGZ1ZbW6f/Xfp/Ki45rtSBP1Fy3yQ5HP46U1mpgwWFCgoM1Px/flbJfW/t0Y8AAACdFaEcljlYcEi79+7TmcqzMlpbFRkZoSFpgzV61MgOn9ICAABwt+JGT1jmp+lD9NP0a5/oAgAA0NVwoycAAABgMUI5AAAAYDFCOQAAAGAxQjkAAABgMW70vI7mZo8uVVWrvqHR6lIA/IgFBwUqMsIlpzPgxi8GAOA6COUdaG726HR5paKjItQzLsbqcoAu7/iJMvVLTrS6jA5Vu2t1urxSCfFxBHMAwN+N9pUOXKqqVnRUhFzhoVaXAuBHzhUequioCF2qqra6FABAJ0Yo70B9QyOBHMBNc4WH0uoGAPheCOUAAACAxQjlAAAAgMUI5QAAAIDFCOUAAACAxQjlAAAAgMUI5QAAAIDFCOUAAACAxQjlAAAAgMUI5QAAAIDFCOUAAACAxQjlAAAAgMUcVhcAALfb6tWr1dLSYn4/a9YsORwONTU1ac2aNea40+nUjBkzJEl79+7V119/bR7LyMhQSkrKD1c0AKBLI5QDuOs0NzfL4/FcM+7z+dTU1NThOa2tre2OGYZxx+oDAOC7COUA7go1NTVqaGiQdCV8X+3cuXOy2+1qbm5uN+71enX27FlJMs9t43a7zWOxsbGy2Wx3qnQAAAjlAO4OhYWFKikp6fDYxo0bOxz3eDxav359h8eKiopUVFQkSfrFL34hp9N5ewoFAKAD3OgJoMsJDQ2Vw8GeBADgx4NQDqDLGTt2rKKioqwuAwAAE6EcAAAAsBj/fwugS7Db7ebXNptNdrvdHPN6vdfcHAoAwA+JUA6gS5g6dap5s6bT6dS4cePMxx7u2bNHJ0+etLI8AEAXRygH0CVc/UeDpkyZovz8fPORhwAAWI2ecgAAAMBihHIAAADAYrSvAOhy1q9fL6/Xa3UZAACYCOUA7goDBgxQz54978jc/v7+d2ReAADaEMoB3BWioqL4g0AAgE6LnnIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoTyDgQHBaraXWt1GQA6iWp3rYKDAq0uAwDQiRHKOxAZ4dKFi1UEcwA3VO2u1YWLVYqMcFldCgCgE+MvenbA6QxQQnycLlVV68LFKqvLASDp+Ikyq0voUHBQoBLi4+R0BlhdCgCgEyOUX4fTGaCecTFWlwEAAIAugPYVAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYoRyAAAAwGKEcgAAAMBihHIAAADAYv8PvbgynTL/D5sAAAAASUVORK5CYII=) ### Access Token (Required) In this text box you need to provide the access token to be used for authorizing the requests to the LaunchDarkly API. This can be either a personal or service token. The access token must have a role that allows the `importEventData` environment action. It is strongly recommended to use a dedicated access token with this permission. ## Metric Options ![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAw0AAAB+CAYAAABmv/0SAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AABvoSURBVHic7d1/dNT1ne/xJ8lMwjdxAhlwgASYYAYY0hlkogQbe4MFa/RCbVzBI9xKT3Gvei/bleuyXm2Xdbt2+2OX3eq23mt7r7QFhfaCGhWEqIkSa5SgRJhZGEoiTDEBpjAxGchA5kfuH0kgQDKEIobA63EOB/L9+f5+J5zzeX0/n893hjS3tHYkEgkAOjo6SCQSJOJx4okE8XicWCxGPB4n2t5OLB6jvT3KDZ5piIiIiIjI1SFloAsQEREREZHLm0KDiIiIiIgkpdAgIiIiIiJJKTSIiIiIiEhSCg0iIiIiIpKUQoOIiIiIiCSl0CAiIiIiIkkpNIiIiIiISFIKDSIiIiIikpRCg4iIiIiIJKXQICIiIiIiSSk0iIiIiIhIUgoNIiIiIiKSlEKDiIiIiIgkpdAgIiIiIiJJmQa6gEFh/wZ+/Gw1IXIpXfows0efuTr03i9Z8Uo9scwiFi+fh/PzimL7K3j6N3Xkzn+YeQXGn3GAGMHtm9n8ex8Nf4pgGpbLlBmzKL3ZgeViarzoukRERERkMFFPw4VINOLzNZ21MITft5/YBR3Hx9onHuW7z9cl3y/NwJJtxZJuvuBSAYLvruSZ/1dNQyIHT5EHhyXE9tdW8otX/EQuptaLrEtEREREBpdB19MQ2V9JxT4HpV+1c+4z7giBtyuon1DK7LzP/wm4Kd3EYZ+Xplk55HTHrWYfvkAMU7rpwoJDf+SUsPivS/68fY/XsbmqnujYUv7qodnkmADCONc8xdqtFVRPd1I6dgDqEhEREZFBZ5CFhhDe6hpqdlXTGL2Pxbc5egSHCPVvrGR1VQBTQQ6evCKsn+u5TeRNdNDo9+I9VEpOTldFPj/7Ux04JzTiO9Bz+zD1b2+g4gM/jW0mrBM83P6N23GdrOTpf6+kMQHsXMt3n/Cz6IkF2Let5EcvB3FOt9Ps82K6ZSlLRlfzxHPbyb93OYsKDSBG09YNbHi3jv0hyMpxUnxHGSX5vcSnvV4ajptwzCnuCgwAFjzFHip2VuPf1UTp2BzCW1fyo5dDOGfkEt7lpbE9i1zXLObfVYQtWNFrra6968+pK7hzMxuq6ggcjWAa4aTotrmUFliBGL41f8+qgJvSYvD93svh9izyispYMMeJBSAWpG5jOZUf7ycUM07fqxGD7NdTRERE5Ao1yIYnWSm6dxGl+SYCVatZ+UZ91zCbHoEhv5T77/28A0Mn0wQnzvQQ3p3dQ5RC+Lz7YYILR8aZ2za9tYqVFfWYnSXcPstN1sFq1qyupGmYhznfnI0jHUx5JSxaOBN796eQCOHzBcl2FeHOOTcIhHeu47mXa2m1Tef2O4rJafey4fl11LacW2v4SDORFCs221nHGWHDlgKHj4RO94wkgjQ0GnjmLKTsRivBj9azqiJALDtJrT1E9pSzck01jWlTKLltNlPSAlQ+v5INDT36XprrqPkPM66vzKZwRIT6d8vZsv/0vVq7tRVbSRnz/7OHrMZq1qzeQlMi6cchIiIiIl+QwfcoN83O7G8tgt+soqJqNSsT85nJFta90xUYvjWbnLRLc+qYyYF7shnvLi9Nt+WQ0+zD+yk4/sKJ4d/QY0M/Ne8HMBfMY8GcQgzAlRZkxWtevEdLKS3IZVsKkJWLc3IOJiAMgIHnGw+wYFpXQ39Pz7OH8W71ErYWsWDhXBwmKLHDMy/U8ccDMYqGnflRRk5GATPG2fci3YwpFYhFTy9LseApLaN4IjAthxMHVrBhp5f9d8zF0UutZwrjfW87oUwPi749D5cBFFppW7GWbe95mZ3v7jqvi7nfnofHgFhOM7uf207TwTDkGRw+EgLDTeFNRbgMsFut+IJZmBIMulgrIiIiciUafKEBzgoOq1gFWCZe2sAAQMxE/lQ3xk4v2z+dzdB9XhrNTuZPMYj6emwXaqQxAhHfep78u/U9VhiEWpMcP8WEkdnHXIzEYYLBGKbcHHK7P7XxpSx5vLTXzc3pZiBKpP2sFSejxOKAqa9JzFZyxhjwaZDQSSA9Sb1ddTUdjmHKzcfeXXqmA0eOCV+wkWCiKzSkmDF3HcuUPhQDiMaigAXn9CJse2pYsyKIw5mPY3Ih02/OwVBgEBEREbksDM7QAGcEhxqKL31gAKKxGIbDxRRjO/4d2zECjZgnzseZGcPbcyhN17+thfNYcNOo08tTwHLtRRRwAY3o7JHZmBJ+gsEIjO8RRI4GCSZg1Ehrnx9+rPtaLnB40OnjRTv37ef+xuQylj7iYvtHPvx/8LF5WzVbXPNYsujSDDMTERERkQszuJ/lptmZ/a1lLPsCAsPpczpwFRgEt2+k+lMzTpcTAzqf3nezjsKWDq1Hwhjj7djz7NjzrJhNVqwG/FlZLWUUtpEmYsEmGrunCgRrWb9qFZUN5763ych348yMUf9BDU2nVkfwfeAjhA1nQc4Z28eiXRslmmg8EAaLjVH9qTVlFDmjTMQO/pFAd69GpInAoRgmW+f8iaQSQeo2b2BzQwZFt5ax6L8vY/GXrYT9H+PvZa6GiIiIiHzxBm9PQ7c0o5dXr15KJhxT3Vi31RAyPLgnG3TPSDhdk5PiGbl4qypZ+esw0ydaieytoabRzsJlC3ClG1gyIRbYxua3zRTNdPXjGiy4Z7ipXFNL+RqDognQ+FENdS35zPtGLx+jxcPtt2yjYVMFz/2vIJ5JVmJNXmr9rVhnLKKk5+tWE2G2v74Gy1EHpgPbqGwyYZvl7pz0nDi3VtvZdd1USNWqWtb9xsTMAish3xbqIlaKb3af/7pSDAjVUf1+AyeOl+C0hPHuawWLE1vmeW+KiIiIiHwBBndPwwAxTXDhHAbGZDfOXlvFJuy3LWbxHW4sh+uo3FSJr83O7Hvmdk4UTrFTNMuDLbafmg/8hPr5BQ+WqfO5/65Csg7VsHlTDYE0N2Xfnk/RsN63t828jwfvKcFOgG3vbcEbysJ9xyIeuMt5ZmM+xSBvQgb1v6+gcneE3BllLLrV3rXu/LUaBWUsvreE3LCXzRsr8UZyKVn4AHMn9ifOWfDcdT9lUw32v7OONesracwspOy+UhyDP9KKiIiIXBGGNLe0diQSnYPPOzo6SCQSJOJx4okE8XicWCxGPB4n2t5OLB6jvT3KDZ5pA1y2fF46v6ehkaL7l1M2caCrEREREZHLkXoaREREREQkKYUGERERERFJSsOTREREREQkKfU0iIiIiIhIUgoNIiIiIiKSlEKDiIiIiIgkpdAgIiIiIiJJKTSIiIiIiEhSCg0iIiIiIpKUQoOIiIiIiCSl0CAiIiIiIkkpNIiIiIiISFIKDSIiIiIikpRCg4iIiIiIJKXQICIiIiIiSSk0iIiIiIhIUgoNIiIiIiKSlEKDiIiIiIgkpdAgIiIiIiJJKTSIiIiIiEhSCg0iQjQ60BWIiIjI5cw00AXIVSIRou7F1WzYGcJ+12MsKjQGuqLLS7iWlf/WQNHyBbi+iCh/8gg73trM5q1e9h5sJZoAMkbicE6j5PavMTM/6wsoQkRERAYLhYbexHys+v4qfCd7LjThWviPLJr2Od+yo36q9xlMv9HOoGhGx5qoe+8wo272kHMBtyLWUMWGgJ35jy7Babm4e9j0xtM89Y6JuX+zhJIRPddE8K35Mav2OFn0xPka3xECH24jMqEE54jet6h/bQUbjYU8fGtO/woLVvLM/25k+rJFFGWeXhzeupIV2xws+asSbP070iUVbXyP537+O95vGcH1xbdw352jyUyBaPMBdnzwHqt//B7Vtz/I39w9mYyBLlZEREQuCwoNfUlxUPb4AxRnX+LzHPGxpdaGa7CEhuhh6qrq8BRdWGiItrQSG+3BcZGBoZsp0UhdXZCSW3s0w8NeavdG+nmECIHaLQQtfYeGvOL5lKVcQDPf5sF9bTX+PRGKTvWkhNm9O4Dt+rLLIjBwZBvP/uvz7BpxK//zsTu51v87nl39KvVtZq5138lD/+MH3Pnhr/i3Nc/wY5ay/O7rMA90zSIiIjLgUh97/PF/6OjoOLWgo6PjjD+JRKLz73icREeCeDxBzpjRA1jyFyARZEf1Aaw33cC4M1ryMXy//SfKm93cYDdO/xwu5IZx6UQ+rWH9r3/N+g3V7GwawhinneGpEH7/l/zz20c4UVfO6hcr2NrQzugCB2xfxdPlXlqaA+yoO07uTZOwDulxukOVPP3sO3z6SQ2vb6ygsq4Jwz6F3GtSOtf97C3+sPdNXqr8jOtumsTwY/VU/m41L7z0KtV1AU5YHThGpnfWueb7rPMe5ePK13n9zWr2REYyyTGSdIBECN/ra1m1dh0VtQ0cHzaBSTYDEvWU/8tKavbVseHFbQwZEabiNxV80naEvdv9JK5zc+SlH1Decj1FeQYQpub//JC3h9zE9WNONzUjH6/l6df8hP+0l9oPDzGi0I3NHCP44au88PxaXnxjK/4jaYydmIslld6vrcd9CTdspd48hkTgM3JnnF4X+nAjlSdsjDlmJm+mG9sQ4Hg9lb9dzQvrN1G96yjG+EnkZnzChp+u5PeHwjT6PyKQXoAnt/HMa/1SEfEPnuW1Pzm56bpMIELg3fX8Zs16Xq3aSkN4OBMm2jB6fl4YWE7s5c36dIqmjulsbB/fQcWmNtxf/wrjDIgF63h1zQuse7WCau8hzOOmMM6SAu2N1L3fTO5MN7YOP+t/+DuOfKkIewYQqWPlDzbBTR7GmPu4pmv6M6aplff/78/Z1H4zSx+9h4Jjb/CTp7aQuPEeFn0tl9AHr1Jx8DruuusObsz8hI0vf0B06i0UDO/X/xoRERG5gmki9AUx4XQ5CPn9hAFi9fj3jcLltEDEz8Y1NZhnLWH5E0soMdWy7s3AqT0jgUaMWx5k+XcfZHq0hqrtIawzFvHYvYVYxs5mybK5OHr7NMIRsm95kGWPP86DnjAVL9UQ7F7X1kpW0QMsXzoXe0qI2nVr8Y8oZenyH7L0G7nUr19LTXP3xlHCKfks+OtlLF9ahtVXTsWuGADBd9ey4bCDhY/+I4/dm0/TaxuoDXftFg8TGVnK0u8toWTqbB54pAxHuoOyR5Ywe6zBFJedoN9PCOD4bvyH7DgnntlnYkxbwMNzHBiTy3j80QW4DIg1bGZVVSvTvvkYP/zeYqZHq1izqZ5Yr9fWy3251oUnzUddQ/ceQXw7QkzxOHo8GQ9T++J6/LY5LH1iOQ/e0ErVSzUEUxzMfWQJs8daKLz3MRZ/2XrutZ7VLRDbU8HabWZmP7ScHy5bQH7Ty5RvC3M2q8vFqICfhq4Oj8geP43jXLizO2usWV9Bq2sRj39/Ofe7Q1S8Vtt57/qtj2vqz65HanlrF9xYdicFGRBtgZwb7+b+RTO58ct3ct9Xx9G2dw+BOFz71bu5Y+whtrztQ3OkRURERKGhL4l6yn/0KI8+2vnnuz+vJAiYJrmxH/GzOwyxgJ96qwtnNsT21uEdVswslxWTyYqn2AkN9QQTnYczTyyiOM+CycjBcZ2F1s/ObXD2ymLHkWMCTNhmFOM46qe+pWvdMCeeAismE9Ds4+NGBzO/5sRiAkv+bGbmH8bn626SmrFNcGBJASxOSqYZ1O/dD4kmvDvCOP9TMTmGCWN8MdPGBGjY19UYT7XhnObAktZHeZNd2A/5qQ9DpKGBxtFOpljOf1n7P67DXHgrRTkGpNkomjUdw/cx9Ylerq0XsYQVj8eK/yN/Z9BoqqPuhIvp43vs0OLl4305FM90YDGZsE0vxtHc4/6dLcm17t/lx3JDCa5sE2Tamfn1u3BZe2lOZ7txjQngbYgBEfy7AuROddN5S6x45j3A/Bk2TJjImejAGg4RTpx7mD5d6DX1EN3/CYEUJ9e7O2cqmJ138NBfzsSeCpw8wIfeI5jHjycnFWA017tG07bvE/50AeWJiIjIlUlzGvrS15yGNCfuCRvYvjfMqIP1WAtKsAKhcJhoYANPf7+ic7sERC3TicCpuQrdN9uMGS6kodjNNBQjPUqkt2H7xyO0GhZOTxkwYbEYtLaGgXPfhGNkGkSOnACguaWV7c8/SV13hIyBfUo/5wZYpuDM2YhvT4isfQFGFZRy/swQIXw8hsXe4+YOs2JpD9B6su+9zpCArKnTyKnejve4E+tHXpi6EHtq/ekn4+EwrSf9lP/zE5R3LYombGRHgGH9PA8AMcLHIxjjTl+ZKcdFUa/bWnB/yUb17npi+TH8B+y47urez4Spzc/GX9fR2BqDRCuHE54LKeSirikaboOhIxl29v/66AE2/fSnvNhSyHf+2/RTk5+HD8+Atjba4kDqhZUpIiIiVxaFhgtmIr/AzoadNdQdycJ5X+fQFsMYitk5n8cWec6Z0NzPPoXzOxmmNWLGZgBnt+kzLWRFAoRjQBpAjHA4gpHbexM+3NKKYQwFMrBkWpk5bxml48/aKHG4H0VZcBbkUumroi5owTnL2o99DCwWg3A4zKlI1RKiOS2LrH7Ouo0CJoubwglV1G7fTtbuLDz350C8/vRGmQZZlkLm/u08nBf1m27CkmkQOd4jAoabCESysdvOnb5ucbnIedeHd3eUwDg3Zd1vUmr3s3FtHZb/spiHxxvQVMHTz/c2+Ker2Hgvqy7imsyWDGhroeUk9HwtUnTXm7zaMJI7/+GbXN8jeHwWaoOMDDIUGERERK56Gp70ZzAmurHv20JtmhPXiNPL8hurqfCFiAHh/dVseK/p9Bj9vqSYMJ9oJdzex/pwA96GMCQi1G+poeFaJ47enihnO3Hl1rOlqp5wAiL7K9nSMApPQXcjPkrjnt0E2yEWrOWtuhiOiXmQkoOzwExdVTVNx4H2IHVvVODrcwiPGVNKhHD49JVZC5yMaqjFO9TZ55uIzpY31Ul022ZqD8UgFqLunW1EXdNwXFBD2MDpcRB8ayO7bdPwnH3ubCdO626qu+5JrNlP5cbariFjZkzmGK3h/kW6vAIH4Y+q8bcAkSaq162icl8fn26mG/foeireCmCf6jwdIqOttJ4AEjEi4SC+j+oJxWPnzhlIySJ7WIj6vUFisTCBD+to7O6ZSnpNyZnzJzORBj70tp25fPwtPLR0Mbfm9lgYP8CH3kNk5E+mny+cFRERkSuYehr60jWnofzUAhPOux9n8QwLGPk48w1CuU5OPVe3eJh/T4h1rz3Dk7+LYYxyU/IXxee/wXYXnrS1/OInJ1j0vXk4z45xaQbhrSt58vnDkO1mzr3Ffby600rx3fMJv1zOU98PwzA7RfMWUHyqIW0miz9S/u8b2R82yPtyGXcVdFZnv3UBpa+Vs2rFZiIpVuzT5zJ/GL0PoUqzUzg1wrqfPUXkv3b1TmS7cI7ZTMQxpd+vFTXlz2XhzHJe/tWTbGw3sLlmsfAOxwX/QpomTsdl2U14mruXYVE2Su6ZT+vL5ax4IgSWPDy3lWFNAbDg9ORT/coKfhlZwgM3n+c8k+ewIFhO+c+fYG3Cgn16GfOn9zUQy8DpslO+D9zOHj0RmYXcfnsD615YQU2KDfcNduzGCU6cMyTLRnFpEQ3rn+LvN2eRN82ONSVyal3f13Qe2dOZff0r/Kz8JXa4v8n1Xb0NLdtf5WebRvPIT+6loKtXoanyRd4KjuOOv5zcjwOLiIjIlW5Ic0trRyLR2TrsfsVqIh4nnkgQj8eJxWLE43Gi7e3E4jHa26Pc4Jk2wGVfJQ5V8vTzEcoe6eMNQv0Ww7fmSWrzl3WGns9dmNqVK2m+7WFKx16Cw8vnp7mOZ//pl+ywzOT+JXdz40gzNB9g1xEzeRNHk0EbgTd+xb+u28O1d/4tj319nL6nQURERNTTIBcpESPcsIX3jzmZr3Esl79sDw89upjnfr6an/1dHQU3FjF10niGp7dT89o77NxWy46DZgrKlvKdOQoMIiIi0kmhQS5K8N1f8MzvzRTfex85miEzONimc//yyXy5+nUqP/iYTdu30BKFjGGjyXN+je88OJMbczPOfxwRERG5amh4koiIiIiIJKVnwyIiIiIikpRCg4iIiIiIJKXQICIiIiIiSSk0iIiIiIhIUnp7Uh/C4WOEj7URTyQ4ETnn27dE5AuUkTGUtrYTA13GOYYMGUJmpkF29jCGpqcNdDkiIiKXjEJDL1pbjxE+dhzDMLBmZw10OSJXvb31ASY67ANdxjkSiQTh8HEOHDjIuHFjFBxEROSKpeFJvTh2vI2MDAUGEUkuJSWFYcMs2K610tzcMtDliIiIXDIKDb2IxeNkD1dgEJH+sVgyOX48MtBliIiIXDIKDb04eaJ9oEsQkUEkJSWFjo6OgS5DRETkklFoEBERERGRpBQaREREREQkKYUGERERERFJSqFBRERERESSUmgQEREREZGkFBpERERERCQphQYREREREUlKoUFERERERJJSaBARERERkaRMA12AiMjnquMg217ZQsOJrp9Tx1D0jZlclwZE/kDVhu0E413rjEnMurMQ2xAIfvgqVfVtXStMjJl+JzPz0774+kVERC5DCg0icsWJt0eIRLp+SI2fXpGI0x6JEOlelNpO9z/jsTYip3eiPd5jPxERkaucQoOIXBHaW4N8FgESR2k7o70f4bPDQYJpEI8co73nqngbnx0KkpoS52jbGYejvTXIwcMGqaRhsQ3HGHLJL0FEROSypdAgIleEo/9RxWZ/5NwV8SA73trIjt52igT4YHOglxVxgt4qNnuB1LHcPP9rTDI+33pFREQGE02EFpGrTCrGsOFYNF1BRESk3xQaROTqkmIh/+ZbmToydaArERERGTQUGkREREREJCnNaRCRq0KqubtnIZXUFBOkppJq7loUjxNPDFRlIiIilz+FBhG58qWOoXDOTCYZncEhNT0NSuaTlwCIc7D2FaoaeplELSIiIoBCg4hcDeIH2Vb+W7YBpAzHdfutDPv4Zd5r0ncxiIiI9IfmNIiIiIiISFIKDSIiIiIikpSGJ4nI1SXxGbvffBniGpokIiLSXwoNInJFGD7xK8wacwmCwJA0rPoiOBERucopNIjIFcEYORb7yIGuQkRE5MqkOQ0iIiIiIpKUQoOIiIiIiCSl0CAiIiIiIkkpNIiIiIiISFIKDSIiIiIikpRCg4iIiIiIJKXQICIiIiIiSSk09GKokT7QJYjIIJJIJBgyZMhAlyEiInLJKDT0IiUlhVBz60CXISKDRDh8nMxMY6DLEBERuWQUGnqRdU0GkUiE5s8UHESkb4lEgpaWMME/hcjOHjbQ5YiIiFwypoEu4HJksVxDRwccO97GseNtnIicHOiSRK5qGRlD2VsfGOgyzjFkyBAyMw3GjRvD0PS0gS5HRETkklFo6ENW1jVkZV0z0GWIiIiIiAw4DU8SEREREZGkFBpERERERCQphQYREREREUlKoUFERERERJJSaBARERERkaQUGkREREREJCmFBhERERERSUqhQUREREREklJoEBERERGRpBQaREREREQkKYUGERERERFJSqFBRERERESSUmgQEREREZGkFBpERERERCQphQYREREREUlKoUFERERERJJSaBARERERkaQUGkREREREJKn/D16Pyd5+6OqXAAAAAElFTkSuQmCC) This configuration section is available only if your [Metric Type](#metric-type) is set to **Numeric Metric**. In that case, this configuration option is required to be set. ### Event property for Metric Value In this section you can specify the event property whose value will populate the LaunchDarkly `metricValue` object. > **Warning:** Since the metric value needs to be a number (e.g. `10.0`), the Tag **will fail**, if you specify an event property whose value is not a number. In order to specify the common event property of interest, Key Path notation can be used (e.g. `x-sp-contexts_com_acme_transaction_1.0.total` to select the `total` value from an entity (at array index 0)) as your metric value. ## Context Keys ### User Options ![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAxAAAAB/CAYAAAB/syTvAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AABnrSURBVHic7d1/dNTVnf/xJ8kkQ6AJkkKoCAYhFhJxJVChRAtYLFGLQCs//AHoRnG1Wku/WMuyVb66FrGVo+uW2gJm1ahYpP5AsAZBgVZSQ4VaCGKbuMQA1QBBgiRkMjPsHyQQEHGQahp5Ps7JOfD5zNz7/kxyJveVe+9nWu2o2rU/Go0SjUSIRKNEIhHC4TCRSIT6UIhwJEwoVE+/7D5IkiRJOrnFNXcBkiRJkloOA4QkSZKkmBkgJEmSJMXMACFJkiQpZgYISZIkSTEzQEiSJEmKmQFCkiRJUswMEJIkSZJiZoCQJEmSFDMDhCRJkqSYGSAkSZIkxcwAIUmSJClmBghJkiRJMTNASJIkSYqZAUKSJElSzAwQkiRJkmIWaO4C/lnV/imf/1xQytkTZ3BF78ajYdY9fgfzN2Zwxe15ZCd9TrW8W0zhsiLWl79POLET6b0HMvTC/qS3PYFGNxfyX4+u47QxP2B01ud0IZIkSWrxnIH4JxfeXMicuQtZvTVARt8css8IsLVoIb9+uJDyUKyt1FKcP43bfvocpdGGQ4lJJLdPJTmY8BlVLkmSpC+iFhEgajcv57lXy6k9+lnKX32O5ZuPfrZlq6K4cCVbAxmM+t5NXDFyOKOuvImbLs0gYdtKCtdUffqmOw8i75brye3hJJQkSZJi1wJGj1WsX7Wa1RtXsbV+AnnDMji04KaW0qX5FLxSTiCrM9nd+pP6OVe3p2w5i5esYdN71ZB8Gr3OH86ob6QfrLFqQyGLlxVTtrOWQMde5OR+h6E9kyFazuL7ZrP6S9n0TyineFs6E2+/gl5NI93uTWwoD5OcPYj+Xz50OPXcHHotK2X9xk3sOS8HXs/nnmer6DXgNPZsXM/WUAqn9f4mY77Tn7S4Up67Zw6rdwOsZs60TQy6YSrD6xYy/eG19Lj8dib2TQLCVP7lJRa/so7ynbUEvtyL/sOGk5t14BXdUzSHe56vpf+3e1G9pphNu8KknjmIMWOGkp4EUMu2oiU8+/v1bP2gnqSvNLlWSZIkfWG0gBmIVPpfPpHcHgHKXykgf2lpw0xEk/DQI5drL//8wwN71vHs44VsCvQid+wYhqbXsn7JYzz35wMVhsueI//JlWxLyWbosKFkxpVR+OR8Vu881ER483o2RdPp/7UeH61/RyVV0QCpaWmHJ73ENNLaBwjvqOT9xiVJ0UrKtiaR/e0rGfW1VCrfWMhjheWE406j/+gryOkSgLa9yB0/mv6dPnoptW8/R/6Tq9iamMmgYUPJTCxn+eP5LC4LH3pQdCvFfygnJXswg3smU7VhOYv/WHngOt5ewmPPryV8xlDGjB5OJmUUPj6f1btO8DWWJEnSP5UWMAMBJKYz9OqJ8OhjFL5SQH50DINZydMrGsLD1UPpnNgMde2upLI2QFpmDjl90qBnJ5K7llKfHAZqWftaMZXJfckbdxEZCcAZYbb+YiXrNlaRc15DG52HMnHSUDofLcqFw9RGITXxyH0KARKCwO4wB4f3cclk544i50ygT2f2VdzH4r+sZ/PFw8no2YvOrwF7UknPyiDtI33tYf1ra6lqm83Efx1N7ySgbyo1981nzWvrGdoj+2Affb+bx6ieAahN5v2357Pp75WESaN6RxXVpNL73ByyuwXI7pJC2lvVpEaP7EuSJEktWcsIEHBEiHiMx4DkM5sxPAB0zianRzHPFc5m5ls96PXVXvTul0NG+wBEy6l8Lwy7ism/s/iwp6V+sOfgvwNtk0j+uHmgQIBAHNSH6484Eaa+vuH8UZ+YSudTk2BLJVV1wCfdZCn6PtveDxM4rUfDciSgbQYZnQNsqNxKZTS7YXYkgdbBhh6DSbQJAOF66oHUrHM5e9XTrHr4PsrPzCDjzN5k988hzRs8SZIkfaG0nAABh4WI1eR8tuEh7sBLUx8Nc+hlqocoEB+AeCAujZxrb6XbhjWs3VjKptcWsnplEUOvvYnc0+sJR4DOOVwxKvuw5UkJ7U4Dtn5yDalpdIoL8/57lYRJPfTNClVRuTNMoHManT4mfIQPLm06rqtu8gPRcK2xPr99NldMPo3sP69jw9ubKH6hmJWv9WfijaPpdSK3m5UkSdI/lRawB+IIiekMvfpWbv2MZx6SOqSSGhemsnzzoWVCoa2UvxeG9p3oFIDaslUsXrKa6vRBDL88j1snjyIjspV1b26GuE6kdQjArirq26eT3u3AV3JiEu3bx5jb2veiV5cAezauZm2TvQRV61azqTZAt6xeNN2iHK5vqDS6ja0VeyA5jU5NZwA+LgzEdaJzpwDhv7976Nawtdsofy9MIC3tKEuejhRm2+uLWfz7StIG5jL6mh9w69hsEnasZV3TPRSSJElq8VrWDESjxKRPXJVzwrr0J6fHGp57bT6/3tefXh1g11vFFFcl0es75x7YsxCspaxoORu2h/lm3zTYuoH3owE6dUgFkske1JdV/1PMcw8/RmW/biRVbWL1mmqyr5/M8NNjKSKVnEsGs+7h5Tz3qzm8+y/ptK4pZ8O6Uuq/MpTcc5vMa0T3sPbFJ0nemUGgYg3LtwVI++bZpMcBJNCmTQLsLWX10lWEzx1Er8P6Sebsr/fllceKefrRAIOzUqnasJJ1tanknHc2ScAejiVA6/A21ryyjvLaanLOCFD1l3Jq4zqR2qFl/ohJkiTp6BzdfZy4NHLG58FLhazesJLltZCUlkHO6IsY/rWGgXuXXCaMh8XLilmyoJb6tp3o9c2JDB944HxSz1Fcf2Uyi18pZnXhJhJO6cbZF49haLdAzEuDAt1yuf7aZApfWc364pWEE1M57dxRXDEsh/SmMzBxSXQ7ow2lfyhka00Spw0YxZgL0xtbodf5Q+n17nI2Fa2jU68jAwQkZY0i7/LWLH5lHS8tCZOUls6gK0dx0ZmxRbXU865gQmgxhWsKefr1epI6ZDBo7HCGdo7tOiVJktQytNpRtWt/NBolGokQiUaJRCKEw2EikQj1oRDhSJhQqJ5+2X2au1Z9jD2v53PPs1vpf+3tjDqzuauRJEnSF1nL2wMhSZIkqdkYICRJkiTFzCVMkiRJkmLmDIQkSZKkmBkgJEmSJMXMACFJkiQpZgYISZIkSTEzQEiSJEmKmQFCkiRJUswMEJIkSZJiZoCQJEmSFDMDhCRJkqSYGSAkSZIkxcwAIUmSJClmBghJkiRJMTNASJIkSYqZAUKSJElSzAwQkiRJkmJmgJAkSZIUMwOEJEmSpJgZIHT86uubuwJJkiQ1k0BzF6BPo54539/D9mvb8x99Wn0O/dWw7fVlLHp1DW9u3kFNPZCQROduvfnaBZdw8YCv0OZzqEKSJEnNzxmIRv/7IWOu2cPrkSbHttRw1bhqloU+g/5CIe6btJM7/rj/8OPlexkz7gOe2fUZ9Plp1LzD7x64k3+ft4Jt7fowYvz1fP+m6/n++Es4p917LJt3J1MfWEZ5TXMXKkmSpM+DAaK5JCbwrf5xvF4Uoq7J4Xf+GGLLWa35Vvtmq+yQ+vf43YO/4Kl3T+XyqXdz+7c7UL60gP+ePYeHl77LqSN+xKypI+j27m+Z+eAytkU+uUlJkiS1bAaImO3nrUXVjJmwk37jdnLVrFreaZyZiERYNvcDRly5k4F5H3DfqoaRdCTET/N28b27P2DIuF08sqVpe604Z1ACwfUh/nxwhiPMquIoA76RSDLA3nrmz9pF7pU7GXjNLu5YHj5KXVGW3LWTyUujB9t4ZEqTmY2Pqy0G218t4Lfvnsplk2/i4tPf47cPPcWbbc4j7/rxDG6zgfxfLGJbt4v5/uQRdH53EQ8v3xFz25IkSWqZDBCx+nstdz+9nyt+lsobc5MZvK2G+5ZGgP28s3APP38nkXvnprL01gTeyv+QZ3Y2PC8aZfepbVj46Clc0+WINjOCXJhUz7KNDYP9v4f4fWUCw/vHAft588kPmU8STxR8maWTE3jr4b0sqT6eoj+htmOJVLBq1Tuccv5lXHx6AtTsJuGMwUzMu4zBA87j8jHn0XHX2/xtJyScfiFXnN+O0lUrKXcWQpIk6QvNABGreGjNfrZsibCnbSLX33UK914YB5Ewy1ZFGTQ6icy2rUjOSuLb6WFeL2kIBXFxDL4wkQ6tj7LZOT7A8K+34vU/Hrir0dY/hthydiLfaAvQiu7Dv8RDN7SmQzwkn51AZnyUrcezN+KTajuW6ncofT+FzHO6kwDQLpvLrr+cgWkANZT+6W0++NLppLcDSCDjnJ60e/8d/vbhcdQnSZKkFse7MMUqLYl7b9nPg0/v4dIHIevrrbn12iSSibKlOsqSn1WxuPGxUcj+evRYrTVoRfdBiSTfE+LNUDwbi6MMGBkk2HA2OT7K/Nkf8HL5fmA/2/e14srjKvpYtcUf+6k1NeyNb0NG8pEn6il9+gFmroSLb/kuWQeLTaFt/LvUfAi0O64iJUmS1IIYIBq1jiMxFGZPPYfG1nuj7EmE5HigOsL2Lknc9bO2sLeeR+6p5vZnAjwxLo4ObeO5+iencEvPI2YZIjGs50kP8q22e3h5TRzvvB/g6r6NbUR4ZvaH/PnsFB6/LYFgJMRPr917lAYOPD501K6OUdsn+VIb2kZ288HuejgwB3HAh8UsWv4eZ467g8t6Nbl56+5qPoi0oc2Xjq8bSZIktSwuYWqUlsDgL9fzP0/UsXXffup21TP/NyHq+gTJioe6v9Uw+c4PWVa5HxLi6Ni21YFBe3yAwefCkidreat6P+wLs+SRD1lWGWvHAb51fitefngfW/q1pk9i4/H9VH8IdUBob4S3lu/j9b1A9KPLj7qktaLszyG27tvPjjf28XLjZu0Tqa1ddzJOreXNte9w2MfGJfRkxI03MSGnQ5OD9Wxcu56aU7uT5eyDJEnSF5oBolF8AtdPa0v2/9Zw1YSd5Nywh2VtW/PADUGSgWC/ttw1aD+/nlJFv6s+4JFIa37y3QSgFedM+BK3fDnE5ElVDJy0hyWRBPqkxd71aecl0mUPDBqUcHD5EgS4bEJrkpdVMyxvN/dtimPAGVC9+8hnt+Kc77Zh0Pt7GXFlFXkLo3Ts0OTcp66tK4MG96Sm6LcsKmsSIXas5uGHnuJPTW64VF/2O54qqiVrcA6dY79sSZIktUCtdlTt2h+NRolGIkSiUSKRCOFwmEgkQn0oRDgSJhSqp192n+auVZ+3yA6WPTCDgndPZcT1/8ZlZ6VA3Q5KS6tpl9GdjkHYXbKIX835HZtPv5y7Jg+m4ydsrZAkSVLLZoDQsdVVsCx/Dk/9aTenZPVn4Nln0LldIqHdf6d0/RqKNu7mlK9dxv/LG0zn4Cc3J0mSpJbNAKEY1LP9zZW8tLKYN8ve44OaemjTgc49ejJw8EVceE6HptusJUmS9AVmgJAkSZIUMzdRS5IkSYqZAUKSJElSzAwQkiRJkmJmgJAkSZIUs0BzF/DPZGNZmF/9ppbX1oWoC330E58lCSCY2IrzshO5YVwSWT18G5UknVz8zddgY1mYCVN3GxwkfaK60H5eeb2O19aFKJjZzhAhSTqp+Fuvwa9+U0tdaD/fGdqaH17dhtR2ru6SdHRVu6Pc/2gNzy7fx69+U8uD05KbuyRJkj43jpIbvLYuBGB4kPSJUtvF8cOr2wCH3jskSTpZOFJu0Lh0yfAgKRaN7xUue5QknWwcLUuSJEmKmQFCkiRJUswMEJIkSZJiZoCQJEmSFDMDhCRJkqSYGSAkSZIkxcwAIUmSJClmBghJkiRJMTNASJIkSYqZAeIfoHpdAdPzhjGgTyaZA4YxfloBa6ubu6rPyIqp9Ok9mcLI4Ye3P5VH5gUzWBs5+tP+YSKFTO7dh8nLD/y37sXJZHbvTvfu3eme2YcBF41n6pwiKj7rOiRJkk5SBogTVLd+Nnn/+gvKut/MA0++wAsP3ExW6Szybi6g1EHs5yMll5kvLWXp4id44JaB7HvmRsb/YBEVzV2XJEnSF1CguQto2bbz/INzqRhyLy//JJcUADKY1rmStd+ex4LiCUwb2Mwlngzi25HWI4OMeMjocRYD/6UjeSN/yuzlQ5g5NKW5q5MkSfpCcQbiRFQX8WpRkAsuHcJhw9T0XCZdm0saDeuYakpYcPt4hvXLJLPfMMbfvoCSmsY2FpHX+1Kmz5lF3kUDyOzdh2GTZlNUXkLBbaMZ3CeTzAGXMvmJEuoAqGDe2EzG3z2P6ROH0ad3JgOGT6ZgfQVFD93IpTmZZPYZzOhpiw5bxlNXXsiMScMY0DuTPoNGMzl/bWN1rJ05jMwb57Ho7oYaB1zK5Kca+/uUL836AqaOHUyfzEz6XDCeqU3bqyk99HoMGEbez1ccrHXtzGFkjp3K9ImD6dP7RhbVfFwPx9BlJOOG1PHSi0UndA2SJEn6KAPEiXivgi2k0bVr8IgTXcn94TSuG5gCbGfRbXnMeDuLKQVLWfrIFLI2ziDvtkVsb3x4pISX/gDjfv4EL8ydQlbpLG787g95teskfrlgIfnXdKTonnsoKD/Uw+qlb9A1734WLniIiWlFzJhwKTNK+jFl3gss/FkuvHgHMxY1RITqFcy45scUfeVmHlqylIV3XcCOX+fxvUcPLfKpW1HAc+0mcP/8hTw0vh0r7p7F89v5dOqKmHXLLEr73snCl5eSf1MX1t79PWb9qQ7YTuHt1zL7vSFMm7+UFx6YQMqLk/n3prWsX8vuoTN54pm7GNLm0xQQpEdGBnWbS13GJACeXBzhkhv2HfPrycWuOZQkKRYuYToR4X1AkGD8MR7z1wXMXZHGpGemkftVgK5M+89SisY+xvNlI7iuIxDflZG3TCH3bIAMJl1SQOEfRjLte7lkxAPpE7lg3mRKS4H0A82edcUUrhuSAUDGxAt4rHgLY6dfx5COQNbNjH26gHlvlwJ92f7iYyxgLPk/GUHfIJB+E/ffUsSgRwpYO34aAMFzr+Pe7+fSEaDLWAb+ehYlpXDgwHGq207F9nacdf5AMroEYfQ07g++xJYUoOx55i7vwU0vN9TKBKZcU8iwJYVU5F13oJaB1/EfVw/8VF03CrZpDXv3URcBjvX90UnhyuEHfggeX1x/1PPjhyccfIwkSTo2A8SJCLQG6g4MUj9GXekGSlP60rdHk4M9+9K37Vw2lNU1DNCDhw1yg62D0C6Fdo3HAkGCwTp2N+knGN9k1iMYpDXJpB1cRxWkdRDqwgcW8JSWlMBZ4w6EhwYds7Ppek8JpVWQ0dBGsEl7wcC+Y17XMaUMYdLVjzH55mGUDMnlgvMvYMQlY8ltA3VLN1BSU8T0CzKZfuhVoq5jFtsb+4uHI+d0jlddzT5o2/rY4U4nlY8LEYYHSZKOjwHiRHylK12opKKiDr7adMhbwYo5C6g8dxIjCR71Vf5c1+YHWn+0hghA6FO0BUHqIMzhf9mPNG0rhYE/WsiqsUUULl/NKwumcunsLKY98hAjw0D7XO58cgoDm9YUbEdaPJQcf0VHUUfZ26UEu06i6z+kPX1RHBkiDA+SJB0/90CciJSBXNC/jldfXMFhH/uwpZB5Dy6iZF8KwYwzyagqoWRzk/Nvr6Vkb1fO7HGif2ePTUbPHlDyBiVNUsv2v6yjomMWGanH2dipGXSljA1/bXqwjtJNFdC1K13jgbJCZv93IRVdBjIibwoPzH+YSakrKHixlGC3rmTsLWN7XVe6pjd+pZHWPuWEZx0OKl9AwYogQ4YN/Me1qS+MK4fHM354guFBkqRPyQBxQjoy8vsTSFt+B9+7v5C1fy2ltGgRM348lw19r2NCf+CrY5nwjQrm3jGLFX+toGJjITP+/1wqvzGJsT0+sYN/TJWXTGRkeAHTZy6iZEsFpStm8+PZG+g3YSx9j3f81GMkY7+xm4KfTGXBH0ooLSthRf6Pmf4sjLx6ZMOKrEreeOQOZvxyxYH+igp5Y0uQLl9Jg6yxTOhfwdzbprLgT6VUlK1lwW2jufTnJ3DHpMhuKjdXUFFWQtELs7nxmhn8beAUpuR6C1cd3ZXD4w0PkiR9Si5hOkHB7CnkP9SOn94/g7w5ldSldKXfsJvJ/9GEAxug6cjYe/Opmzmd6WPnURnflX6XTCP/RyNOaJPwcUkZwp3z7mLGnb/gqosqoONZDLnmIe7Ky/gUjXVk7Kx8+Pks5k0ZTeku6PjVgYz82cNMafzMhS4TuPe/qvnp/dO56peV1LXPYMj4B7hrRAqQwtgH8qm7ewaz/+1SpkfSOGvoBGbedAKzBdWFTP1WIQSCpKT3Y8jYh1iYN+TAbIgkSZL+oVrtqNq1PxqNEo1EiESjRCIRwuEwkUiE+lCIcCRMKFRPv+w+zV3rZ6r3yB0AbHi+QzNXIqml8H1DknQycgmTJEmSpJgZICRJkiTFzAAhSZIkKWYGCEmSJEkxM0BIkiRJipkBQpIkSVLMDBCSJEmSYmaAkCRJkhQzA4QkSZKkmBkgJEmSJMXMANEgmNgKgKrd0WauRFJL0Phe0fjeIUnSycIA0eC87EQA7n+0xhAh6Ziqdke5/9Ea4NB7hyRJJ4tAcxfwz+KGcUm8ti7Es8v38ezyfc1djqQWIJjYihvGJTV3GZIkfa6cgWiQ1SNAwcx2fHNA0CUJko4pmNiKbw4IUjCzHVk9/DuMJOnk0mpH1a790WiUaCRCJBolEokQDoeJRCLUh0KEI2FCoXr6Zfdp7lolSZIkNTNnICRJkiTFzAAhSZIkKWYGCEmSJEkxM0BIkiRJipkBQpIkSVLMDBCSJEmSYmaAkCRJkhQzA4QkSZKkmBkgJEmSJMXMACFJkiQpZgYISZIkSTEzQEiSJEmKmQFCkiRJUswMEJIkSZJiZoCQJEmSFDMDhCRJkqSYGSAkSZIkxcwAIUmSJClm/we1kgrkjy0gngAAAABJRU5ErkJggg==) This section allows you to configure how to populate the value of the user property that uniquely identifies the context that the LaunchDarkly metric is about. #### User Value With this drop-down option you can select how to derive the value for the `user` in your `contextKeys`. The available options are: 1. **Common User ID** (default): Using this option the Tag will populate the `user` from the `user_id` property of the common event. 2. **Custom**: This option allows you to specify an alternative property of the event to be used. Selecting this option reveals the following text-box to specify which event property to use. 3. **Do not populate**: Selecting this option will not populate `user` as a context key. #### Event property for user context key ![](/assets/images/05-user-value-custom-ffce8f0a70afdfd5304fbbf846857140.png) This option is revealed if you have previously selected **Custom** as the option for the [User Value](#user-value). In this text box you can specify the Property Key from the GTM Event to use. You can use Key Path notation here if you want to denote a nested key (e.g. `x-sp-contexts_com_snowplowanalytics_snowplow_client_session_1.0.userId` to use the `userId` from the Snowplow client session entity (in array index 0)). ### Other Context Keys ![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAw0AAACoCAYAAACi2AIgAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AACAASURBVHic7d15XJV13v/xN5zD4bBv4gIqCBhu4EIuWZZmi+aSs2lp3U273TX31Ey3U/O457ZZ77znnt/M1My0aDXlWOqUMZmKImLiCqJOouCGZCoqyr4fruv8/mA4RcCFlkbU6/l48ACu73W+388BH57rfb7f74VXRVW1W5LcbrdatHxtmqbns9vtlmEYMgxDpmHIMA25XE1yNTbKMAzVNzRodMooAQAAAPh68e7qAgAAAAB8tREaAAAAAFgiNAAAAACwRGgAAAAAYInQAAAAAMASoQEAAACAJUIDAAAAAEuEBgAAAACWCA0AAAAALBEaAAAAAFgiNAAAAACwRGgAAAAAYInQAAAAAMASoQEAAACAJUIDAAAAAEuEBgAAAACWCA0AAAAALBEaAAAAAFgiNAAAAACwRGgAAAAAYInQAAAAAMASoQEAAACAJUIDAAAAAEuEBgAAAACWCA0AAAAALBEaAAAAAFiyd3UB+OL27t2rzZs36/z58woJCdHYsWN1ww03WD5m6dKlOnDggJ599tkrWlthYaHWr1+vU6dOyd/fX4mJiZo6dar8/f2v6LhXSlFRkf7yl7/o1ltv1eTJkyVJlZWV+v3vfy/DMPT4448rPDy8i6sEAAC4vJhp6Oa2b9+ut956Sy6XSykpKXI6nVqzZo1SU1M95xQVFWnBggXKysr6UmsrLCzU4sWLVVZWprFjxyo+Pl45OTl69dVXZZrmZRkjIyNDCxYs0OnTp7ukP9M0tWzZMtXU1GjOnDkEBgAA8LXETMNlUlRUpCNHjujmm29utz09PV0DBw5UbGzsZRuzvr5e69atU69evfSDH/xADodDpmnqr3/9q3bs2KFx48apd+/el228jpimKW/vtvlz9erVstvteuyxxxQcHCxJ6tWrl9auXav8/HwNHTr0itd2paWlpen48eO6/vrrvxbPBwAAoD22p3/602c6anS73W0+f/rDNE2ZhiG3260mw1B0VJ8vpeivonfeeUfZ2dkyDEMJCQmt2tatW6eMjAxVVVVp5MiRl23MgwcPau/evbrllls8YcTLy0tBQUHKzc1VYGCgTp8+rWXLlkmSDh8+rOPHjyslJUUffvihSkpKFBAQoKVLl2rTpk1yuVyKj4/39J+Zmak333xTGRkZKioq0oABA+R0OnXu3Dn9/Oc/1/nz57V+/Xrl5+dr1KhRrWorLy/X2rVrNWLECKWkpHiOh4SEqKysTBEREerTp4/cbrc2btyo5cuXKy0tTceOHVPfvn0VGBgoqXkZ1T/+8Q9J0htvvKHMzEzV19crISFBr7/+unbs2CFJ2rlzp7y9vRUXF2dZ+/bt2/X8888rJCRE0dHR2rt3r37/+9/Lbrdry5YtHfb36eeVk5OjhIQENTY2KjU1VTExMbrzzjtbBaeDBw9q6dKlWrNmjQ4cOKCePXsqLCxMGzZs0EsvvaTY2FhFRETINE0tXLhQH374ocaNG6fy8nKtWLFCq1atUlZWlkpLS5WQkCCbzfaF/70AAAB8XixPukzuvvtuxcbGatOmTUpLS/McX7dunTIzMxUbG6u77rrrso5ZUlIiSW1mE3r27ClJOn/+vIYMGaKpU6dKkkaPHq1bb73Vc55pmsrOztaYMWMUFBSk9PR0nTx5UpK0ceNGrVu3ToMHD9bEiRN14sQJvfHGG63G2bt3ryIiIjR48OA2tZWWlkqSIiIiWh0PCwvTPffc4wkZ69evV3p6ugYMGKDJkyeruLhYS5YsUWNjo+cx1dXV2rt3r8aPH6/g4GBt2rRJH3/8sSZNmuQJYTNmzFBycnKntY8bN069evXSxo0bVV9fr7S0NIWHh2vChAkd9teeiooKLV++XE6nU3fddVeri/rCwkK9/vrrCgoK0i233CLTNPXKK6+osrLS02d+fr4k6aOPPlJ9fb2SkpIkScuXL1dRUZGmT5+usWPHaseOHa3+PQEAAHQFliddJg6HQw888ICWLFmiTZs2SWq+KN+8ebNiY2P1wAMPyOFwXNYx6+vrJUl+fn6tjrd839DQoIiICA0YMEBSc7iIiYnxnOfl5aWHH35YTqdT/fr10yuvvKKTJ08qKipKWVlZGjJkiKZPny5JstvtWr16tc6cOeN5R/3qq6/W7Nmz262toaFBkuTr69th/U1NTdq2bZvi4+M1d+5cSVJ0dLReffVV7d69W+PHj29Vp5+fn/r3768lS5bo1KlTGjdunI4cOSJJio+PV2RkpEzTtKy9d+/emjFjhpYsWaIXX3xRZWVluvvuu2W329W/f/82/XVk165dcrvdstlsqqurU0hIiKftgw8+UEBAgObNmye73a64uDg999xz2r9/v6699lr16tVLBQUFmjlzpg4dOiRJntBw+vRp9evXT2PGjJEkBQcHW/4MAQAAvgyEhsuoveBwpQKD9MkFeV1dXavjLd87nU7Lx3t5eXnO+XTQKCsrU11dnQ4ePKj/+q//avWY0tJS9ejRQ5JaXSh/VsvzbQkP7SktLVVDQ4Mn1EjyLI86depUqzpb6mupt6N+O6u9d+/euuqqqzRo0CAVFBQoLi7Oc8F+Kdxut1JSUpSbm6u3335bjz76qLy8vCQ1X/hXV1frmWeeaVObJCUnJys9PV0lJSU6dOiQevfu7Qko1157rTIyMrRo0SINHjxYw4cPbxX0AAAAugKh4TL7dHCQdMUCgyTPxfvZs2dbXXi3LFtqaf+8kpKSNGHChFbHevbsqerq6k4fGxYWJumTC+UWFRUVSk9PV2JioqKjoyXJc7H9aV/07kod1d6iJVjV1dXJ7Xa3W4OVlJQUzZkzR97e3srJydG2bdt03XXXedojIiI0Z86cVo9p2QzeEhp2796t06dP66abbvKcc+uttyoxMVH79u3TgQMHtHXrVk2bNq3TW+gCAABcSexpuAJagsOVDAySlJiYKIfDoe3bt7faA7B161Z5eXlp2LBhrc5v2dDembCwMDkcDp07d04xMTGKjY1VbGys7Hb7Rf99hfDwcPXq1Ut5eXmqqanxHN+3b5+ys7MlSaGhofL19dXx48c97S1f9+lzaZvqW57bxdS+b98+ffTRRxo0aJCKi4s99bTXX0daAtn06dMVGBio9evXq7y8XFLzHaLKy8sVGhrqGd9ms3lux9qrVy/16tVLWVlZcrvdnpmOiooKpaamyjAMzZo1Sz/5yU8UGRmp3bt3X9LPAgAA4HIjNFwhDofjigYGqXlJ0ZQpU3TmzBk9//zzWr16tV544QUdOHBAY8eO9WyQbnmHe9++fRd1Aert7a2JEyfq7NmzWrx4sbZs2aLFixfr5ZdfbhVOOjN9+nQ1NDTo+eef15o1a7RixQqlpaWpX79+GjJkiLy9vXX99dfr6NGjevPNN7Vp0yatWLFC/v7+re64ZKXluWVmZuro0aOd1u5yubR27VpFRkbqnnvuUXR0tNLS0jz7Qz7bX2f8/Pw0c+ZMNTQ0aNWqVZKkyZMnyzAMvfzyy9q8ebPeeecd/elPf2oVjpKTk9XU1KTIyEjP7ykgIEB5eXlauXKlcnJytHPnTpWXlysqKuqif+YAAABXAqGhm7vuuus0Z84ceXl5aceOHaqoqNCUKVP0rW99y3NOeHi4xo8fr3PnzikvL++i+r3xxht18803q6SkROvWrVNtba3uuuuuSwpCiYmJuvfeexUQEKDt27fr2LFjGj16tO6//37P3YYmT56sSZMm6ejRo8rIyFDPnj01f/58BQQEXNQYycnJiouLU35+vufOT1a1b968WeXl5ZoyZYpsNptuu+021dTUKD09vcP+OjNixAjPHom9e/dqwIABuvvuu2Wz2bR+/XodPnxY06ZNa7WErGV24dP7Kex2ux588EFFREQoNTVVGzZsUFJSkm6//faLqgMAAOBK8aqoqnZLrZdjtHzdsq7cNE253W4ZhiHDMGQahgzTkMvVJFdjowzDUH1Dg0anjGpnCACf9c9//lPLli3TD3/4Q8/eDgAAgK8qNkIDXyKXy6WMjAzl5OQoOjqawAAAALoFlicBX6KmpiZt3bpVgYGBuuOOO7q6HAAAgIvC8iQAAAAAlphpAAAAAGCJ0AAAAADAEqEBAAAAgCVCAwAAAABLhAYAAAAAlggNAAAAACwRGgAAAABYIjQAAAAAsERoAAAAAGCJ0AAAAADAEqEBAAAAgCVCAwAAAABLhAYAAAAAlggNAAAAACwRGgAAAABYIjQAAAAAsERoAAAAAGDJ3tUFALBWWVmpQ4ePqqKiQk4/Pw2I7a/oqKiuLgsAAHyDEBqAr6jKqiq9teJt7dyVI5vNpuDgINXW1qm+vl4DYmN097w7FDcgtqvLBAAA3wCEBuAyWbp0qQYOHKhx48ZdUlt7zpWUaNFv/yCbzaZ/n/+ARg5Plt1ul9vtVuHxIr37j9X6zaLf6ZGH7lPKqJGX+6kAAAC0wp6GbsYwDL3//vv6xS9+oYULF+qNN95QRUXFF+738OHDOnz48BXr48iRI1q0aJHne8Mw9NJLL2nVqlVfaMzOGIah9PR0GYbxuftYv369fve7313Gqqw1NTXpD8+/oODgIC382VMaOniQFr/yVz3ygyf0zC//R5L048d/oIk3TNBLS17TyVOnv7TaAADANxOhoZt59913dfLkST366KN6+umnFRoaqqVLl37hfq90aPisd955R97e3po1a9YXGrMzlyM07N27V5WVlTp16tRlrKxjmR9k6cKFC/qPR+crwN9ff3tzhQ4fOabZ3/mWQkKC9Yfn/6L6+gbNu+N76te3r1a+/e6XUhcAAPjmYnlSN1JbW6vc3Fz953/+p8LDwyVJ06ZN069+9StduHBBERER+uijj/Tuu+/qwoULiomJ0Xe/+12FhoZKkp555hndeOON2rZtmxobGzVz5kyNHDlSy5cv1/79+yU1b7qdO3euTNPUmjVrlJOTo8DAQM2YMUODBw9Wdna2tm7dqscff1yGYei3v/2tZs6cqby8vDZ9dGTTpk06ceKEHnvsMXl7N+fWjsZbsmSJEhISNHHiREnSn//8Z40bN04pKSnatWuX0tPT5XK5lJSUpG9/+9ue/iSpuLhYixcvliQ9++yzuv/++xUdHa3s7Gxt3LjR87gZM2bIx8en3VqLiork7e2ta665Rnv27FF0dLSnzeVy6e9//7sKCgoUHR2tpqami2rrzM5dORp/zTiFhTX/3k6dLtadc76rMaNTNPrqFD32+JP6+ORJXTUwQdOm3qo/vfCyampqFBAQcNFjAAAAXApmGrqRM2fOKCAgwBMYJMlms2nhwoWKiIhQXV2dXnvtNU2aNEkLFy5UTEyM3njjjVZ9nDp1Sj/60Y90++2367333pMk3XHHHbrmmmt0zTXXeC72t2zZolOnTunpp5/WnDlztGLFCtXV1WnMmDHy8fHR7t27tWXLFvXs2VPDhg1rt4/2fPjhh8rKytJ9990np9PpOd7ReElJSTpw4ICk5tB08uRJDRo0SBUVFVqzZo0effRRPfXUUzp9+rR2797daqw+ffroqaeekiQ99dRTio6O1vHjx5WWlqZ7771XP/nJT1ReXq7169d3WO+ePXs0bNgwJScna+/evTJN09OWmZmpyspKLViwQFOmTFFJSclFtXXm5KlTGpgQ5/n+5//9U40ZnSJJytq6XUGBgeob3Xz3pISEOJmmqeLisxfdPwAAwKUiNHQjDQ0N8vPz67C9oKBAkZGRGj58uOx2uyZPnqyysjKdPfvJBeXkyZPl6+urIUOGqKamRo2Nje32lZubq0mTJsnPz08xMTGKjo5WYWGhJGnWrFnasGGDsrKydPvtt190/dXV1Vq1apXcbrdsNttFjTd06FCdPHlS1dXVOnz4sGJiYhQQECCbzSa3261z587J19dXDz30kJKTkzutYc+ePRozZoz69Okjp9OpqVOnKjc3t91zDcPQhx9+qKSkJEVFRcnhcOjIkSOe9ry8PE2aNEmBgYGKiYlRfHz8RbV1psnVJIePo83x99emadU/Vus/Hpsvf39/SZLD0Xxeo6v93yMAAMDlwPKkbiQgIED19fUdtldXVyssLMzzvbe3t0JCQlReXq5evXq1OrdlGU9Ha/0rKyv1t7/9TV5eXp7zWi7K+/Xrp7CwMAUEBCgyMvKi63e5XLrrrruUn5+v5cuX66GHHvL039F4gYGB6t+/v/Lz83Xs2DENGTJEkhQYGKh77rlHGzdu1FtvvaXhw4dr+vTpndZQVVXVaolRWFiYampq5HK52ixRys/Pl8PhUL9+/SRJSUlJys3NVWJioqTmn3fL0q/PsmrrTGhYqM6cbTtzsGbdBt0+4zYNTPgkgJw903xeeHhYm/MBAAAuF0JDN9KjRw/V1NSorKzMEw4Mw9CiRYs0f/58hYaGevYVSM37BCoqKhQYGHjJYwUFBel73/ueYmJi2rQdO3ZMVVVVKisrU3Fxsfr06XNRfYaFhSkxMVHx8fF67rnnlJmZqRtvvLHT8YYNG6YDBw7oxIkTuuWWWyQ1X5SHh4dr/vz5qq2t1dKlS7Vjxw5df/31ljWEhoa2uttUWVmZnE5nu3sa9uzZo/Lyci1YsMBzzMfHRw0NDfL19VVQUJBqa2vbHceqrTPJw4Zqx85sTZt6qydEud1uff/uuUq8amCrc7ft2KUePSLU+zOhEAAA4HJieVI34u/vr6uvvlpvv/22ysvLVVdXp3Xr1ik4OFjh4eG66qqrdP78eeXl5ckwDH3wwQcKDAxs9c56RxwOh8rKyjwbdocPH64NGzaotrZWtbW1Wr16tSoqKmQYhlJTUzV16lTddNNNrW6Z+tk+OmK32zV37lxlZmbqxIkTluNJze/w5+fnKzAw0LOf4/Tp03rxxRdVUVEhX19fOZ3Odse12+3y9vbWhQsXZJqmRo0apezsbJ09e1aNjY3asGGDRo5s+3cO6urqVFBQoB/96Ef63//9X89HWFiYJ5gNHTpUW7dulWmaKi8v9zyXzto6M+WWm3Tm7DmtXpPmOWaabr3w8isq+uiTfo4eLdSmzVs0/bYpF903AADA52F7+qc/faajRrfb3ebzpz9M05RpGHK73WoyDEVHXdw7zvj8EhISdOLECaWmpmrbtm3y9/fX7Nmz5XQ6ZbfbFR8fr7Vr12rt2rVqbGzUvHnzPOvfN2/erNGjRysgIECmaSojI0OTJk2Sj4+PAgMDtWnTJhUVFWnEiBGKiYlRcXGxVq1apa1bt6pv375KSkrSli1bVFpaqpkzZyoqKkpbt26Vj4+PoqOj2/TxaaWlpcrPz9d1110nqXl5kdPp1Pvvv6/Ro0crLi6u3fEkyel0Kj8/X4MHD1ZCQoIkKSIiQm63WytWrNDmzZsVGRmpqVOnttkr4eXlpaqqKqWmpmrgwIGKiYlRYGCgVq1apYyMDEVFRWnmzJltHrdnzx5VV1d7ZkJaGIah/fv3KyUlxbNs6r333tORI0fk5+enPn36qG/fvpZtnQkMDFBISLCWr3xHTUaTrhqYILvdplkzp6t37+YZhdw9e/XnFxcrOWmovvedWZ4ZCQAAgCvBq6Kq2i19Egw+/XXLnWJM05Tb7ZZhGDIMQ6ZhyDANuVxNcjU2yjAM1Tc0aHTKqC54CvgmeOmllzRt2rSLuuj+uti6bYeWvrlcfn5+Gjk8WeHh4aqtq9XBgwU68fFJTbz+Ot01d06bwAMAAHC5sacBX2mGYaiwsFCVlZXfqMAgSddde42Shg3V5i1Zyi84rEOHj8jp56eE+Dh9/9/maUBs2/0fAAAAVwIzDfhKy8zMVFZWlubNm3dJty0FAADA5UNoAAAAAGCJuycBAAAAsERoAAAAAGCJ0AAAAADAEqEBAAAAgCVCAwAAAABLhAYAAAAAlggNAAAAACwRGgAAAABYIjQAAAAAsERoAAAAAGCJ0AAAwNdIamqq0tPTu7oMAF8zhIZuxDRNLViwQBUVFZ5jaWlp+uMf/yiXy9WFlQEALqdly5bp/fffb3P8l7/8pQ4ePNgFFQH4piM0dGO5ubnKzc3VvffeKx8fn64uBwBwmYwcOVJ5eXmtjhUVFampqUmJiYldVBWAbzJ7VxeAz6ewsFDvvfeeHn74YQUHB3uO7969W+vXr5dpmho/frwmT56szMxMHT9+XPfdd5+k5tmJyspKzZ49u6vKBwBYSExMVH19vYqLi9WnTx9JUl5enpKTk2Wz2SRJWVlZ2rx5s0zT1OjRo3Xbbbe16eeDDz7QmTNnNGfOHEnyzF5Mnz5dUvuvGQDQHmYauqGSkhItXbpUs2fPVlRUlOf4iRMntHbtWj344IP64Q9/qNzcXB09elRJSUk6evSoGhsbJUkFBQUaOnRoV5UPAOiEzWZTUlJSq9mGvLw8jRgxQpJ04MAB7dy5Uz/+8Y/1xBNPKCcnR4WFhZc0RkevGQDQHkJDN/Tmm2+qsbHR825Tiz179ujqq69Wz549FRwcrKuvvlr5+fnq0aOHIiIidOjQIVVWVqqkpERXXXVVF1UPALgYI0aM8ISG06dPyzAMxcXFSZJiYmL04IMPyt/fX8HBwerfv7/OnTt3Sf139JoBAO1heVI3NHbsWEVFRWnlypV64oknFBQUJEmqqKjQ4cOHtXPnTkmS2+32zCgkJSXpwIEDqq+v18CBA9kDAQBfcXFxcaqtrVVpaan279+vESNGyMvLS5Lk7++v9PR0FRQUyDRNlZWVafDgwZfUv9VrBgB8FqGhGxo3bpxCQkJ06NAhrVixQg888IAkKTg4WDfddJMmTZrU5jHDhg3Tyy+/rIaGBl4UAKAb8PLy0vDhw7V//37l5eXpzjvv9LRt27ZNxcXFeuSRR+RwOPT666+324fNZpPb7W63zeo1AwA+i+VJ3djMmTNVVlamLVu2SJKSk5O1fft2nT17Vi6XS1u3btXhw4clSVFRUXI6ncrPz7/kd6MAAF1jxIgR2r59u9xud6s9bFVVVZIkwzB07NgxFRUVyTTNNo+PiIhQUVGRqqurVVxcrP3793varF4zAOCzmGnoxhwOh+bOnasXX3xRCQkJio+P10033aTXXntNNTU1SkhI0KhRozznDxs2TCdOnFBgYGAXVg0AuFh9+/aV3W73bIBuMWHCBC1btky//vWvFRsbq8TERDU0NLR5/KBBg7Rv3z4tWrRIERERrYJHZ68ZAPBpXhVV1W5JraYvW75uedfCNE253W4ZhiHDMGQahgzTkMvVJFdjowzDUH1Dg0an8J/NV1l6erocDoduuOGGri4FAAAA3QjLk74BTNPUhQsXtHv3biUnJ3d1OQAAAOhmWJ70DXDixAktWbJEt9xyi8LCwrq6HAAAAHQzLE8CAAAAYInlSQAAAAAsERoAAAAAWCI0AAAAALBEaAAAAABgidAAAAAAwBKhAQAAAIAlQgMAAAAAS4QGAAAAAJYIDQAAAAAsERoAAAAAWCI0AAAAALBk7+oCcGkaGhp1obRcNbV1XV0KLAT4+ykiPFS+vo6uLgUAAOALIzR0Iw0NjTpxsliRPcIV1adnV5cDC+UVVTpxslj9+/YhOAAAgG6P5UndyIXSckX2CFdoSFBXl4JOhIYEKbJHuC6Ulnd1KQAAAF8YoaEbqamtIzB0I6EhQSwjAwAAXwuEBgAAAACWCA0AAAAALBEaAAAAAFgiNAAAAACwRGgAAAAAYInQAAAAAMASoQEAAACAJUIDAAAAAEuEBgAAAACWCA0AAAAALBEa0KF9H7sU/eRZjfx5iZrMzs9vMqXoJ8/qjxtrOjwn60ijop88q+zjrnbbx//PeUU/eVbRT55V3yfP6prfnNerW2s/71MAAADAZWDv6gLw1bUyp16+di+dqzKVWdCgm4f4finjDu/no++P91OTKW082KCfpVbJ18dL88b6fSnjAwAAoDVmGtCuxia3UvfW63ujnYqLtGlFdt2XNnZ0mLdmj/bT3LF+WvL9UMVH2vTWri9vfAAAALTGTAPalZbXoIo6U7NGONUzyFvPZdToQrWpiMBPcmZOkUu/fr9K+081qU+ItxZMCZT9MzH0rew6vZBZo1PlpoZF2zVrpPOS6vD2ksIDvNVotK7t+YwaFZxpUrDTS7clO/XU1EAFOb30/zZU63cbarTjpz3UP9ymynq3kheeU1ykXZuejJAkrf5nveYvrdDf54dpfILjc/+MAAAAvimYaUC7VuTUKTrUpnFxDn1nlJ+aDGnVnnpP+6EzTZrzYpnKat36zbeDdP8Ef/0stUrGp/Y+vJVdpydXViomwq7fzQ7W+HiHfrG6utOxy2rcKq0xda7K1Bvb65T7kUtThjYvjXpvX73u/2u5Qvy89H+zg3XfBH8tz67TvMVlMt3StOTmULIpv0FS8/KmJqO53sKS5uSRkd+oiEBvjY0jMAAAAFwMZhrQxpkKU5sPNWruWD+V1ZoK9vNSTIRNb2XX6cHr/SVJr26rlcMuvfPvYerxr9mHkf19NO2PpZ5+Xsis0ehYH/31vlDZ/hVPg/289Kv3Ow4ODU1u7TjWqKSFJZ5jd4zx0yOTAiRJf9hYozEDfLT0gTBPn4m97Lr3tXJtPtSgGwf5KqGnXRn5jfr+tf5at79etyU7tbOwUWv31+vRSQHKLGjQ1GG+nscDAADAGpdNaOPvuc37B97cVaekhSVKWliijy4YOnSmSR+ebL7r0clSQwN72T2BQZJG9PORw+7l+f7jMlNj4xytLs7HDrB+d99h81KQ00sv/1uIJCk61KZnvxMsH1tz+/HzhsYMaN3nuH/NGLTMJNyW7KttRxtVVmsqs6BRM4f7aspQX63b36B9H7t0vtr0zEgAAACgc4QGtLEypzk0LH84zPPxl7uaL+JX5DQvUeoXbtORs006X/3JeqR9H7vU2OT2fN8/3KZdhY2tliztOt5oObaXlzThKoemJTv17qPhOlNpaNG6T2YmBvSwKft46z53Fjb3GRfZoNcrtQAAAvVJREFUnCymJTnV0OTWr9+vlumWbhzsq2nDndr3sUt/21mnUH9v9jIAAABcApYnoZWcIpcKSwwtnBGkCQNbX1iv2lOvd/fUa+GMQN17rb9W5tTrW38u1fyJAaptcOv5TTXy+tT5j0z0149XVuruJWX6ToqfCs40aUnWxf/NhTEDfPSDyQH6Y3qNbkh0aMJAh350c4AeXlqheYvLNGOEU6fKDL38Qa1SYnw0MbF538OwaLv6/2s51ZRhvvJ3eOnaBIdC/b21PLtOd4zxa7NhGwAAAB3j0gmtrMiuk81bur2duxx9N8WpijpTaXkNSuxt18r5YQoP8NZ/p1bp9e21+tWsIAU6P4kNd4zx0//NDtbJMkML3q7UrsJGLfpO0CXV88TNgRoZ46P/eKtCpTWmpg936tV7Q1VZ79bP3q3Sm7vqNGeMn5Y9GCbvTyWWaUnN9U/912e7t3TrvzZTT0v+cv7eBAAAwNeFV0VVtVuS3O5PlpW0fG2apuez2+2WYRgyDEOmYcgwDblcTXI1NsowDNU3NGh0yqgueArfHEeOfaSB8TFdXQYuAb8zAADwdcBMAwAAAABLhAYAAAAAlggNAAAAACwRGgAAAABYIjQAAAAAsERoAAAAAGCJ0AAAAADAEqEBAAAAgCVCQzcS4O+n8oqqri4DF6m8okoB/n5dXQYAAMAXRmjoRiLCQ1VyvpTg0A2UV1Sp5HypIsJDu7oUAACAL8ze1QXg4vn6OtS/bx9dKC1XyfnSri4HFgL8/dS/bx/5+jq6uhQAAIAvjNDQzfj6OhTVp2dXlwEAAIBvEJYnAQAAALBEaAAAAABgidAAAAAAwBKhAQAAAIAlQgMAAAAAS4QGAAAAAJYIDQAAAAAsERoAAAAAWCI0AAAAALBEaAAAAABgidAAAAAAwBKhAQAAAIAlQgMAAAAAS4QGAAAAAJYIDQAAAAAsERoAAAAAWCI0AAAAALBEaAAAAABgidAAAAAAwBKhAQAAAIAlQgMAAAAAS/8fg6zMCYhZkJsAAAAASUVORK5CYII=) #### Context Keys to Add Using this table you can specify context keys depending on your experiment's [randomization units](https://docs.launchdarkly.com/home/creating-experiments/allocation#randomization-units). ## Advanced Event Settings ![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAvgAAAB3CAYAAAB2WzTHAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AACAASURBVHic7d1xXJX13f/xl3jweGQH46RHA+ngOOoBD00wMaiBxVKbOG1Tlpluo1ZrtuWye5m/2zlrt9lmq7Zcd8250pRNrciwJMOUFRSmlCAd9XAnqczQDsqZHoHD8fcHiKioWBl5fD8fDx95zve6ru/nXNc5Pt7X9/peV10O1nmP0eLYsea/BgKB1teBQIBAUxNNgQBNTU34/X6amppobGjA3+SnoaGRoYlDEBERERGRzhfS2QWIiIiIiMiXRwFfRERERCSIKOCLiIiIiAQRBXwRERERkSCigC8iIiIiEkQU8EVEREREgogCvoiIiIhIEFHAFxEREREJIgr4IiIiIiJBRAFfRERERCSIKOCLiIiIiAQRBXwRERERkSCigC8iIiIiEkQU8EVEREREgogCvoiIiIhIEFHAFxEREREJIobOLuBiV712AU+s9+K8dSZTh5jOuqzv/cU8vGovybfPZvyAr6jAL8FZ6w6UkzN3CaW+U9cyYBtzP9PSLV9RlWfio+qdNeQXf8Quj4/Qy6JwJKUzaoQTy5f47a/517M8U9idUT+fSnLEl7ddERERkfOlgP9FBKopq6gBoHKrC9+QRM4e8YOY1UnG0Kg2X6hQzP0v8N5oObkoGzCJh25LbPfLXP3WYp7J30t4bDLpQ8007i1n05tLqKrN5t6JjnMfr/b62JXH/P8twjL2fu68tvkExmAyY4kw0UO/KBEREelkF30c8e0qIP9jO6Out7UT1nxUvZWPu/8oMmIuQNjcs4WyGgNWazg1lWW4DieSGPbld3MxMFgcpF2f/PU6wQlUU7a5Cn+vNCbdnoktBCCVqCXzydlaSNloB8nmL6cry9WTmHb1l7MtERERkS/iIg/4HsoKiyiqKGRv4xSyR9rbBEwf7jcWs3R9FYb4SBJjkvmyJ4tUlbmoMTqY8F0LhUuKKP3IS+LVbRKjr4rCl3PZ+NGn+CPsJEaemMfieedZFrzqIfmnMxkfCwSqyX/iCTYaRjH9lxlYvW4K166npGIXHr+JPoNSGTMuA7sZ2FfAk08UYPp2JpHVRWz62IPBmkBG1kRSI1sO6SEX+a/mU7L9U3wGC/akDDJvSsTa0uwpzyfvzRIqP/Nh6O0gddTNZAwyn7Pu8+Oj9IX55Ox0MGnmJBJNQG0Rz/4hF0/KNGaOtYGviqLX8iks30Ud4cRcdQOZY5KJ7Abe4md55BUfyWMc1G0qwVXrxzIgjYkTM7DV5vPknwrYGwC25jBrjoupcybhPPWukhDA58XrA8IATCSMycZ8ECzGE8ep3RoOnN7HuO94WfOqGz/geWU+s7ZPYHZ2MjVrFrDwHROZM6aRFuEm99FnKYkaRWaYi8Kte6nrHkXyTZMYP6TlW9hQTdFLL1OwbS9+cwyJV5lxbSjDdts8JjmBWhf5r+RTUtl8/GKuSmvdLyIiIiJnc5HfZGsh+ZapjIo1ULV+KYvfcNMcRduE+9hR3H7Llx/uCVRRtq0GQ38HCQMTcFj8uMs/wtu6gA9X3hLytnqwXJVBmjOcvdur8B+vPN5JTIgHt6uq+Y0DbtwHoE98AlY8FP1jKXkVjcTcMJ7xN9hprMgn59VyTkRtP+53i/BEJpJ2rZ3QfaXkrd3S3H+gmoLnl1BQaSDuhtFkXBXO3ndWsiS/uX9/ZS6Ll2+kOjyRjJEZxIVUkr88h6LPzl33GTUepbbWg6flj9frB0w4nLGY6t1UftK8BV+li10BC454G+ChKGcxueU+7NeOZvTVVjybVrFkjftEf4G9lLxdRXhiOumDzHjKC8h7twYiEhlzWwZ2Ixhi0ph6a3rLCH0bIZEkDrNjOlTK8j89S+7GcqoO+TFcbsMea8PSjbPX0E4f8c5R3HqTE3MIRA2fwNTvxJ3xqoW/opBNjXZSRyQT46+i6JV8yhta9vGrS8j9oIbwwelkJFmpKS3FEzi+pofCFUso2GsmedxExqdY8Ly3iiWvu899HEREROSSd5GP4APdbGT8aCo8v4T89UtZHJhIOhtZuaEl3P8o48KMen5SRrnHgD09DpPBjGOQhcL3y/nIm9w87cPnoqTci8ExgakTkzEDyWEeHnn10+b1Ixw4+hlYW+mmBhumShd7A1Gkx1sBLzFpk8g2ReG40gzEwcdlrKpyszfgxN5SgvXaW5k6OhLwYNqzgNx/V/NpAMwfl1BUDfZxU5mQYgaGYfE9Sf4eNzUBK3veKaHGnET2D0djDwX6+9n71EZKKzykXl119rrPwF+ZxxOP5LW+NvRruRIxIIFYYznuHXthUBSVO6rw90zEaQM+KaJoRyP2myYxPsUCJGI+tICc8g9wj7UTBRBiJun72YwfZACfmU+35+D6dw1+kxN7fBSbQoDwKByDItv9Mlu/PYW7uq3h5Te3ULTGTdHrZqKGZjAxM5VI07lqmICjnT4i+lvIw4Cpr73l+LTPcGUGU25JwwLY6908saGGTw+As1clpVs9GGLHk31LKmYgtVcdv1nual4x4KGmxo8h2knq1YmYsdPHUkpVSCiNBMOPVkRERC6k4MgKJ4X8JSwBzAMuYLjHj/vDcjwhUST2acTzmYfwqEhMxS7KK7wkDzdDrYfaegN9bDEcj4AGY9vdbcE5OIq1+ZW4a1Mx7dyL35pKQl8AM5FW+Cg/h7xln+I96qOx3g8nZUkDptbtmTAYgYAfP+CtqcFHOJFR5tb2xFtnkggQqGLLPj/UlrB4bslJn8py0NuButtn6JfKxBscmI6PohutzVdNwhw4+hsor3RR3eDD9bEPs8NJjAG8/67BE/BTs2YBs9a03VgddfU0B3xC6X68f2PLTaz+xvMIuiYih09g2tBRVLvK2FJSSNGmXJ7xGrg3O5nQc9TwhW4q6N69dfXuJhMGfPgDgOdTahqhT/82+9gQemK9kBiGDbdRtn4VC/74AY4BsTi+lUzaWU4mRERERI4LjoAPJ4X8IlIvYLgH/Lsor/CA30PBX+ZT0KbJvbUM7/BUzC1B13CWSVCWuASi8tfiqijD9HEj1uEOIkNono///BIKGhyM//7t2HuB66WnyKv5MopvxN8ERKYyaXziSVOXQntGQb37nHW36xuROJztPZXGRJzTjuElNx9tbcTtNRPnjGnzxTPhGDOFDFvbgGvC8mXcrXu4mvKdn2Lo68DR10ykM5VIZyJRL8wnp+IDXLXJJFzoGtoTAmCArmf6+RmwjbyL+x1llH7owrW9kJx3Ctl00zTuTLdeoKJEREQkWARPwIeWkH8/qZgwXcCbEf0fl+M6ZMB27XjSY3u0vv/p+y+TX1lO2aFUUntaiDD6qfx3DX6sGKA5WLdldeDou5aCwvWE1lsYFm9rfv+Qm6oaP5aUdFIHRQJe3OdRn9lqJZxdVO/1wpVmwI/7zeUUeRPIvNmOtZcB9nhojLBh69m8jqe6BlOEAXwdqPs8mQc5sXddxaZ1HjxhcTj7N3/tzL0shIe48NRBVIyt+cvoq6H6qBVzCG3uZziTc3x9j7rZuCKPmvhJzLzt+CNMQzGZDEAj/sC5ayBwAX4il/XBavRT9vEuvNdbMQO+hsYT7YdcFBS6MSVkkDY2kbQx1eQ/9QQFpWXUfDsD60V+54yIiIhcWMEV8AG6mS7woxr97CovwxMSRep1yTgvP9HioIwiVxnl5R5Sr3WQ7LRQXrqWlVc0kmD2sGWjG/9J1VlJiO9D/ht78fdKI6Ffy9thVixh4P4gn9xwB6Z/l1JU6QeTv2M3WfZPZlhkCflvLiH3aCKW/5RTVLwLw7XphGMmMS2Jwr+XkPu3JdQMjcHkcVG0qY7EO6eTGdORutvZKx4XhRuPnrRUd9swkmNMYI7DcaUB104v5iQH9uMnX/1TSe1fSt47OTzTkEpCLz+7NhdRedkY7s9O7sAHNWEOA3/VJta+FUpyuvPk8Hv5MDISi1iyOZeFS/aSaAvH73FRutmLoV86jggg4uw1mNvro4cJc4ifvR+up8A0jNQkW0eOygndHCReZaH0vTUsWVlLQoQP13vl+I//HE0GvB8VUrDdg29EEpZ6N+7PwDTA0nplSERERORMFBfOl9/NBxXe5mkfl5/cZIhNwG7ys6u8HA8mHJmTyHSG4n5zJSvfrCQ8Nua0mGyNdxIVApZBjhNPgenmYMwto3CG7aUkv4Cyow6GxZuh3oPncAdqDIkk40dTybD5KVufx9oPfFjTbiV7VPMItWnQeO68NQNHSBVF+Wsp3AlxN00kI8YAHaz7NDXlFKzJI6/Nn+IdtS2NZhKcMRhCTNjjY0+cVYZYSbstm/FD++AtL2Dt+hJqLcOYOC6JDs02D7GRfEMiVv8uit514Tnt7MeE4wfTyL4pgfADpRTmr2VjeR2WxEyyp6Q1nwycq4b2+rAmkz4sCsO/t1BU8enneLKNAceYqYwfYsWzdSMFW2qwDrSd2C/d7GT+eBKplhqKXlnOyjdcEJfJlHGX8P9ITURERDqsy8E677HjL44da/5rIBBofR0IBAg0NdEUCNDU1ITf76epqYnGhgb8TX4aGhoZmjikc6oXuVg1ePE2mTG3JHbPxoUseN1L6s9mkhnTqZWJiIjIRS74puiIfO35qcp/hsU7rQwbZsdSX8Wmf1VB3wyS+p17bREREZGzUcAX+coZsF2XSdrhAjatz6MuEE6fARlMHTOKSP0iRURE5AvSFB0RERERkSCim2xFRERERIKIAr6IiIiISBBRwBcRERERCSIK+CIiIiIiQUQBX0REREQkiCjgi4iIiIgEEQV8EREREZEgooAvIiIiIhJEFPBFRERERIKIAr6IiIiISBBRwBcRERERCSIK+CIiIiIiQUQBX0REREQkiCjgi4iIiIgEEQV8EREREZEgooAvIiIiIhJEFPBFRERERIKIAr6IfHFNjTQ2dXYRIiIiAmDo7ALkPOwr4MkXfIy/LxPbhTw1C7jJXZCPeeo0Mvqe0laew5xiG/f/NBXzBSxBLgK123lzzZsUfrCdqtpGAEIjookfkszoMenER4R2coEiIiKXJgX8jvCXs+ThjcT8chppl3d2MWfgL2fJ3CWU1wMGE5a+sTjTRjN6iPWSOcjenUWUkUDqgM449fBR9f4mfP3TcFwOBGoo+MsSvDfdz/jYC9THoRIWP1NJ8i8m4TR9WX10zP73X+DPi9+h2jyIlOt+wE39wgnFz8E929lcvJpHi0r4TvadTLm611dbmIiIiFwy2e/SEGJn/IN3ktrTj/eTEvJWLWZ5YBpTky6NsXbPjiKKiOq8gF+ykRpzS/gOsZL8vYk0nnoF5Mvso2cco38QQ8RXHO6PfPgCjz5TQuh1d/LIrbFU5y7mb4u2c4hw4m+cws9+N5qdyxfyv888SUPog9z+rR5fbYEiIiKXuK4zH5z121PfPHbs2El/b/snEAg0/7epicCxAE1NASKv+FJTzNdPoIYPC6u4bHgyttOyig/3WzksXf4Sr7/1HjsP92LgwF4Y8VG65Hesa7qGb10RCngo/MsCNoWnE9/LT/nyueTuCrDj9WXkrCnE5b2cOIcVI4C/hpKXn2PpylzWFbv4zBTDwMgwQv7zMe99UEPjZ+/xUs6LrNu8l65XDsYWHtJS424s1wwlukcIxsuiiet9gPw39hE93M5lXcBbkc+ypTnkvlHI5l1+rAO+iaUbzVN//vwmO3au46WCg3xzeC/2FVdi/FYy3+zhoWTZk+RU9iIxrhehNeVs2HMZqd86zKuPPMf/xaQysCfgK2XJvJc4lHD6PvLtKWLVc8+xKq+QrdVduMJh47IuVeQ99jwfX3kN9nCapwX9YSlV/a/BbvZTs2U1y57L4cX1m6lqsOKItRDKmfabhZ0rf8+yzQc4tKeczTWXMdR5BaGnHKeqf63i+eWrWL3+PSq9l9F/gBVTl+Zt5pTu473Vq6gIu4bEyC5n6B/8NaWsXr6MlavzKSzbR2h0HNFh/0fe44t5e5+Xva7NVBnjSYz28vbSpey0XofDAgQ8lL+Ww5KcleS/U86+Y1dgj7mMUMBb/Cy/f+sAR0tzWfpiPu9VNtA33o6l7QcIuE/v47IdrHxmMz3SErAec5M7fymlB3ewfsU/WV1YTq25F4Etq1i87EXWvVvJYetABvYyNm/vsJuCfyxl2arXKaz4DNOVA4n6RgfmfdWXs+TxF6m+6g5+c3siXd9ZxMOr9xM//ha+56jn/dfW4b48gx98L4XIfRtY+ZaHb6ZfRV8NJYiIiHxldJPtF+SvWMOqDy2MmT6XufdlYt6WS8FOfwfWbGRvlY9hP3mQh6ZnYNpaQFF1c0v1mzms9w0j+8F5zP5JIkfWrWTjvpbVDlbj7ZvJvbNnM2VQLQUbyvCdoQeDzU6kt4q9XsBbwsqXq7BlPcjc2dNIo4g171SfWPhIHeHJdzJ7etv5/T7ca5ayviGNqeMcnDRQbLDjjPVR6Wrehv/jSqoudzSPLLflc7FmeRGhN0xj9pxppBlKWLmuCkJsJAxqxOWqaV5ujwt3NwcJfcH/yVpy3qxjyO2zeei+TKwVq1hTcXyftrffDDgnziT7GivWa7KZeUsipw5q+7fnk7MplIyfzWbe/ZOIrX6Z3E3e1m3W+aOYeN9sspNMZ+m/hqJV+dQ5p/Lg3NncnuAh/9USPCF2Mu+bRkY/M0m3zCQ7xXLasajZmMPL1TYm3j+P2T/PIHRzDrlbTxw5X9VeTCPuYvasuxjWWMT6LZ6TN9CBPgh48BjTyX7wIe4fY8W9ailFxlHcO/shpqUZKM0voXlveyl5cRUu6ximz5nNXUPrWP9SUUvb2R3ZspHiI4P4wcREegIHG01cPeY2fjZ6GCmjb2Osw8/Oik+AcK6e+F3ij5RQsOVIB7YsIiIiXxYF/C/IYLuB7J+Mxh4GmB04+vrweM4UudsKxTYsHXtPA4bLHdgvr8N7EAhUU+ZqxJmWjLUbGPqmkjkukajjR+oyJ6nDIjEZTNgHRmGo8+I9UxcGAz0MjfjqAVMcmT+bRHo/A4RYsA/og8fTJkT2dJAYb8HQOtLqp/a9HFZV2pjww1Ssp31TDNiddupcH1ETgF073YQ7ErCespR/ZyllPVO5wWnBYLCQmOqASjc1AbDFO2jc4aIGqHZVEhqfQGQI7CothyHfIdlqwGB2kJpoxr1z19n32znsqnBhHpqGM8IAYTbSx96M09LYus2owYlEhhngrP1bSJxwJxOHWzFgIHKAHYvXgzdwjs4D1ZSWekm4IQNbGBginIy61orrA1fryVnogGRSY8wYTJHYv2mm7uAZj+qZhVhxXGXDbDBgiXMQGWpjyHAbJoMBq8OOxVtLXQA4VMYHH0eSmm7HbDBgHZaKvdaF+1AH9uOOT+BKJ0Miml/bMrL52dhBzVdLakrZ+glExrRc0YtIZOiVjezasfv8P4uIiIh8brpw/kUZoeZfK8ndUYMP8NU2Yh30ObbT1U9z3DyCz2fCEnaiyRKfigVgHyefkoUAgUbOyO/jiD+USCNgMOH/eC3L/1lJrR8aD3ug/1nqadpLyfsGTOYk/Gc4DTTEJmB/eSOuA3bqKk04sk6N91Dn9dJYlceTc/Ob3whAo3lYc7Dt78B+ZD3uzxz43eD4XiTgx+ut49P3n2FOcctGAo0Qd5R2r4u07rez8eM97MMUfWJuviHSSXJL22nLnrF/A4YjLtY8V8reOj8E6vg0kHjO3qEOr8+M9bIT75gvC8fwHy8+aJ1KdPzHGEoonOuk4Vy6AiEQesov3A/g9VJX7yL393PIbXm/MWAlwgf0PNtGGzlyxAfmnpw2U+3AO/x5/j/ZGftDZmccv7HWRJg5lCNHNIIvIiLyVVLA/4Jq3skhzzOMO38+CYvBT+kLD1MKQCiEgP+8g1oPTCYfdYeBlukuvn1VeMNsp42On4vPXUl1WCTpZuCTtSwv9JP503txmMHzzrM8edaBVQvJk7JxbF1Mbp6Leyc6Tpv2gikWR/+XKfrXJvzEMj7y9K2YTN0JdUxk5tTTp81ADM4Bdax/twQaHIyKBDBgMpmIGjmNad8+dRpKR6Y+tceAOcyE77APjlfhrabKF4HNeuqjHM/Sf4OL3JxSzJOzufdKE1Tn8+QL5z69gHDMYV7qDtJ6TL0H6/B/IxbTF/hUn1uYiXBzEpn/NQHHef0LEEoPswmqDnGEk88F3AVred+YygN3XEvvrsffPcTBg430OP3GFREREbmANEXnC/L9xwcB8Df68OwqoWx3Y0uoN2DpaaZmhwuP34+3chNln3YgyoVE4nCEUv52CR4/+D8rJfeFPMoOn0dRAR+enYWszPsI67dTsYWA/z9H8QXA7/fh+8xNacWnNAbOEi67momwWHGOzcT+cS5rytubdmTCEW+jZlMJvgGOdp/NbxqQQOzeQvLLPfgB765C8t6pbunXQIzTjufdIurarG93xuF9L5/SfX4I+Kh+L5+Cneee9hQaGor/iBdfOydVMfF2vJsLcR0CfNUUrlxCwcftf/oz9t9YR91RIODH562hfLMbT9PxKwihGEL91HnbmVoTEkmC00T5xkKqfMAhFwXFNTiGtHPSdPZPeOY+zkeEA4flIwrXu/EGwF/romBNCTUdOBmNGRBL6O5SNh04+f3ew3/IfXd9l3hjmzcPbGPrbhMDBkR/sXpFRETkvGgEv6MCVeQ9+mvyjr8OiWLU9HvJuC4T54o8Fs7Px9w/kYT+fahq8AJmbNeNxrHsZRb8JheLPQFrRMd2t+07k7jhlZU8+z9r8JmiSPzORNL70jxF56w1usl95NfkGkxY+tpw3nA7o69uHoU2ONLI3L6SvCfm0xjhIHmAjT61R/DB2f+HVWFOxmS6eDIvl/L+k3Ce0mwalECssZpwR0z765sTmZjlYeWrC3n4n35MfRJI+35q6xfPYHPiMLsxJNhaVzEMGsPUA7m8/PeHWXnUQJ8BqYy/+dxj3ZHORCx/X878FyZy/9TEkz6XYdAYJtXkkvvUHHICZmzDxjNxmLndbZ6x/7AkRo+uZOWyBRSFWEkYasNmOsrResBkxpEYS+ErC3jWN407v31KbSOmMGrNKnJ+vxZftz44Uicx/qrzfb7lKX0MOc/VW1lJy5pI3cu5LJjjAXMMiSPHY+nA6X6PpHRSX3qC1f/cxLBpw1pG8RvZ+fqz/C8/YdHdx6cs1VH8z9eo6JnMA0kawRcREfkqdTlY5219Jubxx2MGAoHW14FAgEBTE02BAE1NTfj9fpqammhsaMDf5KehoZGhiZ87acjFrqGcnKdcJP7yfKd7yMXqyLZ/8PCfiyDph/ziR9cSaYQje7ezi2jio3pA/T42Pr+QpVsMpP/iv5gyWAFfRETkq6SAL5+f30vVv3JYefAGpt9s1+WgS8ihshf507Nv4g6N5urhiVx1ZS+6cYSDn2yn+L1Sqhq/yU133sUtCeGdXaqIiMglRwFfPicvJc8tIP8/idw8ZTzOsz59RYLSf3az8fV1vPvhdnYdqOMIJnr2upL4bw3jppuuxfaNzi5QRETk0qSALyIiIiISRPQUHRERERGRIKKALyIiIiISRBTwRURERESCiAK+iIiIiEgQ0ZMNO+hofQO1tYc4fNjXejOyiMipunTpQliYiYiInnQ3duvsckRE5BKkgN8BR+sb2LNnH717RdDHejkhIbrwISLtCwQCeL2H2bNnH/369VXIFxGRr5ySagfU1h6id68IevY0K9yLyFmFhITQs6eZ3r0iqK091NnliIjIJUhptQMOH/ZhNod1dhkichExm8M4fNjX2WWIiMglSAG/A44dO6aRexE5LyEhIbpfR0REOoVSq4iIiIhIEFHAFxEREREJIgr4IiIiIiJBRAFfRERERCSIKOCLiIiIiAQRBXwRERERkSCigC8iIiIiEkQU8EVEREREgogCvoiIiIhIEFHAv5DqtrFi9m2MHB5HnHM4IyfPZNHb+zu7quBUsZAJqdksrersQkREREQ6lwL+BbOfFTOymVcRyx1PrOLVl55m1nWHWPqzbOa9X3/u1Zu2sTAzjtuWneGE4FztF7n616YTlzaH4qYOtvdLIevH40nq/ZWVKCIiIvK1ZOjsAoLW/rWsLu7OuL/OJSul+S37wEeZsS2N3+RsYMbVozB2boXBJTyJrDuTOrsKERERkU6nEfwLxWCkG4eo2dd2hD2cUTOXsfiupJZwX8eW56czIW0Icc7hjL1nIcW1QMVjjI0by2MV9RTNHk5c1iJ2t932Gdvr2LJ4OhOub97eyOx55Fed+WpBXdlSZk4eyRBnHEOuv405L7upB6hdQbZzLI9VnFh22+NjiZu6lP0AG2YyZGg2M2dNYLhzODPfhi3zRxKXNZM5U9MZ4ryb1UeAI+7mKUpD44gbPpLsP2xgd8uI+5b5I4m7exGrf3e8fSzT/7GNeupZfU8ccfespn7PUiYPGMLMDW2rPkN7xWOMdWazYj/AbhZlxXHb7xYxZ2rz5xueOZ2lZbspfvpuxqbGETcknQmzVrfWA7D/7YXcPW44Q5xDSM+aydKyDlxpEREREfma0Qj+hRIxmqk3L+Lu2ZO5230HU78/mpTYcIz9BnN8nHn3suncnRPOPY+u4vHL97B6/gNMnx1N3lMzWPXB9TyVNZnS77/B3yZHnzzaH99+u/v5n5O9qBs/nbeKx/vXUbxwJg/cUU/4i3NJCT+lvv2reeCux/De/CjLHo2HTYuYOffnzOv7KnMdHfh83mK2dZ3LEyuSsMfA7rehvmwLh2bOZ9l/24nusZ/8GbezsG4Kc3Meod9nG3hq1nQevPxVXsiOBqB+w1Jyfz6Lx3MeZH/+I0z/3WO8krGYrMc+YETGdIb/wcrTa2cx4qTajXyvvfaK00ssemMzIx56nFUz95O/4AHmTRmL/bp7mLFoBr33rGDOr3/DvGEjePrmcKhYyN0z1jJ45tO8mmSkImcOD/xyHtFr5jKiR4eOuFyk1q1bx7p16866zI033siNN9748A1LHwAABoBJREFUFVUkIiLyxWgE/4IJZ8TDq1j2YAre1+Yx+cbhDM+ayaLilhH9pi0sfW4bKb98iCkpdqIHjmDar7Lo+fbrFNeB0WikO4CxO8Z25vKc1t5UzIrnykn65aNMG2En2pZE1sNzGed/hb++cfo8/d2vrWCDZQoP3jeKwf2iGXzzLGZNtlKzY/dpy7arxyju+e8sUuLt9G4JwMaUO/h/P0ph8MDehFe+wl8LYpk27w5GDIzGnjKFGT92snltfuvVCOOwO3j0F6MYPHAwI27PIoVKtrkBoxGj0QgGMIa1++HP3t5i8KQZ3DFiMPb4EUybej3hfidZc+5gRLydwSPvIWtYPZXb3UA9G55bwv7vzmLWzUlE2wYz6lczGFe/lrUlGsUPducK7wr3IiJysdEI/gUVTtLkubwweRa7389n9XNP8VT2ZHY/vYq5g9249+yn+NfDifv1iTXqmwZTsx8IO8+u9rup2B9N4lVt7jI1JjF0cD0bXLuBk+8+3ePeidE+BXvX1oVJ+a8XSAGo3XLu/rp2x3jqt6crrVca6ivL2XakmDnXxzGndYF66nvHs//4tBij8cSVCaMRo+Eo9We4qfbzMHZtE/6NRrpjxtp6NcBIdyPU++uB3WzbXsfuHdkMWdlmAw2QcqAedLdE0Dse4E8dyVe4FxGRi5EC/oWy382W/UYGx0djxEj01d9j2tUpRN99Iw/8cwMz5tTT0NVO1h//wh2D2gTIrt3pecXn77b7qUe0CWjqhFFoPxAxirnLZ5DStiZjT6xdYdtXX9GZtZxUJN25mMe/3++kpu69T53bJMHq1JCvcC8iIhcrBfwLpO7dx8j+rZFH33qCUa0ZsTe9exlh71Hqe9uJDd9N5X4j0SOjW9errwdjV1pDZ4f1thPfezebt+6HgS2j9fVb2OyC2HT7aYv3sw+gfmU57qZRDG4ZxXe//Bj5YVlMu6Y7RuqoP9JmBf/5nSQYY6KxH97A/vpoomNbPx319cav33h4VyvR0UbcVfvpaUvh+OGqr69vd3qUBK+2gV7hXkRELlaag3+BhGdMJeuKfH4z/TFWF2/DXbmN4lVzeOy1elJGptC7awpZUwaz+Y+/Yt6r29i9x03xs3cz9taFuJuAruEYe8LuD4vZVlXHafH61PauKWRNsrPlT79h0dtudldtYfXcebzCOKZ+9/SHw0d/N4sRnhU88scNuPfsZttr85j5u3xqzFYIi8UZU0N+zgq27HCz5bXHeGSV+/x2QHwWU5J389dfz2TF+252V25hxa8nMPYPxad/lnYYw7sT7qlkS6mb/XXn335+whk1OQvrm//D9Kc3sK1qN9veeIzbR9/N0j1fdNtysdHIvYiIXOwU8C+UHinM+vsy7onaxl9nTGZs5gSmL6pk8H2LeeKW5hH7wXc/zeI7+lH8h8mMHD2BORut3PHwHS3z4qMZN3kc4QUPMHnWCk6/9fX0dvudT/P0JCOvzBrLyDF389T+FB5aNPeUp9C06P09Hn3mHvptmsOEG0cy+fEKkn77F2alGKHrYKb89h5it81j8rgJ/Oq5/SReN/g8d0A0WU8sZka8m4V3jWXkD37FiqZxzJ+W0rER/GFTmJq0m6d+PJnHNrVzSnCu9vNkTJnF3/6YBWseYMLokWT/cRvx/zWXrH7nXldERETk66TLwTrvseMvjh1r/msgEGh9HQgECDQ10RQI0NTUhN/vp6mpicaGBvxNfhoaGhmaOKRzqv+K7HRXMcBu6+wyROQio387RESkM2gEX0REREQkiCjgi4iIiIgEEQV8EREREZEgooAvIiIiIhJEFPBFRERERIKIAr6IiIiISBBRwBcRERERCSIK+CIiIiIiQUQBX0REREQkiCjgi4iIiIgEEQX8DujSpQuBQKCzyxCRi0ggEKBLly6dXYaIiFyCFPA7ICzMhNd7uLPLEJGLiNd7mLAwU2eXISIilyAF/A6IiOjJ/gO1HDrk1Ui+iJxVIBDg0CEv+w/UEhHRs7PLERGRS5Chswu4GHQ3dqNfv77U1h5i/4Fajh071tklicjXVJcuXQgLM9GvX1+6G7t1djkiInIJUsDvoO7GblzRt3dnlyEiIiIiclaaoiMiIiIiEkQU8EVEREREgogCvoiIiIhIEFHAFxEREREJIgr4IiIiIiJBRAFfRERERCSIKOCLiIiIiAQRBXwRERERkSCigC8iIiIiEkQU8EVEREREgsj/ByliDju321arAAAAAElFTkSuQmCC) ### LaunchDarkly event creation time In this section you can specify how to derive the creation time of the event, which populates the `creationDate` in LaunchDarkly payload. The available options are: 1. **Set to current time** (default): sets the event time to the current timestamp. 2. **Set from event**: sets the event time from a client event property. #### Event property name ![](/assets/images/07-advanced-time-from-event-32eeb82cf4abf42f8829f122c934cf6d.png) This text box is revealed if the [LaunchDarkly event creation time](#launchdarkly-event-creation-time) is set to "Set from event". Here you can specify the event property to use in order to set the event time (in Unix milliseconds). For example in the above image, the LaunchDarkly creation date will be set from the device created timestamp (`dvce_created_tstamp`) of the Snowplow event (prefixed with `x-sp-` in the client common event). ## Versioning ![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAuwAAAB7CAYAAAAv3NUuAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AABzPSURBVHic7d1/dNT1ne/xJ+E7hG9wAgx00Iw6oRllYCfIoIkl1MQa1uCa6riSrXBrvMZT7Gnale3l9lp7rddy+uN02V27p/S2nCvtgsfYQpeoQRNrqKQaKiihJsJQJivBJoQRB8hIRjLfmdw/EiSEYFEGM+DrcU7+yPf7nc/3/f1M4LzmM+/5zph3Dx/pB+jv7yeZTJJIJEgkEvQnE1iJBJaVIN7Xh5VI0NfXx7X+OYiIiIiIyCcjY7QLEBERERGRM1NgFxERERFJYwrsIiIiIiJpTIFdRERERCSNKbCLiIiIiKQxBXYRERERkTSmwC4iIiIiksYU2EVERERE0pgCu4iIiIhIGlNgFxERERFJYwrsIiIiIiJpTIFdRERERCSNKbCLiIiIiKQxBXYRERERkTSmwP5hrCAbVnyLb/2wllDy1F2xnWt55FvfYmV9V+rPGwuy4ceP8JNNHVipGG9fAz959Eds2BVLxWgiIiIi8glSYP8whgef1w5HQwT/MnSHRXuwnViGE++snNSfd6yBfbID+yU2jFSMN84cGC/TlorRREREROQTdEEF9ti+Rmp/38HI68QxOn5fS+O+VK4iG3hmerATpj04ZCW9L0Rwbwymesm/PIWnO2Gch7KvPEBVSYpeDOQUU/WPSynLS0n8FxEREZFP0AWU4CK0NjXTvKuJzvjdVN3swfxgX4zQC2tYt7kDY1YO/txCHCk6q3F1Ph6zhdbgbsI35+AErI4gu4+Bc24+7sGXPFZ3C/XPNdLyVgQry4X38+UEbnBjJjuoW7mK5kv8FNo62NblpvLhxXiTYVo21dK4cx8Ry8Qx3c/C2xfim2JArIU1K2rouLaKR+/0DozftY26TU207h8c/3NllH/Bgx2Ibl3ND5+OUXirl57t2wgetnBcVUxFRSluE9izgUce30HeXQ9TOddG25PfZW1HPmVF0PZyKwf7ssktDLD4Vi/2E/P5+/VsejnIwaQDz/VeeLWJ8NylPPhFT4pmVkRERETOxgW0wu6g8K5KyvIMOjavY80LocGV9iFhPa+M++5KXVgHYFwe+VeZWN1Bgu8ObNoXDBLFgfdv3AMbom2sf7yG5iNOCm9eSNEVMVo3raV2Z/SDYax9rQSTbgqvy8MBdL24lppXe3AWB6j4Oz/ZnU08uW4LXcnTS+BoC08+voFthxz4b1pI8XSLYMMa1v5+yKp/spNtL3eQ7S+hZIadSFsjdX8Mn/m6DrfQ/KYN3+dLmTslRugPtWzZN3g5O9ZT09BGz5RCSm/MJzu0neCxc5lEEREREfm4LqAVdmCcm9J7KuE/1tKweR1rkhWUsIX1Lw2G9XtKyRmX6pOa5Pm8mDtbCf45SvH1EYJ7IuAoJv/KgSMiO7bQGnVR8t8qKM2xwXVOYm+vYdsbIQKzB18+5JRS+ZVScjIALFoORcDMZ+7nCvGZ4HY4aAtnY4wQ2MM7mggec1L01UrKcw3Ah3l0JbXNzQRvWIQLIMPO3L+vIjDDgJidg3tqCB4IY+Ec+UnO9FF+7yL8Jlg5h9n9+A66DkQhF1pfDxKd4Kfy3gA+E5htJ/LPtURSPbUiIiIi8lddWIEdhoX2tawF7Fedr7A+wLzKS57ZQnDPbqJXHKb9EDjmewfbYSy6DoSxkjEaf/YIjUMeZ9gjRAfX+40JJvYP3s8w8BYU4tzTzJMrw3i8eXhmzKVgfg5mBhAfenaL8IGDWJn55F5+4ulykDt9GnQcJHyUgcCOjfGZg/szTbIMwIoT5wxPcoYNW+ZgNZnjMYG4FYdklMghC+NS90A7DYBhXIB/KCIiIiIXhwszhw0J7c0UndewDoDpJf8qk7a9QZpbDnMQB4WzhvRyJ+NguCm9pxxv5tA6HTjOsC5tzgiw7Js+drzeRvDPbdRvb2KLbxHVlWdu6Tn1yRqM4olzurKRjQXGXph/GiIiIiIXmwuoh32YcW5K71nO8vMd1gEw8c7KwzweZMvWTqyJXnzuE/sMnM5pkIwQOe7EnevGnevGlWXDdNpHfkWUDNNSX0d9exaFCwJUfm05VfMcRIM7CR4dfrCB87JpGMc7CXWduCt7lH0dETCn4Uxpwz6Q4cAx2cDqbKfjxA13rPdTcz94EREREfnILuxl1HHmkDvFnF/mDC95tjbajoPD6yN3yMw5C4rxba2hZeNq6PbjopPWra1Yxct4oGSEwTJMiLTQtLWd948V47VHaX2rB+xenBMY1hIDzrnz8L68gebfPAnXezC7t9PUDq6bivAYnOE2lx+XnfxrvTT+poWNa00Oek0OtzYTskjth3lFRERE5Kxc2IH9kzRhJt48g7ZgNl5f7qkTN9HPkvvi1NU30fqHelrHOcidE+CO+U6gY4TB7PjvuI/Yc3U0v7SeHcdtONxzCfxDGR6D0wI7EwupuNdi06Ymtr8QhCwX3gWVlN+Uc16eQPvcChYfhU0vb6Ox24FnthdXVwsx9MVLIiIiIp+0Me8ePtIP0N/fTzKZJJFIkEgk6E8msBIJLCtBvK8PK5Ggr6+Pa/1zRrtmOe9iRI/asE8cfDmwv46VP2vGLFtG9Reco1uaiIiIyKeMVtjlNLG2Wh7bGMZzfQHuCVFCW5sJZ+YSmK2wLiIiIvJJU2CX05jeUu4oqKfx9QZao5B9eT7ld5RTNGW0KxMRERH59FFLjIiIiIhIGrtwb+soIiIiIvIpoMAuIiIiIpLGFNhFRERERNKYAruIiIiISBpTYBcRERERSWMK7CIiIiIiaUyBXUREREQkjSmwi4iIiIikMQV2EREREZE0psAuIiIiIpLGFNhFRERERNKYAruIiIiISBpTYBcRERERSWMK7CIiIiIiaUyBXUREREQkjSmwi4iIiIikMQV2EREREZE0psD+aRSPj3YFIiIiInKWjNEuQD4Zve2v8Ex9M68F/4t3eoEMGxMv+yzXXL+A2xb4+EzmaFcoIiIiIiPRCvspLNqeWsHKTR2nbk6GqPvxCjbsslJzmmSYxp+upLY9NcN9uB52/faf+R8/eoLmI1OYF6jiG9VL+cb9X2LB1TZCm1bx7f+zhi2dWnUXERERSUcK7Kcw8M6eSWxPkK7kkM37gwQTM/FdnaI3JDKcFN5WQfEVqRnuzOKE1v+Uf/1dD7PufoR/eeBGbG8+w8//72p+vq6Zo94v8b0fVnPLJW2s+ZdfsvXQ+a5HRERERD4qtcQMY1ztY+bTDbR2l5GTM7Ct480gljeAxwCwCO+oo7ahhX19djzzAiy+2YOZDFG7spZIjknnXiiprqZ4apiWp9fTsLOTHmMa+TcGCNzgxiTMtmdqiN7yIIE8IBmh7fla6l8LETWm4f18gECJGxOLtidX0Gwvxt7eTOu74LruDipv92E/m4vZ9wyPv3CIWUse4hvFWby2+t/47f4rua3qNiYGf8dT/++XTH3kf3LnP91P/AePsfbX25lVXcDE8za7IiIiIvJRaYV9OMPDHG+M4K7wwO/JLoJ7LLyzczEAa389NS/2MOe+h/neN8tx7trAphOtMokosallLPtONcVOiO2spy7spfI7P+B7Xykg1lRLc/fppwxvqWFjl5uK5T/g4a+VYnu9hto3YoN743R2xCi499t8b1kp5huNNHedzYXE+dPvt/HO5X/LXV+YCvQSz5rBbUvu5s55BSz48m1cZ3ub3Xt7IWsGtwUKoPUlmrXKLiIiIpJWFNhPY5A720s82EoYoLuVYNKHf/rAmxH7WtpgzgIKnQaG3UuR305o776Bh4514p3jwT5ucKgMIBYlEolhXFpE5bIqipzDTpfsoqUlSv5NpbgngDHZR9l8J8GdQQYiuw13QQmeiQbGFC+eKT1Ej5zNdXQTequHz/h8DLxRMJV5X67izrnZABxt3UkoPgX3lVkAZOXnMyvjbUL71MsuIiIikk7UEjMCY7ofb18dwXdLsXYFifsCuDMALKLRHg6+9gse2Tp4cDIOM9/HYvxp45hzKqg81kDjUyvZ2DeNmfPLuHX+8GaWHqIxO85JJ7fYJ2VjvBclxgjGWpxVpE7E6e21MWnS6Q0uvTueYMXqHXwm8E/clnviorPJGg+HonHAdjZnEBEREZFPgAL7SDLc5M+waGhrIx6E/IB7cIeBaZq4bq6m+gbHqY9Jhk4bJnb4MPY5AarmB7DC21j/qxoaXQ9Snjv0qGzsE6L0HAGmDGyJHunBuiQP81yuYWwWE7Pi7IocBbKH7DjElk2v0HfNUr6x8IqT0fx4D0d7IcuusC4iIiKSTtQScwbua7zEXq1je9JHfs7J7R7fTKKvNtDSbUEyRterDTTuHXEtnMPb17N64zYiFhgTsjENsBLDDsrIId9n0raliY4YcDRI49Yw3jnecwvsTMGTl01Xawsdp5wzi2v+vpp/rPCRNWRrb2sLe7mCmXkK7CIiIiLpRIH9THLy8Y7rwczPJ2fILBkzbqVyHjT/cgUPPbqS9XsNPDkjR+ucGysoNraz+vsP8dC/1hHxBSjNG+m4uymbEqTmxw/xyM8aiPkXE5h9bnEdbMz6wnzc4Zd4qnHIJ10Tb9P4y9X89k9DXmT0tlFT24LtmgUUTT7H04qIiIhISo159/CRfoD+/n6SySSJRIJEIkF/MoGVSGBZCeJ9fViJBH19fVzrnzPaNctZi9Px7GOseOYQsyru56s3f5Yseuna+zbxqTNwTwYOtbBu1TpejPr4xnequE6BXURERCStKLBf9HoJbVrNv9fuIX6Zj6ICH1c6sxh3/Chdf25h62v/xRHnfO77+peY51Q7jIiIiEi6UWD/lOjt3M6Lz7/Ca8H9dB2NgS2bSa48rvtcCbcUz2CisrqIiIhIWlJgFxERERFJY/rQqYiIiIhIGlNgFxERERFJYwrsIiIiIiJpTIFdRERERCSNGaNdQDqKRt8j+l4viWSS92PHR7sckU+1rKzx9Pa+P9plnGbMmDFMmGAyefJExmeOG+1yRETkIqbAPkxPz3tE3zuGaZo4JmePdjkin3p7Qx1c5XGPdhmnSSaTRKPHePvtA1xxxWUK7SIict6oJWaY9471kpWlsC4iHy4jI4OJE+04P+Pg8OGjo12OiIhcxBTYh0kkk0yepLAuImfHbp/AsWOx0S5DREQuYgrsw6hnXUQ+ioyMDPr7+0e7DBERuYgpsIuIiIiIpDEFdhERERGRNKbALiIiIiKSxhTYRURERETSmAK7iIiIiEgaU2AXEREREUljCuwiIiIiImlMgV1EREREJI0psKdYrKuNti5ryBaL8K42OqKjVpKco9hf2mjrHvpNlgPPaeexUStJREREPkWM0S7gYhLZsZbVm6L4l+ThwwBiBDeuoqbDQ8W9voGDrAhtv6ujcUc7B6NxTKcH/4IAC2c7RuXJiLyymp+86mbpsjJc5/HlW3Tran64ySDw7SoKJwzdYxF8agVruotZvqwUZ8rOGKPliZU0mBUsv9N7TnNrvbePLevraL29msWz7UCc2DvNrN24jdL7qyiamqqa5Zz1H2D701tof3/w97GXUXh7CZ8dB8T+zOa6HYQTg/vMq7nptrk4x0D4tWfYHOod3GFwWcFtlOSN++TrFxERGYFW2FMl2UHT80EcCyopyzMHtnU30bDdRvGSAL6JAFFanlzFkzvBd3sVy5bdzx1zoO2p1dTuin3Y6GfN2rmGh1ZsIJQ8u+OzpxdQPM+L4zz/Jdh9PnKTIdqGX6cVom1PDJcvP4VhHcDEPaeIor9xnfMLIbu3nMob7bQ+v4WO5ODYJXdTfkUHDS8Gsf7aAPKJSvTFiMVO/CRO7kgm6IsN2dfXx4m9Cat3yGN66UskRhxbRERkNCiwp0q0k85jDtxu+webrO6DHMx04RpcgbXaG2nYZVJyTyWlPjfOS934bqqk3Bdjx0stjEbXjJHjp3SeG/N8n8iej286hN4MckpzSaiN3TEXPl9q4zqAw1dKsdf+1w88C3a3m+yjXRz8oHgT9+UO4t2dRFJyBjkXfT1hwgfDhLvfpfeUrB3jyMGBfQfeeY++obsSvRzpDhM+eIB3e08f78DBMOGDR4j1n//6RUREPoxaYlIlObjOmjFsm2F8MMmdu4JELi3AnzP0gQbeolLmtprEADsQ3dXA+vpthMIxbFM9FP1dBWWzBoJnx9M/4heHCgg4gmx+vYMemxPvgsUsmT+N1l89RM2ugVFXP9iKv/JhFvsMIrsaqK3fRigchYlu5i6oIFDgxAAiW1ax8nUP1d8swxXbxuoVTThv9dPzahPBcBzzyiIq7i7nRO6NBhtY/1wzoXAcMyefkkCA4itNiG1j9YoGjDluIruDWHPv58Evuodcp518Xy51m1oJxvz4TQCL0Bu7iV5aRP6lA0dZh1qo+20DLR0RsLvJ/6DWMI3//hgtk+aSfWAH+yYsZPnXizH3N1H7n1to7Y6C3UX+jYuomO/CwKLlV9+lzl7Fw3d6Bmr/WPM6uEKfARDHGrqcPtaApEX84/3FSAq9++Zm6oMjvEuVCPOnFzfxp5EeFOvgj/UdI+xIEG7dTH0rMPZy5lf8LVef91e0IiIiZ6YV9lRJxLGwYQyZ0XjcggywAWAROdyD4XDgGPZQI7eYRV/04wSs/XX84okWjILFLPtfy7i7ALY98Qvq9p9MitbeZnaOL2Hx1x+gstCkfdMmth018H/5Bzx6pxdjgp+qRx+mwmdAVwNrnmjBmFfJ8m8/yP3FdnZvrGFL+AzXkQyzbWsYzxeXUv3VAJ6jTax/MTRw3r80sKamlewbqli2vJpyd5jGdbUEj594bJSOcDal91RTdaPrtKEH2mLaad0zGKysfbTtiZ5sh4kFqX28ls6ccpZ+czlVNznY9/Ra6vd/cOWE34rgvvV+qu8qwGGF2LSunvD0O6he/iD3L3DSsWkNde2nN6l8/Hkd3Jlhw4BT2l9sBpAc/JELxFjMiZOwqz1dREQuIArsKWERfiPIwQku3CfSeDJC6+5ObC43zhOznAQM24eME6P1pe1EZ5VRcYMH52QnnpLFlHujbH/5ZK+0cXkJi27x4b7UhfemItwcpCsMGAa2TAbCZebgyv7kAhZ/fSlL5rlxTHTgnlfIzMyDdHadofM6w87c8gqKrnLhyi2keLaD2IEuolgEm5rpmR0gUODGOdWF/9Yy8q0gLScCcoad/LIA/lwXTvsIb97Y85kzPU57WzsxwHprJ7uPufDNHmiHie5sYoetiMCtPlxTnbivD1DqidD2pxOroAbOebdS5nPjmmrC8QiRYyauGV5cUx24rw+weFEpntNWQ89xXgGmuHGZYYJvhD843n6Fm+wjwWF3BZK0lmEnb/4CZk8dO9qViIiInDW1xKRAx7MrWfWHGL4llXgMgDCNP11JQ9hN+TL/B/3hNgOwPqSBIhmhMxzHef3QnnITt9tJ/PVOIgzeaSZz/Mn9Y20YGRbxMw1rZmPbU8/a37TRcaiHOGD1WXjPWIQNY/zJPwubzYAkxInQ2R0jGl7Dd18/ebRlWXiOxk8+9sNej2Bn5uxcap/bQftxL+PbdhPNKSJ/sH39YNdBrHCIVf97y8nxkxaGL4o1eMW2jCEnmJDPTfO2sm7tD+n0zsQ7w0fBnCLcmXDKWngq5jXTS/ntXh57YiWPvfcAy29xwZULWXTtSlb/dAXR//4oi848qSIiIiIfmwJ7CrhuvJvFsRrWb26mc3YZrgwnRXdVEnvqSZr/EKLoDg8GBvaJ2VhvRYjAKXdEsfZvoz6YRdGCyYPtMyOn3o/TK23tq2PNhg68d1WxeJYTkxAbvr+Gno86UDIOSQPXDVUsvv7UW1Da7OZZt4XYZ80h99lNtO4OYe6K4ZrnP2UujOnlVN/pO/VDsJnZGCN+tNPE88UHeHBeiOCbQVpfXc/KF10E7q+icNitFs95XpOdNL/Yinn9YhYXTxvYdqiZhp3gv6OKMs/ZDCKjZaztxIr6WMZmGDB2LGNP/DkkEiTU1iQiImlMLTEpYNhd+Iv9TDsUon0wV5pOHyUFufT8OUjnYBhweT3Yu9to6R76aItQcwPNoQhkTMPptBF+u2PInVRidHSEsTldTPsYtUVCHfQ4fRT7nJgZQPx94h+ngyPDgXMqRN6JYp/qwDH4k23Pxp75Ecaxz2SOO06woZbWY9PwzT7Z0e9wOuBQmKj95PiOSdlkj9ReA9DdQsPzLUQdHvwl5VR+/X5KzBBNO4Y16KdiXsNBgoemUVjixzVhoJ7o3iCdWT6KCtycqURJA2MvY+6tFdy16C7uWrSQa6aa5BWf+L2Ckun6RKmIiKQ3xYxUybRhECM2JAybmQZYJ+8iYlx1E2VX/4TaJ2owy4vxTrII79pC3Rsw98sFODDILvbTsLqO9a+MZ+GsbHra6qkL2ilYenZf/mOMM7HFDtL+VphpOU7sUydj625h86tuCqa9T+ilOlpiFt7jcT7a02+SP7+QhsdrWfeCjVvn5kBXM7XPduH/2lKKxp/tOHZmzvZQ+9sgscvL8A9ZCXfMKcL70npqf+0kcLMPZ1+Izf+5md4bH6ByzghDZcboeKWBEDYCBTnYDrXQEbXhmDw8gBnkn+O8YsWIYZzS8hPvi4Pt1A8aSxpKHGB77VNsB8iYhG/hAibu3MgrXbrXuoiIXBgU2FNl6AdLh0oOXc52ULi4Gp6rZfOvV1EXA/ulHvx3VbNw1kDINHIDLF1Sx4b6dTz2bByb00PRkirKcs/yqcorovjytTQ+vorIkodZPOcOKt6uoe7Z1Wwb68BbWETxjG2E+6LwEe++blxVztIlBhueX8+qzXFsk3Px3xKgcDLwEb73yT7Lh+fpILFr/KfeMcfuZ8l9cWprN1PzWB1x04mnMEBgtgkj3aV+chGLvxxl43MbWbUlCqYTT+FiKq61w7CvMzrneU0C2E59TyphQYZxhkYbERERkdQY8+7hI/0A/f39JJNJEokEiUSC/mQCK5HAshLE+/qwEgn6+vq41j/SUufFY2+og6s87r9+4HBWkJrv1xC/42EqZw+GwP11/OjnXRR/eylFqfn+Hhklse1rWFGfTdV3FuHJALAIPrWCtX0BHq70n/8vnvoUO5t/kwe2PjXyfdiH+6gr7Gd5H/aP/f+GiIjIWdCb+alieCie56D9hVpawoOru1cWUTy9k6aNzXR+hBVoSS9WVzPrX+zANX/eB2E98kYtdbvtFBXnK6xfSJJH2P27jfyxW+0wIiJy4VBLTMoYuG5eytKpTQQPxcBpBxwU3VtN9ksthI+BS8nughQ7FGPaLdUsmXPifjZxIofHU/yVagov1z+hdDDpqs9z02XnIYSPGYdDX7IkIiKjTGkjpUxcc8s45Ts+DSe+BWWjVZCkgH12Kac+gyaekvJRqkZGYk69HPfUv36ciIjIhUgtMSIiIiIiaUyBXUREREQkjSmwi4iIiIikMQV2EREREZE0psAuIiIiIpLGFNhFRERERNKYAruIiIiISBpTYB9mvJk52iWIyAUkmUwyZsyY0S5DREQuYgrsw4zNyCByuGe0yxCRC0Q0eowJE/Q1xiIicv4osA9jvySLWCzG4SMK7SJyZslkkqNHo4TfiTB58sTRLkdERC5ixmgXkG7s9kvo74f3jvXy3rFe3o8dH+2SRD7VsrLGszfUMdplnGbMmDFMmGByxRWXMT5z3GiXIyIiFzEF9hFkZ19CdvYlo12GiIiIiIhaYkRERERE0pkCu4iIiIhIGlNgFxERERFJYwrsIiIiIiJpTIFdRERERCSNKbCLiIiIiKQxBXYRERERkTSmwC4iIiIiksYU2EVERERE0pgCu4iIiIhIGlNgFxERERFJYwrsIiIiIiJpTIFdRERERCSNKbCLiIiIiKQxBXYRERERkTSmwC4iIiIiksb+P++HVVRtwz/DAAAAAElFTkSuQmCC) The LaunchDarkly event import REST API accepts a `User-Agent` header, which helps identify the source of traffic and debug issues. One of the components to construct this header is the `Version`, which can be any format. ### Version This configuration section allows you to define the version to be used. For example, you could use the `Container Version` as the version to use. If this is not provided, the Tag will use the string `"1"` instead. As an example using the default version, the User-Agent header could be like: ```text 'User-Agent: MetricImport-Snowplow-int/1' ``` ## Logs Settings ![](/assets/images/09-logs-settings-9d0b55515c253bf74027121c7524c6a0.png) Through the Logs Settings you can control the logging behavior of the LaunchDarkly Tag. The available options are: - `Do not log`: This option allows you to completely disable logging. No logs will be generated by the Tag. - `Log to console during debug and preview`: This option enables logging only in debug and preview containers. This is the **default** option. - `Always`: This option enables logging regardless of container mode. > **Note:** Please take into consideration that the logs generated may contain event data. The logs generated by the LaunchDarkly GTM SS Tag are standardized JSON strings. The standard log properties are: ```json { "Name": "LaunchDarkly Metric Events", // the name of the tag "Type": "Message", // the type of log (one of "Message", "Request", "Response") "TraceId": "xxx", // the "trace-id" header if exists "EventName": "xxx" // the name of the event the tag fired at } ``` Depending on the type of log, additional properties are logged: | Type of log | Additional information | | ----------- | -------------------------------------------------------------- | | Message | "Message" | | Request | "RequestMethod", "RequestUrl", "RequestHeaders", "RequestBody" | | Response | "ResponseStatusCode", "ResponseHeaders", "ResponseBody" | --- # Snowplow Client for GTM Server Side > Receive Snowplow events in GTM Server Side containers with the Snowplow Client, which populates common event data and rich Snowplow properties for tags. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-client-for-gtm-ss/ To receive events in your GTM SS container, the Snowplow Client must be installed. This works for both events direct from the tracker, or enriched events from the pipeline. The Snowplow Client populates the common event data so many GTM SS tags will just work, however it also populates a set of additional properties to ensure the rich Snowplow event data is available to Tags which wish to take advantage of this, such as the Snowplow Authored Tags. ## Template Installation There are two methods to install the Snowplow Client. ### Tag Manager Gallery **Coming Soon.** The Gallery for Clients has not yet been made public. ### Manual Installation 1. Download [template.tpl](https://raw.githubusercontent.com/snowplow/snowplow-gtm-server-side-client/main/template.tpl) - Ctrl+S (Win) or Cmd+S (Mac) to save the file, or right click the link on this page and select "Save Link As..." 2. Create a new Client in the Templates section of a Google Tag Manager Server container 3. Click the More Actions menu, in the top right hand corner, and select Import 4. Import `template.tpl` downloaded in Step 1 5. Click Save ![Installing Snowplow Client](/assets/images/manualclientinstall-93c4a4f45c4decdd25c20faf1d519013.gif) ## Snowplow Client Setup With the template installed, you can now add the Snowplow Client to your GTM SS Container. 1. From the Clients tab, select "New", then select the Snowplow Client as your Client Configuration 2. Click Save ![Adding Snowplow Client to GTM SS](/assets/images/clientsetup-a7275d01ad3c7e318d302a322af2c3f5.gif) ## Testing You can test your Snowplow Client setup by using GTM SS [preview mode](https://developers.google.com/tag-platform/tag-manager/server-side/debug). Once your debug container is running, obtain your preview header value: ![Opening the “send requests manually” popup](/assets/images/preview-mode-1-965b75ae77d43b5fa941eaed2e378f5e.png) ![The “send requests manually” popup](/assets/images/preview-mode-2-8913f55547ae5b9dbf3debea2c2ac704.png) Now you can use the cURL command below. Note: - Replace `{{your-gtm-ss-url}}` with the URL of your GTM SS container. - Replace `{{your-preview-header}}` with the value obtained above. ```bash curl --request POST \ --url https://{{your-gtm-ss-url}}/com.snowplowanalytics.snowplow/enriched \ --header 'Content-Type: application/json' \ --header 'x-gtm-server-preview: {{your-preview-header}}' \ --data '{ "app_id": "example-website", "platform": "web", "etl_tstamp": "2021-11-26T00:01:25.292Z", "collector_tstamp": "2021-11-20T00:02:05Z", "dvce_created_tstamp": "2021-11-20T00:03:57.885Z", "event": "unstruct", "event_id": "c6ef3124-b53a-4b13-a233-0088f79dcbcb", "txn_id": null, "name_tracker": "sp1", "v_tracker": "js-3.1.6", "v_collector": "ssc-2.3.0-stdout$", "v_etl": "snowplow-micro-1.1.2-common-2.0.1", "user_id": "jon.doe@email.com", "user_ipaddress": "92.231.54.234", "user_fingerprint": null, "domain_userid": "de81d764-990c-4fdc-a37e-adf526909ea6", "domain_sessionidx": 3, "network_userid": "ecdff4d0-9175-40ac-a8bb-325c49733607", "geo_country": "US", "geo_region": "CA", "geo_city": "San Francisco", "geo_zipcode": "94109", "geo_latitude": 37.443604, "geo_longitude": -122.4124, "geo_location": "37.443604,-122.4124", "geo_region_name": "San Francisco", "ip_isp": "AT&T", "ip_organization": "AT&T", "ip_domain": "att.com", "ip_netspeed": "Cable/DSL", "page_url": "https://snowplowanalytics.com/use-cases/", "page_title": "Snowplow Analytics", "page_referrer": null, "page_urlscheme": "https", "page_urlhost": "snowplowanalytics.com", "page_urlport": 443, "page_urlpath": "/use-cases/", "page_urlquery": "", "page_urlfragment": "", "refr_urlscheme": null, "refr_urlhost": null, "refr_urlport": null, "refr_urlpath": null, "refr_urlquery": null, "refr_urlfragment": null, "refr_medium": null, "refr_source": null, "refr_term": null, "mkt_medium": null, "mkt_source": null, "mkt_term": null, "mkt_content": null, "mkt_campaign": null, "contexts_org_w3_performance_timing_1": [ { "navigationStart": 1415358089861, "unloadEventStart": 1415358090270, "unloadEventEnd": 1415358090287, "redirectStart": 0, "redirectEnd": 0, "fetchStart": 1415358089870, "domainLookupStart": 1415358090102, "domainLookupEnd": 1415358090102, "connectStart": 1415358090103, "connectEnd": 1415358090183, "requestStart": 1415358090183, "responseStart": 1415358090265, "responseEnd": 1415358090265, "domLoading": 1415358090270, "domInteractive": 1415358090886, "domContentLoadedEventStart": 1415358090968, "domContentLoadedEventEnd": 1415358091309, "domComplete": 0, "loadEventStart": 0, "loadEventEnd": 0 } ], "se_category": null, "se_action": null, "se_label": null, "se_property": null, "se_value": null, "unstruct_event_com_snowplowanalytics_snowplow_link_click_1": { "targetUrl": "http://www.example.com", "elementClasses": [ "foreground" ], "elementId": "exampleLink" }, "tr_orderid": null, "tr_affiliation": null, "tr_total": null, "tr_tax": null, "tr_shipping": null, "tr_city": null, "tr_state": null, "tr_country": null, "ti_orderid": null, "ti_sku": null, "ti_name": null, "ti_category": null, "ti_price": null, "ti_quantity": null, "pp_xoffset_min": null, "pp_xoffset_max": null, "pp_yoffset_min": null, "pp_yoffset_max": null, "useragent": null, "br_name": null, "br_family": null, "br_version": null, "br_type": null, "br_renderengine": null, "br_lang": null, "br_features_pdf": true, "br_features_flash": false, "br_features_java": null, "br_features_director": null, "br_features_quicktime": null, "br_features_realplayer": null, "br_features_windowsmedia": null, "br_features_gears": null, "br_features_silverlight": null, "br_cookies": null, "br_colordepth": null, "br_viewwidth": null, "br_viewheight": null, "os_name": null, "os_family": null, "os_manufacturer": null, "os_timezone": null, "dvce_type": null, "dvce_ismobile": null, "dvce_screenwidth": null, "dvce_screenheight": null, "doc_charset": null, "doc_width": null, "doc_height": null, "tr_currency": null, "tr_total_base": null, "tr_tax_base": null, "tr_shipping_base": null, "ti_currency": null, "ti_price_base": null, "base_currency": null, "geo_timezone": null, "mkt_clickid": null, "mkt_network": null, "etl_tags": null, "dvce_sent_tstamp": null, "refr_domain_userid": null, "refr_dvce_tstamp": null, "contexts_com_snowplowanalytics_snowplow_ua_parser_context_1": [ { "useragentFamily": "IE", "useragentMajor": "7", "useragentMinor": "0", "useragentPatch": null, "useragentVersion": "IE 7.0", "osFamily": "Windows XP", "osMajor": null, "osMinor": null, "osPatch": null, "osPatchMinor": null, "osVersion": "Windows XP", "deviceFamily": "Other" } ], "domain_sessionid": "2b15e5c8-d3b1-11e4-b9d6-1681e6b88ec1", "derived_tstamp": "2021-11-20T00:03:57.886Z", "event_vendor": "com.snowplowanalytics.snowplow", "event_name": "link_click", "event_format": "jsonschema", "event_version": "1-0-0", "event_fingerprint": "e3dbfa9cca0412c3d4052863cefb547f", "true_tstamp": "2021-11-20T00:03:57.886Z" }' ``` --- # Configure Snowplow Client for GTM Server Side > Configure IP forwarding, sp.js hosting, custom paths, common event mapping, and entity merging for the Snowplow Client in GTM Server Side. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-client-for-gtm-ss/snowplow-client-configuration/ > **Tip:** The [GTM Common Event](https://developers.google.com/tag-platform/tag-manager/server-side/common-event-data) has a `user_data` property. To populate this, you can attach a context Entity to your events of this schema: `iglu:com.google.tag-manager.server-side/user_data/jsonschema/1-0-0`, which can be found on [Iglu Central](https://github.com/snowplow/iglu-central/blob/853357452300b172ebc113d1d75d1997f595142a/schemas/com.google.tag-manager.server-side/user_data/jsonschema/1-0-0). ## Forward User IP address As the container sits between the website user and the Snowplow collector (or other downstream destinations), the users IP will be unknown to the destination. By enabling this option, the users IP address will be included in the events sent to Tags. By disabling this, you are able to use GTM SS as a proxy which can string user IP addresses from requests. Many tags also offer this functionality at the tag level. ## Populate GAv4 Client Properties Enabled by default, this option will populate additional properties which the GAv4 requires, that is useful if you want to forward your Snowplow events to the GAv4 Tag. ## sp.js settings This setting allows for your GTM SS Container to serve your `sp.js` JavaScript Tracker file. This allows you to have first party hosting of your tracker without the need to set up separate hosting or use a third party CDN. It is recommended to rename `sp.js` if enabling this setting, as many adblockers will block requests to files named `sp.js`. A random string is the best option here. ![sp.js settings](/assets/images/spjssettings-905ab9716df666e83d799fa929e3bc40.png) You can request _any_ version of the Snowplow JavaScript Tracker with this setting enabled. e.g. `https://{{gtm-ss-url}}/3.1.6/776b5b25.js` will load v3.1.6, or `https://{{gtm-ss-url}}/2.18.2/776b5b25.js` will load v2.18.2. ## Additional Options ### Custom POST Path As many ad blockers will block the default `/com.snowplowanalytics.snowplow/tp2` POST path, it is recommended to change this and then update your trackers initialization to use this custom POST path. ### Claim GET Requests The default Snowplow path for GET requests is `/i`, as this is so short there is a chance it could conflict with other Clients within your GTM SS Container. If you'd only like your Snowplow Client to listen for POST requests, you can disable this GET endpoint with this setting. ### Include Original `tp2` Event If using this Client to receive Snowplow Tracker events and then forward to a Snowplow Collector with the Snowplow Tag, you should leave this option enabled as it will allow the Snowplow Tag to forward the original tracker event with no extra processing. ### Include Original Self Describing Event By default, the self describing event will be "shredded" into a key using the schema name as the key, this is a "lossy" transformation, as the Minor and Patch parts of the jsonschema version will be dropped. This flag populates the original, lossless, Self Describing Event as `x-sp-self_describing_event`. Let's assume we have a self-describing event following the schema `iglu:com.acme/foobar/jsonschema/1-0-0`. By default, the option to _Include Original Self Describing Event_ is disabled. So the Snowplow client by default will include it in the common event as: ```json "x-sp-self_describing_event_com_acme_foobar_1": { "foo": "bar" } ``` In case the option to _Include Original Self Describing Event_ is enabled, then the Snowplow client, if it finds the original event (see note below), will also include it in the common event, resulting in: ```json "x-sp-self_describing_event_com_acme_foobar_1": { "foo": "bar" }, "x-sp-self_describing_event": { "schema": "iglu:com.acme/foobar/jsonschema/1-0-0", "data": { "foo": "bar" } } ``` > **Note:** This option only makes sense when using GTM in a [**Server Side Tag Manager (pre-pipeline)**](/docs/destinations/forwarding-events/google-tag-manager-server-side/#configuration-options) architecture, because it only makes a difference when the input is a _raw_ Snowplow event. > > In a [**Destinations Hub (post-pipeline)**](/docs/destinations/forwarding-events/google-tag-manager-server-side/#configuration-options) architecture, this option **does not apply**. Effectively, it’s always disabled, regardless of the setting. In the example above, this would mean that the data will contain `x-sp-self_describing_event_com_acme_foobar_1`, but not`x-sp-self_describing_event`. ### Include Original Contexts Array By default, the contexts will be "shredded" into separate keys using the context name as the key, this is a "lossy" transformation, as the Minor and Patch parts of the jsonschema version will be dropped. If you would like to keep the original "lossless" contexts array (as `x-sp-contexts`), enable this option. ## Advanced common event options ![advanced common event options](/assets/images/advanced_common_options-87a6c18646fd2ad02e1d8ca7d6e8a49d.png) ### `client_id` #### Use default settings for client\_id mapping in common event By default the Snowplow Client sets the `client_id` as follows: If the Snowplow event has the `client_session` context entity attached, its `userId` property is used. Else the `domain_userid` atomic property is used. Disabling this option reveals the following table that allows you to override the default behavior. #### Specify client\_id You can use this table to specify the rules to set the `client_id` of the common event. For consistency downstream it is suggested to specify properties that apply to all Snowplow events (atomic or through global context entities). The columns of this table are: - **Priority**: Using this column you can set the priority (higher values mean higher priority) with which the Client will look into the Snowplow event to locate the value to set the `client_id`. - **Property name or path**: This column refers to the common event, so you can define alternative Snowplow properties using the `x-sp-` prefix before the enriched property name or nested path (using dot notation). Example values: `x-sp-network_userid` or `x-sp-contexts_com_acme_user_1.0.anonymous_identifier`. ### `user_id` #### Use default settings for user\_id mapping in common event By default the Snowplow Client sets the `user_id` from the corresponding `user_id` property of the Snowplow event. Disabling this option reveals the following table that allows you to override the default behavior. #### Specify user\_id You can use this table to specify the rules to set the `user_id` of the common event, which will override the default Snowplow Client behavior. For consistency downstream it is suggested to specify properties that apply to all Snowplow events (atomic or through global context entities). The columns of this table are: - **Priority**: Using this column you can set the priority (higher values mean higher priority) with which the Client will look into the Snowplow event to locate the value to set the `user_id`. - **Property name or path**: This column refers to the common event, so you can define alternative Snowplow properties using the `x-sp-` prefix before the enriched property name or nested path (using dot notation). For example: `x-sp-contexts_com_acme_user_entity_1.0.email`. ### Snowplow Entities Mapping > **Note:** This is an advanced feature and we recommend that you first explore the configuration options of your Tag(s) in case they are enough to cover your use case. #### Merge selected Snowplow entities Enable this option to allow merging of Snowplow context data to the Common Event. Checking this box reveals the following table: ![](/assets/images/context_merging_overview-583c306bceab0a6601d86760ed9ebb73.png) #### Context entities to merge Using this table you can specify the rules to merge Snowplow context entity data to the Common Event. The columns of this table are: ![](/assets/images/context_merging_new_row-95c55b188e68a938e0f393b772b1c3ab.png) 1. **Schema**: (Required) The schema of the context entity to merge. > **Info:** The **Schema** can be specified in 3 ways: > > - Iglu URI (e.g. `iglu:com.acme/test/jsonschema/1-0-0` ) > - Enriched name (e.g. `contexts_com_acme_test_1`) > - Common Event name (e.g. `x-sp-contexts_com_acme_test_1`) 2. **Apply to all versions**: (False/True) Whether the rule applies to all versions of the context entity schema. Default is False. > **Info:** When you set **Apply to all versions** to `True`, it will apply the same rule to all versions of the schema, independently of the one mentioned in the **Schema** field. 3. **Prefix**: (Optional) Specify a prefix to use for property names when merging. 4. **Merge to**: (Event Properties/Custom) Specify where to merge the context entity's properties. Default is Event Properties, i.e. to the root level of the common event. 5. **Custom path**: (Optional) The path for custom merging. > **Info:** The **Custom path** option applies only if the **Merge to** column is set to `Custom`, else the row is considered invalid. 6. **Keep original mapping**: (True/False) Whether to keep the original context mapping. Default is True. 7. **Custom transformation**: (Optional) Specify a Variable returning **a function** that represents a custom transformation of the context data to the desired object before merging. The default behaviour is the following: It first checks whether the context array contains a single entity object and if so, it merges this single object. In case the context specified contains multiples of entity objects, the rule is not applied (in those cases you will need to provide a custom transformation function through a Variable). > **Info:** The function signature must be: `(contextArray, event) => Object` > > The Client guarantees that it will call this function providing as arguments the original context (Array) specified and the Common Event (Object) it has constructed. The event argument is provided in order to enable merging logic to optionally be based on event properties. Please note that it is not possible to modify the event inside the function you will define. The Client expects the function to return an Object, otherwise the value is ignored. #### Example: Custom transformation function for context mapping You can define and return the function in a **Variable Template**, which you can then use to create a **Variable** to reference. The following is an example code of such a **Variable Template**: ```javascript // Variable template code function selectFirst(contextArray, event) { // the function must return an object return contextArray[0]; } // The Variable must return the function return selectFirst; ``` ### Snowplow Self-describing Event Mapping > **Note:** This is an advanced feature and we recommend that you first explore the configuration options of your Tag(s) in case they are enough to cover your use case. #### Merge selected Snowplow self-describing event data Enable this option to allow merging of Snowplow self-describing event data to the Common Event. Checking this box reveals the following table: ![](/assets/images/selfdesc_merging_overview-b2be0321a3de3b7d0bdf95169cf9b5b7.png) #### Self-describing events to merge Using this table you can specify the rules to merge Snowplow self-describing event data to the Common Event. The columns of this table are: ![](/assets/images/selfdesc_merging_new_row-f75e06e030456a7f31905c40e6b6de93.png) **Schema**: (Required) The schema of the self-describing event. > **Info:** The **Schema** can be specified in 3 ways: > > - Iglu URI (e.g. `iglu:com.acme/myevent/jsonschema/1-0-0` ) > - Enriched name (e.g. `contexts_com_acme_myevent_1`) > - Common Event name (e.g. `x-sp-contexts_com_acme_myevent_1`) **Apply to all versions**: (False/True) Whether the rule applies to all versions of the self-describing event schema. Default is False. > **Info:** When you set **Apply to all versions** to `True`, it will apply the same rule to all versions of the schema, independently of the one mentioned in the **Schema** field. **Prefix**: (Optional) Specify a prefix to use for property names when merging. **Merge to**: (Event Properties/Custom) Specify where to merge the self-describing event’s properties. Default is Event Properties, i.e. to the root level of the common event. **Custom path**: (Optional) The path for custom merging. > **Info:** The **Custom path** option applies only if the **Merge to** column is set to `Custom`, else the row is considered invalid. **Keep original mapping**: (True/False) Whether to keep the original context mapping. Default is True. **Custom transformation**: (Optional) Specify a Variable returning **a function** that represents a custom transformation of the self-describing data to the desired object before merging. The default behaviour is to merge the self-describing data object as is. > **Info:** The function signature must be: `(selfDescObject, event) => Object` > > The Client guarantees that it will call this function providing as arguments the original self-describing data (Object) specified and the Common Event (Object) it has constructed. The event argument is provided in order to enable merging logic to optionally be based on other event properties. Please note that it is not possible to modify the event inside the function you will define. The Client expects the function to return an Object, otherwise the value is ignored. #### Example: Custom transformation function for self-describing event mapping You can define and return the function in a **Variable Template**, which you can then use to create a **Variable** to reference. The following is an example code of such a **Variable Template**: ```javascript // Variable template code const Object = require('Object'); function addKeySuffix(selfDescObject, event) { // the function must return an object return Object.keys(selfDescObject).reduce((acc, curr) => { acc[curr.concat('_suffix')] = selfDescObject[curr]; return acc; }, {}); } // The Variable must return the function return addKeySuffix; ``` --- # Snowplow Tag for GTM Server Side > Forward events to Snowplow Collector from GTM Server Side using the Snowplow Tag, supporting events from Snowplow JavaScript Tracker or other GTM SS clients like GA4. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-tag-for-gtm-ss/ The [Snowplow Tag for GTM SS](https://tagmanager.google.com/gallery/#/owners/snowplow/templates/snowplow-gtm-server-side-tag) is most useful if using GTM SS as a Server Side Tag Manager for Snowplow JavaScript Tracker events, as you will want to ensure you forward these events to your Snowplow Collector. The Snowplow Tag will automatically forward any events the Snowplow Client receives once it has been configured with your Collector URL. The Snowplow Tag can also construct Snowplow events from other GTM SS Clients such as GAv4. ## Template Installation > **Note:** The server Docker image must be 2.0.0 or later. ### Tag Manager Gallery 1. From the Templates tab in GTM SS, click "Search Gallery" in the Tag Templates section 2. Search for "Snowplow" and select the official "By Snowplow" tag 3. Click Add to Workspace 4. Accept the permissions dialog by clicking "Add" ## Snowplow Tag Setup With the template installed, you can now add the Snowplow Tag to your GTM SS Container. 1. From the Tag tab, select "New", then select the Snowplow Tag as your Tag Configuration 2. Select your desired Trigger - If using the Snowplow JavaScript Tracker and Snowplow Client, you want to select "All Events" 3. Enter your Snowplow Collector Endpoint, and confirm the Cookie Name matches that of your Collector 4. Click Save ![](/assets/images/tagsetup-3a2fbd7ede526d18786087b5b2b1dfbb.gif) --- # Event settings for Snowplow Tag in GTM SS > Configure structured events, self-describing events, and context entities for the Snowplow Tag in GTM Server Side. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-tag-for-gtm-ss/snowplow-tag-configuration/advanced-event-settings/ This page describes the event settings available in the Snowplow Tag for GTM Server Side. ## Structured events This section allows you to specify which incoming events will be tracked as custom Snowplow events. It only targets events that follow the event/category/label/value Universal Analytics model. ### Send selected events as Snowplow Structured Events Enabling this check-box reveals a multi-line text field to allow you to set custom Structured Events. ### Event name(s) selected Add the event names (in separate lines) to be tracked as custom structured Snowplow events. ## Self-describing events This section allows you to define [Snowplow self-describing events](/docs/sources/web-trackers/custom-tracking-using-schemas/#track-a-custom-event-self-describing). ![self describing events setup](/assets/images/self_describing_setup-5b8b1628a6630ae2a9c8d69ea3959bb0.png) ### Define events to be sent as Snowplow Self-Describing Events Enable this to allow custom self-describing event definitions. ### Event Name to Schema A table of the events (event names and corresponding schemas) to be tracked as custom self-describing Snowplow events. Add events which you would like to capture and convert into Self Describing Events. The `Event Name` should equal the Clients `event_name` property, if this is found when the Snowplow Tag fires, this tag will create a Self Describing Event using the specified schema. ### Event Definitions A table of definitions for self-describing data properties. Each row maps a single data property of a custom self-describing Snowplow event to its value. For each Event, you can also read properties off the client event object and add them as properties to the Self Describing Event. ## Context entities ### Apply context entities settings also to Snowplow events Enable this tick box to also apply the Context entities settings to raw Snowplow events. This may be helpful in use cases where you want to modify the entities already attached to Snowplow events **before** they are relayed to your Snowplow collector. ### Context entities settings This section allows you to define the context entities to be attached to your events using the following subsections. You can define both custom and global context entities: Custom Entities are attached where the event\_name matches the incoming event, whereas Global Entities will be applied to all events. Both subsections offer two ways to set your context entities: 1. Using the default tables in the UI: ![context entities settings](/assets/images/context_entities_settings-30d068889d497c4542ba61cf9f44e4aa.png) 2. Through GTM Server-side variables (best suited for advanced use-cases): ![context entities settings through variable](/assets/images/context_entities_settings_with_var-32d664fa3c5b3d2c3172337cd0b69b6f.png) #### Custom context entities > **Tip:** Custom context entities can be used to augment any standard Snowplow event, including self describing events, with additional data. The entities set in this subsection are attached _where the event name matches the incoming event_. ##### Use variables to define custom context entities Enabling this setting allows you to set custom context entities using variables that return the entities array to attach to the event. ##### Define custom context entities This is the way to define custom context entities through table UI. The columns of this table are: - **Event Name**: The event that this entity will be attached to. - **Entity Schema**: The schema of the entity. - **Entity Property Name**: The key path to the entity's data. - **Type**: The type of the value. Available options are: - **Default**: This option means that the value remains to its original type. - **String**: This option means that the value will be interpreted as a string. - **Boolean**: This option means that the value will be interpreted as a boolean. - **Number**: This option means that the value will be interpreted as a number. - **Reference**: What the value references. Available options are: - **Client Event Property** (default): This means that what is set in the **Value** column corresponds to a property path in the client event. - **Constant or Variable**: This means that what is set in the **Value** column is the actual value to be used. - **Value**: The value to set the property name to. ##### Set custom context entities through variables This table is revealed if **Use variables to define custom context entities** option is enabled. It provides an alternative way to set custom context entities for an even through a GTM Server-side variable. The variable specified must return an array of context entities. The columns of this table are: - **Event Name**: The event that this entity will be attached to. - **Custom Entities Array**: The variable containing the array of entities to be attached to the event. #### Global context entities > **Tip:** Global context entities are custom entities that apply globally. This lets you define your own context entities once and have them sent _with **all** events_. ##### Use a variable to define global context entities Enabling this setting allows you to set global context entities using a variable that returns the entities array to attach to the event. ##### Define global context entities This is the way to define global context entities through a table UI. The columns of this table are: - **Entity Schema**: The schema of the entity. - **Entity Property Name**: The key path to the entity's data. - **Type**: The type of the value. Available options are: - **Default**: This option means that the value remains to its original type. - **String**: This option means that the value will be interpreted as a string. - **Boolean**: This option means that the value will be interpreted as a boolean. - **Number**: This option means that the value will be interpreted as a number. - **Reference**: What the value references. Available options are: - **Client Event Property** (default): This means that what is set in the **Value** column corresponds to a property path in the client event. - **Constant or Variable**: This means that what is set in the **Value** column is the actual value to be used. - **Value**: The value to set the property name to. ##### Set global context entities through variable This text box is revealed if **Use a variable to define global context entities** option is enabled. ### Examples In the following screenshots you can find examples of context entities settings using: #### 1. The default table UI _**Scenario**_: On a tutorial platform, we use GTM to send a `tutorial_begin` event type to our GTM Server-side container with parameters: ```javascript dataLayer.push({ 'event': 'tutorial_begin', 'tutorial.id': 'math101', 'tutorial.category_name': 'mathematics', 'user_data.email_address': 'foo@bar.baz', }); ``` We are using the Snowplow GTM SS Tag to forward it to our Snowplow pipeline, and we want to attach context entities. For our example, let's say: - our `tutorial` Snowplow entity corresponds to this schema ```json { "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "self": { "vendor": "com.acme", "name": "tutorial", "version": "1-0-0", "format": "jsonschema" }, "description": "Tutorial information", "type": "object", "properties": { "id": { "description": "The ID of the tutorial", "type": "string", "maxLength": 4096 }, "category": { "description": "The category of the tutorial", "type": ["string", "null"], "maxLength": 4096 } }, "required": ["id"] "additionalProperties": false } ``` - we also want user information, so we will use the `user_data` context entity ([available in Iglu Central](https://github.com/snowplow/iglu-central/blob/master/schemas/com.google.tag-manager.server-side/user_data/jsonschema/1-0-0)) attached to all events as well. Then we can configure the Snowplow GTM SS Tag's Context entities settings as: ![context entities settings example A](/assets/images/context_entities_settings_example-b75aaf07d445a7e76d034b91aac9270c.png) Here we have configured the Context entities settings in order to: 1. Attach the `tutorial` entity only to `tutorial_begin` event (using the **Custom context entities** section). Consequently we specify how to derive the data of the entity: - We set the `id` to the value found in the `tutorial.id` key path of the client event - We set the `category` property to the value found in the `tutorial.category_name` key path of the client event 2. Attach the `user_data` entity to all events (using the **Global context entities** section). In the row, we specify how to derive the `user_data` entity's data: - We set the `email_address` to the value found in the `user_data.email_address` key path of the client event > **Note:** As you can see in the images above, the schema for an entity can be written in 2 equally correct ways: > > 1. With the `iglu:` prefix, e.g. `iglu:com.acme/product/jsonschema/1-0-0` > 2. Without the `iglu:` prefix, e.g. `com.acme/product/jsonschema/1-0-0` #### 2. GTM SS Variables As mentioned, it is also possible to reference GTM Server-side Variables in order to set both the Custom and the Global context entities. As a simple example: ![context entities settings example B](/assets/images/context_entities_settings_example_var-a10e8e255f4ca4506f15accc03749520.png) Here: 1. We specify the Variable `product_entities` as the context entities to attach only to `purchase` events 2. We specify the Valiable `global_entities` as the context entities to attach to all events The Variables referenced must return an array of context entities. For example, the return value of such a Variable should look like: ```javascript [ { schema: "iglu:com.example_company/page/jsonschema/1-2-1", data: { pageType: 'test', lastUpdated: '2022-11-18T17:59:00', } }, { schema: "iglu:com.example_company/user/jsonschema/2-0-0", data: { userType: 'tester', } } ] ``` ## Additional event settings ### Base64 encoding Whether to encode the custom self-describing event data in base64. ### Platform identifier When a platform is not specified on the event, this value will be used. ### App ID This text box allows you to specify the `app_id` for the event. > **Note:** In case you are specifying the `app_id` through a GTM-SS Variable, please ensure that its return value is a string, otherwise it will be ignored. --- # Configure Snowplow Tag for GTM Server Side > Configure collector URL, cookie settings, advanced event settings, and logging for the Snowplow Tag in GTM Server Side. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-tag-for-gtm-ss/snowplow-tag-configuration/ ## Collector URL (Required) Set this to the URL of your Snowplow Collector you wish to send events to. ## Cookies Settings If you have configured your Snowplow collector to have different cookie details, you should ensure they match here. ### Name (Required) This value must match the value of your Collector cookie name, this will allow the Snowplow Tag to find and return your Collector Cookie back to your users browser. ### Override Cookie Properties If you'd like to overwrite your collectors cookie settings you can do that here. #### Domain Override To return a cookie with a different domain value than your collector, you can override it to another string here. "auto" will ensure the value is unchanged. #### Path Override This will override the path value of the cookie. #### SameSite Override This will override the SameSite flag on the cookie. #### Expiration Override in Seconds Allows the expiration time of the cookie to be altered. This value is in seconds, and defaults to 63072000 (2 years). #### HttpOnly Overwrites the HttpOnly flag on the cookie. Will be `true` if enabled (default), or `false` if disabled. #### Secure Overwrites the Secure flag on the cookie. Will be `true` if enabled (default), or `false` if disabled. Setting this to `false` and `SameSite` to `None` will prevent browsers from being able to store the cookie. ### Cookie Headers ![cookie headers](/assets/images/cookie_headers-acafd18e7933e0e939c4ff547256a52a.png) #### Forward Cookies in the request headers This option allows you to forward selected cookies in the request headers and it is disabled by default. Enabling it reveals the "Cookies to forward" table. #### Cookies to forward Using this table allows you to select the cookies to forward. Its columns are: - **Cookie Name**: The name of the cookie to forward - **Decode**: Whether to want to decode the cookie value(s) before forwarding (defaults to `No`) ## Advanced Event Settings ![advanced event settings overview](/assets/images/advanced_event_settings_overview-6567e2661a9a168f1271f37ac28335ae.png) This section allows you to: - Specify events to be tracked as Structured events - Configure Self-describing event definitions - Specify Context entities to be attached to events - Customize additional event settings You can find out more about all the available configuration options in the [Advanced Event Settings page](/docs/destinations/forwarding-events/google-tag-manager-server-side/snowplow-tag-for-gtm-ss/snowplow-tag-configuration/advanced-event-settings/). ## Logs Settings Through the Logs Settings you can control the logging behavior of the Snowplow Tag. The available options are: - `Do not log`: This option allows you to completely disable logging. No logs will be generated by the Tag. - `Log to console during debug and preview`: This option enables logging only in debug and preview containers. This is the default option. - `Always`: This option enables logging regardless of container mode. > **Note:** Please take into consideration that the logs generated may contain event data. The logs generated by the Snowplow GTM SS Tag are standardized JSON strings. The standard log properties are: ```json { "Name": "Snowplow", // the name of the tag "Type": "Message", // the type of log (one of "Message", "Request", "Response") "TraceId": "xxx", // the "trace-id" header if exists "EventName": "xxx" // the name of the event the tag fired at } ``` Depending on the type of log, additional properties are logged: | Type of log | Additional information | | ----------- | -------------------------------------------------------------- | | Message | "Message" | | Request | "RequestMethod", "RequestUrl", "RequestHeaders", "RequestBody" | | Response | "ResponseStatusCode", "ResponseHeaders", "ResponseBody" | --- # Debug and test GTM Server Side tags > Test and debug Google Tag Manager Server Side tag configurations using Preview Mode before deploying to production environments. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/google-tag-manager-server-side/testing/ If you are working on some changes to the configuration of your Google Tag Manager tags and would like to test them before applying them in production, you can use GTM’s [Preview Mode](https://developers.google.com/tag-platform/tag-manager/server-side/debug) feature. It shows information about the events it receives, which tags get triggered, etc. You can direct some (or all) of your Snowplow events to the Preview Mode, instead of the production tags. > **Note:** To follow the steps below, you will need to be running [Snowbridge](/docs/api-reference/snowbridge/) 2.3+. You will also need to have the [`spGtmssPreview` transformation](/docs/api-reference/snowbridge/configuration/transformations/builtin/spGtmssPreview/) activated (this is the default for Snowplow customers using Snowbridge with GTM Server Side). ## Copy the preview header Once you enter Preview Mode in Google Tag Manager, click on the three dots in the top right corner of the screen and click “Send requests manually”. You will see a popup with a preview header, for example (not a real value): > **Note:**```text > sTjhMcUdkNldaM2RsOThwWTRvNzE3VkZtb1BwK0E9PQo= > ``` Copy the header value (in the example above, `sTjhMcUdkNldaM2RsOThwWTRvNzE3VkZtb1BwK0E9PQo=`). ## Add the preview header to your events You can add the header value to all or some of your events as an [entity](/docs/fundamentals/entities/). For example, if you are using the [JavaScript tracker](/docs/sources/web-trackers/): ```javascript snowplow('trackPageView', { context: [{ schema: 'iglu:com.google.tag-manager.server-side/preview_mode/jsonschema/1-0-0', data: { // paste your header value here 'x-gtm-server-preview': 'sTjhMcUdkNldaM2RsOThwWTRvNzE3VkZtb1BwK0E9PQo=' } }] }); ``` You can also add it as a [global context](/docs/sources/web-trackers/custom-tracking-using-schemas/global-context/) for all events: ```javascript const gtmPreviewContext = { schema: 'iglu:com.google.tag-manager.server-side/preview_mode/jsonschema/1-0-0', data: { // paste your header value here 'x-gtm-server-preview': 'sTjhMcUdkNldaM2RsOThwWTRvNzE3VkZtb1BwK0E9PQo=' } }; snowplow('addGlobalContexts', [gtmPreviewContext]); ``` For trackers other than the JavaScript tracker, the approach is the same — you need to add the preview header as an entity. > **Tip:** Note that the header value can change over time as you make changes to your Google Tag Manager setup. --- # Real-time event forwarding to third-party platforms > Send Snowplow events to third-party platforms in real-time using Snowplow's managed event forwarding solution with built-in filtering, field mapping, and JavaScript transformations. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/ Event forwarders let you filter, transform, and send Snowplow events to third-party platforms in real-time. They're deployed as fully managed apps that sit alongside warehouse and lake loaders in your Snowplow cloud account. You can configure forwarders through Snowplow Console. ![Event forwarding architecture showing data flow from Snowplow pipeline through forwarders to destination APIs](/assets/images/event-forwarding-diagram-2e83ef2253fd7137b01a0f5a81e2aab4.svg) Event forwarding uses [Snowbridge](/docs/api-reference/snowbridge/) under the hood, deployed within your existing Snowplow cloud account, to transform and deliver events reliably. For detailed setup guides and field mappings, check out the list of [available integrations](/docs/destinations/forwarding-events/integrations/). For complex requirements or unsupported destinations, [advanced alternatives](#alternative-approaches) are also available. ## Use cases Event forwarding works best for use cases where you need low-latency event delivery and don't require complex aggregations across multiple events. For more complex transformations or batch processing, consider using [reverse ETL](/docs/destinations/reverse-etl/) instead. Event forwarding is a good fit for use cases such as: - **Real-time personalization**: send events to marketing automation or customer engagement platforms for immediate campaign triggers - **Product analytics**: forward user actions to analytics tools for real-time product insights - **A/B testing**: send experiment events to testing platforms for real-time optimization and analytics - **Fraud detection**: forward security-relevant events to monitoring systems - **Customer support**: stream events to support platforms for context-aware assistance ## How it works Event forwarders are deployed as managed [Snowbridge](/docs/api-reference/snowbridge/) apps that consume events from your enriched event stream in near real-time. It uses a [JavaScript transformation function](/docs/api-reference/snowbridge/configuration/transformations/custom-scripts/javascript-configuration/) generated from your configuration to filter and transform events. Here's how forwarders process events: 1. **Read events**: reads enriched events from your stream (Kinesis, Pub/Sub, or EventHub) as the Snowplow pipeline produces them 2. **Apply filters**: checks each event against your configured [JavaScript filters](/docs/destinations/forwarding-events/reference/#event-filtering) to decide whether to forward it 3. **Transform data**: transforms matching events using [field mapping expressions](/docs/destinations/forwarding-events/reference/#field-mapping) and custom JavaScript to convert Snowplow event data into your destination's API format 4. **Delivery handling**: sends transformed events to the destination via HTTP API calls. Retries failures depending on the [failure type](/docs/destinations/forwarding-events/event-forwarding-monitoring-and-troubleshooting/#failure-types-and-handling) and [logs non-retryable failures](/docs/destinations/forwarding-events/event-forwarding-monitoring-and-troubleshooting/#what-happens-when-events-fail) to cloud storage The end-to-end latency from event collection to destination delivery is on the order of seconds. Latency depends on overall pipeline event volume, complexity of transformation logic, and destination rate limits. ## Getting started To set up a new event forwarder, you must first create a **connection**, which stores the credentials and endpoint details needed to send events to your destination, and then a **event forwarder** configuration, which defines which pipeline to read events from and the transformations to apply to your events. For a step-by-step guide, see [creating forwarders](/docs/destinations/forwarding-events/creating-forwarders/). Each destination has its own requirements for API credentials, configuration, and field mappings. See the [available integrations](/docs/destinations/forwarding-events/integrations/) for destination-specific guides. For detailed information on supported JavaScript expressions, field transformations, and mapping syntax, see the [filter and mapping reference](/docs/destinations/forwarding-events/reference/). ## Alternative approaches Using event forwarders is the recommended starting point for most real-time delivery use cases. For more complex requirements or unsupported destinations, consider these alternatives: - **[Snowbridge](/docs/api-reference/snowbridge/)**: flexible event routing with custom transformations and destinations (Kafka, Kinesis, HTTP APIs). Use when you need destinations not yet supported by event forwarders, complex custom transformations, non-HTTP destinations, or advanced batching and retry configurations. - **[Google Tag Manager Server Side](/docs/destinations/forwarding-events/google-tag-manager-server-side/)**: use GTM SS to relay enriched events to destinations using rich libraries of tags. Best if your organization is heavily invested in GTM or if you need destinations not yet supported by event forwarders, but supported by GTM SS, such as Google Analytics. - **[Custom integrations](/docs/destinations/forwarding-events/custom-integrations/)**: build your own solutions using AWS Lambda, GCP Cloud Functions, or other stream processing systems for fully bespoke requirements. --- # Forward events to Amplitude > Send Snowplow events to Amplitude for product analytics and behavioral insights using the HTTP API v2 with support for event tracking and user properties. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/integrations/amplitude/ Send Snowplow events to Amplitude to power product and marketing analytics or guide and survey personalization using Amplitude's [HTTP API v2](https://www.docs.developers.amplitude.com/analytics/apis/http-v2-api/). ## Prerequisites Before setting up the forwarder in Console, you'll need an Amplitude API Key. To find your API key: 1. Log in to your Amplitude workspace 2. Click **Settings** in the top right, then click **Organization Settings** 3. From the sidebar, select **Projects**, then select your Project to view its details 4. Copy the **API Key** > **Tip:** To avoid introducing bad data in your production Amplitude project, we recommend using a test or development Amplitude project to test your transformations first. Then, create a new Connection in Console with your production API key, and a new forwarder that imports the configuration from your development forwarder. ## Getting started ### Configure the destination To create the connection and forwarder, follow the steps in [Creating forwarders](/docs/destinations/forwarding-events/creating-forwarders/). When configuring the connection, select **Amplitude** for the connection type, enter your API key, and select the **Server Location** where your Amplitude project is hosted. ### Validate the integration You can confirm events are reaching Amplitude by checking the **Ingestion Debugger** page in your Amplitude account: 1. From the left navigation bar, click **Data**, then select **Sources** from the sidebar. You will see a list of sources. 2. Select the **Ingestion Debugger** tab 3. Filter the graphs to show only events from the **HTTP API** to confirm data is flowing as expected from Snowplow. ## Sending custom properties You can send custom properties beyond the standard fields defined in the schema reference below. Amplitude supports three types of custom properties: - **event\_properties**: custom data associated with specific events (e.g., `event_properties.plan_type`, `event_properties.feature_flag`) - **user\_properties**: custom data tied to user profiles (e.g., `user_properties.subscription_tier`, `user_properties.account_age`) - **group\_properties**: custom data tied to groups when `event_type` is `$groupidentify` (requires Amplitude Accounts add-on) For property names containing spaces, use bracket notation (e.g., `event_properties["campaign source"]`). See Amplitude's [HTTP API v2 documentation](https://amplitude.com/docs/apis/analytics/http-v2#) for details on supported data types, property operations, and object depth limits. See [Creating forwarders](/docs/destinations/forwarding-events/creating-forwarders/) for details on configuring field mappings. ## Schema reference This section contains information on the fields you can send to Amplitude, including field names, data types, required fields, and default Snowplow mapping expressions. | Field | Details | | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `event_type` _string_ | _Required._ A unique identifier for your event.Default mapping: `event?.event_name` | | `user_id` _unknown_ | _Optional._ ID for the user. Required if `device_id` isn't provided.Default mapping: `event?.user_id` | | `device_id` _unknown_ | _Optional._ A device-specific identifier, such as the Identifier for Vendor on iOS. Required if `user_id` isn't provided.Default mapping: `event?.domain_userid ?? event?.contexts_com_snowplowanalytics_snowplow_client_session_1?.[0]?.userId` | | `event_id` _integer_ | _Optional._ An incrementing counter to distinguish events with the same user\_id and timestamp from each other. | | `session_id` _integer_ | _Optional._ The start time of the session in milliseconds since epoch (Unix timestamp). Necessary if you want to associate events with a particular system.Default mapping: `spTstampToEpochMillis(event?.contexts_com_snowplowanalytics_snowplow_client_session_1?.[0]?.firstEventTimestamp)` | | `insert_id` _string_ | _Optional._ A unique identifier for the event. Amplitude deduplicates subsequent events sent with the same device\_id and insert\_id within the past 7 days.Default mapping: `event?.event_id` | | `time` _integer_ | _Optional._ The timestamp of the event in milliseconds since epoch. If time isn't sent with the event, then it's set to the request upload time.Default mapping: `spTstampToEpochMillis(event?.derived_tstamp)` | | `event_properties` _object_ | _Optional._ Arbitrary key-value pairs assigned to the event. | | `user_properties` _object_ | _Optional._ Arbitrary key-value pairs assigned to the user. | | `groups` _object_ | _Optional._ Arbitrary key-value pairs representing groups of users. | | `group_properties` _object_ | _Optional._ Arbitrary key-value pairs assigned to the groups listed in the `groups`. | | `$skip_user_properties_sync` _boolean_ | _Optional._ When true user properties aren't synced. Defaults to false.Default mapping: `false` | | `app_version` _string_ | _Optional._ The current version of your application.Default mapping: `event?.contexts_com_snowplowanalytics_mobile_application_1?.[0]?.version` | | `platform` _string_ | _Optional._ Platform of the device.Default mapping: `event?.platform` | | `os_name` _string_ | _Optional._ The name of the mobile operating system or browser that the user is using.Default mapping: `event?.os_name` | | `os_version` _string_ | _Optional._ The version of the mobile operating system or browser the user is using.Default mapping: `event?.contexts_nl_basjes_yauaa_context_1?.[0]?.operatingSystemVersion` | | `device_brand` _string_ | _Optional._ The device brand that the user is using.Default mapping: `event?.contexts_nl_basjes_yauaa_context_1?.[0]?.deviceBrand` | | `device_manufacturer` _string_ | _Optional._ The device manufacturer that the user is using. | | `device_model` _string_ | _Optional._ The device model that the user is using.Default mapping: `event?.contexts_nl_basjes_yauaa_context_1?.[0]?.deviceName` | | `carrier` _string_ | _Optional._ The carrier that the user is using.Default mapping: `event?.contexts_nl_basjes_yauaa_context_1?.[0]?.carrier` | | `country` _string_ | _Optional._ The current country of the user.Default mapping: `event?.geo_country` | | `region` _string_ | _Optional._ The current region of the user.Default mapping: `event?.geo_region` | | `city` _string_ | _Optional._ The current city of the user.Default mapping: `event?.geo_city` | | `dma` _string_ | _Optional._ The current Designated Market Area of the user. | | `language` _string_ | _Optional._ The language set by the user.Default mapping: `event?.br_lang` | | `price` _number_ | _Optional._ The price of the item purchased. Required for revenue data if the revenue field isn't sent. You can use negative values for refunds | | `quantity` _integer_ | _Optional._ The quantity of the item purchased. | | `revenue` _number_ | _Optional._ Revenue = (price x quantity). If you send all 3 fields of price, quantity, and revenue, then the revenue value is (price x quantity). Use negative values for refunds. | | `productId` _string_ | _Optional._ An identifier for the item purchased. You must send a price and quantity or revenue with this field. | | `revenueType` _string_ | _Optional._ The type of revenue for the item purchased. You must send a price and quantity or revenue with this field. | | `location_lat` _number_ | _Optional._ The current Latitude of the user.Default mapping: `event?.geo_latitude` | | `location_lng` _number_ | _Optional._ The current Longitude of the user.Default mapping: `event?.geo_longitude` | | `ip` _string_ | _Optional._ The IP address of the user.Default mapping: `event?.user_ipaddress` | | `idfa` _string_ | _Optional._ (iOS) Identifier for Advertiser.Default mapping: `event?.contexts_com_snowplowanalytics_snowplow_mobile_context_1?.[0]?.appleIdfa` | | `idfv` _string_ | _Optional._ (iOS) Identifier for Vendor.Default mapping: `event?.contexts_com_snowplowanalytics_snowplow_mobile_context_1?.[0]?.appleIdfv` | | `adid` _string_ | _Optional._ (Android) Google Play Services advertising ID.Default mapping: `event?.contexts_com_snowplowanalytics_snowplow_mobile_context_1?.[0]?.androidIdf` | | `android_id` _string_ | _Optional._ (Android) Android ID (not the advertising ID). | | `plan` _object_ | _Optional._ Tracking plan properties.Properties:* `branch` (string, optional): The tracking plan branch name. * `source` (string, optional): The tracking plan source. * `version` (string, optional): The tracking plan version. | --- # Forward events to Braze > Send Snowplow events to Braze for real-time personalization and campaign automation using the Track Users API with support for user attributes, custom events, and purchases. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/integrations/braze/ Send Snowplow events to Braze for real-time personalization, user tracking, and campaign automation using Braze's [Track Users API](https://www.braze.com/docs/api/endpoints/user_data/post_user_track). Snowplow supports the following Braze object types: - **[User attributes](https://www.braze.com/docs/api/objects_filters/user_attributes_object)**: Profile data and custom user properties - **[Custom events](https://www.braze.com/docs/api/objects_filters/event_object)**: User actions and behaviors - **[Purchases](https://www.braze.com/docs/api/objects_filters/purchase_object)**: Transaction data with product details ## Prerequisites Before setting up the forwarder in Console, you'll need the following from your Braze account: - Braze REST API key with these permissions: - `users.track` - `users.alias.new` - `users.identify` - `users.export.ids` - `users.merge` - `users.external_ids.rename` - `users.alias.update` - Braze REST API endpoint, found in Braze under **Settings** > **APIs and Identifiers** ## Getting started ### Configure the destination To create the forwarder, follow the steps in [Creating forwarders](/docs/destinations/forwarding-events/creating-forwarders/). When configuring the connection, select **Braze** for the connection type and enter your API key and endpoint. When configuring the forwarder, you can choose from the following **Braze object types** to map: - **[Attributes](https://www.braze.com/docs/api/objects_filters/user_attributes_object)**: update user profile data - **[Events](https://www.braze.com/docs/api/objects_filters/event_object)**: send custom user actions - **[Purchases](https://www.braze.com/docs/api/objects_filters/purchase_object)**: send transaction events ### Validate the integration You can confirm events are reaching Braze by checking the following pages in your Braze account: 1. Query Builder: in Braze, navigate to **Analytics** > **Query Builder**. You can write queries on the following tables to preview the data forwarded from Snowplow: `USER_BEHAVIORS_CUSTOMEVENT_SHARED`, `USERS_BEHAVIORS_PURCHASE_SHARED`. 2. API Usage Dashboard: in Braze, navigate to **Settings** > **API and Identifiers** to see a chart of API usage over time. You can filter specifically for the API key used by Snowplow and see both successes and failures. ## Sending custom properties You can send custom properties beyond the standard fields defined in the schema reference below. The structure depends on which Braze object type you're using: - **User attributes**: add as top-level fields (e.g., `subscription_tier`, `loyalty_points`) - **Event properties**: nest under `properties` object (e.g., `properties.plan_type`, `properties.feature_flag`) - **Purchase properties**: nest under `properties` object (e.g., `properties.color`, `properties.size`) For property names containing spaces, use bracket notation (e.g., `["account type"]` or `properties["campaign source"]`). See Braze's [Event Object documentation](https://www.braze.com/docs/api/objects_filters/event_object) for details on supported data types, property naming requirements, and payload size limits. See [Creating forwarders](/docs/destinations/forwarding-events/creating-forwarders/) for details on configuring field mappings. ## Limitations **Rate limits:** Braze enforces a rate limit of 3,000 API calls every three seconds for the Track Users API. Because Snowplow does not currently support batching for event forwarders, this API rate limit also functions as the event rate limit. If your input throughput exceeds 3,000 events per three seconds, you will experience increased latency. ## Schema reference The sections below contain information on the fields you can send to Braze, including field names, data types, required fields, and default Snowplow mapping expressions. ### User attributes Use the user attributes object to update standard user profile fields or add your own custom attribute data to the user. | Field | Details | | ----------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `_update_existing_only` _boolean_ | _Optional._ If `true`, API requests will only update existing user profiles in Braze. `false` recommended.Default mapping: `false` | | `user_alias` _object_ | _Optional._ Braze user alias object.Properties:* `alias_name` (string, required): The actual value of the alias identifier. * `alias_label` (string, required): Indicates the type of alias. E.g. `domain_userid`. | | `external_id` _string_ | _Optional._ A unique user identifier. Required if `user alias`, `braze_id`, `email`, or `phone` is not provided.Default mapping: `event?.user_id` | | `braze_id` _string_ | _Optional._ Identifier reserved for the Braze SDK. Required if `external_id`, `user alias`, `email`, or `phone` is not provided. Not recommended for use with the Snowplow integration. | | `country` _string_ | _Optional._ ISO-3166-1 alpha-2 standard standard country code. Where the value does not meet that standard, Braze attempts to map it to a country. Where it cannot, the value will be NULL.Default mapping: `event?.geo_country` | | `current_location` _object_ | _Optional._Properties:* `longitude` (number, optional): Longitude of the user's location. * `latitude` (number, optional): Latitude of the user's location. | | `date_of_first_session` _string_ | _Optional._ Date at which the user first used the app. Must be ISO 8601 format. | | `date_of_last_session` _string_ | _Optional._ Date at which the user most recently used the app. Must be ISO 8601 format. | | `dob` _string_ | _Optional._ The user's date of birth. | | `email` _string_ | _Optional._ The user's email address. | | `email_subscribe` _string_ | _Optional._ The user's email subscription status. Must be one of: `opted_in`, `unsubscribed`, `subscribed` | | `email_open_tracking_disabled` _boolean_ | _Optional._ Set to true to disable the open tracking pixel from being added to all future emails sent to this user. | | `email_click_tracking_disabled` _boolean_ | _Optional._ Set to true to disable the click tracking for all links within future emails sent to this user. | | `first_name` _string_ | _Optional._ User's first name. | | `gender` _string_ | _Optional._ The user's gender. Must be one of: `M`, `F`, `O`, `N`, `P`, `null` | | `home_city` _string_ | _Optional._ The user's city. | | `language` _string_ | _Optional._ The user's preferred language. Must be an ISO-639-1 standard language code.Default mapping: `event?.br_lang` | | `last_name` _string_ | _Optional._ User's last name. | | `marked_email_as_spam_at` _string_ | _Optional._ Date at which the user’s email was marked as spam. Must be ISO 8601 format. | | `phone` _string_ | _Optional._ The user's phone number. | | `push_subscribe` _string_ | _Optional._ The user's push message subscription status. Must be one of: `opted_in`, `unsubscribed`, `subscribed` | | `push_tokens` _array of object_ | _Optional._ Array of objects with app\_id and token string. | | `time_zone` _string_ | _Optional._ The user's time zone. Must be a valid IANA Time Zone.Default mapping: `event?.geo_timezone` | ### Events Each event object represents a single occurrence of a custom event by a particular user at the specific time. You can set and use custom event properties in messages, data collection, and personalization. | Field | Details | | --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `time` _string_ | _Required._ Time of the event, required. Must be ISO 8601 format.Default mapping: `spTstampToJSDate(event?.collector_tstamp)?.toISOString()` | | `name` _string_ | _Required._ Name of the type of event.Default mapping: `event?.event_name` | | `external_id` _string_ | _Optional._ A unique user identifier. Required if `user alias`, `braze_id`, `email`, or `phone` is not provided.Default mapping: `event?.user_id` | | `braze_id` _string_ | _Optional._ Identifier reserved for the Braze SDK. Required if `external_id`, `user alias`, `email`, or `phone` is not provided. Not recommended for use with the Snowplow integration. | | `phone` _string_ | _Optional._ The user's phone number. | | `user_alias` _object_ | _Optional._ Braze user alias object.Properties:* `alias_name` (string, required): The actual value of the alias identifier. * `alias_label` (string, required): Indicates the type of alias. E.g. `domain_userid`. | | `_update_existing_only` _boolean_ | _Optional._ If `true`, API requests will only update existing user profiles in Braze. `false` recommended.Default mapping: `false` | | `email` _string_ | _Optional._ The user's email address. | | `app_id` _string_ | _Optional._ Associates activity with a specific app in your Braze workspace. If set, should match a Braze App Identifier, found in Braze console's API section. Can be omitted, but incorrect values may result in data loss in Braze. | | `properties` _object_ | _Optional._ Arbitrary key-value pairs assigned to the event in Braze. | ### Purchases The purchase object represents a user purchasing a single item by a user at a particular time. Each purchase object is located within a purchase array, which can represent a transaction with multiple items. The purchase object has fields that allow the Braze back-end to store and use this information for messages, data collection, and personalization. | Field | Details | | --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `time` _string_ | _Required._ Time of the purchase, required. Must be ISO 8601 format. | | `product_id` _string_ | _Required._ Identifier for the product purchased. | | `currency` _string_ | _Required._ ISO 4217 Alphabetic Currency Code. | | `price` _number_ | _Required._ Price per item. | | `external_id` _string_ | _Optional._ A unique user identifier. Required if `user alias`, `braze_id`, `email`, or `phone` is not provided. | | `braze_id` _string_ | _Optional._ Identifier reserved for the Braze SDK. Required if `external_id`, `user alias`, `email`, or `phone` is not provided. Not recommended for use with the Snowplow integration. | | `phone` _string_ | _Optional._ The user's phone number. | | `user_alias` _object_ | _Optional._ Braze user alias object.Properties:* `alias_name` (string, required): The actual value of the alias identifier. * `alias_label` (string, required): Indicates the type of alias. E.g. `domain_userid`. | | `_update_existing_only` _boolean_ | _Optional._ If `true`, API requests will only update existing user profiles in Braze. `false` recommended. | | `quantity` _integer_ | _Optional._ Quantity of the item purchased. Braze treats this as multiple individual purchases. | | `email` _string_ | _Optional._ The user's email address. | | `app_id` _string_ | _Optional._ Associates activity with a specific app in your Braze workspace. If set, should match a Braze App Identifier, found in Braze console's API section. Can be omitted, but incorrect values may result in data loss in Braze. | | `properties` _object_ | _Optional._ Arbitrary key-value pairs assigned to the purchase in Braze. | --- # Pre-built event forwarding integrations > Pre-built event forwarding integrations for third-party destinations with authentication, field mapping, and API-specific configurations included. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/integrations/ Event forwarding supports third-party destinations through pre-built integrations that handle authentication, field mapping, and API-specific requirements. ## Available destinations Snowplow event forwarding supports the following destinations: - [Amplitude](/docs/destinations/forwarding-events/integrations/amplitude/) - [Braze](/docs/destinations/forwarding-events/integrations/braze/) - [Mixpanel](/docs/destinations/forwarding-events/integrations/mixpanel/) --- # Forward events to Mixpanel > Send Snowplow events to Mixpanel for product analytics and user behavior insights using the Import API with support for event tracking and custom properties. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/integrations/mixpanel/ Send Snowplow events to Mixpanel to power product analytics, user behavior tracking, and funnel analysis using Mixpanel's [Import API](https://developer.mixpanel.com/reference/import-events). ## Prerequisites Before setting up the forwarder in Console, you'll need the following from your Mixpanel account: - **Project ID**: found in Mixpanel under **Settings** > **Project Settings** - **Service Account Username**: create a service account in Mixpanel under **Settings** > **Organization Settings** > **Service Accounts**. The service account must have either **Admin** or **Owner** permissions. - **Service Account Password**: generated when you create the service account > **Tip:** To avoid introducing test data in your production Mixpanel project, we recommend using a test or development Mixpanel project to test your transformations first. Then, create a new Connection in Console with your production credentials, and a new forwarder that imports the configuration from your development forwarder. ## Getting started ### Configure the destination To create the connection and forwarder, follow the steps in [Creating forwarders](/docs/destinations/forwarding-events/creating-forwarders/). When configuring the connection, select **Mixpanel** for the connection type, enter your project ID, service account username, and service account password. You'll also need to select the **Server Location** where your Mixpanel project is hosted: - **United States**: `https://api.mixpanel.com/` - **European Union**: `https://api-eu.mixpanel.com/` - **India**: `https://api-in.mixpanel.com/` ### Validate the integration You can confirm events are reaching Mixpanel by checking the **Events** page in your Mixpanel account: 1. From the left navigation bar, click **Events** 2. You should see your Snowplow events appearing in the live view 3. You can also navigate to **Settings** > **Project Settings** > **Data & Exports** to view import statistics ## Identity management Mixpanel uses a combination of `distinct_id` (user identifier) and `$device_id` (device identifier) to track users across sessions. The Snowplow integration defaults to using `user_id` for `distinct_id` and a coalesce of `domain_userid` and `client_sesion.user_id` for `$device_id`, which supports Mixpanel's [Simplified ID Merge system](https://docs.mixpanel.com/docs/tracking-methods/id-management#simplified-id-merge-system). When a user logs in and your event contains a `user_id` value, Mixpanel will automatically merge the user's anonymous activity (tracked via `$device_id`) with their identified profile (tracked via `distinct_id`). ## Sending custom properties You can send custom event properties beyond the standard fields defined in the schema reference below. Custom properties are nested under the `properties` object. When configuring your forwarder, add field mappings formatted as `properties.your_custom_field` (e.g., `properties.plan_type`, `properties.feature_flag`). For property names containing spaces, use bracket notation (e.g., `properties["referred by"]`). See Mixpanel's [Import Events API documentation](https://developer.mixpanel.com/reference/import-events) for details on supported data types and property requirements. See [Creating forwarders](/docs/destinations/forwarding-events/creating-forwarders/) for details on configuring field mappings. ## Schema reference This section contains information on the fields you can send to Mixpanel, including field names, data types, required fields, and default Snowplow mapping expressions. | Field | Details | | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `event` _string_ | _Required._ The name of the eventDefault mapping: `event.event_name` | | `properties` _object_ | _Required._Properties:* `time` (integer, required): The time at which the event occurred, in milliseconds since epoch. Default mapping: `spTstampToEpochMillis(event.derived_tstamp)` * `distinct_id` (string, required): Identifies the user who performed the event. Default mapping: `event.user_id` * `$insert_id` (string, required): A unique identifier for the event, used for deduplication. Default mapping: `event.event_id` * `ip` (string, optional): The IP address of the user. Default mapping: `event.user_ipaddress` * `$city` (string, optional): The city of the event sender. Default mapping: `event.geo_city` * `$region` (string, optional): The region (state or province) of the event sender. Default mapping: `event.geo_region` * `mp_country_code` (string, optional): The country of the event sender. Default mapping: `event.geo_country` * `$browser` (string, optional): Name of the browser. Default mapping: `event.contexts_nl_basjes_yauaa_context_1?.[0]?.agentName` * `$browser_version` (string, optional): Version of the browser. Default mapping: `event.contexts_nl_basjes_yauaa_context_1?.[0]?.agentVersion` * `$current_url` (string, optional): The full URL of the webpage on which the event is triggered. Default mapping: `event.page_url` * `$referrer` (string, optional): Referring URL including your own domain. Default mapping: `event.page_referrer` * `$referring_domain` (string, optional): Referring domain including your own domain. Default mapping: `event.refr_urlhost` * `$device` (string, optional): Name of the device. Default mapping: `event.contexts_nl_basjes_yauaa_context_1?.[0]?.deviceName` * `$device_id` (string, optional): A unique device identifier. Used in Mixpanel's Simplified ID Merge API. Default mapping: `event.domain_userid ?? event.contexts_com_snowplowanalytics_snowplow_client_session_1?.[0]?.userId` * `$screen_height` (integer, optional): The height of the device screen in pixels. Default mapping: `event.dvce_screenheight` * `$screen_width` (integer, optional): The width of the device screen in pixels. Default mapping: `event.dvce_screenwidth` * `$os` (string, optional): Operating system of the device. Default mapping: `event.os_name` * `$os_version` (string, optional): Operating system version. Default mapping: `event.contexts_nl_basjes_yauaa_context_1?.[0]?.operatingSystemVersion` * `$manufacturer` (string, optional): Manufacturer of the device. Default mapping: `event.contexts_nl_basjes_yauaa_context_1?.[0]?.deviceBrand` * `$model` (string, optional): Model of the device. Default mapping: `event.contexts_nl_basjes_yauaa_context_1?.[0]?.deviceName` * `$app_version_string` (string, optional): The app version. Default mapping: `event.contexts_com_snowplowanalytics_mobile_application_1?.[0]?.version` * `$app_build_number` (string, optional): Build number for the mobile app. Default mapping: `event.contexts_com_snowplowanalytics_mobile_application_1?.[0]?.build` * `$carrier` (string, optional): Wireless carrier of the device owner. Default mapping: `event.contexts_nl_basjes_yauaa_context_1?.[0]?.carrier` * `$radio` (string, optional): The current cellular network communication standard (3G, 4G, LTE, etc). Default mapping: `event.contexts_com_snowplowanalytics_snowplow_mobile_context_1?.[0]?.networkTechnology` * `$wifi` (boolean, optional): Set to true if the user's device is connected to wifi. Default mapping: `event.contexts_com_snowplowanalytics_snowplow_mobile_context_1?.[0]?.networkType === 'wifi'` * `$user_id` (string, optional): The identified ID of the user. Used in Mixpanel's Simplified ID Merge API. Default mapping: `event.user_id` * `$lib_version` (string, optional): Tracker library version. Default mapping: `event.v_tracker` * `utm_source` (string, optional): UTM source parameter from the URL. Default mapping: `event.mkt_source` * `utm_medium` (string, optional): UTM medium parameter from the URL. Default mapping: `event.mkt_medium` * `utm_campaign` (string, optional): UTM campaign parameter from the URL. Default mapping: `event.mkt_campaign` * `utm_content` (string, optional): UTM content parameter from the URL. Default mapping: `event.mkt_content` * `utm_term` (string, optional): UTM term parameter from the URL. Default mapping: `event.mkt_term` * `mp_lib` (string, optional): Tracker library that sent the event. Default mapping: `'Snowplow: ' + event.v_tracker` * `mp_sent_by_lib_version` (string, optional): Mixpanel library version used to send data (not necessarily the same as the version which enqueued the data). * `$screen_dpi` (integer, optional): Pixel density of the device screen. * `$initial_referrer` (string, optional): Referring URL when the user first arrived on your site. Defaults to “$direct” if the user is not referred. * `$initial_referring_domain` (string, optional): Referring domain at first arrival. Defaults to “$direct” if the user is not referred. * `$search_engine` (string, optional): The search engine that the customer used when they arrived at your domain. * `mp_keyword` (string, optional): Search keywords detected on the referrer from a search engine to your domain. This property is only collected when search keywords are included in a URL. * `$watch_model` (string, optional): The model of the iOS watch. * `$bluetooth_enabled` (boolean, optional): Set to true if Bluetooth is enabled, false if not. * `$bluetooth_version` (string, optional): Set to “none”, “ble”, or “classic”. * `$has_nfc` (boolean, optional): The device supports Near Field Communication(NFC). Set to true if the device hardware supports NFC, false if not. * `$has_telephone` (boolean, optional): Set to true if this device has telephone functionality, false if not. | --- # Manage event forwarders in Console > Edit, clone, and delete Snowplow event forwarders in Console to update configurations, duplicate setups, or remove unused destinations. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/managing-forwarders/ This page explains how to edit, clone, and delete event forwarders. To start, go to **Destinations** > **Destination list** in [Snowplow Console](https://console.snowplowanalytics.com). ## Edit a forwarder To edit a forwarder: 1. Click **Details** under the destination you want to change to open the destination details page. 2. On the event forwarders overview table, click the three dots next to the forwarder you want to change and select **Edit**. You will see the forwarder configuration page. 3. Modify the forwarder configuration as needed. When you're done, select **Deploy** to re-deploy the forwarder with the updated configuration. The forwarder instances will be re-deployed on a rolling basis over the next few minutes. ## Rename a forwarder To rename a forwarder: 1. Click **Details** under the destination you want to change to open the destination details page. 2. On the event forwarders overview table, click the three dots next to the forwarder you want to change and select **Rename**. 3. Enter a new forwarder name and select **Rename** to save. ## Clone a forwarder When creating a new forwarder, you can import the configuration from an existing forwarder of the same type. This is especially helpful when migrating a forwarder setup from a development pipeline to production. To clone a forwarder: 1. Navigate to the **Available** tab and select **Configure** on the destination card from the list of available integrations to start setting up the forwarder. 2. Give the forwarder a **name**, select the **pipeline** you want the forwarder to read events from, and choose a **connection**. 3. From the **Import configuration from** dropdown, choose an existing forwarder. 4. Click **Continue**. The filters, mappings, and custom functions will be pre-populated with those of the existing forwarder you imported from. ## Delete a forwarder To permanently delete a forwarder: 1. Click **Details** under the destination you want to change to open the destination details page. 2. On the event forwarders overview table, click the three dots next to the forwarder you want to change and select **Delete**. 3. On the confirmation modal, select **Delete**. This will start the process of destroying the underlying forwarder infrastructure. --- # Event forwarding filter and mapping reference > Complete reference for Snowplow event forwarding JavaScript expressions, field mapping syntax, event filtering, data transformations, and custom functions. > Source: https://docs.snowplow.io/docs/destinations/forwarding-events/reference/ Event forwarders use JavaScript expressions for filtering events and mapping Snowplow data to destination fields. These expressions are entered during [forwarder setup](/docs/destinations/forwarding-events/#getting-started) in Console, specifically in the **Event filtering**, **Field mapping**, and **Custom functions** sections. This reference covers the syntax and available data for these operations. ## Available event fields You can reference any field in your Snowplow events for both filters and field mappings. ### Standard atomic fields Access [standard Snowplow fields](/docs/fundamentals/canonical-event/) in your filters and mappings using JavaScript dot notation: ```javascript // Standard atomic fields event.app_id event.event_name event.platform event.collector_tstamp event.event_id event.domain_userid event.user_id event.page_url event.page_title event.useragent event.network_userid ``` ### Custom events and entities You can also access fields in Snowplow or custom event and entity schemas. Forwarders transform Iglu schema URIs to JavaScript-safe field names: | Original schema | Transformed field name | | ------------------------------------------ | ------------------------------------ | | `com.acme/signup/jsonschema/1-0-0` | `unstruct_event_com_acme_signup_1` | | `com.acme/user_profile/jsonschema/2-1-0` | `contexts_com_acme_user_profile_2` | | `nl.basjes/yauaa_context/jsonschema/1-0-4` | `contexts_nl_basjes_yauaa_context_1` | Schema names follow these transformation rules: - Self-describing events: `unstruct_event_` prefix - Entities: `contexts_` prefix - Dots and slashes become underscores - Only major version number retained - Hyphens in vendor/name become underscores > **Info:** Always use [optional chaining](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining) (`?.`) when accessing custom events and entities to handle cases where they're not present. For a self-describing event with schema `com.acme/signup/jsonschema/1-0-0`: ```javascript // Access event properties event?.unstruct_event_com_acme_signup_1?.signup_method event?.unstruct_event_com_acme_signup_1?.user_type ``` For entities with schema `com.acme/user_profile/jsonschema/1-0-0`: ```javascript // Access entity properties (entities are arrays) event?.contexts_com_acme_user_profile_1?.[0]?.subscription_tier event?.contexts_com_acme_user_profile_1?.[0]?.account_created ``` ## Event filtering Event filters determine which events are forwarded to your destination. Only events matching your filter criteria (JavaScript expression evaluating to `true`) will be processed and sent. ### Basic filters Filter events can contain standard JavaScript comparison operators: ```javascript // Single condition event.app_id == "website" // Multiple conditions with AND event.app_id == "website" && event.event == "page_view" // Multiple conditions with OR event.event_name == "add_to_cart" || event.event_name == "purchase" // Check if event is in a list ["page_view", "add_to_cart", "purchase"].includes(event.event_name) // Exclude events event.app_id == "website" && event.event_name != "link_click" ``` ### Advanced filtering patterns Regular expressions: ```javascript // Match multiple domains event.page_urlhost.match(/mysite\.(com|fr|de)/) // Match event name patterns event.event_name.match(/^purchase_/) // Match custom field patterns event?.unstruct_event_com_acme_product_view_1?.category.match(/electronics|computers/) ``` Define reusable logic in the Custom Functions section: ```javascript // Defined in the Custom Function editor panel function isHighValueUser(event) { const profile = event?.contexts_com_acme_user_profile_1?.[0]; return profile?.subscription_tier == "premium" || profile?.lifetime_value > 1000; } // Use in filter isHighValueUser(event) && event.event_name == "purchase" ``` ## Field mapping Field mapping defines how Snowplow event data is transformed and sent to destination APIs. Each mapping consists of: - A destination field name (key) - A JavaScript expression that extracts the value from your Snowplow event (value) > **Info:** The code snippets below contain JavaScript expressions that you can include in the **Snowplow expression** mapping field in the UI. ### Basic mappings Map standard event fields directly: ![](/assets/images/event-forwarding-basic-mapping-be50ef0162cd510af04538089519f6ee.png) ```json // sample output { "event_type": "page_view" } ``` You can also apply fallback and conditional logic: ![](/assets/images/event-forwarding-conditional-mapping-b21729fb908ecd1cf784bb8f2952e6f9.png) ```json // sample output { "user_id": "a50d3dfe-ba21-432e-a165-1a1d2d633693" "source": "website" } ``` You can also send static values: ![](/assets/images/event-forwarding-static-mapping-89a993963e026dc4e9fc4a6bc421757c.png) ```json // sample output { "source": "snowplow" } ``` ### Data transformation Convert data types, such as strings, boolean values, and dates: ![](/assets/images/event-forwarding-type-conversions-cfc98aa6623258f0d8ec29cd32ef2576.png) ```json // sample output { "page_width": 720, "page_height": 600, "is_mobile": true, "timestamp": "2025-10-01T18:35:38.563Z" } ``` Use standard [Javascript String methods](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#instance_methods) to manipuate strings: ```javascript // Case conversion event.event_name.toLowerCase() event.page_title.toUpperCase() // String operations event.page_url.replace("http://", "https://") event.page_title.substring(0, 100) event.page_urlpath.split('/') ``` Map to nested objects using dot notation in field names: ![](/assets/images/event-forwarding-nested-mapping-9f700b7f85d97a5edfe10bc78f9ae348.png) ```json // sample output { "user": { "id": "user123" }, "properties": { "page_title": "Home Page", "page_url": "https://example.com", "referrer": "google.com" } } ``` ### Custom mapping functions Define complex transformations as functions in the custom functions section. You can then reference these functions in filters and mappings. Below are a few example transformations: ```javascript // Event name formatting function formatEventName(event) { const nameMap = { 'page_view': 'Page Viewed', 'add_to_cart': 'Product Added to Cart', 'purchase': 'Purchase Completed' }; return nameMap[event.event_name] || event.event_name; } // Extract product data function extractProductInfo(event) { const product = event?.unstruct_event_com_acme_product_1; if (!product) return null; return { id: product.product_id, name: product.product_name, category: product.category, price: parseFloat(product.price), currency: product.currency || 'USD' }; } // User profile enrichment function buildUserProfile(event) { const session = event?.contexts_com_snowplowanalytics_snowplow_client_session_1?.[0]; const geo = event?.contexts_com_snowplowanalytics_snowplow_geolocation_context_1?.[0]; return { user_id: event.domain_userid, session_id: session?.sessionId, location: geo ? `${geo.latitude},${geo.longitude}` : null, user_agent: event.useragent, platform: event.platform }; } ``` ## Other common patterns Timestamp formatting: ```javascript // ISO 8601 format new Date(event.collector_tstamp).toISOString() // Unix timestamp Math.floor(new Date(event.collector_tstamp).getTime() / 1000) // Readable format new Date(event.collector_tstamp).toLocaleDateString() ``` Conditional field mapping: ```javascript // Platform-specific mapping event.platform == "web" ? event.page_url : event.screen_name // Event-type specific properties event.event_name == "purchase" ? event.unstruct_event_com_acme_purchase_1?.total_value : null ``` Array handling: ```javascript // Get first entity event?.contexts_com_acme_product_1?.[0]?.product_name // Map all entities event?.contexts_com_acme_product_1?.map(p => p.product_id) // Filter and transform entities event?.contexts_com_acme_product_1 ?.filter(p => p.price > 10) ?.map(p => ({ id: p.product_id, name: p.product_name })) ``` --- # Data destinations overview > Send Snowplow data to warehouses, lakes, and third-party platforms with event forwarding, native integrations, and reverse ETL for data activation. > Source: https://docs.snowplow.io/docs/destinations/ Read more about Snowplow data destinations [here](/docs/fundamentals/destinations/). [Data warehouses and data lakes](/docs/destinations/warehouses-lakes/) are primary destinations for Snowplow data. Snowplow also supports many ways to [forward events](/docs/destinations/forwarding-events/) to a variety of platforms. Snowplow has native integrations as well as supporting Snowplow, vendor and community authored destinations via Google Tag Manager Server Side. Finally, [reverse ETL](/docs/destinations/reverse-etl/) enables you to publish data from your warehouse directly to marketing platforms, where it can be activated. --- # Reverse ETL for data activation > Activate warehouse data by syncing it to marketing platforms with reverse ETL, enabling sophisticated audience targeting based on predictive models and behavioral insights. > Source: https://docs.snowplow.io/docs/destinations/reverse-etl/ Snowplow and Reverse ETL represents best in class tooling for companies executing more sophisticated use cases with their behavioral data. As one example of where this approach is beneficial, many organizations begin with marketing use cases by creating simple segments but quickly want to target their ads more effectively wanting to incorporate customer propensity to buy and predictive lifetime value. That increase in sophistication can only come from building a deep understanding of users in a place like a data warehouse with modeling tooling (AI/ML) and Reverse ETL. This can only be done repeatedly, and with confidence with excellent governance practises which comes from Snowplow’s compliance controls (i.e. controlling which data is sent to 3rd parties), schematised workflows and UI/API for management. Targeting in a sophisticated way ensure resource is allocated best as well (e.g. don’t target users who have already purchased). Snowplow and Reverse ETL is for organizations who want to: - Adapt to changes in customer behavior and the business questions being asked. - Use Rich, extensible behavioral data. - Maintain high quality data due to validation and private cloud deployment. - Activate very high value audiences based on propensity to convert. - Execute well on dozens of other use cases. ![](/assets/images/reverseetl-70742060395d2f5b62046145cceaf1e2.png) ## What problem does Reverse ETL solve? Organizations have invested in building a high quality data asset in their data warehouse to power numerous use cases, so naturally want to use this to effectively target their users. Reverse ETL enables organizations to take the output of the intelligence they've built using all their customer data (behavioral and non-behavioral) and publish that directly to marketing platforms where it can be activated. ## Reverse ETL Platforms Reverse ETL helps organizations operationalize the data in their warehouse by syncing it with other SaaS solutions such as Salesforce and Google Ads. Snowplow partners with and recommends Census as Reverse ETL platforms to allow organisations to achieve the use cases described above. - [Census](https://www.getcensus.com/) --- # Load Snowplow data to BigQuery > Send Snowplow data to BigQuery for analytics and data warehousing with automatic table creation, schema evolution, and cross-batch deduplication. > Source: https://docs.snowplow.io/docs/destinations/warehouses-lakes/bigquery/ > **Info:** The BigQuery integration is available for Snowplow pipelines running on **AWS** and **GCP**. The Snowplow BigQuery integration allows you to load enriched event data (as well as [failed events](/docs/fundamentals/failed-events/)) directly into your BigQuery datasets for analytics, data modeling, and more. ## What you will need Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process. > **Tip:** The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration. Keep in mind that you will need to be able to: - Provide your Google Cloud Project ID and region - Allow-list Snowplow IP addresses - Specify the desired dataset name - Create a service account with the `roles/bigquery.dataEditor` permission (more permissions will be required for loading failed events and setting up [Data Quality Dashboard](/docs/monitoring/#data-quality-dashboard)) ## Getting started You can add a BigQuery destination through the Snowplow Console. (For self-hosted customers, please refer to the [Loader API reference](/docs/api-reference/loaders-storage-targets/bigquery-loader/) instead.) ### Step 1: Create a connection 1. In Console, navigate to **Destinations** > **Connections** 2. Select **Set up connection** 3. Choose **Loader connection**, then **BigQuery** 4. Follow the steps to provide all the necessary values 5. Click **Complete setup** to create the connection ### Step 2: Create a loader 1. In Console, navigate to **Destinations** > **Destination list**. Switch to the **Available** tab and select **BigQuery** 2. **Select a pipeline**: choose the pipeline where you want to deploy the loader. 3. **Select your connection**: choose the connection you configured in step 1. 4. **Select the type of events**: enriched events or failed events 5. Click **Continue** to deploy the loader You can review active destinations and loaders by navigating to **Destinations** > **Destination list**. ## How loading works The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields. > **Tip:** For more details on the loading flow, see the [BigQuery Loader](/docs/api-reference/loaders-storage-targets/bigquery-loader/) reference page, where you will find additional information and diagrams. ## Snowplow data format in BigQuery All events are loaded into a single table (`events`). There are dedicated columns for [atomic fields](/docs/fundamentals/canonical-event/), such as `app_id`, `user_id` and so on: | app\_id | collector\_tstamp | ... | event\_id | ... | user\_id | ... | | ------- | ----------------------- | --- | ------------------------------------ | --- | ------------------------------------ | --- | | website | 2025-05-06 12:30:05.123 | ... | c6ef3124-b53a-4b13-a233-0088f79dcbcb | ... | c94f860b-1266-4dad-ae57-3a36a414a521 | ... | Snowplow data also includes customizable [self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/). These use [schemas](/docs/fundamentals/schemas/) to define which fields should be present, and of what type (e.g. string, number). For self-describing events and entities, there are additional columns, like so: | app\_id | ... | unstruct\_event\_com\_acme\_button\_press\_1 | contexts\_com\_acme\_product\_1 | | ------- | --- | ------------------------------------------------------- | -------------------------------------------------------------- | | website | ... | data for your custom `button_press` event (as `RECORD`) | data for your custom `product` entities (as `REPEATED RECORD`) | Note: - "unstruct\[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively - the `_1` suffix represents the major version of the schema (e.g. `1-x-y`) You can learn more [in the API reference section](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). > **Tip:** Check this [guide on querying](/docs/destinations/warehouses-lakes/querying-data/?warehouse=bigquery) Snowplow data. --- # Load Snowplow data to Databricks > Send Snowplow data to Databricks for analytics and data processing with Delta Lake tables, schema evolution, and lakehouse architecture support. > Source: https://docs.snowplow.io/docs/destinations/warehouses-lakes/databricks/ > **Info:** The Databricks integration is available for Snowplow pipelines running on **AWS**, **Azure** and **GCP**. The Snowplow Databricks integration allows you to load enriched event data (as well as [failed events](/docs/fundamentals/failed-events/)) into your Databricks environment for analytics, data modeling, and more. Depending on the cloud provider for your Snowplow pipeline, there are different options for this integration: | Integration | AWS | Azure | GCP | Failed events support | | ------------------------------------------------------------------------------------------------------------------- | --- | ----- | --- | --------------------- | | Direct, batch-based ([RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/)) | ✅ | ❌ | ❌ | ❌ | | Via Delta Lake ([Lake Loader](/docs/api-reference/loaders-storage-targets/lake-loader/)) | ❌¹ | ✅² | ✅² | ✅ | | Streaming / Lakeflow ([Streaming Loader](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/)) | ✅ | ✅ | ✅ | ✅ | _¹ Delta+Databricks combination is currently not supported for AWS pipelines. The loader uses DynamoDB tables for mutually exclusive writes to S3, a feature of Delta. Databricks, however, does not support this (as of September 2025). This means that it’s not possible to alter the data via Databricks (e.g. to run `OPTIMIZE` or to delete PII)._ _² The lake must be in the same cloud as the pipeline._ ## What you will need Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process. > **Tip:** The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration. Keep in mind that you will need to be able to do a few things. **Batch-based (AWS):** - Provide a Databricks cluster along with its URL - Specify the Unity catalog name and schema name - Create an access token with the following permissions: - `USE CATALOG` on the catalog - `USE SCHEMA` and `CREATE TABLE` on the schema - `CAN USE` on the SQL warehouse **Via Delta Lake (Azure, GCP):** See [Delta Lake](/docs/destinations/warehouses-lakes/delta/). **Streaming:** - Create an S3 or GCS bucket or ADLS storage container, located in the same cloud and region as your Databricks instance - Create a storage credential to allow Databricks to access the bucket or container - Create an external location and a volume within Databricks pointing to the above - Provide a Databricks SQL warehouse URL, Unity catalog name and schema name - Create a service principal and grant the following permissions: - `USE CATALOG` on the catalog - `USE SCHEMA` and `CREATE TABLE` on the schema - `READ VOLUME` and `WRITE VOLUME` on the volume - `CAN USE` on the SQL warehouse (for testing the connection and monitoring, e.g. as part of the [Data Quality Dashboard](/docs/monitoring/#data-quality-dashboard)) Note that Lakeflow features require a Premium Databricks account. You might also need Databricks metastore admin privileges for some of the steps. *** ## Getting started You can add a Databricks destination through the Snowplow Console. **Batch-based (AWS):** (For self-hosted customers, please refer to the [Loader API reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) instead.) ### Step 1: Create a connection 1. In Console, navigate to **Destinations** > **Connections** 2. Select **Set up connection** 3. Choose **Loader connection**, then **Databricks** 4. Follow the steps to provide all the necessary values 5. Click **Complete setup** to create the connection ### Step 2: Create a loader 1. In Console, navigate to **Destinations** > **Destination list**. Switch to the **Available** tab and select **Databricks** 2. **Select a pipeline**: choose the pipeline where you want to deploy the loader. 3. **Select your connection**: choose the connection you configured in step 1. 4. **Select the type of events**: enriched events or failed events 5. Click **Continue** to deploy the loader You can review active destinations and loaders by navigating to **Destinations** > **Destination list**. **Via Delta Lake (Azure, GCP):** Follow the instructions for [Delta Lake](/docs/destinations/warehouses-lakes/delta/#getting-started). Then create an external table in Databricks pointing to the Delta Lake location. **Streaming:** (For self-hosted customers, please refer to the [Loader API reference](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/) instead.) ### Step 1: Create a connection 1. In Console, navigate to **Destinations** > **Connections** 2. Select **Set up connection** 3. Choose **Loader connection**, then **Databricks Streaming** 4. Follow the steps to provide all the necessary values 5. Click **Complete setup** to create the connection ### Step 2: Create a loader 1. In Console, navigate to **Destinations** > **Destination list**. Switch to the **Available** tab and select **Databricks** 2. **Select a pipeline**: choose the pipeline where you want to deploy the loader. 3. **Select your connection**: choose the connection you configured in step 1. 4. **Select the type of events**: enriched events or failed events 5. Click **Continue** to deploy the loader You can review active destinations and loaders by navigating to **Destinations** > **Destination list**. Once the loader is up and running, click on the “...” button in the **Loaders** table and select **Databricks setup instructions**. Follow the outlined steps to create a Lakeflow pipeline within Databricks. *** ## How loading works The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields. **Batch-based (AWS):** For more details on the loading flow, see the [RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) reference page, where you will find additional information and diagrams. **Via Delta Lake (Azure, GCP):** For more details on the loading flow, see the [Lake Loader](/docs/api-reference/loaders-storage-targets/lake-loader/) reference page, where you will find additional information and diagrams. **Streaming:** For more details on the loading flow, see the [Databricks Streaming Loader](/docs/api-reference/loaders-storage-targets/databricks-streaming-loader/) reference page, where you will find additional information and diagrams. *** ## Snowplow data format in Databricks All events are loaded into a single table (`events`). There are dedicated columns for [atomic fields](/docs/fundamentals/canonical-event/), such as `app_id`, `user_id` and so on: | app\_id | collector\_tstamp | ... | event\_id | ... | user\_id | ... | | ------- | ----------------------- | --- | ------------------------------------ | --- | ------------------------------------ | --- | | website | 2025-05-06 12:30:05.123 | ... | c6ef3124-b53a-4b13-a233-0088f79dcbcb | ... | c94f860b-1266-4dad-ae57-3a36a414a521 | ... | Snowplow data also includes customizable [self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/). These use [schemas](/docs/fundamentals/schemas/) to define which fields should be present, and of what type (e.g. string, number). For self-describing events and entities, there are additional columns, like so: | app\_id | ... | unstruct\_event\_com\_acme\_button\_press\_1 | contexts\_com\_acme\_product\_1 | | ------- | --- | ------------------------------------------------------- | ---------------------------------------------------------------- | | website | ... | data for your custom `button_press` event (as `STRUCT`) | data for your custom `product` entities (as `ARRAY` of `STRUCT`) | Note: - "unstruct\[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively - the `_1` suffix represents the major version of the schema (e.g. `1-x-y`) You can learn more [in the API reference section](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). > **Tip:** Check this [guide on querying](/docs/destinations/warehouses-lakes/querying-data/?warehouse=databricks) Snowplow data. --- # Load Snowplow data to Delta Lake > Send Snowplow data to Delta Lake for analytics and data processing with ACID transactions, schema evolution, and time travel capabilities. > Source: https://docs.snowplow.io/docs/destinations/warehouses-lakes/delta/ > **Info:** The Delta Lake integration is available for Snowplow pipelines running on **AWS**, **Azure** and **GCP**. Delta Lake is an open table format for data lake architectures. The Snowplow Delta integration allows you to load enriched event data (as well as [failed events](/docs/fundamentals/failed-events/)) into Delta tables in your data lake for analytics, data modeling, and more. Data in Delta Lake can be consumed using various tools and products, for example: - Amazon Athena - Apache Spark or Amazon EMR - Databricks¹ - Microsoft Synapse Analytics - Microsoft Fabric _¹ Delta+Databricks combination is currently not supported for AWS pipelines. The loader uses DynamoDB tables for mutually exclusive writes to S3, a feature of Delta. Databricks, however, does not support this (as of September 2025). This means that it’s not possible to alter the data via Databricks (e.g. to run `OPTIMIZE` or to delete PII)._ > **Note:** Currently, we only support loading to a lake in the same cloud as your Snowplow pipeline. ## What you will need Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process. > **Tip:** The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration. Keep in mind that you will need to be able to: **AWS:** - Provide an S3 bucket - Create a DynamoDB table (required for file locking) - Create an IAM role with the following permissions: - For the S3 bucket: - `s3:ListBucket` - `s3:GetObject` - `s3:PutObject` - `s3:DeleteObject` - `s3:ListBucketMultipartUploads` - `s3:AbortMultipartUpload` - For the DynamoDB table: - `dynamodb:DescribeTable` - `dynamodb:Query` - `dynamodb:Scan` - `dynamodb:GetItem` - `dynamodb:PutItem` - `dynamodb:UpdateItem` - `dynamodb:DeleteItem` - Schedule a regular job to optimize the lake **GCP:** - Provide a GCS bucket - Create a service account with the `roles/storage.objectUser` role on the bucket - Create and provide a service account key **Azure:** - Provide an ADLS storage container - Create a new App Registration with the `Storage Blob Data Contributor` permission - Provide the registration tenant ID, client ID and client secret *** ## Getting started You can add a Delta Lake destination through the Snowplow Console. (For self-hosted customers, please refer to the [Loader API reference](/docs/api-reference/loaders-storage-targets/lake-loader/) instead.) ### Step 1: Create a connection 1. In Console, navigate to **Destinations** > **Connections** 2. Select **Set up connection** 3. Choose **Loader connection**, then **Delta** 4. Follow the steps to provide all the necessary values 5. Click **Complete setup** to create the connection ### Step 2: Create a loader 1. In Console, navigate to **Destinations** > **Destination list**. Switch to the **Available** tab and select **Delta** 2. **Select a pipeline**: choose the pipeline where you want to deploy the loader. 3. **Select your connection**: choose the connection you configured in step 1. 4. **Select the type of events**: enriched events or failed events 5. Click **Continue** to deploy the loader You can review active destinations and loaders by navigating to **Destinations** > **Destination list**. We recommend scheduling regular [lake maintenance jobs](/docs/api-reference/loaders-storage-targets/lake-loader/maintenance/?lake-format=delta) to ensure the best long-term performance. ## How loading works The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields. For more details on the loading flow, see the [Lake Loader](/docs/api-reference/loaders-storage-targets/lake-loader/) reference page, where you will find additional information and diagrams. ## Snowplow data format in Delta Lake All events are loaded into a single table (`events`). There are dedicated columns for [atomic fields](/docs/fundamentals/canonical-event/), such as `app_id`, `user_id` and so on: | app\_id | collector\_tstamp | ... | event\_id | ... | user\_id | ... | | ------- | ----------------------- | --- | ------------------------------------ | --- | ------------------------------------ | --- | | website | 2025-05-06 12:30:05.123 | ... | c6ef3124-b53a-4b13-a233-0088f79dcbcb | ... | c94f860b-1266-4dad-ae57-3a36a414a521 | ... | Snowplow data also includes customizable [self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/). These use [schemas](/docs/fundamentals/schemas/) to define which fields should be present, and of what type (e.g. string, number). For self-describing events and entities, there are additional columns, like so: | app\_id | ... | unstruct\_event\_com\_acme\_button\_press\_1 | contexts\_com\_acme\_product\_1 | | ------- | --- | ------------------------------------------------------- | ---------------------------------------------------------------- | | website | ... | data for your custom `button_press` event (as `STRUCT`) | data for your custom `product` entities (as `ARRAY` of `STRUCT`) | Note: - "unstruct\[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively - the `_1` suffix represents the major version of the schema (e.g. `1-x-y`) You can learn more [in the API reference section](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). > **Tip:** Check this [guide on querying](/docs/destinations/warehouses-lakes/querying-data/?warehouse=databricks) Snowplow data. (You will need a query engine such as Spark SQL or Databricks to query Delta tables.) --- # Load Snowplow data to Apache Iceberg > Send Snowplow data to Apache Iceberg data lakes for analytics and data processing with open table format, schema evolution, and cross-engine compatibility. > Source: https://docs.snowplow.io/docs/destinations/warehouses-lakes/iceberg/ > **Info:** The Iceberg integration is available for Snowplow pipelines running on **AWS** and **GCP** only. Apache Iceberg is an open table format for data lake architectures. The Snowplow Iceberg integration allows you to load enriched event data (as well as [failed events](/docs/fundamentals/failed-events/)) into Iceberg tables in your data lake for analytics, data modeling, and more. Iceberg data can be consumed using various tools and products, for example: - Amazon Athena - Amazon Redshift Spectrum - Apache Spark or Amazon EMR - Snowflake - ClickHouse We currently support the following catalogs: | Catalog | AWS | GCP | | ------- | --- | --- | | Glue | ✅ | ❌ | | REST¹ | ✅ | ✅ | _¹The REST catalog has only been tested with the Snowflake Open Catalog implementation._ ## What you will need Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process. > **Tip:** The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration. Keep in mind that you will need to be able to: **REST:** - Specify your Snowflake Open Catalog account id and region, as well as namespace - Create a service connection to the catalog and provide the client id and client secret **AWS Glue:** - Specify your AWS account ID - Provide an S3 bucket and an AWS Glue database - Create an IAM role with the following permissions: - For the S3 bucket: - `s3:ListBucket` - `s3:GetObject` - `s3:PutObject` - `s3:DeleteObject` - For the Glue database: - `glue:CreateTable` - `glue:GetTable` - `glue:UpdateTable` - Schedule a regular job to optimize the lake *** ## Getting started You can add an Iceberg destination through the Snowplow Console. (For self-hosted customers, please refer to the [Loader API reference](/docs/api-reference/loaders-storage-targets/lake-loader/) instead.) ### Step 1: Create a connection 1. In Console, navigate to **Destinations** > **Connections** 2. Select **Set up connection** 3. Choose **Loader connection**, then **Iceberg** 4. Follow the steps to provide all the necessary values 5. Click **Complete setup** to create the connection ### Step 2: Create a loader 1. In Console, navigate to **Destinations** > **Destination list**. Switch to the **Available** tab and select **Iceberg** 2. **Select a pipeline**: choose the pipeline where you want to deploy the loader. 3. **Select your connection**: choose the connection you configured in step 1. 4. **Select the type of events**: enriched events or failed events 5. Click **Continue** to deploy the loader You can review active destinations and loaders by navigating to **Destinations** > **Destination list**. For AWS Glue, we recommend scheduling regular [lake maintenance jobs](/docs/api-reference/loaders-storage-targets/lake-loader/maintenance/?lake-format=iceberg) to ensure the best long-term performance. ## How loading works The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields. For more details on the loading flow, see the [Lake Loader](/docs/api-reference/loaders-storage-targets/lake-loader/) reference page, where you will find additional information and diagrams. ## Snowplow data format in Iceberg All events are loaded into a single table (`events`). There are dedicated columns for [atomic fields](/docs/fundamentals/canonical-event/), such as `app_id`, `user_id` and so on: | app\_id | collector\_tstamp | ... | event\_id | ... | user\_id | ... | | ------- | ----------------------- | --- | ------------------------------------ | --- | ------------------------------------ | --- | | website | 2025-05-06 12:30:05.123 | ... | c6ef3124-b53a-4b13-a233-0088f79dcbcb | ... | c94f860b-1266-4dad-ae57-3a36a414a521 | ... | Snowplow data also includes customizable [self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/). These use [schemas](/docs/fundamentals/schemas/) to define which fields should be present, and of what type (e.g. string, number). For self-describing events and entities, there are additional columns, like so: | app\_id | ... | unstruct\_event\_com\_acme\_button\_press\_1 | contexts\_com\_acme\_product\_1 | | ------- | --- | ------------------------------------------------------- | ---------------------------------------------------------------- | | website | ... | data for your custom `button_press` event (as `STRUCT`) | data for your custom `product` entities (as `ARRAY` of `STRUCT`) | Note: - "unstruct\[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively - the `_1` suffix represents the major version of the schema (e.g. `1-x-y`) You can learn more [in the API reference section](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). > **Tip:** Check this [guide on querying](/docs/destinations/warehouses-lakes/querying-data/?warehouse=databricks) Snowplow data. (You will need a query engine such as Spark SQL or Snowflake to query Iceberg tables.) --- # Supported warehouse and data lake destinations > An overview of the available options for storing Snowplow data in data warehouses and lakes > Source: https://docs.snowplow.io/docs/destinations/warehouses-lakes/ Data warehouses and data lakes are primary destinations for Snowplow data. For other options, see the [destinations overview](/docs/fundamentals/destinations/) page. ### Data warehouses - [Snowflake](/docs/destinations/warehouses-lakes/snowflake/) - [Databricks](/docs/destinations/warehouses-lakes/databricks/) - [BigQuery](/docs/destinations/warehouses-lakes/bigquery/) - [Redshift](/docs/destinations/warehouses-lakes/redshift/) ### Data lakes - [Iceberg](/docs/destinations/warehouses-lakes/iceberg/) - [Delta Lake](/docs/destinations/warehouses-lakes/delta/) --- # How to query Snowplow data in the warehouse > Introduction to querying Snowplow data in warehouses including self-describing events, entities, and handling duplicate events with SQL techniques. > Source: https://docs.snowplow.io/docs/destinations/warehouses-lakes/querying-data/ You will typically find most of your Snowplow data in the `events` table. If you are using Redshift, there will be extra tables for [self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/) — see [below](#self-describing-events). Please refer to [the structure of Snowplow data](/docs/fundamentals/canonical-event/) for the principles behind our approach, as well as the descriptions of the various standard columns. > **Tip:** Querying the `events` table directly can be useful for exploring your events or building custom analytics. However, for many common use cases it’s much easier to use our [data models](/docs/modeling-your-data/modeling-your-data-with-dbt/), which provide a pre-aggregated view of your data. The simplest query could look like this: ```sql SELECT * FROM WHERE event_name = 'page_view' ``` You will need to replace `` with the appropriate location — the database, schema and table name will depend on your configuration. > **Warning:** With large data volumes (read: any production system), you should always include a filter on the partition key (normally, `collector_tstamp`), for example: > > ```sql > WHERE ... AND collector_tstamp between timestamp '2023-10-23' and timestamp '2023-11-23' > ``` > > This ensures that you read from the minimum number of (micro-)partitions necessary, making the query run much faster and reducing compute cost (where applicable). ## Self-describing events [Self-describing events](/docs/fundamentals/events/#self-describing-events) can contain their own set of fields, defined by their [schema](/docs/fundamentals/schemas/). **Redshift:** For Redshift users, self-describing events are not part of the standard `events` table. Instead, each type of event is in its own table. The table name and the fields in the table will be determined by the event’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query just the table for that particular self-describing event, if that's all that's required for your analysis, or join that table back to the `events` table: ```sql SELECT ... FROM . ev LEFT JOIN .my_example_event_table sde ON sde.root_id = ev.event_id AND sde.root_tstamp = ev.collector_tstamp ``` > **Warning:** You may need to take care of [duplicate events](#dealing-with-duplicates). **BigQuery:** Each type of self-describing event is in a dedicated `RECORD`-type column. The column name and the fields in the record will be determined by the event’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query fields in the self-describing event like so: ```sql SELECT ... unstruct_event_my_example_event_1.my_field, ... FROM ``` > **Note:** The column name produced by previous versions of the BigQuery Loader (<2.0.0) would contain full schema version, e.g. `unstruct_event_my_example_event_1_0_0`. The [BigQuery Loader upgrade guide](/docs/api-reference/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/) describes how to enable the legacy column names in the 2.0.0 loader. **Snowflake:** Each type of self-describing event is in a dedicated `OBJECT`-type column. The column name will be determined by the event’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query fields in the self-describing event like so: ```sql SELECT ... unstruct_event_my_example_event_1:myField::varchar, -- field will be variant type so important to cast ... FROM ``` **Databricks, Spark SQL:** Each type of self-describing event is in a dedicated `STRUCT`-type column. The column name and the fields in the `STRUCT` will be determined by the event’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query fields in the self-describing event by extracting them like so: ```sql SELECT ... unstruct_event_my_example_event_1.my_field, ... FROM ``` **Synapse Analytics:** Each type of self-describing event is in a dedicated column in JSON format. The column name will be determined by the event’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query fields in the self-describing event like so: ```sql SELECT ... JSON_VALUE(unstruct_event_my_example_event_1, '$.my_field') ... FROM OPENROWSET(BULK 'events', DATA_SOURCE = '', FORMAT = 'DELTA') AS events ``` *** ## Entities [Entities](/docs/fundamentals/entities/) (also known as contexts) provide extra information about the event, such as data describing a product or a user. **Redshift:** For Redshift users, entities are not part of the standard `events` table. Instead, each type of entity is in its own table. The table name and the fields in the table will be determined by the entity’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. The entities can be joined back to the core `events` table by the following, which is a one-to-one join (for a single record entity) or a one-to-many join (for a multi-record entity), assuming no duplicates. ```sql SELECT ... FROM . ev LEFT JOIN -- assumes no duplicates, and will return all events regardless of if they have this entity .my_entity ent ON ent.root_id = ev.event_id AND ent.root_tstamp = ev.collector_tstamp ``` > **Warning:** You may need to take care of [duplicate events](#dealing-with-duplicates). **BigQuery:** Each type of entity is in a dedicated `REPEATED RECORD`-type column. The column name and the fields in the record will be determined by the entity’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query a single entity’s fields by extracting them like so: ```sql SELECT ... contexts_my_entity_1[SAFE_OFFSET(0)].my_field AS my_field, ... FROM ``` Alternatively, you can use the [`unnest`](https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arrays) function to explode out the array into one row per entity value. ```sql SELECT ... my_ent.my_field AS my_field, ... FROM LEFT JOIN unnest(contexts_my_entity_1) AS my_ent -- left join to avoid discarding events without values in this entity ``` > **Note:** Column name produced by previous versions of the BigQuery Loader (<2.0.0) would contain full schema version, e.g. `contexts_my_entity_1_0_0`. **Snowflake:** Each type of entity is in a dedicated `ARRAY`-type column. The column name will be determined by the entity’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query a single entity’s fields by extracting them like so: ```sql SELECT ... contexts_my_entity_1[0]:myField::varchar, -- field will be variant type so important to cast ... FROM ``` Alternatively, you can use the [`lateral flatten`](https://docs.snowflake.com/en/sql-reference/functions/flatten) function to explode out the array into one row per entity value. ```sql SELECT ... r.value:myField::varchar, -- field will be variant type so important to cast ... FROM AS t, LATERAL FLATTEN(input => t.contexts_my_entity_1) r ``` **Databricks, Spark SQL:** Each type of entity is in a dedicated `ARRAY`-type column. The column name and the fields in the `STRUCT` will be determined by the entity’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query a single entity’s fields by extracting them like so: ```sql SELECT ... contexts_my_entity_1[0].my_field, ... FROM ``` Alternatively, you can use the [`LATERAL VIEW`](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-lateral-view.html) clause combined with [`EXPLODE`](https://docs.databricks.com/sql/language-manual/functions/explode.html) to explode out the array into one row per entity value. ```sql SELECT ... my_ent.my_field, ... FROM LATERAL VIEW EXPLODE(contexts_my_entity_1) AS my_ent ``` **Synapse Analytics:** Each type of entity is in a dedicated column in JSON format. The column name will be determined by the entity’s schema. See [how schemas translate to the warehouse](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/) for more details. You can query a single entity’s fields by extracting them like so: ```sql SELECT ... JSON_VALUE(contexts_my_entity_1, '$[0].my_field') ... FROM OPENROWSET(BULK 'events', DATA_SOURCE = '', FORMAT = 'DELTA') AS events ``` Alternatively, you can use the [`CROSS APPLY` clause combined with `OPENJSON`](https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-nested-types#project-values-from-repeated-columns) to explode out the array into one row per entity value. ```sql SELECT ... JSON_VALUE(my_ent.[value], '$.my_field') ... FROM OPENROWSET(BULK 'events', DATA_SOURCE = '', FORMAT = 'DELTA') as events CROSS APPLY OPENJSON(contexts_my_entity_1) AS my_ent ``` *** ## Failed events See [Exploring failed events](/docs/monitoring/exploring-failed-events/). ## Dealing with duplicates In some cases, your data might contain duplicate events (full deduplication _before_ the data lands in the warehouse is optionally available for [Redshift, Snowflake and Databricks on AWS](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/deduplication/)). While our [data models](/docs/modeling-your-data/modeling-your-data-with-dbt/) deal with duplicates for you, there may be cases where you need to de-duplicate the events table yourself. **Redshift:** In Redshift, you must first generate a `ROW_NUMBER()` on your events and use this to de-duplicate. ```sql WITH unique_events AS ( SELECT ... ROW_NUMBER() OVER (PARTITION BY a.event_id ORDER BY a.collector_tstamp) AS event_id_dedupe_index FROM a ) SELECT ... FROM unique_events WHERE event_id_dedupe_index = 1 ``` Things get a little more complicated if you want to join your event data with a table containing [entities](#entities). Suppose your entity is called `my_entity`. If you know that each of your events has at most 1 such entity attached, the de-duplication requires the use of a row number over `event_id` to get each unique event: ```sql WITH unique_events AS ( SELECT ev.*, ROW_NUMBER() OVER (PARTITION BY a.event_id ORDER BY a.collector_tstamp) AS event_id_dedupe_index FROM . ev ), unique_my_entity AS ( SELECT ent.*, ROW_NUMBER() OVER (PARTITION BY a.root_id ORDER BY a.root_tstamp) AS my_entity_index FROM .my_entity_1 ent ) SELECT ... FROM unique_events u_ev LEFT JOIN unique_my_entity u_ent ON u_ent.root_id = u_ev.event_id AND u_ent.root_tstamp = u_ev.collector_tstamp AND u_ent.my_entity_index = 1 WHERE u_ev.event_id_dedupe_index = 1 ``` If your events might have more than one `my_entity` attached, the logic is slightly more complex. **Details** First, de-duplicate the events table in the same way as above, but also keep track of the number of duplicates (see `event_id_dedupe_count` below). In the entity table, generate a row number per unique combination of _all_ fields in the record. Then join on `root_id` and `root_tstamp` as before, but with an _additional_ clause that the row number is a multiple of the number of duplicates, to support the 1-to-many join. This ensures all duplicates are removed while retaining all original records of the entity. This may look like a weird join condition, but it works. Unfortunately, listing all fields manually can be quite tedious, but we have added support for this in the [de-duplication logic](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/deduplication/) of our dbt packages. ```sql WITH unique_events AS ( SELECT ev.*, ROW_NUMBER() OVER (PARTITION BY a.event_id ORDER BY a.collector_tstamp) AS event_id_dedupe_index, COUNT(*) OVER (PARTITION BY a.event_id) AS event_id_dedupe_count FROM . ev ), unique_my_entity AS ( SELECT ent.*, ROW_NUMBER() OVER (PARTITION BY a.root_id, a.root_tstamp, ... /*all columns listed here for your entity */ ORDER BY a.root_tstamp) AS my_entity_index FROM .my_entity_1 ent ) SELECT ... FROM unique_events u_ev LEFT JOIN unique_my_entity u_ent ON u_ent.root_id = u_ev.event_id AND u_ent.root_tstamp = u_ev.collector_tstamp AND mod(u_ent.my_entity_index, u_ev.event_id_dedupe_count) = 0 WHERE u_ev.event_id_dedupe_index = 1 ``` **BigQuery:** In BigQuery it is as simple as using a `QUALIFY` statement over your initial query: ```sql SELECT ... FROM a QUALIFY ROW_NUMBER() OVER (PARTITION BY a.event_id ORDER BY a.collector_tstamp) = 1 ``` **Snowflake:** In Snowflake it is as simple as using a `qualify` statement over your initial query: ```sql SELECT ... FROM a QUALIFY ROW_NUMBER() OVER (PARTITION BY a.event_id ORDER BY a.collector_tstamp) = 1 ``` **Databricks, Spark SQL:** In Databricks it is as simple as using a `qualify` statement over your initial query: ```sql SELECT ... FROM a QUALIFY ROW_NUMBER() OVER (PARTITION BY a.event_id ORDER BY a.collector_tstamp) = 1 ``` **Synapse Analytics:** In Synapse you must first generate a `ROW_NUMBER()` on your events and use this to de-duplicate. ```sql WITH unique_events AS ( SELECT ... ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY collector_tstamp) AS event_id_dedupe_index FROM OPENROWSET(BULK 'events', DATA_SOURCE = '', FORMAT = 'DELTA') AS events ) SELECT ... FROM unique_events WHERE event_id_dedupe_index = 1 ``` *** --- # Load Snowplow data to Amazon Redshift > Send Snowplow data to Amazon Redshift for analytics and data warehousing with automatic table creation, schema evolution, and optimized batch loading from S3. > Source: https://docs.snowplow.io/docs/destinations/warehouses-lakes/redshift/ > **Info:** The Redshift integration is available for Snowplow pipelines running on **AWS** only. The Snowplow Redshift integration allows you to load enriched event data directly into your Redshift cluster (including Redshift serverless) for analytics, data modeling, and more. ## What you will need Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process. > **Tip:** The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration. Keep in mind that you will need to be able to: - Provide your Redshift cluster endpoint and connection details - Allow-list Snowplow IP addresses - Specify the desired database and schema names - Create a user and a role with the following permissions: - Schema ownership (`CREATE SCHEMA ... AUTHORIZATION`) - `SELECT` on system tables (`svv_table_info`, `svv_interleaved_columns`, `stv_interleaved_counts`) — this is required for maintenance jobs ## Getting started You can add a Redshift destination through the Snowplow Console. (For self-hosted customers, please refer to the [Loader API reference](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) instead.) ### Step 1: Create a connection 1. In Console, navigate to **Destinations** > **Connections** 2. Select **Set up connection** 3. Choose **Loader connection**, then **Redshift** 4. Follow the steps to provide all the necessary values 5. Click **Complete setup** to create the connection ### Step 2: Create a loader 1. In Console, navigate to **Destinations** > **Destination list**. Switch to the **Available** tab and select **Redshift** 2. **Select a pipeline**: choose the pipeline where you want to deploy the loader. 3. **Select your connection**: choose the connection you configured in step 1. 4. Click **Continue** to deploy the loader You can review active destinations and loaders by navigating to **Destinations** > **Destination list**. ## How loading works The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields. For more details on the loading flow, see the [RDB Loader](/docs/api-reference/loaders-storage-targets/snowplow-rdb-loader/) reference page, where you will find additional information and diagrams. ## Snowplow data format in Redshift The event data is split across multiple tables. The main table (`events`) contains the [atomic fields](/docs/fundamentals/canonical-event/), such as `app_id`, `user_id` and so on: | app\_id | collector\_tstamp | ... | event\_id | ... | user\_id | ... | | ------- | ----------------------- | --- | ------------------------------------ | --- | ------------------------------------ | --- | | website | 2025-05-06 12:30:05.123 | ... | c6ef3124-b53a-4b13-a233-0088f79dcbcb | ... | c94f860b-1266-4dad-ae57-3a36a414a521 | ... | Snowplow data also includes customizable [self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/). These use [schemas](/docs/fundamentals/schemas/) to define which fields should be present, and of what type (e.g. string, number). For each type of self-describing event and entity, there are additional tables that can be joined with the main table: **unstruct\_event\_com\_acme\_button\_press\_1** | root\_id | root\_tstamp | button\_name | button\_color | ... | | ------------------------------------ | ----------------------- | ------------ | ------------- | --- | | c6ef3124-b53a-4b13-a233-0088f79dcbcb | 2025-05-06 12:30:05.123 | Cancel | red | ... | **contexts\_com\_acme\_product\_1** | root\_id | root\_tstamp | name | price | ... | | ------------------------------------ | ----------------------- | ------ | ----- | --- | | c6ef3124-b53a-4b13-a233-0088f79dcbcb | 2025-05-06 12:30:05.123 | Salt | 2.60 | ... | | c6ef3124-b53a-4b13-a233-0088f79dcbcb | 2025-05-06 12:30:05.123 | Pepper | 3.10 | ... | Note: - "unstruct\[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively - the `_1` suffix represents the major version of the schema (e.g. `1-x-y`) You can learn more [in the API reference section](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). > **Tip:** Check this [guide on querying](/docs/destinations/warehouses-lakes/querying-data/?warehouse=redshift) Snowplow data. --- # Load Snowplow data to Snowflake > Send Snowplow data to Snowflake for analytics and data warehousing with automatic table management, schema evolution, and efficient data loading via Snowpipe or batch. > Source: https://docs.snowplow.io/docs/destinations/warehouses-lakes/snowflake/ > **Info:** The Snowflake integration is available for Snowplow pipelines running on **AWS**, **Azure** and **GCP**. The Snowplow Snowflake integration allows you to load enriched event data (as well as [failed events](/docs/fundamentals/failed-events/)) directly into your Snowflake warehouse for analytics, data modeling, and more. ## What you will need Connecting to a destination always involves configuring cloud resources and granting permissions. It's a good idea to make sure you have sufficient priviliges before you begin the setup process. > **Tip:** The list below is just a heads up. The Snowplow Console will guide you through the exact steps to set up the integration. Keep in mind that you will need to be able to: - Provide your Snowflake account locator URL, cloud provider and region - Allow-list Snowplow IP addresses - Generate a key pair for key-based authentication - Specify the desired database and schema names, as well as a warehouse name - Create a role with the following permissions: - `USAGE`, `OPERATE` on warehouse (for testing the connection and monitoring, e.g. as part of the [Data Quality Dashboard](/docs/monitoring/#data-quality-dashboard)) - `USAGE` on database - `ALL` privileges on the target schema ## Getting started You can add a Snowflake destination through the Snowplow Console. (For self-hosted customers, please refer to the [Loader API reference](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/) instead.) ### Step 1: Create a connection 1. In Console, navigate to **Destinations** > **Connections** 2. Select **Set up connection** 3. Choose **Loader connection**, then **Snowflake** 4. Follow the steps to provide all the necessary values 5. Click **Complete setup** to create the connection ### Step 2: Create a loader 1. In Console, navigate to **Destinations** > **Destination list**. Switch to the **Available** tab and select **Snowflake** 2. **Select a pipeline**: choose the pipeline where you want to deploy the loader. 3. **Select your connection**: choose the connection you configured in step 1. 4. **Select the type of events**: enriched events or failed events 5. Click **Continue** to deploy the loader You can review active destinations and loaders by navigating to **Destinations** > **Destination list**. ## How loading works The Snowplow data loading process is engineered for large volumes of data. In addition, our loader applications ensure the best representation of Snowplow events. That includes automatically adjusting the tables to account for your custom data, whether it's new event types or new fields. For more details on the loading flow, see the [Snowflake Streaming Loader](/docs/api-reference/loaders-storage-targets/snowflake-streaming-loader/) reference page, where you will find additional information and diagrams. ## Snowplow data format in Snowflake All events are loaded into a single table (`events`). There are dedicated columns for [atomic fields](/docs/fundamentals/canonical-event/), such as `app_id`, `user_id` and so on: | app\_id | collector\_tstamp | ... | event\_id | ... | user\_id | ... | | ------- | ----------------------- | --- | ------------------------------------ | --- | ------------------------------------ | --- | | website | 2025-05-06 12:30:05.123 | ... | c6ef3124-b53a-4b13-a233-0088f79dcbcb | ... | c94f860b-1266-4dad-ae57-3a36a414a521 | ... | Snowplow data also includes customizable [self-describing events](/docs/fundamentals/events/#self-describing-events) and [entities](/docs/fundamentals/entities/). These use [schemas](/docs/fundamentals/schemas/) to define which fields should be present, and of what type (e.g. string, number). For self-describing events and entities, there are additional columns, like so: | app\_id | ... | unstruct\_event\_com\_acme\_button\_press\_1 | contexts\_com\_acme\_product\_1 | | ------- | --- | ----------------------------------------------------------------- | -------------------------------------------------------------- | | website | ... | data for your custom `button_press` event (as a `VARIANT` object) | data for your custom `product` entities (as a `VARIANT` array) | Note: - "unstruct\[ured] event" and "context" are the legacy terms for self-describing events and entities, respectively - the `_1` suffix represents the major version of the schema (e.g. `1-x-y`) You can learn more [in the API reference section](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). > **Tip:** Check this [guide on querying](/docs/destinations/warehouses-lakes/querying-data/?warehouse=snowflake) Snowplow data. --- # Data structure management overview > Create, manage, and update data structures (schemas) using Snowplow Console UI, Data Structures API, Snowplow CLI, or Iglu for community users. > Source: https://docs.snowplow.io/docs/event-studio/data-structures/ This section explains how to create, manage, and update [data structures](/docs/fundamentals/schemas/) (schemas). Snowplow provides different options for data structure management: - Snowplow Console UI - [Data structures API](/docs/event-studio/programmatic-management/data-structures-api/) - [Snowplow CLI](/docs/event-studio/programmatic-management/snowplow-cli/) - For Community users: [Iglu](/docs/api-reference/iglu/iglu-repositories/iglu-server/) ## Create a data structure To create a data structure within Console, go to **Data collection** > **Data structures** and click the **Create data structure** button. You can use the builder to create simple data structures with basic types and validation rules, or the JSON editor for more complex data structures. > **Info:** The data structure builder supports the following types: > > - String > - Enumerated list > - Integer > - Decimal > - Boolean > > For more complex data structures that require nesting or more advanced data types, use the [JSON editor](/docs/event-studio/data-structures/json-editor/). To understand all available JSON Schema validation options, see the [JSON Schema reference](/docs/api-reference/json-schema-reference/). Populate the general information, such as Name, and a Description and Vendor. Vendor allows you to organize your data structures, for example, by teams. Snowplow will automatically generate the Tracking Url to be referenced in your tracking code. ![](/assets/images/data-structures-1-5b794bc4236fadba42000ce5d714b43f.png) When creating a new [data structure](/docs/fundamentals/schemas/), you can add one or multiple properties. For each property, you can set a name, description, its type and a possible enumeration of allowed values (for type `string`). You can also set additional constraints to define if this property should be optional or mandatory, and if `null` values are allowed. ![](/assets/images/data-structures-2-b23aa8f4043bced0dfd2ac52eb0267d5.png) Click **Save** on the Property dialog box to save your property changes. Clicking **Save** on the data structure page will save your data structure as a draft. At this point, your data structure is not yet deployed to your development environment and cannot be used for event validation. When you're ready to test your data structure, you'll need to deploy it from the draft state to your development environment. ## Working with drafts When you create a new data structure your changes are initially saved as a **draft**. Drafts allow you to: - Make multiple changes without worrying about version numbers - Experiment freely before committing to a final version - Review and refine your data structure before deployment **Important**: Draft data structures are not deployed to your development environment and will not be available for event validation. You must deploy your draft to the development environment when you're ready to test it. This workflow gives you the flexibility to iterate on your data structure design without the overhead of managing version increments for every small change. ## Edit a data structure To edit an existing data structure, navigate to **Data Structures** and locate the data structure you wish to edit. You can more easily find your data structure by: - Using the search facility to search by name or vendor - Ordering the Name column alphabetically - Filtering the listing by Type and / or Vendor Once located, click on the name to view the data structure. You can then select from two options to edit the data structure: **Edit with builder** or **Edit with JSON editor**. > **Note:** The **Edit with builder** option will be unavailable if the data structure you're viewing is not supported. More complex data structures must be edited with the **JSON Editor**. On the edit page, under the General Information panel, you can update the data structure type or its description. To add a new property, cick the "Add Property" button. To edit or delete an existing property, click the three dots next to the property name to open the action menu, and then select the appropriate option. ![](/assets/images/edit-data-structure-a751c564691dd2386c85aae6a9702fb0.png) When you modify the data structure, the builder will mark your changes in yellow, and automatically determine the new version of your data structure based on these modifications. You can reset the data structure and erase all changes at any moment by clicking the **Clear Changes** button found in the alert beneath the properties. If you are satisfied with your changes, click **Save** and make sure to note the newly updated tracking URL. ![](/assets/images/data-structure-version-e2f76da4a575ceedae49575b78954288.png) ## Promote a data structure When you're ready to use your data structure, you need to publish it from draft status to your development environment for testing. Once you are happy with your changes in the development environment, you will want to migrate these changes to your production environment. > **Note:** The action of migrating data structures to production is only available to Admin users. Navigate to **Data structures** and locate the data structure you wish to migrate. You can more easily find your data structure by: - Using the search facility to search by name or vendor - Ordering the Name column alphabetically - Filtering the listing by Type and / or Vendor Once located, either click on the name to view the data structure and then click the "Migrate to production" button, or click the three dots to bring up the action menu where you can select "Migrate to production". ![](/assets/images/image-8-887f63361cd518905ef3ba0aa4294c88.png) At this stage you will see the publish dialog, and depending on how you versioned your edits you will see one of two messages: If you are **publishing a new schema**, or **have incremented** the version whilst editing then you will see a confirmation of the action. Click **Migrate to Production** to migrate the data structure. If you **have patched** the version whilst editing then you will see a warning that you must increment before publishing. Patching the version on Production is not a permitted action. [Increment the version number according to the changes you have made](/docs/event-studio/data-structures/versioning/) and click **Migrate to production** to migrate the latest version of your data structure to your production environment. Your data structure will now be available in your production environment to send events against. ## Hide a data structure Sometimes you will make errors when creating a data structure, or simply be creating new data structures as part of a quick experiment. On these occasions you may wish to hide the schema to clean up the listing in Console. Navigate to **Data structures** and locate the data structure you wish to hide. You can more easily find your data structure by: - Using the search facility to search by name - Ordering the Name column alphabetically - Filtering the listing by Type and / or Vendor Once located either click on the name to view the data structure and then click the **Hide** button, or click the three dots to bring up the action menu where you can select **Hide data structure**. Follow the dialog instructions to confirm the action. > **Note:** Hiding a data structure will not remove it from the registry, it simply hides it from the console listing. This means: > > (1) events can still be sent against this structure (2) you cannot create a new structure of the same name ### Restore a hidden data structure If you have hidden a data structure and wish to restore it, navigate to the bottom of the list of data structures and locate the 'View hidden data structures' link. ![](/assets/images/image-9-aaac318a1695105606bcf8e362e3314a.png) This will take you to a list of hidden data structures, locate the one you wish to restore and click **Restore data structure** to show it in the main listing. ## Externally managed data structures Data structures can be managed from an external repository using [Snowplow CLI](/docs/event-studio/programmatic-management/snowplow-cli/data-structures/). When a data structure is managed this way it becomes locked in the UI, disabling all editing. You will see a banner explaining the situation and giving people with the 'publish to production' (default for admin users) capability the ability to unlock. ![](/assets/images/locked-ds-5318a0b425e253f84213ba78ba9743de.png) > **Warning:** Having a single source of truth for a data structure is a good idea. If your source of truth is an external repository then unlocking and editing will cause conflicts. --- # Create complex data structures with the JSON editor > Define complex data structures with heavy nesting and advanced data types using the JSON Editor for full JSON Schema support. > Source: https://docs.snowplow.io/docs/event-studio/data-structures/json-editor/ > **Info:** The JSON editor is ideal for more complex data structures that require nesting or more advanced data types. For simple data structures, use the [Data Structures Builder](/docs/event-studio/data-structures/). ## Creating a new data structure Select whether you'd like to create an [Event](/docs/fundamentals/events/) or an [Entity](/docs/fundamentals/entities/). You can always change this selection at a later date. ![Choice between builder and JSON editor options](/assets/images/image-2-bad6698ed68cad8d2bd005ce7ff2a082.png) You can now write the first version of your JSON schema for this data structure. Some template JSON is provided in the code window to start you off. For comprehensive guidance on all supported JSON Schema features and validation options, see the [JSON Schema reference](/docs/api-reference/json-schema-reference/). ![](/assets/images/json-template-406dba60e0fcee3f5c03ed6e579372d4.png) Once you are done, click the **Validate** button and we'll validate that your schema is valid JSON markup. Assuming it passes validation, you can save your data structure as a draft. See the [Working with drafts](/docs/event-studio/data-structures/#working-with-drafts) section for more information about the draft workflow. Click **Save as draft** to save your data structure as a draft. As this is the first version of your data structure, it will be created as version `1-0-0` when you later deploy it to your development environment. ## Editing a data structure Make the required edits to the JSON schema. You can use the 'Difference' toggle above the editor to see a 'diff' view against the latest Production version of your data structure. In the example below we have changed the `maxLength` of `example_field_1`. ![](/assets/images/image-5-64bfbf5b5861f6d058d055bf896f5192.png) Once you are happy with your changes, click **Validate** to ensure you have valid JSON markup. Then click **Publish to development environment** to save your changes to your development environment. ![](/assets/images/image-7-9423f8ff7c3adcfd4e7d36858e60a934.png) The versioning dialog will appear, at this point you have three options: - Increment a minor version to indicate a non-breaking change to the schema. In our example, this would increment the schema to from `1-0-1` to `1-0-2`. - Increment a major version to indicate a breaking change to the schema. In our example, this would increment the schema from `1-0-1` to `2-0-0`. - [Patch the current version](/docs/event-studio/data-structures/versioning/#patch-a-schema), this will overwrite the existing schema without increasing the version. In our example, this would leave the schema at 1-0-1. For more information see [Versioning your data structures](/docs/event-studio/data-structures/versioning/). Once you have selected the appropriate version, click **Deploy to development environment** and your data structure will be deployed to your development environment ready [for you to test](/docs/testing/). You can identify data structures where the Development version is ahead of the Production version by the yellow background on the version number. In this example both `user` and `alert` have been edited on development. *** --- # Version and amend data structures > Evolve your tracking design safely with backwards-compatible data structure versioning using JSON schema version numbers to control warehouse loader behavior. > Source: https://docs.snowplow.io/docs/event-studio/data-structures/versioning/ Every data structure is based on a [versioned schema](/docs/fundamentals/schemas/versioning/). There are two kinds of schema changes: - **Non-breaking** - a non-breaking change is backward compatible with historical data and increments the `patch` number i.e. `1-0-0` -> `1-0-1`, or the middle digit i.e. `1-0-0` -> `1-1-0`. - **Breaking** - a breaking change is not backwards compatible with historical data and increments the `model` number i.e. `1-0-0` -> `2-0-0`. Different data warehouses handle schema evolution slightly differently. Use the table below as a guide for incrementing the schema version appropriately. | | Redshift | Snowflake, BigQuery, Databricks | | -------------------------------------------- | ------------ | ------------------------------- | | **Add / remove / rename an optional field** | Non-breaking | Non-breaking | | **Add / remove / rename a required field** | Breaking | Breaking | | **Change a field from optional to required** | Breaking | Breaking | | **Change a field from required to optional** | Breaking | Non-breaking | | **Change the type of an existing field** | Breaking | Breaking | | **Change the size of an existing field** | Non-breaking | Non-breaking | > **Warning:** In Redshift and Databricks, changing _size_ may also mean _type_ change. For example, changing the `maximum` integer from `30000` to `100000`. See our documentation on [how schemas translate to database types](/docs/api-reference/loaders-storage-targets/schemas-in-warehouse/). ## Automatic versioning with the data structure builder Versioning is automated when using the data structure builder to create or edit your custom data structures. It will automatically select how to version up your data structure depending on the changes you have just made. In this example, a new required property has been added to the data structure. This is a breaking change, so the builder will increment the first digit: ![](/assets/images/data-structures-2-1133e290be8b44d1ed1d0f74dcef1c3a.png) In this example, an additional enum option has been added to `category`. This is a non-breaking change, so the builder is incrementing the middle digit: ![](/assets/images/data-structures-1-984215ae8aba92b496bfca0378c9e0be.png) ## Versioning with the JSON editor When using the JSON editor, at the point of publishing a data structure you'll be asked to select which version you'd like to create. ![](/assets/images/json_editor_version_options-bf9596ad2af60a2639de1ee5107f87cd.png) ## Patch a schema To [patch a schema](/docs/fundamentals/schemas/versioning/#patch-a-schema), i.e. apply changes to it without updating the version, select the **Patch** option when saving the schema. Note that various pipeline components, most importantly Enrich (including Enrich embedded in Snowplow Mini and Snowplow Micro), cache schemas to improve performance. The default caching time is 10 minutes (it's controlled by the [Iglu Resolver configuration](/docs/api-reference/iglu/iglu-resolver/)). This means that the effect of patching a schema will not be immediate. > **Note:** If you are using Snowplow Self-Hosted, to patch a schema, don't increment the schema version when [uploading it with `igluctl`](/docs/api-reference/iglu/manage-schemas/). > > You'll need to explicitly enable patching in the [Iglu Server configuration](/docs/api-reference/iglu/iglu-repositories/iglu-server/reference/) (`patchesAllowed`) at your own risk. ## Mark a schema as superseded To [mark a schema as superseded](/docs/fundamentals/schemas/versioning/#mark-a-schema-as-superseded), use the JSON editor and add a `$supersedes` field. --- # Discover and manage events with the Event Catalog > Browse, search, and manage all event specifications across tracking plans from a single location to improve event discoverability and governance. > Source: https://docs.snowplow.io/docs/event-studio/event-catalog/ The Event Catalog provides a centralized location to discover and manage all event specifications across your [tracking plans](/docs/event-studio/tracking-plans/). Instead of navigating through individual tracking plans to find specific events, you can browse, search, and filter all event specifications from a single view. When your organization has multiple tracking plans across different teams and domains, finding specific event specifications can become challenging. The Event Catalog addresses this by: - **Centralizing discovery**: browse all event specifications in one place rather than searching through individual tracking plans - **Improving governance**: maintain oversight of all event specifications and their status across your organization - **Streamlining onboarding**: help new team members understand what events are available and how they're organized - **Enabling cross-team collaboration**: see how different teams have defined similar events and share best practices ## Access the Event Catalog Navigate to **Event Catalog** in the main navigation. ![Event Catalog overview](/assets/images/event-catalog-overview-0239dfa65e17f58839465c6c9b30533a.png) ## Browse event specifications The Event Catalog provides a comprehensive list of all event specifications defined across your tracking plans. Each row displays: | Column | Description | | ------------------------ | ------------------------------------------------------------------ | | Event specification name | The name and schema identifier of the event specification | | Entities | The [entities](/docs/fundamentals/entities/) attached to the event | | Tracking plan | The tracking plan containing the event specification | | Volume | The number of events collected | | Last seen | When the event was last received | | Status | The status of the event specification | ### Filter and search You can filter and search the list to find specific event specifications. Use the controls at the top of the list: - **Search**: enter text to filter by event specification name - **Status filter**: show all specifications or filter by Draft or Published status - **Entity filter**: filter specifications by attached entities ## Create event specifications From the Event Catalog, you can create new event specifications without first navigating to a tracking plan. To create an event specification: 1. Click **Create event specification** 2. Enter a name that describes the event 3. Optionally add a description 4. Select the tracking plan this event specification belongs to 5. Select the source applications where this event is tracked 6. Click **Confirm** ![Create event specification dialog](/assets/images/create-event-specification-7ba523f45dd227ae96a8434ee0d0ee4d.png) After creation, you can add event and entity data structures, triggers, and property instructions. > **Tip:** Use descriptive names that reflect the business action being tracked. For example, "Add to cart" is clearer than "cart\_event\_1". --- # Implement tracking in your applications > Generate and implement tracking code from your event specifications using Snowtype, Console code snippets, or manual SDK integration. > Source: https://docs.snowplow.io/docs/event-studio/implement-tracking/ Once you've defined your [tracking plans](/docs/event-studio/tracking-plans/) and [event specifications](/docs/event-studio/tracking-plans/event-specifications/), the next step is implementing the tracking code in your applications. Snowplow provides several approaches to generate and implement tracking code: - Ready-to-use code snippets in [Console](https://console.snowplowanalytics.com) (web only) - [Snowtype](/docs/event-studio/implement-tracking/snowtype/) code generation tool - Manual integration with [Snowplow trackers](/docs/sources/) ## Console code snippets When viewing an [event specification](/docs/event-studio/tracking-plans/event-specifications/) in Console, the **Working with this event** section provides ready-to-use code snippets. The snippets in the **Implementation** tab show the exact tracking calls needed for each event, including all required properties and entities. > **Note:** Code snippets are available for the JavaScript tracker only, for event specifications with custom event data structures. Here's an example snippet for the JavaScript tracker. It provides a `trackSelfDescribingEvent` call for the event specification, with the correct schema references and properties. The example event specification has an event data structure named `article_click`, and one entity data structure, `article`. The snippet also includes an autogenerated `event_specification` entity. This helps with analysis as it's a direct link between the tracked event and the tracking plan. To use your snippet, paste it into your application code, and provide the appropriate property values. ```javascript window.snowplow("trackSelfDescribingEvent", { "event": { "schema": "iglu:com.example/article_click/jsonschema/1-0-1", "data": { "name": "", // string - Required - maxLength: 1000 "location": "", // string - Nullable - maxLength: 1000 } }, "context": [ // Entity: article (min: 0) { "schema": "iglu:com.example/article/jsonschema/3-0-0", "data": { "publish_date": "", // string - Nullable "content_id": "", // string - Nullable } }, // System entity. Please do not edit it. { "schema": "iglu:com.snowplowanalytics.snowplow/event_specification/jsonschema/1-0-3", "data": { "id": "0a0ef8bb-314c-4973-8988-f192e8714d68", "name": "Article Click", "data_product_id": "28a6316a-47fd-473b-b5a1-00c555ba25e4", "data_product_name": "Article Performance", "data_product_domain": "Marketing" } } ] }); ``` Use the **Show Snowtype code** toggle to display the specific Snowtype function name to call for tracking implementation. ![Show snowtype code](/assets/images/show-snowtype-code-f0c0e475e699e14327af5e306eee2121.png) --- # Client-side schema validation with Snowtype > Enable real-time schema validation in the browser for JavaScript and TypeScript trackers to catch tracking errors before events are sent. > Source: https://docs.snowplow.io/docs/event-studio/implement-tracking/snowtype/client-side-validation/ > **Info:** This feature is available since version 0.2.8 of Snowtype for the [Browser Tracker](/docs/sources/web-trackers/quick-start-guide/?platform=browser) in both JavaScript and TypeScript. ## Schema validation right on your browser Using Snowtype you can get notified, at runtime, about schema validation errors and fix them before slipping to production. To opt-in to client-side validation you should include the `--validations` flag when you are generating your code. ```sh npx @snowplow/snowtype@latest generate --validations ``` For validations to work, you will also need to install `ajv@8`, `ajv-formats@2` and `ajv-draft-04@1`. ```sh # Example using npm npm install ajv@8 ajv-formats@2 ajv-draft-04@1 ``` This command will generate your code as expected but behind the scenes run all the validations required when an event is being sent from the generated code. ## Schema validation example Below there is an example of how validations will show up in your environment. Suppose we are tracking against a custom schema for button clicks: ```json { type: 'object', description: 'Data structure for custom button clicks ', properties: { label: { type: 'string', description: 'The text on the button, or a user-provided override' }, id: { type: 'string', description: 'The identifier of the button' }, }, /* Other attributes... */ } ``` When the respective method, from Snowtype, that handles tracking of this event is fired validation will happen at runtime for all schema attributes. Following we can see an example of how the schema validation will show up in the browser console when the event responsible for tracking against the custom button click schema fires. ![validation example](/assets/images/validation-0a7cf374318205444a962c9a69fbc40d.png) As we can observe, the value passed as the `id` attribute is violating the schema rules. The erroneous value can be found under `errors[n].data` which in this case is the number `1`. Currently the validation information will include attributes that can help point out the issue happening at the schema level, plus the stack trace revealing the caller of the function. ## Entity cardinality rules validation example > **Info:** This feature is available since version 0.3.1 of Snowtype for the [Browser Tracker](/docs/sources/web-trackers/quick-start-guide/?platform=browser) in both TypeScript and JavaScript. Cardinality rules allow you to specify the expected number of an entity taking part in an Event Specification. You would use this capability to ensure the correct number of entities are getting sent alongside your event. E.g. - `Exactly 1` - `At least 1` - `Between 1 and 2` By using Snowtype client-side validations you will get notified right in your browser that there is a violation of cardinality rules for an Event Specification. This can prove to be really important during development and/or testing. Below there is an example of how entity cardinality rules validations will show up in your environment. In this example there is a `product` entity that is expected to have cardinality of `Exactly 1`. ![cardinality validation example](/assets/images/cardinality-validation-7387c2d3b0d509881ac5973741cce55a.png) The code generated for the Event Specification can be used as such without violating the cardinality rules: ```ts trackButtonClickSpec({ label: "Product click", context: [createProduct({ name: "Product", price: 1, quantity: 1 })], }); ``` In the case that a rule is violated, for example in the case of adding more than one product contexts to the event, you will get notified with a validation warning on your browser as such: ```ts trackButtonClickSpec({ label: "Product click", context: [ createProduct({ name: "Product", price: 1, quantity: 1 }), // This violates the cardinality rule of Exactly 1 createProduct({ name: "Product 2", price: 1, quantity: 1 }) ], }); ``` ![cardinality validation browser example](/assets/images/cardinality-browser-6e2ea31e9b212020037906f5da74e21d.png) The warning will include information about the `minCardinality` and `maxCardinality` expected alongside the `currentCardinality` which is the number of currently included contexts in the event. All these together with stack trace information you can use to trace back the violating function. A similar warning will occur when there is a cardinality rule set for an entity, and this entity does not exist as context in the event: ![empty cardinality validation browser example](/assets/images/cardinality-empty-fed6ca67310fc3b06dd24ea53a735b26.png) ## Property rules validation example > **Info:** This feature is available since version 0.10.0 of Snowtype for the [Browser Tracker](/docs/sources/web-trackers/quick-start-guide/?platform=browser) in both TypeScript and JavaScript. Property rules are [specific instructions](/docs/event-studio/tracking-plans/) you can add in every schema that takes part in an Event Specification. This capability will allow you to adjust the expected values specifically to this event. E.g. - The `category` attribute of the `product` entity is expected to take the values of "related" or "cross-sell" for this Event Specification ![property rules browser example](/assets/images/property-rules-browser-0895b2997955f456740a1bf4db235428.png) The code generated for the Event Specification can be used as such without violating the property rules: ```ts trackRelatedSpec({ label: "Related product", context: [ /* This is a method to create this specific product entity for the `Related` Event Specification */ createProductRelated({ /* Category can only be `cross-sell` or `related` based on the type generated */ category: "cross-sell", name: "product", quantity: 1, price: 10, }) ], }); ``` In the case that a rule is violated, for example adding an unintended `category` value, you will get notified with a validation warning on your browser as such: ```ts trackRelatedSpec({ label: "Related product", context: [ createProductRelated({ /* `cross-sells` is not a valid category based on the set instructions. */ category: "cross-sells", name: "product", quantity: 1, price: 10, }) ], }); ``` ![property rules enum error](/assets/images/property-rules-enum-error-1efab496e8e3d763cad18c269380dc1b.png) ## Custom Violation Handler By default, when a JSON Schema or Tracking Plan rule is violated, Snowtype will print a warning using `console.log`, displaying this information in the browser's developer tool panel. While this is useful for debugging events in the browser, it can be adjusted for different environments to better suit your needs. For example: - When unit testing with a library such as Jest, you might prefer each violation to throw a new `Error` so that relevant tests automatically fail. - In staging or production environments, you might want to report the violation to a third-party error monitoring solution such as Sentry. To accommodate custom violation handling use cases, Snowtype provides an option to set the `violationsHandler`. Using the `snowtype.setOptions` API, you can configure the `violationsHandler` to be called whenever a violation is detected. ```ts import { snowtype } from "{{outpath}}/snowplow"; function myViolationsHandler(error){ // Custom violation handling logic } snowtype.setOptions({ violationsHandler: mockViolationsHandler }); ``` The `error` attribute is typed as follows: ```ts type ErrorType = { /* Specific error code number e.g. 100, 200, 201 ... */ code: number; /* Error message */ message: string; /* Description of the violation */ description: string; /* Violations occurred */ errors: (ErrorObject | Record)[]; }; ``` > **Info:** When Snowtype detects the `NODE_ENV` environment variable being set to `test`, as is done by many testing libraries, it will automatically default to throwing an `Error` when a violation is detected. ## Caveats ### Bundle size consideration Since the validation capability depends on a set of additional libraries which can increase the application bundle size, it is advised that this feature is used in development and test environments. Moving forwards there is consideration for creating a validation capability with minimal overhead both at runtime performance and bundle size. ### Divergence with pipeline validation Due to the differences between environments, there could be a few cases where validation result might diverge between the client and the pipeline. These differences can be found in cases where regular expressions are included in the schema. For JSON Schema, these kinds of formats are mostly included in the [pattern](https://json-schema.org/understanding-json-schema/reference/string#regexp) attribute. For that reason, when Snowtype detects a `pattern` key in string type attributes will warn accordingly during generation. --- # Snowtype CLI command reference > Summary of Snowtype CLI commands and options for code generation, with details on usage scenarios and configuration. > Source: https://docs.snowplow.io/docs/event-studio/implement-tracking/snowtype/commands/ > **Info:** We originally called tracking plans "data products". You'll still find the old term used in some existing APIs and CLI commands. > **Info:** This page only summarizes the CLI commands and the options for each command. For details on which scenarios they can be used, you can go to the [Working with the CLI page](/docs/event-studio/implement-tracking/snowtype/using-the-cli/). ## Usage `snowtype [COMMAND] [OPTIONS] [CONTEXT-SPECIFIC-OPTIONS]` ## Available CLI commands ### `snowtype init` Initialize the setup of Snowtype code generation in a project. Creates the configuration file. **Options** - `-i, --organizationId` Organization ID. - `-t, --tracker` Tracker to use. [See available](/docs/event-studio/implement-tracking/snowtype/using-the-cli/#available-trackerslanguages) - `-l, --language` Language to use. [See available](/docs/event-studio/implement-tracking/snowtype/using-the-cli/#available-trackerslanguages) - `-o, --outpath` Output path. ### `snowtype generate` Generates tracking code based on configuration on the configuration file. Can generate/modify the `.snowtype-lock.json` file. **Options** - `-c, --config` Config file path. - `--instructions` Generate event specification instructions. - `--no-instructions` Generate without instructions. - `--validations` Add runtime validation on events. _Currently available for the Browser tracker_. - `--no-validations` Do not add runtime validation on events. - `--disallowDevSchemas` Disallow generation of code using schemas deployed on DEV environment. _Sending events using schemas deployed on DEV, will result in failed events in production pipelines._ (default: false) - `--deprecateOnlyOnProdAvailableUpdates` Show deprecation warnings only when there are PROD available schema updates. (default: false) ### `snowtype update` Checks for latest version updates in Data Structures and Event Specifications. **Options** - `-c, --config` Config file path. - `-y, --yes` Updates all to latest version without prompting. (default: false) - `-m, --maximumBump` The maximum SchemaVer update to show an available update notification for. Possible values are 'patch', 'minor', 'major' and will work as expected regular SemVer bumps. (default: 'major') ### `snowtype patch` Adds new Data Structures and Event Specifications in the `snowtype.config.json` file without needing to modify the file by hand. **Options** - `-c, --config` Config file path. - `-e, --eventSpecificationIds` Event Specification ID/s. - `-p, --dataProductIds` Tracking Plan ID/s. - `-d, --dataStructures` Data structure schema URI/s. - `-i, --igluCentralSchemas` Iglu central schema URI/s. - `-r, --repositories` Local Data Structure repositories generated from the [snowplow-cli](/docs/event-studio/programmatic-management/snowplow-cli/data-structures/). ### `snowtype help` Shows a helpful message and brief instructions for the Snowtype CLI usage. ### Global options - `-h, --help` Shows helpful instructions for the command. - `-V, --version` Output the package version number. - `-k, --apiKey` Provide the Snowplow Console API key as a CLI option. - `-v, --verbose` Enable verbose logging. --- # Generate tracking code with Snowtype > Automatically generate type-safe tracking code from data structures and event specifications with compile-time validation, reducing implementation time and maintenance overhead. > Source: https://docs.snowplow.io/docs/event-studio/implement-tracking/snowtype/ **Snowtype** is a code generation tool that automates the creation of type-safe tracking code for Snowplow SDKs. Snowtype connects directly to your data structures and event specifications. This eliminates manual instrumentation work and ensures that your tracking code is compliant with the schemas and produces high quality data. Snowtype streamlines the development workflow by providing several key advantages: - **Type safety enforcement:** Generates strongly-typed code that validates events and entities at compile time, preventing schema violations before data reaches your pipeline. - **Automated code generation:** Converts event specifications into production-ready SDK code, reducing implementation time from weeks to days. - **Integrated documentation:** Syncs inline code documentation with your data structures and products, maintaining consistency between design and implementation. - **Development workflow integration:** Fits seamlessly into CI/CD processes, enabling GitOps-style tracking plan management and automated updates when schemas evolve. - **Reduced maintenance overhead:** Automatically updates tracking code when data structures change, eliminating the need for manual synchronization across multiple codebases. ## Supported trackers | **Tracker** | **Language/s** | | -------------------------------- | ---------------------- | | `@snowplow/browser-tracker` | javascript, typescript | | `@snowplow/node-tracker` | javascript, typescript | | `@snowplow/react-native-tracker` | typescript | | `@snowplow/javascript-tracker` | javascript | | `snowplow-golang-tracker` | go | | `snowplow-ios-tracker` | swift | | `snowplow-android-tracker` | kotlin | | `snowplow-flutter-tracker` | dart | | `snowplow-java-tracker` | java | ## Prerequsites To use Snowtype, you must have [Node.js](https://nodejs.org/en/) (>=@18) installed. ## Installation Navigate to your project and install Snowtype using your favorite package manager: **npm:** ```bash npm install --save-dev @snowplow/snowtype@latest ``` **Yarn:** ```bash yarn add --dev @snowplow/snowtype@latest ``` **pnpm:** ```bash pnpm add --save-dev @snowplow/snowtype@latest ``` *** ## Executing commands Installing Snowtype will also create a local executable `snowtype` which you can use with `npx`, `yarn` or `pnpm` directly when on your project's directory. **npm:** ```bash npx @snowplow/snowtype@latest init # Same as npx snowtype init ``` **Yarn:** ```bash yarn @snowplow/snowtype@latest init # Same as yarn snowtype init ``` **pnpm:** ```bash pnpm @snowplow/snowtype@latest init # Same as pnpm snowtype init ``` *** _We will show example commands using `npm/npx` but it should work the same with any other package manager._ --- # Snowtype configuration options > Configure Snowtype code generation with options for output paths, tracker selection, language settings, and custom templates. > Source: https://docs.snowplow.io/docs/event-studio/implement-tracking/snowtype/snowtype-config/ > **Info:** We originally called tracking plans "data products". You'll still find the old term used in some existing APIs and CLI commands. The Snowtype CLI configuration can be saved in a `.json`, `.js`, or `.ts` file after initialization. For example: `snowtype.config.json`, `snowtype.config.js`, or `snowtype.config.ts`. **We highly recommend you keep this file in the root of your project folder.** ## Attributes in your configuration file ### `igluCentralSchemas` The schema tracking URLs for schemas available in [Iglu Central](https://iglucentral.com/). ### `repositories` Local Data Structure repositories generated from the [snowplow-cli](/docs/event-studio/programmatic-management/snowplow-cli/data-structures/). ### `dataStructures` The schema tracking URLs for Data Structures published in the Console. ### `eventSpecificationIds` The Event Specification IDs you wish to generate tracking code for. The Event Specification ID is a UUID that can be retrieved as the final part of the URL when visiting an event specification main page. ### `dataProductIds` The Tracking Plan IDs you wish to generate tracking code for. By providing the Tracking Plan Id, Snowtype will fetch all the event specifications for the Tracking Plan and generate code for all of them. The Tracking Plan ID is a UUID that can be retrieved as the final part of the URL when visiting a tracking plan main page. ### `organizationId` The Organization ID for your Snowplow account. The Organization ID is a UUID that can be retrieved from the URL immediately following the .com when visiting console. ### `tracker` The target tracker to generate the required code for. [See list of available trackers](/docs/event-studio/implement-tracking/snowtype/using-the-cli/#available-trackerslanguages). ### `language` The target language to generate the required code for. [See list of available languages](/docs/event-studio/implement-tracking/snowtype/using-the-cli/#available-trackerslanguages). ### `outpath` The outpath relative to the current working directory when running the script. ### `options` Options related to Snowtype behavior and are described by the following TypeScript type: ```ts options?: { /* Command related options. */ commands: { generate?: { /* Generate implementation instructions. */ instructions?: boolean; /* Add runtime validations. */ validations?: boolean; /* Disallow generation of code using schemas only deployed on DEV environment. */ disallowDevSchemas?: boolean; /* Show deprecation warnings only when there are PROD available schema updates. */ deprecateOnlyOnProdAvailableUpdates?: boolean; } update?: { /* Update your configuration file automatically and regenerate the code of the latest available update. */ regenerateOnUpdate?: boolean; /* The maximum SchemaVer update to show an available update notification for. */ maximumBump?: "major" | "minor" | "patch"; /* The `update` command will only display updates for Data Structures that have been deployed to production environment. */ showOnlyProdUpdates?: boolean; } patch?: { /* Automatically regenerate the code after a successful patch operation. */ regenerateOnPatch?: boolean; } } } ``` ### `namespace` > **Info:** This option only applies when generating Swift code. The namespace for the generated code. All classes generated will be included in this namespace, which can be used to avoid naming conflicts. For example, setting `namespace` to `Snowtype` will result in classes being accessed with the `Snowtype` prefix: ```swift let data = Snowtype.AccountConfirmed(companyCountry: "", companyName: "", ...) ``` _Keep in mind that CLI flags take precedence over configuration file options._ ## Example configuration file **JSON:** ```json { "igluCentralSchemas": ["iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0"], "repositories": ["../data-structures"], "dataStructures": ["iglu:com.myorg/custom_web_page/jsonschema/1-1-0"], "eventSpecificationIds": [ "a123456b-c222-11d1-e123-1f123456789g" ], "dataProductIds": [ "a123456b-c222-11d1-e123-1f12345678dp" ], "organizationId": "a654321b-c111-33d3-e321-1f123456789g", "tracker": "@snowplow/browser-tracker", "language": "typescript", "outpath": "./src/snowtype" } ``` **JavaScript:** ```javascript const config = { "igluCentralSchemas": ["iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0"], "repositories": ["../data-structures"], "dataStructures": ["iglu:com.myorg/custom_web_page/jsonschema/1-1-0"], "eventSpecificationIds": [ "a123456b-c222-11d1-e123-1f123456789g" ], "dataProductIds": [ "a123456b-c222-11d1-e123-1f12345678dp" ], "organizationId": "a654321b-c111-33d3-e321-1f123456789g", "tracker": "@snowplow/browser-tracker", "language": "typescript", "outpath": "./src/snowtype" } module.exports = config; ``` **TypeScript:** ```typescript type SnowtypeConfig = { tracker: | "@snowplow/browser-tracker" | "@snowplow/javascript-tracker" | "snowplow-android-tracker" | "snowplow-ios-tracker" | "@snowplow/node-tracker" | "snowplow-golang-tracker" | "@snowplow/react-native-tracker" | "snowplow-flutter-tracker"; language: "typescript" | "javascript" | "kotlin" | "swift" | "go" | "dart"; outpath: string; organizationId?: string; igluCentralSchemas?: string[]; repositories?: string[]; dataStructures?: string[]; eventSpecificationIds?: string[]; dataProductIds?: string[]; options?: { commands: { generate?: { instructions?: boolean; validations?: boolean; disallowDevSchemas?: boolean; deprecateOnlyOnProdAvailableUpdates?: boolean; } update?: { regenerateOnUpdate?: boolean; maximumBump?: "major" | "minor" | "patch"; showOnlyProdUpdates?: boolean; } patch?: { regenerateOnPatch?: boolean } } } }; const config: SnowtypeConfig = { "igluCentralSchemas": ["iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0"], "repositories": ["../data-structures"], "dataStructures": ["iglu:com.myorg/custom_web_page/jsonschema/1-1-0"], "eventSpecificationIds": [ "a123456b-c222-11d1-e123-1f123456789g" ], "dataProductIds": [ "a123456b-c222-11d1-e123-1f12345678dp" ], "organizationId": "a654321b-c111-33d3-e321-1f123456789g", "tracker": "@snowplow/browser-tracker", "language": "typescript", "outpath": "./src/snowtype" }; export default config; ``` *** --- # Work with the Snowtype CLI > Use the Snowtype CLI to initialize projects, generate tracking code, and configure code generation for Snowplow tracking SDKs. > Source: https://docs.snowplow.io/docs/event-studio/implement-tracking/snowtype/using-the-cli/ > **Info:** We originally called tracking plans "data products". You'll still find the old term used in some existing APIs and CLI commands. The Snowtype CLI is a tool which aims to speed up tracking implementations, provide type safety and inline documentation for developers and ultimately reduce the number of erroneous events. By integrating this tool in the development workflow we introduce a way to connect the additions and updates done in a Snowplow implementation with the corresponding tracking code of the project. ## Authenticating with the Console A Console API key is required for the Snowtype CLI to authenticate with your account. You can find your own or create one in the Console [API key management](https://console.snowplowanalytics.com/credentials). > **Note:** Both API key and API key ID variables are required for versions > `0.9.0`. The ways for the CLI to read the credentials are either through the global `-k, --apiKey` and `-s, apiKeyId` options or the `SNOWPLOW_CONSOLE_API_KEY` and `SNOWPLOW_CONSOLE_API_KEY_ID` environment variables. Additionally, the Snowtype CLI automatically reads from a `.env` file at the root of your project. **.env file:** ```bash SNOWPLOW_CONSOLE_API_KEY=MY-API-KEY SNOWPLOW_CONSOLE_API_KEY_ID=MY-API-KEY-ID ``` **Shell variable:** ```bash # The required command will depend on your shell export SNOWPLOW_CONSOLE_API_KEY=MY-API-KEY export SNOWPLOW_CONSOLE_API_KEY_ID=MY-API-KEY-ID ``` **CLI parameter:** ```bash npx @snowplow/snowtype@latest generate --apiKey MY-API-KEY --apiKeyId MY-API-KEY-ID ``` *** **Recommended:** We recommend that you use the `SNOWPLOW_CONSOLE_API_KEY` and `SNOWPLOW_CONSOLE_API_KEY_ID` environment variables. ## Initializing Snowtype for your project For the Snowtype CLI to work properly, it requires a [configuration file](/docs/event-studio/implement-tracking/snowtype/snowtype-config/) to be initialized and present on your project's root folder. This file will be automatically generated using the `snowtype init` command, after adding the required input. ```bash # Start prompting for configuration inputs npx @snowplow/snowtype@latest init ``` The input required for the initialization to work, it the following: - The organization ID from Snowplow Console. - The [tracker](/docs/event-studio/implement-tracking/snowtype/using-the-cli/#available-trackerslanguages) you wish to generate code for. - _If applicable,_ the language for that tracker. - The output path you wish the CLI to generate the code to. These will all be prompted to you by default, but if needed you can call the `snowtype init` command with any or all the attributes passed as [optional flags](/docs/event-studio/implement-tracking/snowtype/commands/#snowtype-init) so prompting is not required. ## Generate tracking code The CLI will generate tracking code using a valid Snowtype configuration file with the `snowtype generate command`. The code that will be generated, depending on the language, will have all the required types used in schemas and Event Specifications together with methods/classes that allows for tracking these. ```bash # Code will be generated to the outpath configuration npx @snowplow/snowtype@latest generate ``` The code generated by the CLI is not minified and contains inline documentation for methods, classes and types. If needed you can modify it in any way it suits your project. ### Contents The contents of a generated file from the Snowtype CLI will be: - Types/Interfaces/Classes for each schema that relates to the Data Structures, Iglu Central Schemas and Event Specifications selected. - For each Event Specification [instruction set](/docs/event-studio/tracking-plans/), a type for the adjusted schema is generated as well. The type/class will contain the Event Specification name as suffix to avoid conflicts. - For each schema: - A method/class to instantiate the structure as a Self Describing JSON. _This is particularly useful to add entities as extra context on events._ - A method that sends a Self Describing Event with the schema as the main event entity. - For each Event Specification, a method/class to track the event specification with the set event and context entity schemas. > **Warning:** The Snowtype CLI does not automatically install the required Snowplow tracking libraries. For now it generates code that use the tracking libraries which are expected to be installed on the project. ### Available Trackers/Languages Following is the set of available trackers and languages the Snowtype CLI currently can work with. This list is also the source of truth for valid keys in the `tracker` and `language` attributes of the Snowtype configuration file. | **Tracker** | **Language/s** | | -------------------------------- | ---------------------- | | `@snowplow/browser-tracker` | javascript, typescript | | `@snowplow/node-tracker` | javascript, typescript | | `@snowplow/react-native-tracker` | typescript | | `@snowplow/javascript-tracker` | javascript | | `snowplow-golang-tracker` | go | | `snowplow-ios-tracker` | swift | | `snowplow-android-tracker` | kotlin | | `snowplow-flutter-tracker` | dart | | `snowplow-java-tracker` | java | ### Example Usage Below we show example usage of the generated code. For demonstration, we assume the code was generated for the [web\_page](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0) and [product](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.ecommerce/product/jsonschema/1-0-0) schemas. **@snowplow/browser-tracker TypeScript:** ```tsx import { trackWebPage, createProduct, WebPage, Product, createWebPage, } from "./{outpath}/snowplow"; /* Track a WebPage event */ trackWebPage({ id: "212a9b63-1af7-4e96-9f35-e2fca110ff43" }); /* Track a WebPage event with a Product context entity */ const product = createProduct({ id: "Product id", name: "Snowplow product", currency: "EUR", price: 10, category: "Snowplow/Shoes", }); trackWebPage({ id: "212a9b63-1af7-4e96-9f35-e2fca110ff43", context: [product], }); /* You can enforce specific context entities on any `track` function using type arguments */ const webPage = createWebPage({ id: "212a9b63-1af7-4e96-9f35-e2fca110ff43" }); trackWebPage({ id: "212a9b63-1af7-4e96-9f35-e2fca110ff43", context: [product, webPage], }); ``` **@snowplow/node-tracker TypeScript:** ```tsx import { trackWebPage, createProduct, WebPage, Product, createWebPage, } from "./{outpath}/snowplow"; /* `t` is the tracker instance created by the `tracker` function of the @snowplow/node-tracker package. */ /* Track a WebPage event */ trackWebPage(t, { id: "212a9b63-1af7-4e96-9f35-e2fca110ff43" }); /* Track a WebPage event with a Product context entity */ const product = createProduct({ id: "Product id", name: "Snowplow product", currency: "EUR", price: 10, category: "Snowplow/Shoes", }); trackWebPage(t, { id: "212a9b63-1af7-4e96-9f35-e2fca110ff43", context: [product], }); /* You can enforce specific context entities on any `track` function using type arguments */ const webPage = createWebPage({ id: "212a9b63-1af7-4e96-9f35-e2fca110ff43" }); trackWebPage(t, { id: "212a9b63-1af7-4e96-9f35-e2fca110ff43", context: [product, webPage], }); ``` **snowplow-android-tracker:** ```kotlin import {{ specified package }}.Product import {{ specified package }}.WebPage /* Track a WebPage event */ tracker.track(WebPage(id = "212a9b63-1af7-4e96-9f35-e2fca110ff43").toEvent()) /* Track a WebPage event with a Product context entity */ val product = Product( id = "Product id", name = "Snowplow product", currency = "EUR", price = 10.0, category = "Snowplow/Shoes", ) val event = WebPage(id = "212a9b63-1af7-4e96-9f35-e2fca110ff43").toEvent() event.entities.add(product.toEntity()) tracker.track(event) ``` **snowplow-ios-tracker:** ```swift import SnowplowTracker /* Track a WebPage event */ _ = tracker.track(WebPage(id: "212a9b63-1af7-4e96-9f35-e2fca110ff43").toEvent()) /* Track a WebPage event with a Product context entity */ let product = Product( category: "Snowplow/Shoes", currency: "EUR", id: "Product id", name: "Snowplow product", price: 10 ) let event = WebPage(id: "212a9b63-1af7-4e96-9f35-e2fca110ff43").toEvent() event.entities.append(product.toEntity()) _ = tracker.track(event) ``` **snowplow-golang-tracker:** ```go // Track a WebPage event TrackWebPage(tracker, WebPage{ID: "212a9b63-1af7-4e96-9f35-e2fca110ff43"}) // Track a WebPage event with a Product context entity productName := "Snowplow product" product := Product{ ID: "Product_id", Currency: "EUR", Price: 10, Category: "Snowplow/Shoes", Name: &productName, } TrackWebPage( tracker, WebPage{ID: "212a9b63-1af7-4e96-9f35-e2fca110ff43"}, WithContexts(product), ) ``` **@snowplow/react-native-tracker TypeScript:** ```tsx import { trackWebPage, createProduct, WebPage, Product, createWebPage, } from "./{outpath}/snowplow"; /* `t` is the tracker instance created by the `createTracker` function of the @snowplow/react-native-tracker package. */ /* Track a WebPage event */ trackWebPage(t, { id: "212a9b63-1af7-4e96-9f35-e2fca110ff43" }); /* Track a WebPage event with a Product context entity */ const product = createProduct({ id: "Product id", name: "Snowplow product", currency: "EUR", price: 10, category: "Snowplow/Shoes", }); trackWebPage(t, { id: "212a9b63-1af7-4e96-9f35-e2fca110ff43", context: [product], }); /* You can enforce specific context entities on any `track` function using type arguments */ const webPage = createWebPage({ id: "212a9b63-1af7-4e96-9f35-e2fca110ff43" }); trackWebPage(t, { id: "212a9b63-1af7-4e96-9f35-e2fca110ff43", context: [product, webPage], }); ``` **snowplow-flutter-tracker:** ```dart import './{outpath}/snowplow.dart'; /* Track a WebPage event */ await tracker.track(const WebPage(id: "212a9b63-1af7-4e96-9f35-e2fca110ff43")); /* Track a WebPage event with a Product context entity */ const product = Product( category: "Snowplow/Shoes", currency: "EUR", id: "Product id", price: 10.0, name: "Snowplow product" ); const event = WebPage(id: "212a9b63-1af7-4e96-9f35-e2fca110ff43"); await tracker.track(event, contexts: [product]); ``` **snowplow-java-tracker:** ```java package test; import com.snowplowanalytics.snowplow.tracker.*; import com.snowplowanalytics.snowplow.tracker.Tracker; import com.snowplowanalytics.snowplow.snowtype.*; import com.snowplowanalytics.snowplow.tracker.events.SelfDescribing; import java.util.Collections; public class SnowtypeTest { public static void main(String[] args) { Tracker tracker = Snowplow.createTracker("asdf", "asdf", "asdf"); /* Track a WebPage event */ tracker.track(SelfDescribing.builder().eventData(new WebPage.Builder().setId("212a9b63-1af7-4e96-9f35-e2fca110ff43").build().toSelfDescribingJson()).build()); /* Track a WebPage event with a Product context entity */ Product product = new Product.Builder().setId("Product id").setName("Snowplow product").setCurrency("EUR").setPrice(10.0).setCategory("Snowplow/Shoes").build(); WebPage webPage = new WebPage.Builder().setId("212a9b63-1af7-4e96-9f35-e2fca110ff43").build(); SelfDescribing event = SelfDescribing.builder().eventData(product.toSelfDescribingJson()).customContext(Collections.singletonList(webPage.toSelfDescribingJson())).build(); tracker.track(event); } } ``` *** ### Tracking Plans To generate code for the whole set of Event Specifications of a Tracking Plan, either manually or through the `snowtype patch` command, you will need the ID of the Tracking Plan. You can do this either by clicking on the `Implement tracking` button on the Tracking Plans main page to get the command directly: ![tracking plan track](/assets/images/dp-track-544c276d1535553cf5f930f697af85ad.png) Or retrieve the ID from the URL bar and then add it on the `dataProductIds` array: ![tracking plan id](/assets/images/dp-id-6c8bd5e400425daa6c524a48efdc129e.png) ### Event Specifications To add an Event Specification to the code generation, either manually or through the `snowtype patch` command, you would need the ID of the Event Specification. You can find the Event Specification ID in the main page of the Event Specification as shown below: ![event specification id](/assets/images/es-id-0a3367afcd71c89c002445c58795d483.png) Then you should add this ID to your configuration file `eventSpecificationIds` array. ### Data Structures To add a Data Structure to the code generation, either manually or through the `snowtype patch` command, you would need the Data Structure `Schema tracking URL`. You can find the Data Structure tracking URL on the Data Structure page in the Console, under the **Overview** tab as shown below: ![data structure url](/assets/images/ds-url-c93e327627928eba0a66750da09b833a.png) Then you should add this Data Structure tracking URL to your configuration file `dataStructures` array. ### Iglu Central Schemas To add a Data Structure to the code generation, either manually or through the `snowtype patch` command, you would need the `Schema tracking URL`. You can find the Schema tracking URL on [Iglu Central](http://iglucentral.com/) by searching for the schema and under the **General Information** tab you can find the URL as shown below: ![iglu central tracking url](/assets/images/iglu-url-99a7c2aae4d2cd5f818c9dfa1605804a.png) Then you should add this Schema tracking URL to your configuration file `igluCentralSchemas` array. ### Local Data Structure Repositories To add a local Data Structure repository to the code generation, either manually or through the `snowtype patch` command, you would only need the path/s to the repository/ies you have generated schemas using the [snowplow-cli](/docs/event-studio/programmatic-management/snowplow-cli/data-structures/). Then you should add the path/s to your configuration file `repositories` array. ## Generating event specification instructions When generating code for event specifications, you have the option of delivering the implementation instructions and triggers for each specification right on the developer's environment. By using the `--instructions` option on the `snowtype generate` command, you can generate a markdown file with all the required information about tracking an event specification. This includes: - Trigger description. - Implementation rules. - Images uploaded on your Event Specification triggers. - App identifiers and URLs this event should be triggered on. - Direct links to the code for this Event Specification. ## Keeping up with latest updates It is important that the tracking code is up-to-date with latest versions of Data Structures and Event Specifications we are tracking on a project. The Snowtype CLI gives the engineers the ability to check if there are available updates for Data Structures and Event Specifications that are used in the project. This works with the `snowtype update` command. ```bash npx @snowplow/snowtype@latest update ``` The above command will output a _diff_ showing the available version updates, similar to what you can see below: ![patch command version diff](/assets/images/patch-diff-bf39da50d673e493a76a07d201ed72ef.png) In that case, you can select to update to latest versions and regenerate the tracking code. To automatically update and regenerate the tracking code reflecting the latest updates, you can use the `--yes` flag. ### Adjust the level of update notifications For possible Data Structure updates, you can set the maximum level of update you want to be notified about using the `--maximumBump` flag. This value provides the maximum bump to be notified about and update to if available. This value defaults to 'major' meaning the update command will notify you for all updates up to and including major type of updates to the schema model. An example showcasing the flag's behavior: ```js // Data Structure version added to the snowtype config is 1-0-0. { // Other options... dataStructures: ["iglu:com.acme_company/page_unload/jsonschema/1-0-0"] } ``` This Data Structure has other deployed versions such as `1-0-1`, `1-1-0` and `2-0-0`. The `update` command will show available updates as follows: ```bash npx @snowplow/snowtype@latest update --maximumBump=major # Will prompt an update to 2-0-0 or any other available update. npx @snowplow/snowtype@latest update --maximumBump=minor # Will prompt an update to 1-1-0 or 1-0-1 if the only available update. npx @snowplow/snowtype@latest update --maximumBump=patch # Will prompt an update to 1-0-1 if available or any other patch type update. ``` ## Disallow generating code using schemas not deployed on production environment. While developing or testing, it might be useful to use [Snowplow Micro](/docs/testing/snowplow-micro/) to validate against your new schemas in your development environment. In this and any other case you are developing a schema and eventually publishing the tracking to production, you need to make sure all the schemas you are using are deployed to the production environment for the pipeline to use. Failing to do that will result in failed events. Snowtype by default will print a warning when code is generated using schemas only published to development environment. To make sure that there are no schemas not yet deployed to production, you can use the `--disallowDevSchemas` flag or option when using the `generate` command. Using this flag will make sure each generation attempt will fail, indicating the schemas that are not yet deployed to the production environment. --- # Use Snowtype with Google Tag Manager > Generate tracking code specifically formatted for Google Tag Manager custom JavaScript with Snowtype's GTM target for easier tag implementation. > Source: https://docs.snowplow.io/docs/event-studio/implement-tracking/snowtype/working-with-gtm/ > **Info:** This feature is available from version 0.5.0. To make working with Google Tag Manager and event tracking easier, we created a specific target for Snowtype fitting the way Google Tag Manager handles custom JavaScript code. A few extra benefits: - Simple initialization and maintenance. - Snowtype generated functions are available in `window.__snowtype` for all tags to use. - Fully typed code documentation using JSDoc. ## Getting Started without a code repository **Already using Snowtype?** To generate code for usage in Google Tag Manager, you should use the option `Google Tag Manager` as the tracker option in your [init](/docs/event-studio/implement-tracking/snowtype/commands/#snowtype-init) flow or replace your `tracker` and `language` attributes with the following values: ```json { // Rest of the attributes... "tracker": "google-tag-manager", "language": "javascript-gtm" } ``` What we are going to showcase here is how you can set up a separate project that uses Snowtype to generate code for your Google Tag Manager tracking needs. _You can also use version control to better understand any changes you make across time._ The only requirement is to first have [Node.js](https://nodejs.org/en/download/package-manager) installed on your computer. After that, open your terminal and run the following commands: ```bash # Navigate to a directory that you wish to create your project in. cd ./directory/to/setup/the/project # Create a folder for the project. Here you can replace # 'container-id' with the GTM container the code will be used in. mkdir snowtype-gtm-container-id # Change directory to the newly created folder. cd snowtype-gtm-container-id # This will create a few needed files for the project. npm init -y # This will install the latest version of Snowtype. npm install @snowplow/snowtype@latest # This will start the Snowtype init flow, # in which you should select 'Google Tag Manager' when # prompted to select a tracker. npx @snowplow/snowtype@latest init ``` After you have completed the `init` flow and have added your desired configuration, you can go ahead and generate the code you need to use: ```bash npx @snowplow/snowtype@latest generate ``` After that you can find the code in the specified `outpath` attribute of your configuration file. ## Using in Google Tag Manager The code that Snowtype will generate is included in a `snowplow.js` file and, as stated in the file as well, is the code to be **copied and pasted** into a Google Tag Manager [Custom JavaScript variable](https://support.google.com/tagmanager/answer/7683362?hl=en#custom_javascript). Below you can see the steps you need to create the variable using the contents generated by Snowtype: ![](/assets/images/gtm-var-a0b2e5014dc5e14da1288ff2172b2037.gif) > **Warning:** If the generated code is too large for Google Tag Manager, you will receive a warning from Snowtype. You can choose to use the `.minified.js` version created by Snowtype to reduce the file size. Alternatively, you can split the code into multiple variables and include them in the order they are generated, or place them directly into a Custom HTML tag. ### Naming and calling the Custom JavaScript After selecting a name for the Custom JavaScript variable, you would need to include it in a [Custom HTML](https://support.google.com/tagmanager/answer/6107167?hl=en#CustomHTML) tag so that it is executed. Depending on your Google Tag Manager setup, there are a couple, or more, places the variable can be used. An example is a type of Custom HTML tag that runs during page initialization: ```html ``` ### Using the Snowtype generated code From there on, after the function has been executed you can access all the functions generated would be available through the `window.__snowtype` object. For example: ```js // Example Data Structure window.__snowtype.trackExample({ ... }); // Or for an Example Event Specification window.__snowtype.trackExampleSpec({ ... }); ``` Keep in mind that at the bottom of the generated file, there are all the JSDoc type definitions required for you to use the functions correctly. --- # Event Studio > Design and implement behavioral data tracking with schema management, governance, code generation, and tracking plans in Snowplow Console. > Source: https://docs.snowplow.io/docs/event-studio/ Event Studio is a comprehensive set of tools for designing and implementing behavioral data event tracking. It provides: - **Schema management**: define and version data structures for events and entities - **Ownership and governance**: assign ownership and establish data contracts - **Observability**: monitor data quality and tracking implementation - **Code generation**: automatically generate tracking code from your designs, using [Snowtype](/docs/event-studio/implement-tracking/snowtype/) - **Tracking plans**: document and manage your tracking implementation The Event Studio UI is included in [Snowplow Console](https://console.snowplowanalytics.com). These tools help organizations move from ad-hoc tracking implementations to a structured, governed, collaborative approach. > **Tip:** New to tracking design? Start with our [best practice](/docs/fundamentals/tracking-design-best-practice/) guide to learn how to approach designing your tracking implementation. ## Key concepts To use Event Studio effectively, you should understand these core concepts: - **[Events](/docs/fundamentals/events/)**: actions that occur in your systems - **[Entities](/docs/fundamentals/entities/)**: the objects and context associated with events - **[Event specifications](/docs/event-studio/tracking-plans/event-specifications/)**: documentation of business events you're tracking - **[Tracking plans](/docs/event-studio/tracking-plans/)**: logical groupings of related business events with defined ownership Each tracking plan is associated with one or more [source applications](/docs/event-studio/source-applications/). The events and entities are defined by their [data structures](/docs/event-studio/data-structures/). This diagram illustrates how these concepts relate to each other within Event Studio: ![Tracking plan overview showing the relationship between tracking plans, event specifications, data structures](/assets/images/tracking-plan-overview-9f48fdeef0c5d4c593b416f9313c174c.png) This example `Ecommerce Checkout Flow` tracking plan groups two event specifications for ecommerce checkout behavior: - `Checkout Started` describes a `checkout_started` event, with an associated `cart` entity - `Product Add To Cart` describes an `add_to_cart` event, with `cart` and `product` entities The individual event and entity data structures can also be used in other event specifications and tracking plans. --- # Manage data structures via the API > Programmatically manage data structures through the API with endpoints for retrieving, validating, and deploying schemas to development and production registries. > Source: https://docs.snowplow.io/docs/event-studio/programmatic-management/data-structures-api/ The data structures Console API endpoints focus on the main operations in the workflow around: 1. Retrieving existing data structures and their associated schemas 2. Creating or editing new or existing data structures 3. Validating a data structure 4. Deploying a data structure to a registry ## Retrieving data structures The following `GET` requests allow you to retrieve data structures from both your development and production environment registries. ### Retrieve a list of all data structures Use this request to: - Retrieve a list of all data structures - Retrieve a list of data structures filtered by `vendor` or `name` query parameters `**GET** /api/msc/v1/organizations/{organizationId}/data-structures/v1` ### Retrieve a specific data structure Use this request to retrieve a specific data structure by its hash (see 'Generating a data structure hash' below), which is generated on creation. `**GET** /api/msc/v1/organizations/{organizationId}/data-structures/v1/{dataStructureHash}` ### Retrieve specific version of a specific data structure Use this request to retrieve all versions of a specific data structure by its hash (see 'Generating a data structure hash' below) `**GET** /api/msc/v1/organizations/{organizationId}/data-structures/v1/{dataStructureHash}/versions/{versionNumber}` See the [detailed API documentation](https://console.snowplowanalytics.com/api/msc/v1/docs) for all options. #### Generating a data structure hash To use the commands to retrieve information about a specific Data Structure, you need to encode its identifying parameters (`organization ID`, `vendor`, `name` and `format`) and hash it with SHA-256. **Example:** | Parameter | Value | | --------------- | -------------------------------------- | | Organization ID | `38e97db9-f3cb-404d-8250-cd227506e544` | | Vendor | `com.acme.event` | | Schema name | `search` | | Format | `jsonschema` | First concatenate the information with a dash (-) as the separator: `38e97db9-f3cb-404d-8250-cd227506e544-com.acme.event-search-jsonschema` And then hash them with SHA-256 to receive: `a41ef92847476c1caaf5342c893b51089a596d8ecd28a54d3f22d922422a6700` ## Validation To validate that your schema is in proper JSON format and complies with warehouse loading requirements, you can use the validation `POST` requests. `**POST** /api/msc/v1/organizations/{organizationId}/data-structures/v1/validation-requests` ### Example ```bash curl 'https://console.snowplowanalytics.com/api/msc/v1/organizations/cad39ca5-3e1e-4e88-91af-87d977a4acd8/data-structures/v1/validation-requests' \ -H 'authorization: Bearer YOUR_TOKEN' \ -H 'content-type: application/json' \ --data-binary '{ "meta": { "hidden": false, "schemaType": "event", "customData": {} }, "data": { "description": "Schema for an example event", "properties": { "example_field_1": { "type": "string", "description": "the example_field_1 means x", "maxLength": 128 } }, "additionalProperties": false, "type": "object", "required": [ "example_field_1" ], "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "self": { "vendor": "com.acme", "name": "example_schema_name", "format": "jsonschema", "version": "1-0-0" } } }' ``` Please note: - the request's body has two parts: - one for data structure metadata as value to the `meta` key - one for the schema itself as value to the `data` key - this example uses the synchronous version of validation that responds with the result immediately. There is also an asynchronous version available that returns a request ID that you can later poll to get the result. - you can add metadata specific to your organization to the schema as key/value pairs in the `customData` object. See '[Managing Meta Data](#managing-meta-data)' for more information. ## Deployments The deployment endpoints deal with getting a new or edited version of your data structure into your development and production environments. `**GET** /api/msc/v1/organizations/{organizationId}/data-structures/v1/{dataStructureHash}/deployments` `**POST** /api/msc/v1/organizations/{organizationId}/data-structures/v1/deployment-requests` ### Example ```bash curl 'https://console.snowplowanalytics.com/api/msc/v1/organizations/cad39ca5-3e1e-4e88-91af-87d977a4acd8/data-structures/v1/deployment-requests' \ -H 'authorization: Bearer MY_TOKEN' \ -H 'content-type: application/json' \ --data-binary '{ "message": "", "source": "VALIDATED", "target": "DEV", "vendor": "com.acme", "name": "example_schema_name", "format": "jsonschema", "version": "1-0-0" }' ``` Please note: - This example demonstrates deployment from `VALIDATED` to `DEV`. The method is the same for Production, but you would change the variables where `"source": "DEV"` and `"target": "PROD"` - The API enforces a workflow of validating, testing on development and then deploying to production. To achieve this you deploy from one environment to another; from (virtual environment) `VALIDATED` to `DEV`, then `DEV` to `PROD`. - Only users designated as "admin" in the console have the permissions to promote from `DEV` to `PROD`. - There is a sync option that will return the response of the deployment request directly. Otherwise you can poll for deployment responses using the deployment ID. - The property for `message` can be sent with a deployment which will capture any change log notes that will be stored against the deployment. (Note that this specific property is a bolt-on feature and might not be available for your account.) ## Managing meta data Meta data is used to add additional information to a Data Structure. ```json "meta": { "hidden": false, "schemaType": "event", "customData": {} } ``` The `hidden` property sets the data structure as visible (true) or not (false) in the console. The `schemaType` property can be set as null | "event" | "entity". The `customData` property is mapped as `[string, string]` and can be used to send across any key/value pairs you'd like to associate with the schema. For example if you wanted to specify departmental ownership through meta data: ```json "customData": { "department": "marketing" } ``` You can update the meta data for a data structure using the PUT endpoint: `**PUT** ​/api​/msc​/v1​/organizations​/{organizationId}​/data-structures/v1/{dataStructureHash}​/meta` ## Migrating from the legacy API The API described above supersedes the legacy one, which was to be found under `/api/schemas`. It offers under-the-hood improvements and a clear path forward to adding further value. To make the transition easier, we have kept the same models for the data being exchanged; it is therefore simply a matter of 1) updating the authentication method, and 2) switching to the new endpoints listed above. The legacy API is currently tunneled to the new one, so there is no data differences to be expected after the upgrade. We will maintain this facade while supporting customers in upgrading. Your Customer Success Manager and our Engineers will be glad to assist you during this transition. --- # Manage event specifications via the API > Programmatically retrieve, create, edit, publish, deprecate, and delete event specifications using the Event Specifications API endpoints. > Source: https://docs.snowplow.io/docs/event-studio/programmatic-management/event-specifications-api/ The event specifications Console API endpoints allow you to retrieve, create, edit, publish, deprecate, or delete event specifications. > **Note:** In the previous API version (V1), event specifications were referred to as "tracking scenarios". ## Response Format The event specifications API follows a specific response format for successful cases (`2xx`) and for scenarios where critical and non-critical errors may occur, such as (`422`) **Unprocessable Entity**. ```json { "data": [ // Event specifications ], "includes": [ // Additional event specifications info ], "errors": [ // Warnings or errors ] } ``` - `data`: Contains the event specification/s, depending on the request - `includes`: Provides additional information, such as the history of event specification changes - `errors`: Holds a list of errors, which could be of type `Error` or `Warning`. **If the array field contains at least one error of type `Error`, the request will also return a `4xx` status code, indicating that it cannot perform the store operation. Any other severity different from `Error` will result in a `2xx` status code.** ## Compatibility Checks Certain endpoints conduct a validation to verify the compatibility of a specific event specification event schema, `event.schema`, with the source data structure version referenced by `event.source`. When both `event.schema` and `event.source` are defined in the event specification, the compatibility checks will be executed. ```json ... "event": { "source": "iglu:com.example/ui_actions/jsonschema/1-0-0", "schema": { "$schema": "http://json-schema.org/draft-04/schema#", "description": "Event to capture search submit", "type": "object", "required": [ "label", "action" ], "properties": { "action": { "type": "string", "enum": [ "click" ] }, "label": { "type": "string", "enum": [ "Search" ] } }, "additionalProperties": false } } ... ``` However, the compatibility check will not only be performed against the version specified by the source data structure (`event.source` field, e.g., `1-0-0`), which we will refer to as the **current** version. It will also be conducted against the latest version available in Iglu, referred to as the **latest** version. This approach is because it's common for a new event specification to utilize the latest version of the source data structure. However, as this data structure may evolve over time and become incompatible with the `event.schema` defined in the event specification, we provide a method to detect these compatibility issues. Consequently, customers can update the event specification to ensure compatibility. In cases where an event specification is incompatible or we cannot determine it, some errors will be provided in the `errors` field of the [response](#response-format). These errors alerting of compatibility issues between the event specification and the source data structure will take a similar shape to the one below: ```json ... "errors": [ { "type":"Warning", "code":"SchemaIncompatible", "title":"Event specification with id: 59b5e250-91c4-45af-a63d-5f8cd39f4b67, event schema is INCOMPATIBLE with schema with name: test_event, vendor: com.snplow.msc.aws, version: 1-0-13", "source":"event.schema" } ] ... ``` Compatibility checks can result in three possible values: **Compatible**, **SchemaIncompatible**, or **SchemaUndecidable**. - If **Compatible**, the event specification is compatible, and no errors will be appended to the `errors` response field - If **SchemaIncompatible**, the event specification is incompatible against some version. If the check for the **current** version is incompatible, the `type` will be `Error`. For incompatibility with the **latest** version, the `type` will be `Warning`. If the requested operation involves persisting the event specification (create/update), an error of type `Error` will be appended to the response, the status code will be **422 Unprocessable Entity**, and the store operation will not persist. When fetching an event specification, the checks will run for both too, **current** and **latest** versions, and if incompatible, the error type will always be `Warning`, returning status code **200 Ok**. - If **SchemaUndecidable**, it is indeterminable whether the event specification is compatible with a specific version due to the use of some advanced JSON-Schema features and the high computational cost of checking compatibility. The `type` will always be `Warning`, and the user is responsible for ensuring that the event specification is compatible with the source data structure. A warning will be attached to the `errors` response field. > **Info:** The algorithm used to perform the compatibility check is based on the [Finding Data Compatibility Bugs with JSON Subschema Checking](https://dl.acm.org/doi/pdf/10.1145/3460319.3464796) paper, authored by Andrew Habib, Avraham Shinnar, and Michael Pradel. ## Retrieve a List of Event Specifications Use this request to retrieve a list of event specifications within an organization, which will be wrapped into the `data` field of the [response](#response-format). `GET /api/msc/v1/organizations/{organizationId}/event-specs/v1` The `organizationId` parameter is required. ### Query Parameters and Filters You can filter the results based on the following query parameters: - `dataProductId`: Filters the event specifications associated with a particular tracking plan - `sourceId`: Filters the event specifications associated with a particular data structure, inferred from the `event.source` field - `sourceVersion`: Filters the event specifications associated with a specific data structure version when used with `dataStructureId`. - `withLatestHistory`: When `true`, it will return a list of event specifications, with the latest change per event specification attached to the `includes` array field. The relation between event specifications in `data` and history in `includes` can be determined by `id = eventSpecId`. - `status`: Filters the event specifications that match the specified status > **Info:** If no query parameters are provided, it will return all the event specifications for an organization. ## Retrieve a Specific Event Specification Use this request to retrieve a specific event specification within an organization. The retrieved event specification will be wrapped into the `data` field of the response. `GET /api/msc/v1/organizations/{organizationId}/event-specs/v1/{eventSpecId}` > **Info:** This endpoint will trigger [**compatibility checking**](#compatibility-checks) if `event.source` and `event.schema` are defined. Query parameters `organizationId` and `eventSpecId` are required: - `withHistory`: When `true`, returns a list with the history for the event specification in the `includes` array field of the response, related to the event specification by its id - `status`: Filters the event specifications that match the specified status. ## Creating an Event Specification Use this request to create an event specification within an organization. `POST /api/msc/v1/organizations/{organizationId}/event-specs/v1` Query parameter `organizationId` is required. Request body example: ```json { "spec": { "name": "Search", "description": "Tracking the use of the search box", "event": { "source": "iglu:com.example/ui_actions/jsonschema/1-0-0" } }, "message": "update" } ``` The creation form has two fields at the top level, as shown in the example above: - `message`: An optional field to provide a message - `spec`: The definition of the event specification, which should comply with the [validations](#validations). By default, the event specification will be created with `spec.status` set to `draft` and `spec.version` set to `0` if not provided. These values can be changed and managed after creation. Here is an example response: ```json { "data": [ { "id": "5a203ef8-939b-4fd1-914e-f12a3dd1a869", "version": 0, "status": "draft", "name": "Search", "description": "Tracking the use of the search box", "event": { "source": "iglu:com.example/ui_actions/jsonschema/1-0-0" } } ], "includes": [ { "author": "39b81015-1bd5-4b37-96c7-3296cabaa36f", "message": "initial draft", "date": "2023-04-26T14:41:48.708191Z", "eventSpecId": "5a203ef8-939b-4fd1-914e-f12a3dd1a869", "version": 0, "status": "draft", "type": "History" } ], "errors": [] } ``` ### Validations - `spec.event.source`: If provided it should match a valid and existing Iglu URI - `spec.name`: It validates that the `spec.name` of an event specification is unique within the data structure context, inferred from the source data structure `spec.event.source` if provided - `spec.version`: If provided should be equal or greater than zero - `spec.status`: If provided should match one of `draft`, `published` or `deprecated` - `spec.entities`: If provided it will validate that the entities, `spec.entities.tracked` and `spec.entities.enriched`, are not duplicated and that they exist - `spec.dataProductId`: If provided it will validate that the tracking plans exists. (Coming soon) > **Info:** This endpoint will trigger [**compatibility checking**](#compatibility-checks) if `event.source` and `event.schema` are defined. ## Editing an Event Specification Use this request to edit an event specification within an organization. The format of the request and response is the same as during creation. The `organizationId` and `eventSpecId` parameters are required. `PUT /api/msc/v1/organizations/{organizationId}/event-specs/v1/{eventSpecId}` ### Publishing an Event Specification When editing an event specification, it can be published by setting the `status` to `published`. Currently, this will indicate to the event specification consumers (for instance, front-end developers) that the tracking design is ready to be implemented or consumed. By default, when an event specification is created and no value is provided for `spec.status`, it will be set to `draft`. With this, we suggest an event specification lifecycle that we recommend following, but we allow a certain degree of flexibility to accommodate unique customer use cases. Here is the suggested lifecycle: ```mermaid graph LR style Start fill:#633EB5, stroke:#000000, stroke-width:0px; style Draft color:#724512, fill:#FEEEBD, stroke:#724512, stroke-width:1px; style Published color:#3F2874, fill:#F0EBF8, stroke:#3F2874, stroke-width:1px; style Deprecated color:#89251F, fill:#FDF3F2, stroke:#89251F, stroke-width:1px; style Deleted fill:#D63A31, stroke:#000000, stroke-width:1px; Start(( )) -->|Create| Draft Draft -->|Publish| Published Draft -->|Delete| Deleted Published -->|Deprecate| Deprecated Deprecated -->|Undeprecate| Draft Deleted((Deleted)) ``` In addition to this lifecycle, and in conjunction with versioning, we enforce that when an event specification is **published**, the versions between two published versions are **discarded**. For example: Publish new version, before squash: ```mermaid graph LR style A color:#3F2874, fill:#F0EBF8, stroke:#3F2874, stroke-width:1px; style B color:#724512, fill:#FEEEBD, stroke:#724512, stroke-width:1px; style C color:#724512, fill:#FEEEBD, stroke:#724512, stroke-width:1px; style D color:#724512, fill:#FEEEBD, stroke:#724512, stroke-width:1px; style E color:#3F2874, fill:#F0EBF8, stroke:#3F2874, stroke-width:1px; linkStyle default stroke-width:2px,fill:#F2F4F7,stroke:#633EB5,color:#633EB5 A[Published 1] --> B[Draft 2] B[Draft 2] --> C[Draft 3] C[Draft 3] --> D[Draft 4] D[Draft 4] --> E[Published 5] ``` After discarding intermediate versions: ```mermaid graph LR style A color:#3F2874, fill:#F0EBF8, stroke:#3F2874, stroke-width:1px; style B color:#3F2874, fill:#F0EBF8, stroke:#3F2874, stroke-width:1px; linkStyle default stroke-width:2px,fill:#F2F4F7,stroke:#633EB5,color:#633EB5 A[Published 1] --> B[Published 5] ``` ### Deprecating an Event Specification When editing an event specification, it can be deprecated by setting the `status` to `deprecated`. This is a way of informing event specifications consumers (for instance, developers) not to rely on the tracking anymore. ### Validations - `spec.event.source`: If provided, it should match a valid and existing Iglu URI - `spec.name`: It validates that the `spec.name` of an event specification is unique within the data structure context, inferred from the source data structure `spec.event.source` if provided - `spec.version`: If provided, it should be equal to or greater than zero, should not exist, and be greater than the last published version - `spec.status`: If provided, it should match one of `draft`, `published`, or `deprecated` - `spec.entities`: If provided, it will validate that the entities, `spec.entities.tracked` and `spec.entities.enriched`, are not duplicated and that they exist - `spec.dataProductId`: If provided, it will validate that the tracking plan exists > **Info:** This endpoint will trigger [**compatibility checking**](#compatibility-checks) if `event.source` and `event.schema` are defined. ## Deleting an Event Specification Use this request to delete an event specification within an organization. `DELETE /api/msc/v1/organizations/{organizationId}/event-specs/v1/{eventSpecId}` > **Warning:** Please note that this action is irreversible and will permanently delete the event specification. --- # Programmatic management using CLI or API > Automate Event Studio workflows using Snowplow CLI for git-ops integration or the APIs for custom tooling and CI/CD pipelines. > Source: https://docs.snowplow.io/docs/event-studio/programmatic-management/ Data structures and tracking plans can be managed programmatically, enabling git-ops workflows, CI/CD integration, and custom tooling. Partnered with other tools like the data structures [CI tool](/docs/testing/data-structures-ci-tool/) and the testing pipeline [Snowplow Micro](/docs/testing/snowplow-micro/), it's possible to have a very robust and automated data structure workflow that ensures data quality upstream of data hitting your pipeline. ## Snowplow CLI The [Snowplow CLI](/docs/event-studio/programmatic-management/snowplow-cli/) provides file-based workflows for git-ops integration. It has commands to: - Download and upload [data structures](/docs/event-studio/programmatic-management/snowplow-cli/data-structures/) as local YAML/JSON files - Manage [tracking plans](/docs/event-studio/programmatic-management/snowplow-cli/tracking-plans/), event specifications, and source applications - Validate resources before publishing - Integrate with version control and code review workflows The [Snowplow CLI MCP server](/docs/llms-support/mcp-server/) enables AI assistants to interact with your tracking plan resources through natural language. ## Console API Use the Console REST API for direct programmatic access to tracking plans and data structures. This is useful for integration with other tools, or when you need more control than file-based workflows allow. It has endpoints for three resource types: - [Data structures](/docs/event-studio/programmatic-management/data-structures-api/) (`/data-structures/v1`): retrieve, validate, and deploy schemas to development and production registries - [Tracking plans](/docs/event-studio/programmatic-management/tracking-plans-api/) (`/data-products/v2`): create and update tracking plans, view change history, and manage subscriptions - [Event specifications](/docs/event-studio/programmatic-management/event-specifications-api/) (`/event-specs/v1`): create, publish, deprecate, and delete event specifications within tracking plans You can explore all available endpoints in the [Swagger API documentation](https://console.snowplowanalytics.com/api/msc/v1/docs). > **Note:** By default, Snowplow pipelines use Iglu Server schema registries. Each pipeline has a development and a production Iglu Server instance. > > The Console API only works with these registries. If you're using a custom static S3 registry instead, you'll need to update your registry manually. ### Authentication Each request requires your organization ID and an authorization token. You can find your organization ID [on the **Manage organization** page](https://console.snowplowanalytics.com/settings) in Console. Follow the instructions in the [Account management](/docs/account-management/) section to obtain an access token for API authentication. To be able to post sample requests in the [Swagger API documentation](https://console.snowplowanalytics.com/api/msc/v1/docs), click the `Authorize` button at the top of the document and authorize with your token. The value for the token field in each individual requests is overwritten by this authorization. --- # Manage data structures via Snowplow CLI > Use the Snowplow CLI data-structures command to generate, download, validate, and publish data structures with git-ops workflows and JSON Schema validation. > Source: https://docs.snowplow.io/docs/event-studio/programmatic-management/snowplow-cli/data-structures/ The `data-structures` subcommand of [Snowplow CLI](/docs/event-studio/programmatic-management/snowplow-cli/) provides a collection of functionality to ease the integration of custom development and publishing workflows. ## Snowplow CLI Prerequisites Installed and configured [Snowplow CLI](/docs/event-studio/programmatic-management/snowplow-cli/) ## Available commands ### Creating data structures ```bash snowplow-cli ds generate login_click ./folder-name ``` Will create a minimal data structure template in a new file `./folder-name/login_click.yaml`. Note that you will need to add a vendor name to the template before it will pass validation. Alternatively supply a vendor at creation time with the `--vendor com.acme` flag. ### Downloading data structures ```bash snowplow-cli ds download ``` This command will retrieve all organization data structures. By default it will create a folder named `data-structures` in the current working directory to put them in. It uses a combination of vendor and name to further break things down. Given a data structure with `vendor: com.acme` and `name: link_click` and assuming the default format of yaml the resulting folder structure will be `./data-structures/com.acme/link_click.yaml`. > **Note:** The CLI download command only retrieves data structures that have been deployed to at least the development environment. **Draft data structures** that haven't been deployed yet will not be included in the download. ### Validating data structures ```bash snowplow-cli ds validate ./folder-name ``` This command will find all files under `./folder-name` (if omitted then `./data-structures`) and attempt to validate them using Snowplow Console. It will assert the following 1. Is each file a valid format (yaml/json) with expected fields 2. Does the schema in the file conform to [snowplow expectations](/docs/fundamentals/schemas/#self-describing-json-schema-anatomy) 3. Given the organization's [loading configuration](/docs/destinations/warehouses-lakes/) will any schema version number choices have a potentially negative effect on data loading If any validations fail the command will report the problems to stdout and exit with status code 1. ### Publishing data structures ```bash snowplow-cli ds publish dev ./folder-name ``` This command will find all files under `./folder-name` (if omitted then `./data-structures`) and attempt to publish them to Snowplow Console in the environment provided (`dev` or `prod`). Publishing to `dev` will also cause data structures to be validated with the `validate` command before upload. Publishing to `prod` will not validate but requires all data structures referenced to be present on `dev`. --- # Snowplow CLI for data management > Command-line tool for downloading and syncing data structures and tracking plans to Snowplow Console, enabling git-ops workflows with reviews and branching. > Source: https://docs.snowplow.io/docs/event-studio/programmatic-management/snowplow-cli/ > **Info:** We originally called tracking plans "data products". You'll still find the old term used in some existing APIs and CLI commands. Snowplow CLI brings data management elements of Snowplow Console into the command line. It allows you to download your data structures and tracking plans to YAML/JSON files and publish them back to Console. This enables git-ops-like workflows, where your tracking design lives in your repository alongside your application code. Changes are reviewed and deployed through your standard development process. ## Install Snowplow CLI can be installed with [homebrew](https://brew.sh/): ```bash brew install snowplow/taps/snowplow-cli ``` Verify the installation with ```bash snowplow-cli --help ``` For systems where homebrew is not available binaries for multiple platforms can be found in [releases](https://github.com/snowplow/snowplow-cli/releases). Example installation for `linux_x86_64` using `curl` ```bash curl -L -o snowplow-cli https://github.com/snowplow/snowplow-cli/releases/latest/download/snowplow-cli_linux_x86_64 chmod u+x snowplow-cli ``` Verify the installation with ```bash snowplow-cli --help ``` ## Configure ### Automated Setup The easiest way to configure Snowplow CLI is using the built-in `setup` command: ```bash snowplow-cli setup ``` This command will: - Guide you through device authentication with Snowplow Console - Automatically create and configure your API credentials - Set up your organization ID **Prerequisites:** Your Snowplow Console account must have sufficient permissions to create API keys. You can also use optional flags: - `--read-only`: Create a read-only API key - `--dotenv`: Store configuration as .env file in current working directory ### Manual Configuration If you prefer manual configuration, you will need these values: - An API Key ID and the corresponding API Key (secret), which are generated from the [credentials section](https://console.snowplowanalytics.com/credentials) in Console. - Your Organization ID, which you can find [on the _Manage organization_ page](https://console.snowplowanalytics.com/settings) in Console. Snowplow CLI can take its configuration from a variety of sources. More details are available from `snowplow-cli data-structures --help`. Variations on these three examples should serve most cases. **env variables or .dotenv file:** ```bash SNOWPLOW_CONSOLE_API_KEY_ID=********-****-****-****-************ SNOWPLOW_CONSOLE_API_KEY=********-****-****-****-************ SNOWPLOW_CONSOLE_ORG_ID=********-****-****-****-************ ``` **$HOME/.config/snowplow/snowplow\.yml:** ```yaml console: api-key-id: ********-****-****-****-************ api-key: ********-****-****-****-************ org-id: ********-****-****-****-************ ``` **inline arguments:** ```bash snowplow-cli data-structures --api-key-id ********-****-****-****-************ --api-key ********-****-****-****-************ --org-id ********-****-****-****-************ ``` *** Snowplow CLI defaults to yaml format. It can be changed to json by either providing a `--output-format json` flag or setting the `output-format: json` config value. It will work for all commands where it matters, not only for `generate`. ### Verify Configuration After configuration, you can verify that everything is working correctly using the `status` command: ```bash snowplow-cli status ``` This command will: - Check that your API credentials are properly configured - Verify connectivity to Snowplow Console - Confirm your organization access - Provide helpful troubleshooting information if issues are found If the status check fails, the command will suggest the next steps such as running `snowplow-cli setup` to reconfigure your credentials. ### Configure YAML language server (optional) During the tracking plans authoring process it's convenient to have the editor highlight errors, suggest possible values, and show examples. All files generated by Snowplow CLI's `generate` and `download` commands will contain a special comment as part of the file. This comment can be interpreted by a YAML Language server. For VS Code you can get this authoring functionality using this [YAML extension](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml). Instructions for other editors can be found [here](https://github.com/redhat-developer/yaml-language-server?tab=readme-ov-file#clients). After that running `snowplow-cli ds generate test` and opening the generated file in the configured editor of choice should look something like the following: ![](/assets/images/lspValidation-21bfbfb84fe78d4b98d42b0ef09eda5b.png) ## MCP server The Snowplow CLI includes a local Model Context Protocol (MCP) server that enables natural language interaction with AI assistants like Claude Desktop, Cursor, GitHub Copilot, and other MCP-compatible clients for creating, validating, and managing your Snowplow tracking plans. This allows you to: - Create and validate data structures through conversation - Analyze tracking requirements and suggest implementations - Validate tracking plans and source applications For setup instructions and configuration examples for different MCP clients, see our [MCP tutorial](/tutorials/snowplow-cli-mcp/introduction/). ## Use cases - [Manage your data structures with snowplow-cli](/docs/event-studio/programmatic-management/snowplow-cli/data-structures/) - [Set up a GitHub CI/CD pipeline to manage data structures and tracking plans](/tutorials/data-structures-in-git/introduction/) --- # Snowplow CLI command reference > Complete reference for Snowplow CLI commands including data-products and data-structures subcommands with options and usage examples. > Source: https://docs.snowplow.io/docs/event-studio/programmatic-management/snowplow-cli/reference/ > **Info:** We originally called tracking plans "data products". You'll still find the old term used in some existing APIs and CLI commands. This page contains the complete reference for the Snowplow CLI commands. ## Data-Products Work with Snowplow tracking plans ### Examples ```text $ snowplow-cli data-products validate ``` ### Options ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id -h, --help help for data-products -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id ``` ### Options inherited from parent commands ```text --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env --json-output Log output as json -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Products Add-Event-Spec Add new event spec to an existing tracking plan ### Synopsis Adds one or more event specifications to an existing tracking plan file, or print them to the stdout. The command takes the path to a tracking plan file and adds the specified event specifications to it. When run without the argument - it will print generated event specs to the stdout. The command will attempt to keep the formatting and comments of the original file intact, but it's a best effort approach. Some comments might be deleted, some formatting changes might occur. ```text snowplow-cli data-products add-event-spec {path} [flags] ``` ### Examples ```text $ snowplow-cli dp add-event-spec --event-spec user_login --event-spec page_view ./my-data-product.yaml $ snowplow-cli dp add-es ./data-products/analytics.yaml -e "checkout_completed" -e "item_purchased" ``` ### Options ```text -e, --event-spec stringArray Name of event spec to add -h, --help help for add-event-spec -f, --output-format string Format of stdout output. Only applicable when the file in not specified. Json or yaml are supported (default "yaml") ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Products Download Download all tracking plans, event specs and source apps from Snowplow Console ### Synopsis Downloads the latest versions of all tracking plans, event specs and source apps from Snowplow Console. If no directory is provided then defaults to 'data-products' in the current directory. Source apps are stored in the nested 'source-apps' directory ```text snowplow-cli data-products download {directory ./data-products} [flags] ``` ### Examples ```text $ snowplow-cli dp download $ snowplow-cli dp download ./my-data-products ``` ### Options ```text -h, --help help for download -f, --output-format string Format of the files to read/write. json or yaml are supported (default "yaml") --plain Don't include any comments in yaml files ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Products Generate Generate new tracking plans and source applications locally ### Synopsis Will write new tracking plans and/or source application to file based on the arguments provided. Example: $ snowplow-cli dp gen --source-app "Mobile app" Will result in a new source application getting written to './data-products/source-applications/mobile-app.yaml' $ snowplow-cli dp gen --data-product "Ad tracking" --output-format json --data-products-directory dir1 Will result in a new tracking plan getting written to './dir1/ad-tracking.json' ```text snowplow-cli data-products generate [paths...] [flags] ``` ### Examples ```text $ snowplow-cli dp generate --source-app "Mobile app" --source-app "Web app" --data-product "Signup flow" ``` ### Options ```text --data-product stringArray Name of tracking plan to generate --data-products-directory string Directory to write tracking plans to (default "data-products") -h, --help help for generate --output-format string File format (yaml|json) (default "yaml") --plain Don't include any comments in yaml files --source-app stringArray Name of source app to generate --source-apps-directory string Directory to write source apps to (default "data-products/source-apps") ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Products Purge Purges (permanently removes) all remote tracking plans and source apps that do not exist locally ### Synopsis Purges (permanently removes) all remote tracking plans and source apps that do not exist locally. If no directory is provided then defaults to 'data-products' in the current directory. Source apps are stored in the nested 'source-apps' directory ```text snowplow-cli data-products purge {directory ./data-products} [flags] ``` ### Examples ```text $ snowplow-cli dp purge $ snowplow-cli dp purge ./my-data-products ``` ### Options ```text -h, --help help for purge -y, --yes commit to purge ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Products Release Sync tracking plans, event specs and source apps to Snowplow Console, then release event specs to be available within your pipeline ### Synopsis Sync tracking plans, event specs and source apps to Snowplow Console, then release event specs. This command runs 'sync' first, then releases the event specs. Releasing sets the event spec status to 'published' and pushes them to the pipeline, enabling event spec inference. Only event specs that exist locally will be released. Each event spec must have an event defined, and all referenced events and entities must be published to the production environment. If no directory is provided then defaults to 'data-products' in the current directory. Source apps are stored in the nested 'source-apps' directory ```text snowplow-cli data-products release {directory ./data-products} [flags] ``` ### Examples ```text $ snowplow-cli dp release $ snowplow-cli dp release ./my-data-products ``` ### Options ```text -c, --concurrency int The number of validation requests to perform at once (maximum 10) (default 3) -d, --dry-run Only print planned changes without performing them --gh-annotate Output suitable for github workflow annotation (ignores -s) -h, --help help for release ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Products Sync Sync tracking plans, event specs and source apps to Snowplow Console ### Synopsis Sync tracking plans, event specs and source apps to Snowplow Console. This command syncs local files with Snowplow Console. Tracking plans, event specs and source apps are created or updated as needed. Tracking plans and source apps that exist in Snowplow Console are updated in place. Structural changes to event specs (name, event, entities) will instead create a new draft version of the event spec. Use 'release' to also release event specs, which changes the status in Snowplow Console to "published" and enables event spec inference. If no directory is provided then defaults to 'data-products' in the current directory. Source apps are stored in the nested 'source-apps' directory ```text snowplow-cli data-products sync {directory ./data-products} [flags] ``` ### Examples ```text $ snowplow-cli dp sync $ snowplow-cli dp sync ./my-data-products ``` ### Options ```text -c, --concurrency int The number of validation requests to perform at once (maximum 10) (default 3) -d, --dry-run Only print planned changes without performing them --gh-annotate Output suitable for github workflow annotation (ignores -s) -h, --help help for sync ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Products Validate Validate tracking plans and source applications with Snowplow Console ### Synopsis Sends all tracking plans and source applications from \ for validation by Snowplow Console. ```text snowplow-cli data-products validate [paths...] [flags] ``` ### Examples ```text $ snowplow-cli dp validate ./data-products ./source-applications $ snowplow-cli dp validate ./src ``` ### Options ```text -c, --concurrency int The number of validation requests to perform at once (maximum 10) (default 3) --full Perform compatibility check on all files, not only the ones that were changed --gh-annotate Output suitable for github workflow annotation (ignores -s) -h, --help help for validate ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Structures Work with Snowplow data structures ### Examples ```text $ snowplow-cli data-structures generate my_new_data_structure $ snowplow-cli ds validate $ snowplow-cli ds publish dev ``` ### Options ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id -h, --help help for data-structures -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id ``` ### Options inherited from parent commands ```text --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env --json-output Log output as json -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Structures Download Download all data structures from Snowplow Console ### Synopsis Downloads the latest versions of all data structures from Snowplow Console. Will retrieve schema contents from your development environment. If no directory is provided then defaults to 'data-structures' in the current directory. By default, data structures with empty schemaType (legacy format) are skipped. Use --include-legacy to include them (they will be set to 'entity' schemaType). ```text snowplow-cli data-structures download {directory ./data-structures} [flags] ``` ### Examples ```text $ snowplow-cli ds download Download data structures matching com.example/event_name* or com.example.subdomain* $ snowplow-cli ds download --match com.example/event_name --match com.example.subdomain Download with custom output format and directory $ snowplow-cli ds download --output-format json ./my-data-structures Include legacy data structures with empty schemaType $ snowplow-cli ds download --include-legacy ``` ### Options ```text -h, --help help for download --include-drafts Include drafts data structures --include-legacy Include legacy data structures with empty schemaType (will be set to 'entity') --match stringArray Match for specific data structure to download (eg. --match com.example/event_name or --match com.example) -f, --output-format string Format of the files to read/write. json or yaml are supported (default "yaml") --plain Don't include any comments in yaml files ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Structures Generate Generate a new data structure locally ### Synopsis Will write a new data structure to file based on the arguments provided. Example: $ snowplow-cli ds gen login\_click --vendor com.example Will result in a new data structure getting written to './data-structures/com.example/login\_click.yaml' The directory 'com.example' will be created automatically. $ snowplow-cli ds gen login\_click Will result in a new data structure getting written to './data-structures/login\_click.yaml' with an empty vendor field. Note that vendor is a required field and will cause a validation error if not completed. ```text snowplow-cli data-structures generate login_click {directory ./data-structures} [flags] ``` ### Examples ```text $ snowplow-cli ds generate my-ds $ snowplow-cli ds generate my-ds ./my-data-structures ``` ### Options ```text --entity Generate data structure as an entity --event Generate data structure as an event (default true) -h, --help help for generate --output-format string Format for the file (yaml|json) (default "yaml") --plain Don't include any comments in yaml files --vendor string A vendor for the data structure. Must conform to the regex pattern [a-zA-Z0-9-_.]+ ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Structures Publish Publishing commands for data structures ### Synopsis Publishing commands for data structures Publish local data structures to Snowplow Console. ### Options ```text -h, --help help for publish ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Structures Publish Dev Publish data structures to your development environment ### Synopsis Publish modified data structures to Snowplow Console and your development environment The 'meta' section of a data structure is not versioned within Snowplow Console. Changes to it will be published by this command. ```text snowplow-cli data-structures publish dev [paths...] default: [./data-structures] [flags] ``` ### Examples ```text $ snowplow-cli ds publish dev $ snowplow-cli ds publish dev --dry-run $ snowplow-cli ds publish dev --dry-run ./my-data-structures ./my-other-data-structures ``` ### Options ```text -d, --dry-run Only print planned changes without performing them --gh-annotate Output suitable for github workflow annotation (ignores -s) -h, --help help for dev ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Structures Publish Prod Publish data structures to your production environment ### Synopsis Publish data structures from your development to your production environment Data structures found on \ which are deployed to your development environment will be published to your production environment. ```text snowplow-cli data-structures publish prod [paths...] default: [./data-structures] [flags] ``` ### Examples ```text $ snowplow-cli ds publish prod $ snowplow-cli ds publish prod --dry-run $ snowplow-cli ds publish prod --dry-run ./my-data-structures ./my-other-data-structures ``` ### Options ```text -d, --dry-run Only print planned changes without performing them -h, --help help for prod ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Data-Structures Validate Validate data structures with Snowplow Console ### Synopsis Sends all data structures from \ for validation by Snowplow Console. ```text snowplow-cli data-structures validate [paths...] default: [./data-structures] [flags] ``` ### Examples ```text $ snowplow-cli ds validate $ snowplow-cli ds validate ./my-data-structures ./my-other-data-structures ``` ### Options ```text --gh-annotate Output suitable for github workflow annotation (ignores -s) -h, --help help for validate ``` ### Options inherited from parent commands ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") --json-output Log output as json -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Mcp Start an MCP (Model Context Protocol) stdio server for Snowplow validation and context ### Synopsis Start an MCP (Model Context Protocol) stdio server that provides tools for: - Validating Snowplow files (data-structures, data-products, source-applications) - Retrieving the built-in schema and rules that define how Snowplow data structures, tracking plans, and source applications should be structured ```text snowplow-cli mcp [flags] ``` ### Examples ```text Claude Desktop config: { "mcpServers": { ... "snowplow-cli": { "command": "snowplow-cli", "args": ["mcp"] } } } VS Code '\/.vscode/mcp.json': { "servers": { ... "snowplow-cli": { "type": "stdio", "command": "snowplow-cli", "args": ["mcp"] } } } Cursor '\/.cursor/mcp.json': { "mcpServers": { ... "snowplow-cli": { "command": "snowplow-cli", "args": ["mcp", "--base-directory", "."] } } } Note: This server's validation tools require filesystem paths to validate assets. For full functionality, your MCP client needs filesystem write access so created assets can be saved as files and then validated. Setup options: - Enable filesystem access in your MCP client, or - Run alongside an MCP filesystem server (e.g., @modelcontextprotocol/server-filesystem) ``` ### Options ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --base-directory string The base path to use for relative file lookups. Useful for clients that pass in relative file paths. --dump-context Dumps the result of the get_context tool to stdout and exits. -h, --help help for mcp -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id ``` ### Options inherited from parent commands ```text --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env --json-output Log output as json -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Setup Set up Snowplow CLI with device authentication ### Synopsis Authenticate with Snowplow Console using device authentication flow and create an API key ```text snowplow-cli setup [flags] ``` ### Examples ```text $ snowplow-cli setup $ snowplow-cli setup --read-only ``` ### Options ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id --auth0-domain string Auth0 domain (default "id.snowplowanalytics.com") --client-id string Auth0 Client ID for device auth (default "EXQ3csSDr6D7wTIiebNPhXpgkSsOzCzi") --dotenv Store as .env file in current working directory -h, --help help for setup -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id --read-only Create a read-only API key ``` ### Options inherited from parent commands ```text --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env --json-output Log output as json -q, --quiet Log output level to Warn -s, --silent Disable output ``` ## Status Check Snowplow CLI configuration and connectivity ### Synopsis Verify that the CLI is properly configured and can connect to Snowplow Console ```text snowplow-cli status [flags] ``` ### Examples ```text $ snowplow-cli status ``` ### Options ```text -S, --api-key string Snowplow Console api key -a, --api-key-id string Snowplow Console api key id -h, --help help for status -H, --host string Snowplow Console host (default "https://console.snowplowanalytics.com") -m, --managed-from string Link to a github repo where the data structure is managed -o, --org-id string Your organization id ``` ### Options inherited from parent commands ```text --config string Config file. Defaults to $HOME/.config/snowplow/snowplow.yml Then on: Unix $XDG_CONFIG_HOME/snowplow/snowplow.yml Darwin $HOME/Library/Application Support/snowplow/snowplow.yml Windows %AppData%\snowplow\snowplow.yml --debug Log output level to Debug --env-file string Environment file (.env). Defaults to .env in current directory Then on: Unix $HOME/.config/snowplow/.env Darwin $HOME/Library/Application Support/snowplow/.env Windows %AppData%\snowplow\.env --json-output Log output as json -q, --quiet Log output level to Warn -s, --silent Disable output ``` --- # Managing tracking plans via the CLI > Use the Snowplow CLI data-products command to create, download, validate, sync, and release tracking plans, event specifications, and source applications with git-ops workflows. > Source: https://docs.snowplow.io/docs/event-studio/programmatic-management/snowplow-cli/tracking-plans/ > **Info:** We originally called tracking plans "data products". You'll still find the old term used in some existing APIs and CLI commands. The `data-products` subcommand of [Snowplow CLI](/docs/event-studio/programmatic-management/snowplow-cli/) provides a collection of functionality to ease the integration of custom development and publishing workflows. ## Snowplow CLI Prerequisites Installed and configured [Snowplow CLI](/docs/event-studio/programmatic-management/snowplow-cli/) ## Available commands ### Creating tracking plan ```bash snowplow-cli dp generate --data-product my-data-product ``` This command creates a minimal tracking plan template in a new file `./data-products/my-data-product.yaml`. ### Creating source application ```bash snowplow-cli dp generate --source-app my-source-app ``` This command creates a minimal source application template in a new file `./data-products/source-apps/my-source-app.yaml`. ### Creating event specification To create an event specification, you need to modify the existing data-product file and add an event specification object. Here's a minimal example: ```yaml apiVersion: v1 resourceType: data-product resourceName: 3d3059c4-d29b-4979-a973-43f7070e1dd0 data: name: test-cli sourceApplications: [] eventSpecifications: - resourceName: 11d881cd-316e-4286-b5d4-fe7aebf56fca name: test event: source: iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0 ``` > **Warning:** The `source` fields of events and entities must refer to a deployed data structure. Referring to a locally created data structure is not yet supported. ### Linking tracking plan to a source application To link a tracking plan to a source application, provide a list of references to the source application files in the `data.sourceApplications` field. Here's an example: ```yaml apiVersion: v1 resourceType: data-product resourceName: 3d3059c4-d29b-4979-a973-43f7070e1dd0 data: name: test-cli sourceApplications: - $ref: ./source-apps/my-source-app.yaml ``` ### Modifying the event specifications source applications By default event specifications inherit all the source applications of the tracking plan. If you want to customise it, you can use the `excludedSourceApplications` in the event specification description to remove a given source application from an event specification. ```yaml apiVersion: v1 resourceType: data-product resourceName: 3d3059c4-d29b-4979-a973-43f7070e1dd0 data: name: test-cli sourceApplications: - $ref: ./source-apps/generic.yaml - $ref: ./source-apps/specific.yaml eventSpecifications: - resourceName: 11d881cd-316e-4286-b5d4-fe7aebf56fca name: All source apps event: source: iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0 - resourceName: b9c994a0-03b2-479c-b1cf-7d25c3adc572 name: Not quite everything excludedSourceApplications: - $ref: ./source-apps/specific.yaml event: source: iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0 ``` In this example event specification `All source apps` is related to both `generic` and `specific` source apps, but event specification `Not quite everything` is related only to the `generic` source application. ### Downloading tracking plans, event specifications and source apps ```bash snowplow-cli dp download ``` This command retrieves all organization tracking plans, event specifications, and source applications. By default, it creates a folder named `data-products` in your current working directory. You can specify a different folder name as an argument if needed. The command creates the following structure: - A main `data-products` folder containing your tracking plan files - A `source-apps` subfolder containing source application definitions - Event specifications embedded within their related tracking plan files. ### Validating tracking plans, event specifications and source applications ```bash snowplow-cli dp validate ``` This command scans all files under `./data-products` and validates them using Snowplow Console. It checks: 1. Whether each file is in a valid format (YAML/JSON) with correctly formatted fields 2. Whether all source application references in the tracking plan files are valid 3. Whether event specification rules are compatible with their schemas If validation fails, the command displays the errors in the console and exits with status code 1. ### Syncing tracking plans, event specifications and source applications ```bash snowplow-cli dp sync ``` This command locates all files under `./data-products`, validates them, and pushes local changes to Console. Tracking plans and source applications are updated in place. For event specifications, a new version in draft status may be created if the change is structural (name change, event change, or rules change). > **Warning:** The old `publish` command has been renamed to `sync`. Running `dp publish` still works as an alias for backward compatibility, but it may be removed in a future release. Update your scripts to use `dp sync` instead. ### Releasing event specifications ```bash snowplow-cli dp release ``` This command first syncs local files with Console (like `sync`), then releases any draft event specifications. Releasing marks event specifications as published and enables event spec inference. Only event specifications that are part of the local tracking plan files are affected — other event specifications in Console are left unchanged. Event specifications without an event are skipped. Use `sync` if you only want to push changes without changing event specification status. --- # Managing tracking plans via the API > Programmatically manage tracking plans through the API with endpoints for creating, updating, retrieving, and managing subscriptions for automated workflows and version control integration. > Source: https://docs.snowplow.io/docs/event-studio/programmatic-management/tracking-plans-api/ > **Info:** We originally called tracking plans "data products". You'll still find the old term used in some existing APIs and CLI commands. Use the tracking plans Console API endpoints to programmatically manage your tracking plans. ## Retrieving Information about Tracking Plans The following `GET` requests are designed to allow you to access information about tracking plans. ### Retrieve a List of All Tracking Plans To retrieve a comprehensive list of all tracking plans in your organization, you can use the following GET request: `**GET** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v2` Path parameter `organizationId` is required. ### Retrieving Information about a Specific Tracking Plan `**GET** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v2/{dataProductId}` Path parameters `organizationId` and `dataProductId` are required. When retrieving a tracking plan, it could also contain an array field `data[].tracking_scenarios` that will include the `id` and `url` of the associated event specifications. For example: ```json "data": [ ... "eventSpecs": [ { "id": "d1336abc-1b60-46f7-be2d-2105f2daf283", "url": "https://console.snowplowanalytics.com/api/msc/v1/organizations/f51dada7-4f11-4b6a-bbbd-2cf6a3673035/event-specs/v1/d1336abc-1b60-46f7-be2d-2105f2daf283" } ] ... ] ``` Under the json path `includes.tracking_scenarios`, the API will also attach associated event specifications in their entirety: ```json "includes": { ... "eventSpecs": [ "id": "d1336abc-1b60-46f7-be2d-2105f2daf283", ... ] ... } ``` ### Retrieve History Information for a Tracking Plan If you wish to retrieve the change log of a specific tracking plan, you can use the following GET request: `**GET** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v2/{dataProductId}/history` You can pass several parameters to control the result of the response: - **before**: returns records equal or less than the timestamp in the ISO-8601 format - **limit**: limits the number of records - **offset**: skip the first N results - **order**: order of returned records, `asc` or `desc`. Defaults to `desc`. Path parameter `organizationId` is required. ## Creating and updating Tracking Plans ### Creating a Tracking Plan This `POST` request allows you to create a new tracking plan within an organization. `**POST** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v2` The request body is mandatory and should be in JSON format. The minimum payload would be a JSON with only the `name` of the tracking plan. The remaining fields are optional and not required on creation. Example: ```json { "name": "Performance tracking", "description": "Tracks performance", "domain": "Marketing", "owner": "IT department", "accessInstructions": "The data can be accessed in the warehouse, in the atomic.events table" } ``` > **Note:** The name of your tracking plan must be unique to ensure proper identification and avoid conflicts. ### Updating a Tracking Plan Use this request to update a tracking plan. The `dataProductId` is required, along with a valid request body. The minimum payload on update would be the same as on creation but with the addition of the required `status` field. On creation, by default, it will set the `status` to `draft`. `**POST** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v2/{dataProductId}` > **Note:** The name of your tracking plan must be unique to ensure proper identification and avoid conflicts. See the [detailed API documentation](https://console.snowplowanalytics.com/api/msc/v1/docs) for all options. ### Delete a Tracking Plan Use this request to delete a tracking plan. The `dataProductId` and `organizationId` are both required. `**POST** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v2/{dataProductId}` ## Subscription Management for Tracking Plans ### Retrieve All Subscriptions for a Tracking Plan To retrieve all subscriptions for a tracking plan, use the following request. The `organizationId` and `dataProductId` are required. `**GET** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v1/{dataProductId}/subscriptions` ### Add a Subscription To add a subscription for a tracking plan, use the following request. The `organizationId`, `dataProductId` and a valid request body are required. `**POST** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v1/{dataProductId}/subscriptions` The following is the minimum accepted payload. It will create a subscription for the user who issues the request, as inferred by the JWT in the request headers. ```json { "reason": "Get notified on breaking changes", "receiveNotifications": true } ``` If you want to subscribe a different user you will need to populate an additional field, `recipient`, with that user's email address. When a subscription is created, it will send a confirmation email to the recipient (default user or third user). Clicking the confirmation link in that email will direct the recipient to the following URL and mark the subscription as confirmed: `**POST** /organizations/{organizationId}/data-products/v1/{dataProductId}/subscriptions/{subscriptionId}/actions/confirm` Once a subscription is created and the email has been confirmed, the subscriber will start receiving a daily email digest referencing all the tracking plans that had changes in the last 24 hours. ### Update a Subscription To update a subscription for a specific tracking plan, use the following request. Path parameters `organizationId`, `subscriptionId`, `dataProductId`, and a valid request body are required. `**PUT** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v1/{dataProductId}/subscriptions/{subscriptionId}` ### Delete a Subscription To delete a subscription for a specific tracking plan (unsubscribe action), use the following request. Path parameters `organizationId`, `subscriptionId`, `dataProductId`, and a valid request body are required. `**DELETE** ​/api​/msc​/v1​/organizations/{organizationId}/data-products/v1/{dataProductId}/subscriptions/{subscriptionId}` ### Resend a Subscription Confirmation Email To resend a subscription confirmation email, use the following request. Path parameters `organizationId`, `subscriptionId`, `dataProductId` are required. `**POST** ​/api​/msc​/v1/organizations/{organizationId}/data-products/v1/{dataProductId}/subscriptions/{subscriptionId}/actions/resend-confirmation` ### Integration with the SDK Generator To send emails with instructions for the SDK generator, use the following request. Path parameters `organizationId` and `dataProductId` and a valid request body are required. `**POST** /organizations/{organizationId}/data-products/v2/{dataProductId}/share-instructions` --- # Query your data > Use example SQL queries from Console to retrieve and analyze your event data in your data warehouse. > Source: https://docs.snowplow.io/docs/event-studio/query-sql/ When viewing an [event specification](/docs/event-studio/tracking-plans/event-specifications/) in Console, the **Working with this event** section provides ready-to-use code snippets. The **Querying** tab provides example queries to help you retrieve and analyze your event data. Choose your warehouse to see appropriately optimized SQL. ![Querying SQL examples](/assets/images/sql-example-d64be1abac9a42e1162f0ed311e2db80.png) --- # Organize data sources with source applications > Document and manage your tracking implementation across different applications with source applications in Event Studio. > Source: https://docs.snowplow.io/docs/event-studio/source-applications/ For data collection, you will often have different sources of information that correspond to applications designed for a particular purpose. These are what we refer to as source applications. > **Tip:** A guideline, which will address your needs most of the time, is to think of a source application as an independently deployable application system. For example: > > - An Android mobile application > - An iOS mobile application > - A web application > > This will let you best manage changes you make to the available application entities and make sure it reflects as closely as possible the current data state. To illustrate, consider Snowplow. We can identify several applications designed for distinct purposes, each serving as a separate data source for behavioral data, or in other words, a source application: - The Snowplow website that corresponds to the application served under `www.snowplow.io` - The Console application that is served under `console.snowplowanalytics.com` - The documentation website serving as our information hub for all things related to our product, served under `docs.snowplow.io` Source applications are a foundational component that enables you to establish the overarching relationships that connect application IDs and [application entities](/docs/sources/web-trackers/custom-tracking-using-schemas/global-context/) and [tracking plans](/docs/event-studio/tracking-plans/). ## Application IDs Each source application should have a unique application ID to distinguish it in analysis. For each of these applications you would set up a unique application ID using the [`app_id`](/docs/events/ootb-data/app-information/#application-atomic-event-properties) field to distinguish them later on in analysis. > **Tip:** We often see, and recommend as a best practice, setting up a unique application ID for each deployment environment you are using. For example, `${appId}-qa` for staging and `${appId}-dev` for development environments. ## Application entities Application entities, also referred to as [global context](/docs/sources/web-trackers/custom-tracking-using-schemas/global-context/), are a set of entities that can be sent with every event recorded in the application. Using source applications you can document which application entities are expected. This is useful for tracking implementation, data discovery, and preventing information duplication in tracking plans. > **Info:** Since application entities can also be set conditionally, you can mark any of them as optional with a note to better understand the condition or any extra information required. The method for conditionally adding an application entity is through [rulesets](/docs/sources/web-trackers/custom-tracking-using-schemas/global-context/#rulesets), [filter functions](/docs/sources/web-trackers/custom-tracking-using-schemas/global-context/#filter-functions) and [context generators](/docs/sources/web-trackers/custom-tracking-using-schemas/global-context/#context-generators). ## Initialize tracking with the tracker configuration Once you have created a source application, you can use the **Set up tracking** tab to configure tracking and generate ready-to-use code snippets. This guided configuration simplifies the instrumentation process and helps you start receiving events. > **Info:** The Set up tracking tab currently supports the JavaScript tracker. You can always customize your tracking further by referring to the [tracker documentation](/docs/sources/). ### Configure tracker settings The Set up tracking tab provides a visual interface to configure your tracker and generates code snippets based on your selections. #### Initialize tracker Configure the basic tracker settings: - **Collector URL**: the endpoint where your events will be sent - **App ID**: select one of the application IDs associated with your source application ![A screenshot from Console showing the initialize tracker section, with dropdowns for Collector URL and App ID](/assets/images/initialize-tracker-96764a0c520edf581afece137c0282dd.png) #### Automatic tracking Enable out-of-the-box tracking features to capture common user interactions without additional code: - **Page views**: automatically track when pages are viewed - **Link clicks**: capture clicks on links - **Form interactions**: track form submissions and field interactions - **Page pings**: monitor user engagement with periodic activity pings Toggle these features based on your tracking requirements. ![A screenshot from Console showing automatic tracking configuration](/assets/images/automatic-tracking-8a40a05374e35316c3710f66a6ca9759.png) #### Implementation The code snippet at the bottom of the Set up tracking tab updates in real-time as you modify settings. Copy the code snippet and integrate it into your application to begin tracking. Choose your implementation method: **JavaScript (tag):** Add a ` ``` > **Note:** The tracker utilises AMP linker functionality to ensure that user identification can be done where a user may visit via the google domain or the site's own domain. In order to function, this requires that linkers are enabled in the amp-analytics configuration. Not doing so can result in changing AMP client Ids (which are the primary user identifier). ## Standard variables `collectorHost` and `appId` must be provided in the `"vars"` section of the tag: ```javascript "vars": { ... }, ``` The rest are optional ### `collectorHost` Specify the host to your collector like so: ```javascript "vars": { "collectorHost": "snowplow-collector.acme.com", ... ``` Notes: - Do _not_ include the protocol aka schema (`http(s)://`) - Do _not_ include a trailing slash - Use of HTTPS is mandatory in AMP, so your Snowplow collector **must** support HTTPS ### `appId` You must set the application ID for the website you are tracking via AMP: ```javascript "vars": { "appId": "campaign-microsite", ... ``` Notes: - You do not have to use the `appId` to distinguish AMP traffic from other web traffic (unless you want to) - see the [Analytics](#analytics) section for an alternative approach. ### `userId` Specify the optional `"userId"` var to set the uid/user\_id [Snowplow Tracker Protocol](/docs/fundamentals/canonical-event/#user-fields) field. ```javascript "vars": { "userId": "someUserId", ... ``` ### `nameTracker` Specify the optional "nameTracker" var to set the tna/name\_tracker [Snowplow Tracker Protocol](/docs/fundamentals/canonical-event/#application-fields) field. ```javascript "vars": { "nameTracker": "someTrackerName", ... ``` ### `customContexts` Custom contexts may be added by including Self-Describing JSON as a `"customContexts"` var, with `"` characters escaped: ```javascript "vars": { "customContexts": "{\"schema\":\"iglu:com.acme/first_context/jsonschema/1-0-0\",\"data\":{\"someKey\":\"someValue\"}}" ... ``` Multiple custom contexts may be added by separating each self-describing JSON with a comma: ```javascript "vars": { "customContexts": "{\"schema\":\"iglu:com.acme/first_context/jsonschema/1-0-0\",\"data\":{\"someKey\":\"someValue\"}},{\"schema\":\"iglu:com.acme/second_context/jsonschema/1-0-0\",\"data\":{\"someOtherKey\":\"someOtherValue\"}}" ... ``` Custom contexts may either be set globally for all events (as a top-level var, see above) or per-event trigger: ```javascript ... "triggers": { ... "defaultPageview": { "on": "click", "selector": "visible", "request": "pageView", "vars": { "customContexts": "{\"schema\":\"iglu:com.acme/first_context/jsonschema/1-0-0\",\"data\":{\"someKey\":\"someValue\"}}" } } } ... ``` These approaches may not currently be mixed, however. ## Tracking events The following trigger request values are supported for the Snowplow Analytics configuration: - `pageView` for page view tracking - `structEvent` for structured event tracking - `ampPagePing` for page ping tracking - `selfDescribingEvent` for custom event tracking All event tracking is disabled by default; you can enable it on an event-by-event basis as follows: ### Page view Enable the page view tracking like so: ```javascript ``` ### Structured events Structured events are user interactions with content that can be tracked independently from a web page or a screen load. "Structured" refers to the Google Analytics-style structure of having up to five fields (with only the first two required). Events can be sent by setting the AMP trigger request value to event and setting the required event category and action fields. The following example uses the selector attribute of the trigger to send an event when a particular element is clicked: ```javascript ``` You can set key-value pairs for the following event fields in the vars attribute of the trigger: | **Argument** | **Description** | **Required?** | **Validation** | | --------------------- | ---------------------------------------------------------------- | ------------- | -------------- | | `structEventCategory` | The grouping of structured events which this `action` belongs to | Yes | String | | `structEventAction` | Defines the type of user interaction which this event involves | Yes | String | | `structEventLabel` | A string to provide additional dimensions to the event data | No | String | | `structEventProperty` | A string describing the object or the action performed on it | No | String | | `structEventValue` | A value to provide numerical data about the event | No | Int or Float | ### Page pings Enable page ping tracking like so: ```javascript ``` AMP page ping events will be sent as AMP-specific page ping events (rather than Javascript tracker page pings), against the [AMP page ping schema](https://github.com/snowplow/iglu-central/blob/master/schemas/dev.amp.snowplow/amp_page_ping/jsonschema/1-0-0), since the data available on AMP is defined differently to Javascript tracker page pings. All events are sent with an [AMP web page](https://github.com/snowplow/iglu-central/blob/master/schemas/dev.amp.snowplow/amp_web_page/jsonschema/1-0-0) context, for aggregation of pings by page view id. ### Custom events Custom events may be sent via the AMP tracker by passing the schema vendor, and version, and an escaped JSON of the desired data, as follows: ```javascript ``` ## Analytics v2 of the tracker brings with it significant improvements in our ability to model and gain insights. ### Standard fields All events sent via this tracker will have: - `v_tracker` set to `amp-1.1.0` - `platform` set to `web` If you want to analyze events sent via this tracker, you may prefer to query for `v_tracker LIKE 'amp-%'` to future-proof your query against future releases of this tracker (which may change the version number). ### Page view and ping aggregation By default, the [AMP web page context](https://github.com/snowplow/iglu-central/blob/master/schemas/dev.amp.snowplow/amp_web_page/jsonschema/1-0-0) is attached to every event. This will contain the AMP-defined [PAGE\_VIEW\_ID\_64](https://github.com/ampproject/amphtml/blob/master/spec/amp-var-substitutions.md#page-view-id-64), which defined as "intended to be random with a high entropy and likely to be unique per URL, user and day". Users can aggregate page views, page pings and other events on-page by this ID to aggregate engaged time, and model events to a page view level, by combining it with the url, amp client ID, and date. Note that page pings and the page view ID itself are not defined by Snowplow's logic, but by what's made available by AMP - therefore applying the same logic to this data as that produced by the Javascript tracker is liable to produce different results. ### Session information > **Note:** Available from version 1.1.0. By default the [AMP Session](https://github.com/snowplow/iglu-central/blob/master/schemas/dev.amp.snowplow/amp_session/jsonschema/) context is attached to every event. This context allows for tracking information related to session analytics capabilities, as implemented in the [AMP framework](https://github.com/ampproject/issues/3399). The attributes included are the following: - `ampSessionId`: An identifier for the AMP session. - `ampSessionIndex`: The index of the current session for this user. - `sessionEngaged`: If there has been any kind of user engagement in the AMP session. Engagement in this context means if the page is visible, has focus and is in the foreground. - `sessionCreationTimestamp`: Timestamp at which the session was created in milliseconds elapsed since the UNIX epoch. - `lastSessionEventTimestamp`: Timestamp at which the last event took place in the session in milliseconds elapsed since the UNIX epoch. ### User Identification By default, the [AMP ID](https://github.com/snowplow/iglu-central/blob/master/schemas/dev.amp.snowplow/amp_id/jsonschema/1-0-0) context is attached to every event. This contains the [AMP Client ID](https://github.com/ampproject/amphtml/blob/master/spec/amp-var-substitutions.md#client-id), the `user_id` (if set via the `userId` var), and the domain\_userid (if passed to an AMP page via cross-domain linking - more detail below). This provides a map between the main relevant identifiers, which can be used to model user journeys across platforms. Users can choose to instrument further user identification methods using custom contexts. In order for the AMP Client ID to behave as expected, the 'linker' parameter of the configuration JSON must have `'enabled': true` set. Failing to do so can result in changing AMP client IDs, even where a user remains on AMP pages for the whole journey. (This can happen because AMP pages can be served from Google's AMP cache). The tracker is designed to handle user journeys as follows: #### JS-tracker page to AMP page Where a user moves from a standard web page, tracked by the Javascript tracker, to an AMP page, the domain userid from the Javascript tracker can be passed to the AMP tracker by enabling the [Javascript tracker's crossDomainLinker](/docs/sources/web-trackers/tracker-setup/initialization-options/). The AMP tracker will parse the value from the querystring, and attach it to all events, along with the AMP client ID, via the AMP ID context. The AMP tracker uses a combination of cookies and the AMP linker to attempt to retain the value, however due to the nature of AMP pages, there is no guarantee that the value will be retained across sessions. To ensure best possible retention of the value within the session, make sure the tracker config has linker pages enabled for your AMP domains: ```javascript ... "linkers": { "enabled": true, "proxyOnly": false, "destinationDomains": ["ampdomain"] }, ... ``` Data models should be designed for an 'at least once' identification structure - in other words, configuring the cross-domain linker correcttly will ensure at least one event contains both AMP ID and domain User ID. Models should aim to map this across all events. Of course, this method of identification depends on the user traveling directly from a JS-tracked page to an AMP page at least once. #### AMP page to JS-tracker page Where a user moves frrom an AMP page to a standard web page which is tracked by the Javascript tracker, the AMP tracker will use AMP's linker functionality to append the AMP CLient ID to the querystring, as long as linkers are enabled for the destination domain: ```javascript ... "linkers": { "enabled": true, "proxyOnly": false, "destinationDomains": ["destDomain"] }, ... ``` This will add a querystring parameter 'linker=' to the destination url, which contains the amp\_id value, base-64 encoded. This will look something like this: `?sp_amp_linker=1*1c1wx43*amp_id*amp-a1b23cDEfGhIjkl4mnoPqr`. The structure of this param is explained in the [AMP documentation](https://github.com/ampproject/amphtml/blob/master/extensions/amp-analytics/linker-id-receiving.md) - models can extract the base64-encoded AMP Client ID, decode it, and map it to the domain userid (or any other user value) from the Javascript tracker. ## Reporting Issues and Contributing A fork of the AMP project can be found in the [Snowplow Incubator Github](https://github.com/snowplow-incubator/amphtml) Repo. Please submit issues, bugreports and PRs to this repo. --- # Configure the Snowplow Ecommerce Tag > Configure the Snowplow Ecommerce Tag Template in GTM using native Snowplow Ecommerce API or transitional GA4/UA adapter APIs. Set tracking parameters, custom contexts, and product data for ecommerce event tracking. > Source: https://docs.snowplow.io/docs/sources/google-tag-manager/ecommerce-tag-template/configuration/ Use the native Snowplow Ecommerce API or [transitional GA4/UA ecommerce adapter APIs](/docs/sources/web-trackers/tracking-events/ecommerce/#ga4ua-ecommerce-transitional-api) for existing dataLayer implementations using those formats. To get full value from the [Snowplow Ecommerce plugin](/docs/sources/web-trackers/tracking-events/ecommerce/) we recommend using the native API when possible. ![](/assets/images/01_ecommerce_api-5ccc2f11a8b43b04dbc188a27da29ea1.png) ## Tracking Parameters **Snowplow Ecommerce:** ![](/assets/images/02_sp_tracking_parameters-79d633f456f8c14286ec3ed73e2fd3fb.png) #### Snowplow Ecommerce Function In this section you can select the [Snowplow Ecommerce function](/docs/sources/web-trackers/tracking-events/ecommerce/) to use. #### Snowplow Ecommerce Argument In this textbox you can specify the argument to the ecommerce function. This can be a Variable that evaluates to a corresponding object. #### Additional Tracking Parameters ##### Add Custom Context Entities Use this table to attach [custom context entities](/docs/sources/web-trackers/custom-tracking-using-schemas/#track-a-custom-entity) to the Snowplow event. Each row can be set to a Google Tag Manager variable that returns an array of custom contexts to add to the event hit. ##### Set Custom Timestamp Set this to a UNIX timestamp in case you want to [override the default timestamp](/docs/sources/web-trackers/tracking-events/#custom-timestamp) used by Snowplow. **GA4 Ecommerce:** ![](/assets/images/02_ga4_tracking_parameters-62661b55514ba47b36edf5fbc50c3282.png) #### GA4 Ecommerce Function In this section you can select the [Google Analytics 4 Ecommerce function](/docs/sources/web-trackers/tracking-events/ecommerce/) to use. #### GA4 Ecommerce Arguments ##### DataLayer ecommerce Here you can specify the dataLayer ecommerce variable to use, i.e. a variable that returns the `ecommerce` object itself. ##### Options object Here you can specify a variable returning an object holding additional information for the ecommerce event (e.g. including `currency`, `finalCartValue`, `step`, etc). **Universal Analytics Enhanced Ecommerce:** ![](/assets/images/02_ua_tracking_parameters-1c48c247c61794f48eac816ad435b34c.png) #### Universal Analytics Enhanced Ecommerce Function In this section you can select the [Universal Analytics Enhanced Ecommerce function](/docs/sources/web-trackers/tracking-events/ecommerce/) to use. #### Universal Analytics Enhanced Ecommerce Arguments ##### DataLayer ecommerce Here you can specify the dataLayer ecommerce variable to use. ##### Options object Here you can specify a variable returning an object holding additional information for the ecommerce event (e.g.including currency, finalCartValue, step etc). *** ## Snowplow Tracker and Ecommerce Plugin Settings ![](/assets/images/04_tracker_plugin_settings-383e984e9035913a2f284e466f2723fe.png) ### Tracker Settings The Snowplow Ecommerce tag template **requires** a Snowplow Settings Variable to be setup. In this section you can select the Google Tag Manager variable of type [Snowplow Settings](/docs/sources/google-tag-manager/quick-start/) to use. ### Plugin Settings In this section you can select how the plugin will be added. The available options are: - `jsDelivr`: To get the plugin URL from jsDelivr CDN. Choosing this option allows you to specify the plugin version to be used. - `unpkg`: To get the plugin URL from unpkg CDN. Choosing this option allows you to specify the plugin version to be used. - `Self-hosted`: To get the plugin library from a specified URL. This option requires a [Permission](https://developers.google.com/tag-platform/tag-manager/templates/permissions) change to allow injecting the plugin script from the specified URL. - `Do not add`: To not add the plugin (e.g. when using a [Custom Bundle](/docs/sources/web-trackers/plugins/configuring-tracker-plugins/) with the plugin already included). The default plugins bundled with the JavaScript tracker has changed from v3 to v4. Ensure that all plugins that you require are included. A list of the default plugins is available [here](/docs/sources/web-trackers/plugins/). --- # Snowplow GTM Ecommerce template > Install and configure the Snowplow Ecommerce Tag Template in Google Tag Manager to track product views, transactions, cart actions, and checkout events using the Snowplow Ecommerce plugin for v3 and v4 trackers. > Source: https://docs.snowplow.io/docs/sources/google-tag-manager/ecommerce-tag-template/ The Ecommerce Template is a separate Tag Template that can be added to your GTM workspace to track ecommerce events. This was done to keep the main Snowplow Tag Template a bit lighter, and for ease of tag management. This template implements the [Snowplow Ecommerce plugin](/docs/sources/google-tag-manager/ecommerce-tag-template/) for the Snowplow JavaScript tracker and can be used alongside either [v3](/docs/sources/google-tag-manager/previous-versions/) or [v4](/docs/sources/google-tag-manager/) of the Snowplow tag. ## Template Installation ### Tag Manager Template Gallery Search for "Snowplow Ecommerce v3" in the [Tag Manager Template Gallery](https://tagmanager.google.com/gallery/#/owners/snowplow/templates/snowplow-gtm-tag-template-ecommerce-v3) and click `Add to Workspace`. ### Manual Installation 1. Download [template.tpl](https://github.com/snowplow/snowplow-gtm-tag-template-ecommerce-v3) 2. Create a new Tag template in the Templates section of your GTM container 3. Click the More Actions menu and select Import 4. Import the `template.tpl` file downloaded in Step 1 5. Click Save ## Tag Setup With the template installed, you can now add the Snowplow Ecommerce Tag to your GTM Container. 1. From the Tag tab, select `New`, then select the Snowplow Ecommerce Tag as your tag type 2. Select your desired Trigger for the ecommerce events you want to track 3. [Configure the Tag](/docs/sources/google-tag-manager/ecommerce-tag-template/configuration/) 4. Click Save --- # Snowplow Google Tag Manager templates > Deploy the Snowplow JavaScript tracker through Google Tag Manager using custom templates for v4. Configure event tracking, ecommerce, and custom variables with server-side and client-side GTM tags. > Source: https://docs.snowplow.io/docs/sources/google-tag-manager/ Using the Snowplow GTM custom templates you can deploy, implement, and configure the Snowplow [JavaScript tracker](/docs/sources/web-trackers/) directly on the website using Google Tag Manager. The main Tag template that you will need to use when setting up the JavaScript Tracker v4 in GTM is available in the [Tag Manager Template Gallery](https://tagmanager.google.com/gallery/#/owners/snowplow/templates/snowplow-gtm-tag-template-v4). To setup the Snowplow v4 Tag, you will also need the Snowplow v4 Settings Variable template. The templates you will need are: 1. [Snowplow v4](https://tagmanager.google.com/gallery/#/owners/snowplow/templates/snowplow-gtm-tag-template-v4): Load, configure, and deploy the Snowplow JavaScript tracker library. It supports the full functionality of the JavaScript SDK. 2. [Snowplow v4 Settings](https://tagmanager.google.com/gallery/#/owners/snowplow/templates/snowplow-gtm-variable-template-v4): A variable template which can be used to easily apply a set of tracker configuration parameters to tags created with the Snowplow v4 tag template. For Ecommerce tracking, the [Snowplow Ecommerce Tag](https://github.com/snowplow/snowplow-gtm-tag-template-ecommerce-v3) is available on GitHub. --- # Quick start guide for using Snowplow in Google Tag Manager > Add Snowplow v4 custom templates to your Google Tag Manager workspace and configure basic tracking. Import templates from the GTM gallery and set up your collector endpoint for event tracking. > Source: https://docs.snowplow.io/docs/sources/google-tag-manager/quick-start/ This guide will walk you through the initial setup for Snowplow in Google Tag Manager. ## Adding the Templates To get started with Snowplow in Google Tag Manager, you will need to add the Snowplow Tag Template and the Snowplow Settings Variable Template to your GTM workspace. Links to both templates can be found [here](/docs/sources/google-tag-manager/). ### Snowplow Tag Template 1. Navigate to the `Templates` tab in your GTM workspace and click the `Search Gallery` button in the `Tag Templates` section. ![](/assets/images/tag-template-search-0ec39a0b52c7894a89449492ad0308ce.png) 2. Search for "Snowplow v4" ![](/assets/images/search-27677eb5bfee0e80f905e1f996b157ce.png) 3. Click on the template, and then click `Add to Workspace` in the next screen. Review the permissions and click `Add` to finalize the import. ### Snowplow Settings Variable Template The Snowplow Settings Variable template is used to configure the Snowplow tracker, such as the collector endpoint, privacy options, and tracker version. Although possible to use the Snowplow tag without the settings variable, it's highly recommended to use it for ease of configuration, along with keeping the tracker configuration separate from the tag. 1. Again in the `Templates` tab in your GTM workspace, click the `Search Gallery` button in the `Variable Templates` section. ![](/assets/images/variable-template-search-2d20f633b21d310f8664a688d210f0d3.png) 2. Search for "Snowplow v4 Settings" 3. Click on the template, and then click `Add to Workspace` in the next screen. No permissions are required, so click `Add` to finalize the import. ## Configuring the Settings Variable 1. Navigate to the `Variables` tab in your GTM workspace and click `New` in `User-Defined Variables`. ![](/assets/images/variables-new-c32b98d0e9bac37687e92d6234177414.png) 2. Select `Snowplow v4 Settings` from the list of available variables. 3. Under `Tracker Options`, enter your Snowplow collector endpoint set up when [configuring your collector](/docs/pipeline/collector/). ![](/assets/images/variable-9147f9ce6c3ee4fe6ed7770f3eee1fa7.png) > **Info:** You might consider using conditional variables to set the Collector endpoint based on the environment, e.g. sending data to [Snowplow Micro](/docs/testing/snowplow-micro/) during development. 4. Under `JavaScript Tracker`, choose a hosting option. To get started quickly, select either `unpkg` or `jsDelivr` and enter a library version. 5. Give your variable a name and click `Save`. ## Implementing the Snowplow Tag In this section, we will create a simple tag to fire a page view event. 1. Navigate to the `Tags` tab in your GTM workspace and click `New`. ![](/assets/images/new-tag-e29e5a0837ec41a24d07c0bf2c89e5af.png) 2. Click on the `Tag Configuration` section and select `Snowplow v4`. 3. Set the `Tag Type` to `Page View`, if it is not already selected. 4. Under `Tracker Initialisation`, select the Snowplow Settings variable we created earlier. 5. Add a trigger to the tag. This will determine when the tag is fired. For a page view tag, you can use the built-in `All Pages` trigger. 6. Give your tag a name and click `Save`. ## Testing the Tag To test the tag, you can use the GTM preview mode. Click the `Preview` button in the top right of the GTM interface. This will open a new tab with your website and the GTM preview console. Ensure that you see the Page View event in your Snowplow pipeline. If you don't have a full pipeline set up yet, you can use [Snowplow Micro](/docs/testing/snowplow-micro/) or [Snowplow Inspector](/docs/testing/snowplow-inspector/) to check that the event is sent correctly. --- # Snowplow GTM Settings template > Configure the Snowplow Settings Variable template for GTM with tracker options, collector endpoints, privacy settings, cookies, dispatching methods, and predefined context entities for consistent tracking configuration. > Source: https://docs.snowplow.io/docs/sources/google-tag-manager/settings-template/ This page describes the settings available in the Snowplow Settings Variable template for Google Tag Manager. ## Tracker Options ### Tracker Name It is important to set the tracker name. The reason you might have more than one tracker name generated on the site is if you have different configuration objects or tracking endpoints to which you want to send commands. When the tag runs, it first checks if a tag with this name has already been initialized. If it has, it then proceeds to send the command to this tracker name. If a tracker with this name has _not_ been initialized, a new tracker is initialized with the tracker configuration from this settings variable. This means that a tracker configuration is applied **only once** to the tracker. Thus if you have more than one tag running on the site, each with the same tracker name but different tracker configurations, only the configuration of the tag that fires _first_ will be applied to the tracker. ### Collector Endpoint Hostname This needs to be set to the hostname/domain (e.g. `sp.domain.com`) on which you’ve configured your [Snowplow Collector](/docs/pipeline/collector/) ## JavaScript Tracker ### Snowplow JavaScript Tracker Library This determines the source of the Snowplow JavaScript tracker library. You can choose to load the tracker from a CDN or host it on your own server. For production usage, we recommend [self-hosting the tracker](/docs/sources/web-trackers/tracker-setup/hosting-the-javascript-tracker/). ### Self-Hosted Library URL This field is required if you choose to load the Snowplow JavaScript tracker from your own server. Enter the URL of the tracker library here. > **Warning:** The default Tag doesn't have the permission to inject scripts from a custom URL. > > You will need to update the `Injects Scripts` permission to reflect the new location, by editing the `Snowplow Analytics v3/v4 Tag` template. Delete the content of the `Allowed URL Match Patterns` field, and type the full URL to the library there. Again, it must match what you input into the tag itself when creating it. > > ![modifying permissions](/assets/images/modifying_permissions-d485193d56ce3212e8c83daea275800b.png) > > Modifying permissions **breaks the gallery link** and you will no longer be notified about updates to the template. > > ![modifying permissions breaks gallery link](/assets/images/modifying_breaks_gallery_link-7870229aea74cb186cc082ab6b9b2c36.png) > **Note:** Since v1.1.0, an alternative to prevent breaking the gallery update link is to use the `Do not load` option from the corresponding drop down menu: > > ![library host drop down \'Do not load\' option](/assets/images/host_drop_down_no_load-06e54c13559cb4115d3d5c4577540b8f.png) ### Library Version This field is required if you choose to load the Snowplow JavaScript tracker from a CDN. Enter the version of the tracker library you want to load here. ## Application Settings ### Application ID This is the unique identifier for your application. It is used to distinguish different applications in your Snowplow pipeline. ### Platform This is the platform on which your application is running. This is used to distinguish different platforms in your Snowplow pipeline. ## Privacy Settings ### Respect 'Do Not Track' This setting allows you to respect the Do Not Track setting in the user's browser. When enabled, the Snowplow JavaScript tracker will not track users who have enabled the Do Not Track setting in their browser, along with preventing any cookies from being set. ### Anonymous Tracking Read more about anonymous tracking in the [overview page](/docs/events/anonymous-tracking/). #### Server Anonymisation Server-side anonymisation affects user identifiers set server-side. In particular, these are the network\_userid property set in server-side cookie and the user IP address. Setting the flag will add a `SP-Anonymous` HTTP header to requests sent to the Snowplow collector. The Snowplow pipeline will take care of anonymising the identifiers. #### Anonymous Session Tracking This setting disables client-side user identifiers but tracks session information. In practice, this means that events track the Session context entity but the userId property is a null UUID (00000000-0000-0000-0000-000000000000). In case Platform context is enabled, the IDFA identifiers will not be present. #### Cookie Lifetime Extension Service This allows you to set the endpoint for the [Cookie Lifetime Extension Service](/docs/sources/web-trackers/cookies-and-local-storage/cookie-extension/). ## Cookie Settings ### State Storage Strategy This setting allows you to choose the strategy for storing the Snowplow tracker state. The available options are: - `Cookie and Local Storage`: The Snowplow tracker will store the state in both cookies and local storage. - `Cookie`: The Snowplow tracker will store the state in cookies only. - `Local Storage`: The Snowplow tracker will store the state in local storage only. - `None`: The Snowplow tracker will not store the state. ### Cookie Domain This setting allows you to specify the domain for which the Snowplow tracker cookies will be set. This is useful when you want to track users across subdomains. By default, `auto` will be used, which will set the domain to the root domain. See [here](/docs/sources/web-trackers/cookies-and-local-storage/configuring-cookies/#cookie-domain) for more information. ### Cookie Lifetime This setting allows you to specify the lifetime of the Snowplow tracker cookies. By default, the cookies will be active for 2 years. If the lifetime is set to `0`, the cookie will expire at the end of the session (when the browser closes). If set to `-1`, the first-party cookies will be disabled. ### Cookie SameSite This setting allows you to specify the SameSite attribute for the Snowplow tracker cookies. The SameSite attribute is used to prevent CSRF attacks. ### Session Cookie Timeout This setting allows you to specify the timeout for the session cookie. By default, the session cookie will expire after 30 minutes of inactivity. ### Synchronously Write Cookies This setting allows you to specify whether the Snowplow tracker should [write cookies synchronously](/docs/sources/web-trackers/configuring-how-events-sent/#synchronous-cookie-writes). By default, the tracker will write cookies asynchronously. ## Dispatching ### Common #### Dispatch Method This setting allows you to choose the method for sending events to the Snowplow collector. The available options are: - `POST` - `GET` It is recommended to use the default `POST` method for sending events to the collector as it allows for larger payloads, unless you have specific requirements for using the `GET` method. #### Encode Into Base64 This setting allows you to encode the payload into Base64 before sending it to the collector. If you are using the `GET` method for sending events to the collector, it is recommended to enable this setting, as it will help prevent issues with special characters in the payload. Otherwise, it is recommended to leave this setting disabled to reduce the payload size. #### Connection Timeout This setting allows you to specify the timeout for the connection to the Snowplow collector. By default, the timeout is set to 5000 milliseconds. ### `POST` Specific #### Buffer Size This setting allows you to specify the number of events to buffer before sending them to the collector. By default, the buffer size is set to 1. If you set the buffer size to a value greater than 1, the tracker will buffer events and send them in batches to the collector. Although this can help reduce the number of requests made to the collector, it comes at the expense of potential data loss for non-returning visitors. #### POST Path This setting allows you to specify the path to which the events will be sent. By default, the events will be sent to the `/com.snowplowanalytics.snowplow/tp2` path. #### Maximum POST Payload Size This setting allows you to specify the maximum size of the payload that will be sent to the collector. By default, the maximum payload size is set to 40000 bytes. If an event is generated that is over the maximum payload size, the event will bypass the buffer and be sent immediately to the collector. This means that if it fails, it will not be retried. #### Enable keepalive This setting allows you to enable or disable the [keepalive](/docs/sources/web-trackers/configuring-how-events-sent/#keepalive-option-for-collector-requests) feature. This will enable requests to continue to be sent, even if the user navigates away from the page that sent the request. Defaults to `false`. ## Predefined Contexts Predefined contexts provide additional metadata for your Snowplow events. By including these contexts, you can capture common data points like device information, session details, or geolocation without having to define them manually. Available predefined contexts are: | Name | Description | Source Plugin | | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | | `webPage` | Information about the web page where the event occurred. | [Web Page tracking](/docs/sources/web-trackers/tracking-events/page-views/#page-view-id-and-web_page-entity) | | `gaCookies` | Information about the Google Analytics cookies. | [Google Analytics Cookies Plugin](/docs/sources/web-trackers/tracking-events/ga-cookies/) | | `clientHints` | Information about the client's device. | [Client Hints Plugin](/docs/sources/web-trackers/tracking-events/client-hints/) | | `geolocation` | Information about the client's geolocation. | [Geolocation](/docs/events/ootb-data/geolocation/) | | `session` | Information about the user session. | [Session](/docs/events/ootb-data/user-and-session-identification/#session-entity) | | `performanceNavigationTiming` | Retrieves data from the [PerformanceNavigationTiming](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTiming) API. | [Performance Navigation Timing](/docs/sources/web-trackers/tracking-events/timings/) | --- # Snowplow GTM template > Configure Snowplow v4 tag types in Google Tag Manager including ad tracking, button clicks, cart events, site search, timing, enhanced consent, ecommerce, error tracking, page views, link clicks, and form tracking with custom commands. > Source: https://docs.snowplow.io/docs/sources/google-tag-manager/snowplow-template/ This template implements the [Snowplow JavaScript tracker](/docs/sources/web-trackers/) for Google Tag Manager. It allows for the sending of [Snowplow events](/docs/events/) from your website to your Snowplow collector. Tag Types are the kinds of events that can be tracked with the Snowplow v4 Tag Template. Each tag type has its own set of options and parameters that can be configured. You can also configure [plugins](/docs/sources/google-tag-manager/snowplow-template/plugins/) to use with this template. ## Ad Tracking The Ad Tracking tag is used to track impressions and ad clicks. This can used by, for example, ad networks to identify which sites and web pages users visit across a network, so that they can be segmented. **Ad Tracking Parameters** All ad tracking events take the following common parameters: | Name | Required? | Description | Example | | -------------- | --------- | ---------------------------------------------------------------------- | ------- | | `advertiserId` | No | The advertiser ID | 201 | | `campaignId` | No | The campaign ID | 12 | | `cost` | No | The cost of the ad | 5.5 | | `costModel` | No | The cost model for the campaign. Must be one of `cpc`, `cpm`, or `cpa` | cpc | The ad tracking tag includes three event types, each with its own set of additional parameters: ### Impression Event | Name | Required? | Description | Example | | -------------- | --------- | ---------------------------------------------------------------- | ------------------------- | | `impressionId` | No | Identifier for the particular impression instance | 67965967893 | | `targetUrl` | No | The destination URL | | | `bannerId` | No | Adserver identifier for the ad banner (creative) being displayed | 23 | | `zoneId` | No | Adserver identifier for the zone where the ad banner is located | 7 | ### Click Event | Name | Required? | Description | Example | | ----------- | --------- | ---------------------------------------------------------------- | ------------------------- | | `targetUrl` | Yes | The destination URL | | | `clickId` | No | Identifier for the particular click instance | 12243253 | | `bannerId` | No | Adserver identifier for the ad banner (creative) being displayed | 23 | | `zoneId` | No | Adserver identifier for the zone where the ad banner is located | 7 | ### Conversion Event | Name | Required? | Description | Example | | -------------- | --------- | ------------------------------------------------- | --------- | | `conversionId` | No | Identifier for the particular conversion instance | 743560297 | | `category` | No | Conversion category | ecommerce | | `action` | No | The type of user interaction | purchase | | `property` | No | Describes the object of the conversion | shoes | | `initialValue` | No | How much the conversion is initially worth | 99 | ## Button Click Tracking This tag will enable the tracking of clicks on buttons, covering both ` ``` This will result in the following event: ```json { "schema": "iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0", "data": { // Note the label is "My custom label", not "Click me" "label": "My custom label", } } ``` This can also be useful in the case of icon buttons, where there is no text on the button. ## Full example **JavaScript (tag):** Suppose we have the following button on our page: ```html ``` We can configure the plugin to only track this button class: ```javascript window.snowplow('enableButtonClickTracking', { filter: { allowlist: ['nav-btn'], } }); ``` On click, this will result in the following event: ```json { "schema": "iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0", "data": { "label": "Home", "id": "home-btn", "classes": ["nav-btn", "blue-btn", "outlined"], "name": "home" } } ``` **Browser (npm):** Suppose we have the following button on our page: ```html ``` We can configure the plugin to only track this button class: ```javascript import { enableButtonClickTracking } from '@snowplow/browser-plugin-button-click-tracking'; enableButtonClickTracking({ filter: { allowlist: ['nav-btn'], } }); ``` On click, this will result in the following event: ```json { "schema": "iglu:com.snowplowanalytics.snowplow/button_click/jsonschema/1-0-0", "data": { "label": "Home", "id": "home-btn", "classes": ["nav-btn", "blue-btn", "outlined"], "name": "home" } } ``` *** --- # Track campaigns and UTMs on web > Identify traffic sources from paid and organic campaigns using UTM parameters and referrer analysis for marketing attribution. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/campaigns-utms/ Campaign tracking is used to identify the source of traffic coming to a website. At the highest level, we can distinguish **paid** traffic (that derives from ad spend) with **non-paid** traffic: visitors who come to the website by entering the URL directly, clicking on a link from a referrer site or clicking on an organic link returned in a search results, for example. In order to identify **paid** traffic, Snowplow users need to set five query parameters on the links used in ads. Snowplow checks for the presence of these query parameters on the web pages that users load: if it finds them, it knows that that user came from a paid source, and stores the values of those parameters so that it is possible to identify the paid source of traffic exactly. If the query parameters are not present, Snowplow reasons that the user is from a **non-paid** source of traffic. It then checks the page referrer (the url of the web page the user was on before visiting our website), and uses that to deduce the source of traffic: 1. If the URL is identified as a search engine, the traffic medium is set to "organic" and Snowplow tries to derive the search engine name from the referrer URL domain and the keywords from the query string. 2. If the URL is a non-search 3rd party website, the medium is set to "referrer". Snowplow derives the source from the referrer URL domain. Campaign information is **automatically tracked**. ### Identifying paid sources Your different ad campaigns (PPC campaigns, display ads, email marketing messages, Facebook campaigns etc.) will include one or more links to your website e.g.: ```html Visit website ``` We want to be able to identify people who've clicked on ads e.g. in a marketing email as having come to the site having clicked on a link in that particular marketing email. To do that, we modify the link in the marketing email with query parameters, like so: ```html Visit website ``` For the prospective customer clicking on the link, adding the query parameters does not change the user experience. (The user is still directed to the webpage at `http://mysite.com/myproduct.html`.) But Snowplow then has access to the fields given in the query string, and uses them to identify this user as originating from the October Newsletter, an email marketing campaign with campaign id = cn0201. ### Anatomy of the query parameters Snowplow uses the same query parameters used by Google Analytics. Because of this, Snowplow users who are also using GA do not need to do any additional work to make their campaigns trackable in Snowplow as well as GA. Those parameters are: | **Parameter** | **Name** | **Description** | | -------------- | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | | `utm_source` | Campaign source | Identify the advertiser driving traffic to your site e.g. Google, Facebook, autumn-newsletter etc. | | `utm_medium` | Campaign medium | The advertising / marketing medium e.g. cpc, banner, email newsletter, in-app ad, cpa | | `utm_campaign` | Campaign id | A unique campaign id. This can be a descriptive name or a number / string that is then looked up against a campaign table as part of the analysis | | `utm_term` | Campaign term(s) | Used for search marketing in particular, this field is used to identify the search terms that triggered the ad being displayed in the search results. | | `utm_content` | Campaign content | Used either to differentiate similar content or two links in the same ad. (So that it is possible to identify which is generating more traffic.) | The parameters are described in the [Google Analytics help page](https://support.google.com/analytics/answer/1033863). Google also provides a [urlbuilder](https://support.google.com/analytics/answer/1033867?hl=en) which can be used to construct the URL incl. query parameters to use in your campaigns. --- # Track User-Agent Client Hints on web > Capture browser information through Client Hints as an alternative to user-agent strings with basic and high-entropy options. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/client-hints/ User-Agent [Client Hints](https://www.chromium.org/updates/ua-ch) are being rolled out across a number of browsers and are an alternative to the tracking the user-agent, which is particularly useful in those browsers which are freezing the user-agent string. See [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-CH#Browser_compatibility) for browser support. This is useful data to capture as browsers are moving away from high entropy user-agent strings. Client Hints offer useful information to understand browser usage without the potential to infringe on a users privacy as is often the case with the user-agent string. This entity can be configured in two ways: 1. `clientHints: true`: captures the "basic" client hints `isMobile` and `brands`. 2. `clientHints: { includeHighEntropy: true }`: captures the "basic" client hints as well as hints that are deemed "High Entropy" and could be used to fingerprint users. Browsers may choose to prompt the user before making this data available. The high entropy properties are `architecture`, `model`, `platform`, `platformVersion`, and `uaFullVersion`. For the full schema details see the [device and browser tracking](/docs/events/ootb-data/device-and-browser/#client-hints-entity) overview page. The Client Hints entity is **automatically tracked** once configured. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ❌ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-client-hints@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-client-hints@latest/dist/index.umd.min.js) (latest) | **Note:** The links to the CDNs above point to the current latest version. You should pin to a specific version when integrating this plugin on your website if you are using a third party CDN in production. ```javascript // Basic window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-client-hints@latest/dist/index.umd.min.js", ["snowplowClientHints", "ClientHintsPlugin"] ); // High Entropy window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-client-hints@latest/dist/index.umd.min.js", ["snowplowClientHints", "ClientHintsPlugin"], { includeHighEntropy: true } ); ``` **Browser (npm):** - `npm install @snowplow/browser-plugin-client-hints` - `yarn add @snowplow/browser-plugin-client-hints` - `pnpm add @snowplow/browser-plugin-client-hints` ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { ClientHintsPlugin } from '@snowplow/browser-plugin-client-hints'; // Basic newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ ClientHintsPlugin() ], }); // High Entropy newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ ClientHintsPlugin(true) ], }); ``` *** --- # Track consent and GDPR on web > Track user consent preferences and GDPR compliance with enhanced consent events for acceptance, selection, denial, expiration, and withdrawal. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/consent-gdpr/ Track user consent preferences selection events using the Enhanced Consent plugin. Enhanced consent events must be **manually tracked**. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ✅ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-enhanced-consent@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-enhanced-consent@latest/dist/index.umd.min.js) (latest) | ```javascript window.snowplow( 'addPlugin', 'https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-enhanced-consent@latest/dist/index.umd.min.js', ['snowplowEnhancedConsentTracking', 'EnhancedConsentPlugin'] ); ``` **Browser (npm):** - `npm install @snowplow/browser-plugin-enhanced-consent` - `yarn add @snowplow/browser-plugin-enhanced-consent` - `pnpm add @snowplow/browser-plugin-enhanced-consent` ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { EnhancedConsentPlugin } from '@snowplow/browser-plugin-enhanced-consent'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ EnhancedConsentPlugin() ], }); ``` *** > **Note:** The plugin is available since version 3.8 of the tracker. ## Events | API | To track | | ----------------------- | ------------------------------------------------------- | | `trackConsentAllow` | Acceptance of user consent | | `trackConsentSelected` | A specific selection of consented scopes | | `trackConsentPending` | The unconfirmed selection about user consent | | `trackConsentImplicit` | The implicit consent on user consent preferences | | `trackConsentDeny` | A denial of user consent | | `trackConsentExpired` | The expiration of a consent selection | | `trackConsentWithdrawn` | The withdrawal of user consent | | `trackCmpVisible` | The render time of a consent management platform banner | With the exception of the CMP visible event, these methods use the same [`consent_preferences`](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/consent_preferences/jsonschema/1-0-0) event schema. ### Consent allow To track the complete acceptance of a user consent prompt, you can use the `trackConsentAllow` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackConsentAllow:{trackerName}", { consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` **Browser (npm):** ```js import { trackConsentAllow } from "@snowplow/browser-plugin-enhanced-consent"; trackConsentAllow({ consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` *** ### Consent selected To track the specific selection of scopes of a user consent preferences, you can use the `trackConsentSelected` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackConsentSelected:{trackerName}", { consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` **Browser (npm):** ```js import { trackConsentSelected } from "@snowplow/browser-plugin-enhanced-consent"; trackConsentSelected({ consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` *** ### Consent pending Some consent management platform installations, allow the user to take website actions or/and navigating to other pages without accepting the consent prompt. To track the unconfirmed selection of user consent, you can use the `trackConsentPending` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackConsentPending:{trackerName}", { consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` **Browser (npm):** ```js import { trackConsentPending } from "@snowplow/browser-plugin-enhanced-consent"; trackConsentPending({ consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` *** ### Consent implicit Some consent management platforms have a configuration which allows the setting of consent implicitly after a set of user interactions like clicks, scroll etc. To track the implicit selection of a user consent, you can use the `trackConsentImplicit` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackConsentImplicit:{trackerName}", { consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` **Browser (npm):** ```js import { trackConsentImplicit } from "@snowplow/browser-plugin-enhanced-consent"; trackConsentImplicit({ consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` *** ### Consent deny To track the complete denial of a user consent, you can use the `trackConsentDeny` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackConsentDeny:{trackerName}", { consentScopes: ["necessary"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` **Browser (npm):** ```js import { trackConsentDeny } from "@snowplow/browser-plugin-enhanced-consent"; trackConsentDeny({ consentScopes: ["necessary"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` *** ### Consent expired To track the expiration of a user consent, you can use the `trackConsentExpired` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackConsentExpired:{trackerName}", { consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` **Browser (npm):** ```js import { trackConsentExpired } from "@snowplow/browser-plugin-enhanced-consent"; trackConsentExpired({ consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` *** ### Consent withdrawn To track the withdrawal of a user consent, you can use the `trackConsentWithdrawn` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackConsentWithdrawn:{trackerName}", { consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` **Browser (npm):** ```js import { trackConsentWithdrawn } from "@snowplow/browser-plugin-enhanced-consent"; trackConsentWithdrawn({ consentScopes: ["necessary", "marketing", "personalization"], basisForProcessing: "consent", consentUrl: "https://www.example.com/", consentVersion: "1.0", domainsApplied: ["https://www.example.com/"], gdprApplies: true }); ``` *** ### CMP visible Consent management platform banners are an important part of a website’s first impression and performance. Snowplow provides a way to track what we call `elapsedTime`, which is the timestamp of the consent management platform banner becoming visible on the user’s screen. **JavaScript (tag):** ```js window.snowplow("trackCmpVisible:{trackerName}", { /* Using the performance.now API to retrieve the elapsed time from the page navigation. */ elapsedTime: performance.now(), }); ``` **Browser (npm):** ```js import { trackCmpVisible } from "@snowplow/browser-plugin-enhanced-consent"; trackCmpVisible({ /* Using the performance.now API to retrieve the elapsed time from the page navigation. */ elapsedTime: performance.now(), }); ``` *** The CMP visible event uses [this](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/cmp_visible/jsonschema/1-0-0) schema. --- # Legacy enhanced ecommerce plugin for web > Legacy plugin based on Google Analytics Enhanced Ecommerce that has been superseded by the newer Snowplow ecommerce plugin. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/ecommerce/enhanced/ > **Warning:** This plugin has been deprecated and superseded by the [Snowplow ecommerce plugin](/docs/sources/web-trackers/tracking-events/ecommerce/). We highly recommend using this newer plugin, which is more fully featured and allows you to use the [Snowplow Ecommerce](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-ecommerce-data-model/) dbt model. This plugin is based on Google Analytics' Enhanced Ecommerce package. For more information on the Enhanced Ecommerce functions please see the Google Analytics [documentation](https://developers.google.com/analytics/devguides/collection/analyticsjs/enhanced-ecommerce). Enhanced ecommerce events must be **manually tracked**. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ❌ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-enhanced-ecommerce@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-enhanced-ecommerce@latest/dist/index.umd.min.js) (latest) | **Note:** The links to the CDNs above point to the current latest version. You should pin to a specific version when integrating this plugin on your website if you are using a third party CDN in production. ```javascript window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-enhanced-ecommerce@latest/dist/index.umd.min.js", ["snowplowEnhancedEcommerce", "EnhancedEcommercePlugin"] ); ``` **Browser (npm):** - `npm install @snowplow/browser-plugin-enhanced-ecommerce` - `yarn add @snowplow/browser-plugin-enhanced-ecommerce` - `pnpm add @snowplow/browser-plugin-enhanced-ecommerce` ```javascript import { newTracker, trackPageView } from '@snowplow/browser-tracker'; import { EnhancedEcommercePlugin, trackEnhancedEcommerceAction } from '@snowplow/browser-plugin-enhanced-ecommerce'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ EnhancedEcommercePlugin() ], }); ``` *** ## Event The enhanced ecommerce plugin is based around the `EnhancedEcommerceAction` event, to which can be added the Action, Impression, Product and Promo context entities. The context entities must be added first, before the event is tracked. Use the `trackEnhancedEcommerceAction` method to track a GA Enhanced Ecommerce Action. When this function is called all of the added Ecommerce context entities are attached to this action and flushed from the tracker. | **Name** | **Required?** | **Type** | | -------- | ------------- | -------- | | `action` | Yes | string | The allowed actions: - `click` - `detail` - `add` - `remove` - `checkout` - `checkout_option` - `purchase` - `refund` - `promo_click` - `view` Adding an action using Google Analytics: ```javascript ga('ec:setAction', 'refund', { 'id': 'T12345' }); ``` Adding an action using Snowplow: **JavaScript (tag):** ```javascript snowplow('addEnhancedEcommerceActionContext', { id: 'T12345' }); snowplow('trackEnhancedEcommerceAction', { action: 'refund' }); ``` **Browser (npm):** ```javascript addEnhancedEcommerceActionContext({ id: 'T12345' }); trackEnhancedEcommerceAction({ action: 'refund' }); ``` *** ## Context entities The enhanced ecommerce context entities are specific to this plugin, and cannot be added to any other event types. ### Action Use the `addEnhancedEcommerceActionContext` method to add a GA Enhanced Ecommerce Action Context to the Tracker: | **Name** | **Required?** | **Type** | | ------------- | ------------- | ----------------- | | `id` | Yes | string | | `affiliation` | No | string | | `revenue` | No | number OR string | | `tax` | No | number OR string | | `shipping` | No | number OR string | | `coupon` | No | string | | `list` | No | string | | `step` | No | integer OR string | | `option` | No | string | | `currency` | No | string | Adding an action using Google Analytics: ```javascript ga('ec:setAction', 'purchase', { 'id': 'T12345', 'affiliation': 'Google Store - Online', 'revenue': '37.39', 'tax': '2.85', 'shipping': '5.34', 'coupon': 'SUMMER2013' }); ``` > **Note:** The action type is passed with the action context in the Google Analytics example. We have separated this by asking you to call the `trackEnhancedEcommerceAction` function to actually send the context and the action. Adding an action using Snowplow: **JavaScript (tag):** ````javascript snowplow('addEnhancedEcommerceActionContext', { id: 'T12345', affiliation: 'Google Store - Online', revenue: '37.39', // Can also pass as number tax: '2.85', // Can also pass as number shipping: '5.34', // Can also pass as number coupon: 'WINTER2016' }); `` ```javascript addEnhancedEcommerceActionContext({ id: 'T12345', affiliation: 'Google Store - Online', revenue: '37.39', // Can also pass as number tax: '2.85', // Can also pass as number shipping: '5.34', // Can also pass as number coupon: 'WINTER2016' }); ```` *** ### Impression Use the `addEnhancedEcommerceImpressionContext` method to add a GA Enhanced Ecommerce Impression Context to the Tracker: | **Name** | **Required?** | **Type** | | ---------- | ------------- | ----------------- | | `id` | Yes | string | | `name` | No | string | | `list` | No | string | | `brand` | No | string | | `category` | No | string | | `variant` | No | string | | `position` | No | integer OR string | | `price` | No | number OR string | | `currency` | No | string | Adding an impression using Google Analytics: ```javascript ga('ec:addImpression', { 'id': 'P12345', 'name': 'Android Warhol T-Shirt', 'list': 'Search Results', 'brand': 'Google', 'category': 'Apparel/T-Shirts', 'variant': 'Black', 'position': 1 }); ``` Adding an impression using Snowplow: **JavaScript (tag):** ```javascript snowplow('addEnhancedEcommerceImpressionContext', { id: 'P12345', name: 'Android Warhol T-Shirt', list: 'Search Results', brand: 'Google', category: 'Apparel/T-Shirts', variant: 'Black', position: 1 }); ``` **Browser (npm):** ```javascript addEnhancedEcommerceImpressionContext({ id: 'P12345', name: 'Android Warhol T-Shirt', list: 'Search Results', brand: 'Google', category: 'Apparel/T-Shirts', variant: 'Black', position: 1 }); ``` *** ### Product Use the `addEnhancedEcommerceProductContext` method to add a GA Enhanced Ecommerce Product Field Context: | **Name** | **Required?** | **Type** | | ---------- | ------------- | ----------------- | | `id` | Yes | string | | `name` | No | string | | `list` | No | string | | `brand` | No | string | | `category` | No | string | | `variant` | No | string | | `price` | No | number OR string | | `quantity` | No | integer OR string | | `coupon` | No | string | | `position` | No | integer OR string | | `currency` | No | string | Adding a product using Google Analytics: ```javascript ga('ec:addProduct', { 'id': 'P12345', 'name': 'Android Warhol T-Shirt', 'brand': 'Google', 'category': 'Apparel/T-Shirts', 'variant': 'Black', 'position': 1 }); ``` Adding a product using Snowplow: **JavaScript (tag):** ```javascript snowplow('addEnhancedEcommerceProductContext', { id: 'P12345', name: 'Android Warhol T-Shirt', list: 'Search Results', brand: 'Google', category: 'Apparel/T-Shirts', variant: 'Black', quantity: 1 }); ``` **Browser (npm):** ```javascript addEnhancedEcommerceProductContext({ id: 'P12345', name: 'Android Warhol T-Shirt', list: 'Search Results', brand: 'Google', category: 'Apparel/T-Shirts', variant: 'Black', quantity: 1 }); ``` *** ### Promo Use the `addEnhancedEcommercePromoContext` method to add a GA Enhanced Ecommerce Promotion Field Context: | **Name** | **Required?** | **Type** | | ---------- | ------------- | -------- | | `id` | Yes | string | | `name` | No | string | | `creative` | No | string | | `position` | No | string | | `currency` | No | string | Adding a promotion using Google Analytics: ```javascript ga('ec:addPromo', { 'id': 'PROMO_1234', 'name': 'Summer Sale', 'creative': 'summer_banner2', 'position': 'banner_slot1' }); ``` Adding a promotion using Snowplow: **JavaScript (tag):** ```javascript snowplow('addEnhancedEcommercePromoContext', { id: 'PROMO_1234', // The Promotion ID name: 'Summer Sale', // The name creative: 'summer_banner2', // The name of the creative position: 'banner_slot1' // The position }); ``` **Browser (npm):** ```javascript addEnhancedEcommercePromoContext({ id: 'PROMO_1234', // The Promotion ID name: 'Summer Sale', // The name creative: 'summer_banner2', // The name of the creative position: 'banner_slot1' // The position }); ``` *** --- # Track ecommerce events on web > Track comprehensive ecommerce interactions including product views, cart actions, checkout steps, transactions, and refunds with standardized event schemas. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/ecommerce/ This plugin helps you track ecommerce activity. See the full details for all ecommerce schemas in the [ecommerce tracking overview](/docs/events/ootb-data/ecommerce-events/) page. It provides several `trackX` methods, which each create a Snowplow ecommerce action event with the appropriate entities attached. The event itself has only one property, an enum describing the ecommerce action taken e.g. `add_to_cart`. There are also two entities that can be globally configured: ecommerce page and ecommerce user. This plugin is supported by the [Snowplow Ecommerce](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-ecommerce-data-model/) dbt model. > **Note:** The plugin is available since version 3.8 of the tracker. Snowplow ecommerce events and entities must be **manually tracked**. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ✅ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-snowplow-ecommerce@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-snowplow-ecommerce@latest/dist/index.umd.min.js) (latest) | ```javascript window.snowplow( 'addPlugin', 'https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-snowplow-ecommerce@latest/dist/index.umd.min.js', ['snowplowEcommerceAccelerator', 'SnowplowEcommercePlugin'] ); ``` **Browser (npm):** - `npm install @snowplow/browser-plugin-snowplow-ecommerce` - `yarn add @snowplow/browser-plugin-snowplow-ecommerce` - `pnpm add @snowplow/browser-plugin-snowplow-ecommerce` ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { SnowplowEcommercePlugin } from '@snowplow/browser-plugin-snowplow-ecommerce'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ SnowplowEcommercePlugin() ], }); ``` *** ## Events | API | Used for: | | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | `trackProductView` | Tracking a visit to a product page. Known also as product detail view. | | `trackAddToCart` | Track an addition to cart. | | `trackRemoveFromCart` | Track a removal from cart. | | `trackProductListView` | Track an impression of a product list. The list could be a search results page, recommended products, upsells etc. | | `trackProductListClick` | Track the click/selection of a product from a product list. | | `trackPromotionView` | Track an impression for an internal promotion banner or slider or any other type of content that showcases internal products/categories. | | `trackPromotionClick` | Track the click/selection of an internal promotion. | | `trackCheckoutStep` | Track a checkout step completion in the checkout process together with common step attributes for user choices throughout the checkout funnel. | | `trackTransaction` | Track a transaction/purchase completion. | | `trackRefund` | Track a transaction partial or complete refund. | | `trackTransactionError` | Track an error happening during a transaction process. | ### Product view To track a product view, use the `trackProductView` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackProductView:{trackerName}", { id: "12345", name: "Baseball T", brand: "Snowplow", category: "apparel", price: 200, currency: "USD", }); ``` **Browser (npm):** ```js import { trackProductView } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackProductView({ id: "12345", name: "Baseball T", brand: "Snowplow", category: "apparel", price: 200, currency: "USD", }); ``` *** ### Add to cart To track products being added to the cart, use the `trackAddToCart` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackAddToCart:{trackerName}", { products: [ { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, currency: "USD", }, ], total_value: 200, currency: "USD", }); ``` **Browser (npm):** ```js import { trackAddToCart } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackAddToCart({ products: [ { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, currency: "USD", }, ], total_value: 200, currency: "USD", }); ``` *** - Where `products` is an array with the products added to cart. - Where `total_value` is the value of the cart after the addition. - Where `currency` is the currency of the cart. ### Remove from cart To track products being removed from the cart, use the `trackRemoveFromCart` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackRemoveFromCart:{trackerName}", { products: [ { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, currency: "USD", }, ], total_value: 0, currency: "USD", }); ``` **Browser (npm):** ```js import { trackRemoveFromCart } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackRemoveFromCart({ products: [ { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, currency: "USD", }, ], total_value: 0, currency: "USD", }); ``` *** - Where `products` is an array with the products removed from the cart. - Where `total_value` is the value of the cart after the removal of the products. - Where `currency` is the currency of the cart. ### Product list view To track a product list view, use the `trackProductListView` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackProductListView:{trackerName}", { products: [ { id: "P123", name: "Fashion red", brand: "Snowplow", category: "Mens/Apparel", price: 100, inventory_status: "in stock", currency: "USD", position: 1, }, { id: "P124", name: "Fashion green", brand: "Snowplow", category: "Mens/Apparel", price: 119, inventory_status: "in stock", currency: "USD", position: 2, }, { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, inventory_status: "in stock", currency: "USD", position: 3, }, ], name: "Recommended Products", }); ``` **Browser (npm):** ```js import { trackProductListView } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackProductListView({ products: [ { id: "P123", name: "Fashion red", brand: "Snowplow", category: "Mens/Apparel", price: 100, inventory_status: "in stock", currency: "USD", position: 1, }, { id: "P124", name: "Fashion green", brand: "Snowplow", category: "Mens/Apparel", price: 119, inventory_status: "in stock", currency: "USD", position: 2, }, { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, inventory_status: "in stock", currency: "USD", position: 3, }, ], name: "Recommended Products", }); ``` *** - Where `products` is an array of products being viewed from the list. - Where `name` is the name of the list being viewed. For the list names, you can use any kind of friendly name or a codified language to express the labeling of the list. E.g. `Shoes - Men - Sneakers`, `Search results: "unisex shoes"`, or `Product page upsells`. ### Product list click **JavaScript (tag):** ```js window.snowplow("trackProductListClick:{trackerName}", { product: { id: "P124", name: "Fashion green", brand: "Snowplow", category: "Mens/Apparel", price: 119, inventory_status: "in stock", currency: "USD", position: 2, }, name: "Recommended Products", }); ``` **Browser (npm):** ```js import { trackProductListClick } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackProductListClick({ product: { id: "P124", name: "Fashion green", brand: "Snowplow", category: "Mens/Apparel", price: 119, inventory_status: "in stock", currency: "USD", position: 2, }, name: "Recommended Products", }); ``` *** - Where `product` is the product being clicked or selected from the list. - Where `name` is the name of the list the product is in. For the list names, you can use any kind of friendly name or a codified language to express the labeling of the list. E.g. `Shoes - Men - Sneakers`, `Search results: "unisex shoes"`, or `Product page upsells`. ### Promotion view **JavaScript (tag):** ```js /* Carousel slide 1 viewed */ window.snowplow("trackPromotionView:{trackerName}", { id: 'IP1234', name: 'promo_winter', type: 'carousel', position: 1, product_ids: ['P1234'], }); /* On carousel slide 2 view */ window.snowplow("trackPromotionView:{trackerName}", { id: 'IP1234', name: 'promo_winter', type: 'carousel', position: 2, product_ids: ['P1235'], }); ``` **Browser (npm):** ```js import { trackPromotionView } from '@snowplow/browser-plugin-snowplow-ecommerce'; /* Carousel slide 1 viewed */ trackPromotionView({ id: 'IP1234', name: 'promo_winter', type: 'carousel', position: 1, product_ids: ['P1234'], }); /* On carousel slide 2 view */ trackPromotionView({ id: 'IP1234', name: 'promo_winter', type: 'carousel', position: 2, product_ids: ['P1235'], }); ``` *** ### Promotion click **JavaScript (tag):** ```js window.snowplow("trackPromotionClick:{trackerName}", { id: 'IP1234', name: 'promo_winter', type: 'carousel', position: 1, product_ids: ['P1234'], }); ``` **Browser (npm):** ```js import { trackPromotionClick } from "@snowplow/browser-plugin-snowplow-ecommerce"; trackPromotionClick({ id: 'IP1234', name: 'promo_winter', type: 'carousel', position: 1, product_ids: ['P1234'], }); ``` *** ### Checkout step To track a checkout step, use the `trackCheckoutStep` method with the following attributes: **JavaScript (tag):** ```js /* Step 1 - Account type selection */ window.snowplow("trackCheckoutStep:{trackerName}", { step: 1, account_type: "guest checkout", }); /* Step 2 - Billing options selection */ window.snowplow("trackCheckoutStep:{trackerName}", { step: 2, payment_method: "credit card", proof_of_payment: "invoice", }); ``` **Browser (npm):** ```js import { trackCheckoutStep } from '@snowplow/browser-plugin-snowplow-ecommerce'; /* Step 1 - Account type selection */ trackCheckoutStep({ step: 1, account_type: "guest checkout", }); /* Step 2 - Billing options selection */ trackCheckoutStep({ step: 2, payment_method: "credit card", proof_of_payment: "invoice", }); ``` *** ### Transaction To track a completed transaction, use the `trackTransaction` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackTransaction:{trackerName}", { transaction_id: "T12345", revenue: 230, currency: "USD", payment_method: "credit_card", total_quantity: 1, tax: 20, shipping: 10, products: [ { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, inventory_status: "in stock", currency: "USD", quantity: 1, }, ], }); ``` **Browser (npm):** ```js import { trackTransaction } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackTransaction({ transaction_id: "T12345", revenue: 230, currency: "USD", payment_method: "credit_card", total_quantity: 1, tax: 20, shipping: 10, products: [ { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, inventory_status: "in stock", currency: "USD", quantity: 1, }, ], }); ``` *** - Where `products` is an array with the products taking part in the transaction. ### Refund > **Note:** Available from version 3.10. To track a complete or partial refund you can use the `trackRefund` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackRefund:{trackerName}", { transaction_id: "T12345", currency: "USD", refund_amount: 200, refund_reason: "return", products: [ { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, inventory_status: "in stock", currency: "USD", quantity: 1, }, ], }); ``` **Browser (npm):** ```js import { trackRefund } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackRefund({ transaction_id: "T12345", currency: "USD", refund_amount: 200, refund_reason: "return", products: [ { id: "P125", name: "Baseball T", brand: "Snowplow", category: "Mens/Apparel", price: 200, inventory_status: "in stock", currency: "USD", quantity: 1, }, ], }); ``` *** - Where `products` is an array with the products taking part in the refund. ### Transaction error > **Note:** Available from version 3.13. To track an error happening during a transaction process, use the `trackTransactionError` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackTransactionError:{trackerName}", { resolution: "rejection", error_code: "E123", error_shortcode: "CARD_DECLINE", error_description: "Card has been declined by the issuing bank.", error_type: "hard", transaction: { revenue: 45, currency: "EUR", transaction_id: "T12345", payment_method: "card", total_quantity: 1 } }); ``` **Browser (npm):** ```js import { trackTransactionError } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackTransactionError({ resolution: "rejection", error_code: "E123", error_shortcode: "CARD_DECLINE", error_description: "Card has been declined by the issuing bank.", error_type: "hard", transaction: { revenue: 45, currency: "EUR", transaction_id: "T12345", payment_method: "card", total_quantity: 1 } }); ``` *** - Where `transaction` is the transaction entity being processed. ## Page and user entities Once set, the [ecommerce page or user entity](/docs/events/ootb-data/ecommerce-events/#global-ecommerce-entities) will be attached to **all** subsequent Snowplow events. There's no way to unset these entities. **JavaScript (tag):** ```js window.snowplow("setPageType:{trackerName}", { type, language, locale }); window.snowplow("setEcommerceUser:{trackerName}", { id, is_guest, email }); ``` **Browser (npm):** ```js import { setPageType } from '@snowplow/browser-plugin-snowplow-ecommerce'; setPageType({ type, language, locale }); setEcommerceUser({ id, is_guest, email }); ``` *** ## GA4/UA Ecommerce transitional API > **Note:** Available from version 3.10. If you already use Google Analytics 4 ecommerce or Universal Analytics Enhanced Ecommerce to collect information about the shopping behavior of your users, we've prepared a way to quickly implement Snowplow Ecommerce without making many changes on your current setup. This transitional API depends on the standardized [dataLayer](https://developers.google.com/tag-platform/tag-manager/web/datalayer) structure for both Google Analytics ecommerce implementations. This would make it easier for the transition to happen either through Google Tag Manager, which has more control over the dataLayer, or custom code that uses the standard ecommerce structures. ### Universal Analytics Enhanced Ecommerce The standard Universal Analytics Enhanced Ecommerce implementation is based on the official [guide reference](https://developers.google.com/analytics/devguides/collection/ua/gtm/enhanced-ecommerce). **Important:** The `dataLayer.currencyCode` attribute must be available for all product interactions. Otherwise, almost all methods accept an `Options` object which can include the currency code as follows: ```ts method({{dataLayer.ecommerce reference}} , { currency: "currency code" }); ``` #### trackEnhancedEcommerceProductListView To track an Enhanced Ecommerce product list view, you can use the `trackEnhancedEcommerceProductListView` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommerceProductListView:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackEnhancedEcommerceProductListView } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommerceProductListView({{dataLayer.ecommerce reference}}); ``` *** #### trackEnhancedEcommerceProductListClick To track an Enhanced Ecommerce product list click, you can use the `trackEnhancedEcommerceProductListClick` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommerceProductListClick:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackEnhancedEcommerceProductListClick } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommerceProductListClick({{dataLayer.ecommerce reference}}); ``` *** #### trackEnhancedEcommerceProductDetail To track an Enhanced Ecommerce product detail view, you can use the `trackEnhancedEcommerceProductDetail` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommerceProductDetail:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackEnhancedEcommerceProductDetail } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommerceProductDetail({{dataLayer.ecommerce reference}}); ``` *** #### trackEnhancedEcommercePromoView To track an Enhanced Ecommerce internal promotion view, you can use the `trackEnhancedEcommercePromoView` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommercePromoView:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackEnhancedEcommercePromoView } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommercePromoView({{dataLayer.ecommerce reference}}); ``` *** #### trackEnhancedEcommercePromoClick To track an Enhanced Ecommerce internal promotion click, you can use the `trackEnhancedEcommercePromoClick` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommercePromoClick:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackEnhancedEcommercePromoClick } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommercePromoClick({{dataLayer.ecommerce reference}}); ``` *** #### trackEnhancedEcommerceAddToCart To track an Enhanced Ecommerce add to cart event, you can use the `trackEnhancedEcommerceAddToCart` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommerceAddToCart:{trackerName}", {{dataLayer.ecommerce reference}}, { finalCartValue: 20, }); ``` **Browser (npm):** ```js import { trackEnhancedEcommerceAddToCart } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommerceAddToCart( {{dataLayer.ecommerce reference}}, { finalCartValue: 20, }); ``` *** - Where `finalCartValue` is the value of the cart after the addition. #### trackEnhancedEcommerceRemoveFromCart To track an Enhanced Ecommerce remove from cart event, you can use the `trackEnhancedEcommerceRemoveFromCart` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommerceRemoveFromCart:{trackerName}", {{dataLayer.ecommerce reference}}, { finalCartValue: 20, }); ``` **Browser (npm):** ```js import { trackEnhancedEcommerceRemoveFromCart } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommerceRemoveFromCart( {{dataLayer.ecommerce reference}}, { finalCartValue: 20, }); ``` *** - Where `finalCartValue` is the value of the cart after the removal. #### trackEnhancedEcommerceCheckoutStep To track an Enhanced Ecommerce remove from cart event, you can use the `trackEnhancedEcommerceCheckoutStep` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommerceCheckoutStep:{trackerName}", {{dataLayer.ecommerce reference}}, { checkoutOption: { delivery_method: "express_delivery" }, }); ``` **Browser (npm):** ```js import { trackEnhancedEcommerceCheckoutStep } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommerceCheckoutStep( {{dataLayer.ecommerce reference}}, { checkoutOption: { delivery_method: "express_delivery" }, }); ``` *** - Where `checkoutOption` is a key value pair object of available [Snowplow checkout options](https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.ecommerce/checkout_step), except `step` which is retrieved from the dataLayer directly. #### trackEnhancedEcommercePurchase To track an Enhanced Ecommerce remove from cart event, you can use the `trackEnhancedEcommercePurchase` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackEnhancedEcommercePurchase:{trackerName}", {{dataLayer.ecommerce reference}}, { paymentMethod: "bank_transfer", }); ``` **Browser (npm):** ```js import { trackEnhancedEcommercePurchase } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackEnhancedEcommercePurchase( {{dataLayer.ecommerce reference}}, { paymentMethod: "bank_transfer", }); ``` *** - Where `paymentMethod` is the payment method selected in this transaction. This attributes corresponds to the `payment_method` of the [transaction schema](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.ecommerce/transaction/jsonschema/1-0-0#L30). Defaults to `unknown`. ### Google Analytics 4 Ecommerce The Google Analytics 4 ecommerce implementation is based on the official [guide reference](https://developers.google.com/analytics/devguides/collection/ga4/ecommerce?client_type=gtm). **Important:** The `dataLayer.ecommerce.currency` attribute must be available for all product interactions. Otherwise, almost all methods accept an `Options` object which can include the currency code as follows: ```ts method( {{dataLayer.ecommerce reference}} , { currency: "currency code" }); ``` #### trackGA4ViewItemList To track an GA4 Ecommerce item list view, you can use the `trackGA4ViewItemList` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4ViewItemList:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackGA4ViewItemList } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4ViewItemList({{dataLayer.ecommerce reference}}); ``` *** #### trackGA4SelectItem To track an GA4 Ecommerce item selection from a list, you can use the `trackGA4SelectItem` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4SelectItem:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackGA4SelectItem } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4SelectItem({{dataLayer.ecommerce reference}}); ``` *** #### trackGA4ViewItem To track an GA4 Ecommerce item view, you can use the `trackGA4ViewItem` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4ViewItem:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackGA4ViewItem } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4ViewItem({{dataLayer.ecommerce reference}}); ``` *** #### trackGA4ViewPromotion To track an GA4 Ecommerce internal promotion view, you can use the `trackGA4ViewPromotion` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4ViewPromotion:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackGA4ViewPromotion } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4ViewPromotion({{dataLayer.ecommerce reference}}); ``` *** #### trackGA4SelectPromotion To track an GA4 Ecommerce internal promotion selection, you can use the `trackGA4SelectPromotion` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4SelectPromotion:{trackerName}", {{dataLayer.ecommerce reference}}); ``` **Browser (npm):** ```js import { trackGA4SelectPromotion } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4SelectPromotion( {{dataLayer.ecommerce reference}}); ``` *** #### trackGA4AddToCart To track an GA4 Ecommerce add to cart event, you can use the `trackGA4AddToCart` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4AddToCart:{trackerName}", {{dataLayer.ecommerce reference}}, { finalCartValue: 20, }); ``` **Browser (npm):** ```js import { trackGA4AddToCart } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4AddToCart( {{dataLayer.ecommerce reference}}, { finalCartValue: 20, }); ``` *** - Where `finalCartValue` is the value of the cart after the addition. #### trackGA4RemoveFromCart To track an GA4 Ecommerce remove from cart event, you can use the `trackGA4RemoveFromCart` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4RemoveFromCart:{trackerName}", {{dataLayer.ecommerce reference}}, { finalCartValue: 20, }); ``` **Browser (npm):** ```js import { trackGA4RemoveFromCart } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4RemoveFromCart( {{dataLayer.ecommerce reference}}, { finalCartValue: 20, }); ``` *** - Where `finalCartValue` is the value of the cart after the removal. #### trackGA4BeginCheckout To track an GA4 Ecommerce checkout beginning, you can use the `trackGA4BeginCheckout` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4BeginCheckout:{trackerName}", { step: 1, }); ``` **Browser (npm):** ```js import { trackGA4BeginCheckout } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4BeginCheckout({ step: 1, }); ``` *** - Where `step` is a number representing the step of the checkout funnel. Defaults to 1, mimicking the `begin_checkout` GA4 event. #### trackGA4AddShippingInfo To track an GA4 Ecommerce checkout shipping info step completion, you can use the `trackGA4AddShippingInfo` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4AddShippingInfo:{trackerName}", { step: 1, }); ``` **Browser (npm):** ```js import { trackGA4AddShippingInfo } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4AddShippingInfo({ step: 1, }); ``` *** - Where `step` is a number representing the step of the checkout funnel. #### trackGA4AddPaymentOptions To track an GA4 Ecommerce checkout payment option step completion, you can use the `trackGA4AddPaymentOptions` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4AddPaymentOptions:{trackerName}", { step: 1, }); ``` **Browser (npm):** ```js import { trackGA4AddPaymentOptions } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4AddPaymentOptions({ step: 1, }); ``` *** - Where `step` is a number representing the step of the checkout funnel. #### trackGA4Transaction To track an GA4 Ecommerce checkout payment option step completion, you can use the `trackGA4Transaction` method with the following attributes: **JavaScript (tag):** ```js window.snowplow("trackGA4Transaction:{trackerName}", { paymentMethod: "bank_transfer", }); ``` **Browser (npm):** ```js import { trackGA4Transaction } from '@snowplow/browser-plugin-snowplow-ecommerce'; trackGA4Transaction({ paymentMethod: "bank_transfer", }); ``` *** - Where `paymentMethod` is the payment method selected in this transaction. This attributes corresponds to the `payment_method` of the [transaction schema](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow.ecommerce/transaction/jsonschema/1-0-0#L30). Defaults to `unknown`. --- # Track page element visibility and lifecycle on web > Declaratively track page element visibility and lifecycle events as they are created, destroyed, scrolled into view, or scrolled out of view with configurable rules. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/element-tracking/ Element visibility tracking enables declarative tracking of page elements existing on web pages and scrolling into view. This is useful for impression tracking, including: - Funnel steps e.g. form on page > form in view > [form tracking events](/docs/sources/web-trackers/tracking-events/form-tracking/) - List impression tracking e.g. product impressions - Component performance e.g. recommendations performance, newsletter sign-up forms, modal popups - Product usage e.g. elements that appear on-hover, labeling or grouping events related to specific features - Advertisement impression tracking Once you call `startElementTracking`, the plugin watches the DOM and automatically fires events whenever: - Elements appear on the page: tracks `create_element` - Elements scroll into view: tracks `expose_element` - Elements scroll out of view: tracks `obscure_element` - Elements are removed from the page: tracks `destroy_element` You can define rules for which elements to track, and can also trigger events when elements change to match or no longer match a rule. An entity containing details about the element is attached to each event, and you can also configure other entities. Element lifecycle events are **automatically tracked** once configured. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ❌ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | | Download from GitHub Releases (Recommended) | [GitHub Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-element-tracking@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-element-tracking@latest/dist/index.umd.min.js) (latest) | > **Note:** The links to the CDNs point to the current latest version. You should pin to a specific version when integrating this plugin on your website if you are using a third-party CDN in production. **Browser (npm):** - `npm install @snowplow/browser-plugin-element-tracking` - `yarn add @snowplow/browser-plugin-element-tracking` - `pnpm add @snowplow/browser-plugin-element-tracking` *** ## Start element tracking Begin tracking elements by providing configuration to the plugin's `startElementTracking` method: **JavaScript (tag):** ```javascript window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-element-tracking@latest/dist/index.umd.min.js", ["snowplowElementTracking", "SnowplowElementTrackingPlugin"] ); snowplow('startElementTracking', { elements: [/* configuration */] }); ``` **Browser (npm):** First, add the plugin when initializing the tracker. ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { SnowplowElementTrackingPlugin, startElementTracking } from '@snowplow/browser-plugin-element-tracking'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ SnowplowElementTrackingPlugin() ], }); startElementTracking({ elements: [/* configuration */] }); ``` *** The `elements` configuration can take a single rule, or an array of rules. You can call `startElementTracking` multiple times to add more rules as needed. ## Events and entities The plugin can generate four events: - `create_element`: when a matching element is added to the page - `expose_element`: when a matching element scrolls into view - `obscure_element`: when a matching element scrolls out of view - `destroy_element`: when a matching element is removed from the page Each of these events has only one property, `element_name`. Check out the [page element tracking overview](/docs/events/ootb-data/page-elements/#page-element-visibility-and-lifecycle) page to see the schema details. Every element event includes an `element` entity with details about the element that triggered the event. The attributes tracked depend on your `detail` configuration. By default, only the `expose_element` event is tracked. Configure which event types to track using booleans, or provide objects for more fine-grained control. Check out the configuration options on this page for details. **JavaScript (tag):** ```javascript // This minimal example tracks expose events for all `.product-card` elements snowplow('startElementTracking', { elements: { selector: '.product-card' } }); // It's equivalent to this more explicit configuration snowplow('startElementTracking', { elements: { selector: '.product-card', create: false, // won't fire when element added to DOM expose: true, // WILL fire when element scrolls into view obscure: false, // won't fire when element scrolls out of view destroy: false // won't fire when element removed from DOM } }); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; // This minimal example tracks expose events for all `.product-card` elements startElementTracking({ elements: { selector: '.product-card' } }); // It's equivalent to this more explicit configuration startElementTracking({ elements: { selector: '.product-card', create: false, // won't fire when element added to DOM expose: true, // WILL fire when element scrolls into view obscure: false, // won't fire when element scrolls out of view destroy: false // won't fire when element removed from DOM } }); ``` *** ### Example event This example shows how to track an `expose_element` event as users scroll through a web page. All event types are configured similarly. The example uses the `details` data selector option to specify what data to capture. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: { selector: "section", // Matches all
elements expose: { when: "element" }, // Fires when element becomes visible, once per element details: { child_text: { title: "h2" } } // Captures the main section header } }); ``` **Browser (npm):** ```javascript import { SnowplowElementTrackingPlugin, startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: { selector: "section", // Matches all
elements expose: { when: "element" }, // Fires when element becomes visible, once per element details: { child_text: { title: "h2" } } // Captures the main section header } }); ``` *** In this example, the page has several sections. As a user scrolls down the page and each section becomes visible, an `expose_element` event is generated for each one. All events will have `"element_name": "section"`. Example `element` entity for the first section's `expose_element` event. The section title is "Why Data Teams Choose Snowplow": ```json { "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0", "data": { "element_name": "section", "width": 1920, "height": 1111.7333984375, "position_x": 0, "position_y": 716.4500122070312, "doc_position_x": 0, "doc_position_y": 716.4500122070312, "element_index": 2, "element_matches": 10, "originating_page_view": "06dbb0a2-9acf-4ae4-9562-1469b6d12c5d", "attributes": [ { "source": "child_text", "attribute": "title", "value": "Why Data Teams Choose Snowplow" } ] } } ``` Example `element` entity for the second section's `expose_element` event. The section title is "How Does Snowplow Work?": ```json { "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0", "data": { "element_name": "section", "width": 1920, "height": 2880, "position_x": 0, "position_y": 896.683349609375, "doc_position_x": 0, "doc_position_y": 1828.183349609375, "element_index": 3, "element_matches": 10, "originating_page_view": "06dbb0a2-9acf-4ae4-9562-1469b6d12c5d", "attributes": [ { "source": "child_text", "attribute": "title", "value": "How Does Snowplow Work?" } ] } } ``` ## Stop element tracking To turn off tracking, use `endElementTracking`. You can remove all configured rules, or selectively remove specific rules. **JavaScript (tag):** ```javascript // Remove all configured rules and listeners snowplow('endElementTracking'); // Removes based on `name` matching // Multiple rules may share a name snowplow('endElementTracking', { elements: ['name1', 'name2'] }); // Removes rules based on `id` matching // At most one rule can have the same `id` snowplow('endElementTracking', { elementIds: ['id1'] }); // More complicated matching // Rules where the `filter` function returns true will be removed snowplow('endElementTracking', { filter: (rule) => /recommendations/i.test(rule.name) }); // Passing an empty object removes no rules snowplow('endElementTracking', {}); ``` **Browser (npm):** ```javascript import { SnowplowElementTrackingPlugin, endElementTracking } from '@snowplow/browser-plugin-element-tracking'; // Remove all configured rules and listeners endElementTracking(); // Removes based on `name` matching // Multiple rules may share a name endElementTracking({ elements: ['name1', 'name2'] }); // Removes rules based on `id` matching // At most one rule can have the same `id` endElementTracking({ elementIds: ['id1'] }); // More complicated matching // Rules where the `filter` function returns true will be removed endElementTracking({ filter: (rule) => /recommendations/i.test(rule.name) }); // Passing an empty object removes no rules endElementTracking({}); ``` *** If you specify more than one of the `elementIds`, `elements`, and `filter` options, they get evaluated in that order. ## Configure entities You can configure additional element tracking or custom entities by modifying the `startElementTracking` call. Additional entities can be attached depending on configuration: - `element_statistics`: visibility and scroll depth statistics for the element - `element_content`: information about nested elements within the matched element - `component_parents`: the component hierarchy that the element belongs to - Custom entities Check out the [page element tracking overview](/docs/events/ootb-data/page-elements/#page-element-visibility-and-lifecycle) page to see the schema details. The configuration is per-rule, so different rules can have different settings. ### Element statistics Use the `includeStats` option to attach the `element_statistics` entity to specified events, including those not generated by this plugin. This example will add the `element_statistics` entity to `expose_element` and `page_ping` events: **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: { selector: 'main.article', name: 'article_content', includeStats: ['expose_element', 'page_ping'] } }); ``` **Browser (npm):** ```javascript import { SnowplowElementTrackingPlugin, startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: { selector: 'main.article', name: 'article_content', includeStats: ['expose_element', 'page_ping'] } }); ``` *** Adding element statistics to page pings can be useful to understand how a user moves through the content. It'll show scroll depth increasing over time, backtracking behavior, and total engagement duration. For [baked-in events](/docs/fundamentals/events/#baked-in-events), use the following names: - Page view: `page_view` - Page ping: `page_ping` - Structured: `event` Be cautious with the `selector`. If it matches a lot of elements, this can enlarge event payload sizes. ### Element content Add the `element_content` entity by setting `contents`. It captures data about specified nested elements within the matched parent element. In this example, the plugin will track an `expose_element` event when a `.product-grid` element scrolls into view. This event will have an `element` entity for the grid itself, and multiple `element_content` entities for each `.product-card` within the grid. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: { selector: '.product-grid', name: 'product_list', expose: { when: 'element' }, contents: [ { selector: '.product-card', name: 'product_item', details: [ { dataset: ['productId', 'price'] }, { child_text: { name: 'h3', brand: '.brand-name' } } ] } ] } }); ``` **Browser (npm):** ```javascript import { SnowplowElementTrackingPlugin, startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: { selector: '.product-grid', name: 'product_list', expose: { when: 'element' }, contents: [ { selector: '.product-card', name: 'product_item', details: [ { dataset: ['productId', 'price'] }, { child_text: { name: 'h3', brand: '.brand-name' } } ] } ] } }); ``` *** The `details` configuration sets which element `attributes` to capture. **Example entities for this configuration** The `expose_element` event will have `"element_name": "product_list"`. One `element` entity: ```json { "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0", "data": { "element_name": "product_list", "element_index": 1, "element_matches": 1, "width": 1200, "height": 400, "attributes": [] } } ``` Multiple `element_content` entities: ```json { "schema": "iglu:com.snowplowanalytics.snowplow/element_content/jsonschema/1-0-0", "data": { "element_name": "product_item", "parent_name": "product_list", // element_name of parent element "parent_position": 1, // element_index of parent element "position": 1, "attributes": [ { "source": "dataset", "attribute": "productId", "value": "SKU-001" }, { "source": "dataset", "attribute": "price", "value": "29.99" }, { "source": "child_text", "attribute": "name", "value": "Wireless Mouse" }, { "source": "child_text", "attribute": "brand", "value": "Logitech" } ] } } ``` ```json { "schema": "iglu:com.snowplowanalytics.snowplow/element_content/jsonschema/1-0-0", "data": { "element_name": "product_item", "parent_name": "product_list", // element_name of parent element "parent_position": 1, // element_index of parent element "position": 2, "attributes": [ { "source": "dataset", "attribute": "productId", "value": "SKU-002" }, { "source": "dataset", "attribute": "price", "value": "49.99" }, { "source": "child_text", "attribute": "name", "value": "Mechanical Keyboard" }, { "source": "child_text", "attribute": "brand", "value": "Keychron" } ] } } ``` ### Component parents You can mark elements as components to track hierarchy, using `component` rules. Events for child elements include a `component_parents` entity listing their ancestor components. This is useful when you have the same appearing in multiple places on your site. Without component tracking, all those events look identical. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: [ // Define components (containers) { selector: 'header', // Mark the header as a component name: 'site_header', component: true, expose: false // Don't track expose events for the component itself }, { selector: 'footer', // Mark the footer as a component name: 'site_footer', component: true, expose: false // Don't track expose events for the component itself }, // Track elements - events will include component_parents { selector: '.newsletter-form', name: 'newsletter_signup', create: true, // Fire create_element events expose: { when: 'element' } // Fire expose_element events } ] }); ``` **Browser (npm):** ```javascript import { SnowplowElementTrackingPlugin, startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: [ // Define components (containers) { selector: 'header', // Mark the header as a component name: 'site_header', component: true, expose: false // Don't track expose events for the component itself }, { selector: 'footer', // Mark the footer as a component name: 'site_footer', component: true, expose: false // Don't track expose events for the component itself }, // Track elements - events will include component_parents { selector: '.newsletter-form', name: 'newsletter_signup', create: true, // Fire create_element events expose: { when: 'element' } // Fire expose_element events } ] }); ``` *** For this example, imagine a page has two `.newsletter-form` elements: one in the page sidebar, and one in the footer. The `component_parents` entity for the sidebar form, which isn't within either of the defined component containers, could look like this: ```json { "schema": "iglu:com.snowplowanalytics.snowplow/component_parents/jsonschema/1-0-0", "data": { "element_name": "newsletter_signup", "component_list": [] } } ``` The `component_parents` entity for the footer form, which is within the `footer` component, could look like this: ```json { "schema": "iglu:com.snowplowanalytics.snowplow/component_parents/jsonschema/1-0-0", "data": { "element_name": "newsletter_signup", "component_list": ["site_footer"] } } ``` #### Generate entities for other events The plugin also exposes a `getComponentListGenerator` utility function for attaching component hierarchy information to custom events, or to events generated by other plugins like the [form](/docs/sources/web-trackers/tracking-events/form-tracking/) or [link](/docs/sources/web-trackers/tracking-events/link-click/) tracking plugins. This function returns two entity generator functions that determine component hierarchy for a given element: - `componentGenerator`: returns a single `component_parents` entity - `componentGeneratorWithDetail`: returns a `component_parents` entity plus an `element` entity **JavaScript (tag):** The JavaScript tracker uses a callback pattern to access the generators asynchronously: ```javascript // This snippet assumes you've already defined component rules in startElementTracking snowplow('getComponentListGenerator', function (componentGenerator, componentGeneratorWithDetail) { // attach the component_parents entity to events from these plugins snowplow('enableLinkClickTracking', { context: [componentGenerator] }); snowplow('enableFormTracking', { context: [componentGenerator] }); }); ``` **Browser (npm):** The Browser tracker returns the generators directly as an array: ```javascript // This snippet assumes you've already defined component rules in startElementTracking import { getComponentListGenerator } from '@snowplow/browser-plugin-element-tracking'; import { enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; import { enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; const [componentGenerator, componentGeneratorWithDetail] = getComponentListGenerator(); // attach the component_parents entity to events from these plugins enableLinkClickTracking({ context: [componentGenerator] }); enableFormTracking({ context: [componentGenerator] }); ``` *** > **Note:** `componentGeneratorWithDetail` returns multiple entities and isn't directly compatible with the `context` arrays used by the link and form tracking plugins. ### Custom entities There are two ways you can add custom entities to element tracking events: - Plugin `context` option, applies to all rules - Per-rule `context` option, applies only to events from that specific rule **JavaScript (tag):** ```javascript // Configure at plugin level to apply to all rules snowplow('startElementTracking', { elements: [/* rules */], context: [/* entities attached to ALL events */] }); // Configure at rule level to apply to a specific rule snowplow('startElementTracking', { elements: { selector: '.promo-banner', name: 'promotion', context: [/* entities attached only to this rule's events */] } }); ``` **Browser (npm):** ```javascript // Configure at plugin level to apply to all rules startElementTracking({ elements: [/* rules */], context: [/* entities attached to ALL events */] }); // Configure at rule level to apply to a specific rule startElementTracking({ elements: { selector: '.promo-banner', name: 'promotion', context: [/* entities attached only to this rule's events */] } }); ``` *** You can configure static or dynamic entities: - Use static entities when the same data should be attached to every event, e.g. A/B test variant ```javascript context: [ { schema: 'iglu:com.example/campaign/jsonschema/1-0-0', data: { campaign_id: 'summer_2025', variant: 'A' } } ] ``` - Use callbacks to generate dynamic entities when the data depends on the specific element that triggered the event ```javascript context: [ (element, rule) => ({ schema: 'iglu:com.example/promotion/jsonschema/1-0-0', data: { promo_id: element.dataset.promoId, position: element.dataset.position, rule_name: rule.name } }) ] ``` ## Configure the plugin As well as configuring the `element_statistics`, `element_content`, and `component_parents` entities, you can customize how element visibility tracking works using the options below. The core options are explained in this table: | Property | Type | Description | Status | | ---------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- | | `selector` | `string` | A CSS selector string that matches one or more elements on the page that should trigger events from this rule. | **Required** | | `name` | `string` | A label to name this rule. Allows you to keep a stable name for events generated by this rule, even if the `selector` changes, so the data produced remains consistent. You can share a single `name` between many rules to have different configurations for different selectors. If not supplied, the `selector` value becomes the `name`. | _Recommended_ | | `id` | `string` | A specific identifier for this rule. Useful if you share a `name` between many rules and need to specifically remove individual rules within that group. | | You'll see `selector` and `name` in the examples on this page. ### Event frequency with `when` The `when` option controls how often events fire. The default is `always`. This example shows the options: **JavaScript (tag):** ```javascript // "always" - fires every time (e.g., every scroll in/out of view) // Boolean shorthand - expose: true snowplow('startElementTracking', { elements: { selector: '.ad-banner', name: 'ad_impression', expose: { when: 'always' } // fires each time banner scrolls into view } }); // "element" - fires once per matched element snowplow('startElementTracking', { elements: { selector: '.product-card', name: 'product_impression', expose: { when: 'element' } // fires once per card, even if user scrolls back } }); // "pageview" - fires once per element, resets on new page view (useful for SPAs) snowplow('startElementTracking', { elements: { selector: '.hero-section', name: 'hero_viewed', expose: { when: 'pageview' } // resets when tracker fires next page_view event } }); // "once" - fires exactly once for the entire rule, regardless of how many elements match snowplow('startElementTracking', { elements: { selector: '.newsletter-form', name: 'newsletter_form_exists', expose: { when: 'once' } // fires once even if multiple forms exist } }); // "never": never track this event for this rule // This is useful for defining components // Boolean shorthand - expose: false snowplow('startElementTracking', { elements: { selector: 'section', expose: { when: 'never' } // never fires } }); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; // "always" - fires every time (e.g., every scroll in/out of view) // Boolean shorthand - expose: true startElementTracking({ elements: { selector: '.ad-banner', name: 'ad_impression', expose: { when: 'always' } // fires each time banner scrolls into view } }); // "element" - fires once per matched element startElementTracking({ elements: { selector: '.product-card', name: 'product_impression', expose: { when: 'element' } // fires once per card, even if user scrolls back } }); // "pageview" - fires once per element, resets on new page view (useful for SPAs) startElementTracking({ elements: { selector: '.hero-section', name: 'hero_viewed', expose: { when: 'pageview' } // resets when tracker fires next page_view event } }); // "once" - fires exactly once for the entire rule, regardless of how many elements match startElementTracking({ elements: { selector: '.newsletter-form', name: 'newsletter_form_exists', expose: { when: 'once' } // fires once even if multiple forms exist } }); // "never": never track this event for this rule // This is useful for defining components // Boolean shorthand - expose: false startElementTracking({ elements: { selector: 'section', expose: { when: 'never' } // never fires } }); ``` *** If you're using `when: pageview`, ensure that the tracker is firing page view events appropriately for your needs, especially if it's a single page application (SPA). The plugin assumes that you'll call `startElementTracking()` before `trackPageView()`. The first page view doesn't reset the element visibility state, because the plugin sets `ignoreNextPageView: true` by default internally. If your site tracks page views before calling `startElementTracking()`, you can disable this behavior by passing `ignoreNextPageView: false` in the plugin options when adding it to the tracker. **JavaScript (tag):** ```javascript window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-element-tracking@latest/dist/index.umd.min.js", ["snowplowElementTracking", "SnowplowElementTrackingPlugin"], [{ ignoreNextPageView: false }] ); snowplow('startElementTracking', { elements: [/* configuration */] }); ``` **Browser (npm):** First, add the plugin when initializing the tracker. ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { SnowplowElementTrackingPlugin, startElementTracking } from '@snowplow/browser-plugin-element-tracking'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ SnowplowElementTrackingPlugin({ ignoreNextPageView: false }) ], }); startElementTracking({ elements: [/* configuration */] }); ``` *** ### Visibility thresholds for `expose` Control what counts as "visible" for `expose_element` events: **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: { selector: '.video-player', name: 'video_impression', expose: { when: 'element', minPercentage: 0.5, // at least 50% of element must be visible minTimeMillis: 2000, // must be visible for 2 seconds cumulative minSize: 10000, // element must be at least 10,000px² (e.g., 100x100) boundaryPixels: 50 // adds 50px padding when calculating visibility } } }); // boundaryPixels accepts different formats: expose: { when: 'element', boundaryPixels: 20 // 20px all sides // or: [10, 20] // 10px vertical, 20px horizontal // or: [10, 20, 30, 40] // top, right, bottom, left } ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: { selector: '.video-player', name: 'video_impression', expose: { when: 'element', minPercentage: 0.5, // at least 50% of element must be visible minTimeMillis: 2000, // must be visible for 2 seconds cumulative minSize: 10000, // element must be at least 10,000px² (e.g., 100x100) boundaryPixels: 50 // adds 50px padding when calculating visibility } } }); // boundaryPixels accepts different formats: expose: { when: 'element', boundaryPixels: 20 // 20px all sides // or: [10, 20] // 10px vertical, 20px horizontal // or: [10, 20, 30, 40] // top, right, bottom, left } ``` *** ### Data selectors using `details` The plugin uses data selectors when deciding if an element should trigger an event using `condition`, or when building the `element` entity's `attributes` property. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: { selector: '.product-card', name: 'product', expose: { when: 'element' }, details: [ // HTML attributes (from getAttribute) { attributes: ['id', 'data-category'] }, // Element properties (may differ from HTML attributes) { properties: ['className', 'tagName'] }, // Dataset values (data-* attributes, camelCase) //
{ dataset: ['productId', 'price'] }, // Text content from child elements { child_text: { name: 'h3', // text from first

brand: '.brand-name' // text from first .brand-name }}, // Regex extraction from element's textContent { content: { sku: /SKU-(\d+)/ // captures first group }}, // Include the selector that matched { selector: true }, // Validate collected attributes - discards results if no match // Useful for filtering in `condition` { match: { category: 'electronics', // exact value match price: (val) => parseFloat(val) > 0 // function match }}, // Custom callback function (element) => ({ isOnSale: element.classList.contains('on-sale') ? 'true' : 'false', position: element.dataset.position }) ] } }); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: { selector: '.product-card', name: 'product', expose: { when: 'element' }, details: [ // HTML attributes (from getAttribute) { attributes: ['id', 'data-category'] }, // Element properties (may differ from HTML attributes) { properties: ['className', 'tagName'] }, // Dataset values (data-* attributes, camelCase) //
{ dataset: ['productId', 'price'] }, // Text content from child elements { child_text: { name: 'h3', // text from first

brand: '.brand-name' // text from first .brand-name }}, // Regex extraction from element's textContent { content: { sku: /SKU-(\d+)/ // captures first group }}, // Include the selector that matched { selector: true }, // Validate collected attributes - discards results if no match // Useful for filtering in `condition` { match: { category: 'electronics', // exact value match price: (val) => parseFloat(val) > 0 // function match }}, // Custom callback function (element) => ({ isOnSale: element.classList.contains('on-sale') ? 'true' : 'false', position: element.dataset.position }) ] } }); ``` *** ### Conditional event firing with `condition` Only fire events when elements match certain criteria. Use data selectors to define the conditions: **JavaScript (tag):** ```javascript // Example: only track visible notifications snowplow('startElementTracking', { elements: { selector: '.notification', name: 'notification_shown', create: { when: 'element', condition: [ // Only fire if notification has data-visible="true" { dataset: ['visible'] }, { match: { visible: 'true' } } ] } } }); // Example: only track products that are in stock snowplow('startElementTracking', { elements: { selector: '.product-card', name: 'in_stock_product', expose: { when: 'element', condition: [ { dataset: ['stockStatus'] }, { match: { stockStatus: (val) => val !== 'out-of-stock' } } ] } } }); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; // Example: only track visible notifications startElementTracking({ elements: { selector: '.notification', name: 'notification_shown', create: { when: 'element', condition: [ // Only fire if notification has data-visible="true" { dataset: ['visible'] }, { match: { visible: 'true' } } ] } } }); // Example: only track products that are in stock startElementTracking({ elements: { selector: '.product-card', name: 'in_stock_product', expose: { when: 'element', condition: [ { dataset: ['stockStatus'] }, { match: { stockStatus: (val) => val !== 'out-of-stock' } } ] } } }); ``` *** ### Shadow DOM tracking If the elements you want to track exist within [shadow DOM](https://developer.mozilla.org/en-US/docs/Web/API/Web_components/Using_shadow_DOM) trees, the plugin might not identify them automatically. Use these settings to notify the plugin that it should descend into shadow hosts to identify elements to match the rule against. By default, the plugin matches specified elements both outside and inside `shadowSelector` shadow hosts. Set `shadowOnly` to `true` to only match elements within those shadow hosts. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: { selector: 'button.submit', name: 'submit_button', shadowSelector: 'my-custom-form', // CSS selector for elements that are shadow hosts containing the targeted elements shadowOnly: true, // only match elements inside shadow DOM, not elsewhere expose: { when: 'element' } } }); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: { selector: 'button.submit', name: 'submit_button', shadowSelector: 'my-custom-form', // CSS selector for elements that are shadow hosts containing the targeted elements shadowOnly: true, // only match elements inside shadow DOM, not elsewhere expose: { when: 'element' } } }); ``` *** ### Send to specific trackers If you have multiple trackers loaded on the same page, you can specify which trackers should receive events using the `tracker` option. Provide a list of tracker namespaces. If omitted, events go to all trackers the plugin has been activated for. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: { selector: '.promo-banner' } }, ['tracker1', 'tracker2']); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: { selector: '.promo-banner' } }, ['tracker1', 'tracker2']); ``` *** ## Further examples These examples are based on [a snapshot](https://web.archive.org/web/20250422013533/https://snowplow.io/) of the [Snowplow website](https://snowplow.io/). ### Content depth The blog posts have longer-form content. Snowplow's page ping events track scroll depth by pixels, but those measurements become inconsistent between devices and page. To see how much content gets consumed, you can generate stats based on the paragraphs in the content. You can also get periodic stats based on the entire article in page pings. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: [ { selector: ".blogs_blog-post-body_content", name: "blog content", expose: false, includeStats: ["page_ping"] }, { selector: ".blogs_blog-post-body_content p", name: "blog paragraphs" } ] }); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: [ { selector: ".blogs_blog-post-body_content", name: "blog content", expose: false, includeStats: ["page_ping"] }, { selector: ".blogs_blog-post-body_content p", name: "blog paragraphs" } ] }); ``` *** Because the expose event contains the `element_index` and `element_matches`, you can easily query the largest `element_index` by page view ID. The result tells you consumption statistics for individual views of each article. You can then summarize that metric to the content or category level, or converted to a percentage by comparing with `element_matches`. ```json { "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0", "data": [ { "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0", "data": { "element_name": "blog paragraphs", "width": 800, "height": 48, "position_x": 320, "position_y": 533.25, "doc_position_x": 320, "doc_position_y": 1373, "element_index": 6, "element_matches": 24, "originating_page_view": "f390bec5-f63c-48af-b3ad-a03f0511af7f", "attributes": [] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0", "data": { "id": "f390bec5-f63c-48af-b3ad-a03f0511af7f" } } ] } ``` The periodic page ping events also give you a summary of the total progress in the `max_y_depth_ratio`/`max_y_depth` values. With `y_depth_ratio` you can also see when users backtrack up the page. ```json { "schema": "iglu:com.snowplowanalytics.snowplow/element_statistics/jsonschema/1-0-0", "data": { "element_name": "blog content", "element_index": 1, "element_matches": 1, "current_state": "unknown", "min_size": "800x3928", "current_size": "800x3928", "max_size": "800x3928", "y_depth_ratio": 0.20302953156822812, "max_y_depth_ratio": 0.4931262729124236, "max_y_depth": "1937/3928", "element_age_ms": 298379, "times_in_view": 0, "total_time_visible_ms": 0 } } ``` ### Simple funnels A newsletter sign-up form exists at the bottom of the page. Performance measurement becomes difficult because many visitors don't even see it. To test this you first need to know: - When the form exists on a page - When the form is actually seen - When people actually interact with the form - When the form is finally submitted The form tracking plugin can only do the last parts, but the element tracker gives you the earlier steps. If you end up adding more forms in the future, you'll want to know which is which, so you can mark the footer as a component so you can split it out later. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: [ { selector: ".hbspt-form", name: "newsletter signup", create: true, }, { selector: "footer", component: true, expose: false } ] }); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: [ { selector: ".hbspt-form", name: "newsletter signup", create: true, }, { selector: "footer", component: true, expose: false } ] }); ``` *** If you try this on a blog page, you actually get two `create_element` events. Blog posts have a second newsletter sign-up in a sidebar next to the content. Because only the second form is a member of the `footer` component, you can easily see which one you are trying to measure when you query the data later. ```json { "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0", "data": [ { "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0", "data": { "element_name": "newsletter signup", "width": 336, "height": 161, "position_x": 1232, "position_y": 238.88333129882812, "doc_position_x": 1232, "doc_position_y": 3677.883331298828, "element_index": 1, "element_matches": 2, "originating_page_view": "02e30714-a84a-42f8-8b07-df106d669db0", "attributes": [] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0", "data": { "id": "02e30714-a84a-42f8-8b07-df106d669db0" } } ] } ``` ```json { "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0", "data": [ { "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0", "data": { "element_name": "newsletter signup", "width": 560, "height": 137, "position_x": 320, "position_y": 1953.5, "doc_position_x": 320, "doc_position_y": 5392.5, "element_index": 2, "element_matches": 2, "originating_page_view": "02e30714-a84a-42f8-8b07-df106d669db0", "attributes": [] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/component_parents/jsonschema/1-0-0", "data": { "element_name": "newsletter signup", "component_list": [ "footer" ] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0", "data": { "element_name": "footer", "width": 1920, "height": 1071.5, "position_x": 0, "position_y": 1212, "doc_position_x": 0, "doc_position_y": 4651, "originating_page_view": "", "attributes": [] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0", "data": { "id": "02e30714-a84a-42f8-8b07-df106d669db0" } } ] } ``` ### Recommendations performance The homepage contains a section for the "Latest Blogs from Snowplow." This could represent recommendations or some other form of personalization. If it did, one might want to optimize it. Link tracking could tell you when a recommendation worked and a visitor clicked it, but how would identify the recommendation not encouraging clicks? If you track when the widget becomes visible and include the items that got recommended, you could correlate that with the clicks to measure performance. For fairer measurement of visibility, you can configure that visibility only counts if at least 50% is in view, and it has to be on screen for at least 1.5 seconds. You'll also collect the post title and author information. **JavaScript (tag):** ```javascript snowplow('startElementTracking', { elements: [ { selector: ".blog_list-header_list-wrapper", name: "recommended_posts", create: true, expose: { when: "element", minTimeMillis: 1500, minPercentage: 0.5 }, contents: [ { selector: ".collection-item", name: "recommended_item", details: { child_text: { title: "h3", author: ".blog_list-header_author-text > p" } } } ] } ] }); ``` **Browser (npm):** ```javascript import { startElementTracking } from '@snowplow/browser-plugin-element-tracking'; startElementTracking({ elements: [ { selector: ".blog_list-header_list-wrapper", name: "recommended_posts", create: true, expose: { when: "element", minTimeMillis: 1500, minPercentage: 0.5 }, contents: [ { selector: ".collection-item", name: "recommended_item", details: { child_text: { title: "h3", author: ".blog_list-header_author-text > p" } } } ] } ] }); ``` *** Scrolling down to see the items and you see the items that get served to the visitor: ```json { "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0", "data": [ { "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0", "data": { "element_name": "recommended_posts", "width": 1280, "height": 680.7666625976562, "position_x": 320, "position_y": 437.70001220703125, "doc_position_x": 320, "doc_position_y": 6261.066711425781, "element_index": 1, "element_matches": 1, "originating_page_view": "034db1d6-1d60-42ca-8fe1-9aafc0442a22", "attributes": [] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/element_content/jsonschema/1-0-0", "data": { "element_name": "recommended_item", "parent_name": "recommended_posts", "parent_position": 1, "position": 1, "attributes": [ { "source": "child_text", "attribute": "title", "value": "Data Pipeline Architecture Patterns for AI: Choosing the Right Approach" }, { "source": "child_text", "attribute": "author", "value": "Matus Tomlein" } ] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/element_content/jsonschema/1-0-0", "data": { "element_name": "recommended_item", "parent_name": "recommended_posts", "parent_position": 1, "position": 2, "attributes": [ { "source": "child_text", "attribute": "title", "value": "Data Pipeline Architecture For AI: Why Traditional Approaches Fall Short" }, { "source": "child_text", "attribute": "author", "value": "Matus Tomlein" } ] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/element_content/jsonschema/1-0-0", "data": { "element_name": "recommended_item", "parent_name": "recommended_posts", "parent_position": 1, "position": 3, "attributes": [ { "source": "child_text", "attribute": "title", "value": "Agentic AI Applications: How They Will Turn the Web Upside Down" }, { "source": "child_text", "attribute": "author", "value": "Yali\tSassoon" } ] } }, { "schema": "iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0", "data": { "id": "034db1d6-1d60-42ca-8fe1-9aafc0442a22" } } ] } ``` --- # Track errors on web > Track handled and unhandled JavaScript exceptions with manual error tracking and automatic error tracking capabilities. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/errors/ The Errors tracker plugin provides two ways of tracking exceptions: manual tracking of handled exceptions using `trackError` and automatic tracking of unhandled exceptions using `enableErrorTracking`. Error events can be **manually tracked** and/or **automatically tracked**. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ✅ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-error-tracking@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-error-tracking@latest/dist/index.umd.min.js) (latest) | **Note:** The links to the CDNs above point to the current latest version. You should pin to a specific version when integrating this plugin on your website if you are using a third party CDN in production. ```javascript window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-error-tracking@latest/dist/index.umd.min.js", ["snowplowErrorTracking", "ErrorTrackingPlugin"] ); ``` **Browser (npm):** - `npm install @snowplow/browser-plugin-error-tracking` - `yarn add @snowplow/browser-plugin-error-tracking` - `pnpm add @snowplow/browser-plugin-error-tracking` ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { ErrorTrackingPlugin, enableErrorTracking } from '@snowplow/browser-plugin-error-tracking'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ ErrorTrackingPlugin() ], }); enableErrorTracking(); ``` *** ## Manual error tracking Use the `trackError` method to track handled exceptions (application errors) in your JS code. This is its signature: **JavaScript (tag):** ```javascript snowplow('trackError', { /** The error message */ message: string; /** The filename where the error occurred */ filename?: string; /** The line number which the error occurred on */ lineno?: number; /** The column number which the error occurred on */ colno?: number; /** The error object */ error?: Error; }); ``` **Browser (npm):** ```javascript trackError({ /** The error message */ message: string; /** The filename where the error occurred */ filename?: string; /** The line number which the error occurred on */ lineno?: number; /** The column number which the error occurred on */ colno?: number; /** The error object */ error?: Error; }); ``` *** | **Name** | **Required?** | **Description** | **Type** | | ---------- | ------------- | ----------------------------------- | ---------- | | `message` | Yes | Error message | string | | `filename` | No | Filename or URL | string | | `lineno` | No | Line number of problem code chunk | number | | `colno` | No | Column number of problem code chunk | number | | `error` | No | JS `ErrorEvent` | ErrorEvent | Of these arguments, only `message` is required. Signature of this method defined to match `window.onerror` callback in modern browsers. **JavaScript (tag):** ```javascript try { var user = getUser() } catch(e) { snowplow('trackError', { message: 'Cannot get user object', filename: 'shop.js', error: e }); } ``` **Browser (npm):** ```javascript try { var user = getUser() } catch(e) { trackError({ message: 'Cannot get user object', filename: 'shop.js', error: e }); } ``` *** Using `trackError` it's assumed that developer knows where errors could happen, which is not often the case. Therefor it's recommended to use `enableErrorTracking` as it allows you to discover errors that weren't expected. ## Automatic error tracking Use the `enableErrorTracking` method to track unhandled exceptions (application errors) in your JS code. This is its signature: **JavaScript (tag):** ```javascript snowplow('enableErrorTracking', { /** A callback which allows on certain errors to be tracked */ filter?: (error: ErrorEvent) => boolean; /** A callback to dynamically add extra context based on the error */ contextAdder?: (error: ErrorEvent) => Array; /** Context to be added to every error */ context?: Array; } ``` **Browser (npm):** ```javascript enableErrorTracking({ /** A callback which allows on certain errors to be tracked */ filter?: (error: ErrorEvent) => boolean; /** A callback to dynamically add extra context based on the error */ contextAdder?: (error: ErrorEvent) => Array; /** Context to be added to every error */ context?: Array; }); ``` *** | **Name** | **Required?** | **Description** | **Type** | | -------------- | ------------- | ------------------------------- | ------------------------------------------- | | `filter` | No | Predicate to filter exceptions | `(ErrorEvent) => Boolean` | | `contextAdder` | No | Function to get dynamic context | `(ErrorEvent) => Array` | | context | No | Additional custom context | `Array` | Unlike `trackError` you need enable error tracking only once: **JavaScript (tag):** ```javascript snowplow('enableErrorTracking') ``` **Browser (npm):** ```javascript enableErrorTracking(); ``` *** Application error events are implemented as Snowplow self-describing events. [Here](https://raw.githubusercontent.com/snowplow/iglu-central/master/schemas/com.snowplowanalytics.snowplow/application_error/jsonschema/1-0-1) is the schema for an `application_error` event. --- # Integrate with your event specifications on web > Use the event specifications plugin to automatically attach event specification entities from your tracking plan to matching browser events. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/event-specifications/ This plugin allows you to integrate with [Media Web](/docs/event-studio/tracking-plans/templates/#media-web) event specifications. The plugin will add an event specification entity to the matching [Snowplow media](/docs/events/ootb-data/media-events/) events. Retrieve the configuration directly from your [tracking plan](https://docs.snowplow.io/docs/fundamentals/tracking-plans/) in [Snowplow Console](https://console.snowplowanalytics.com). > **Note:** The plugin is available since version 3.23 of the tracker. It's only available for tracking plans created using the [Media Web template](/docs/event-studio/tracking-plans/templates/#media-web). The event specification entity is **automatically tracked** once configured. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ❌ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-event-specifications@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-event-specifications@latest/dist/index.umd.min.js) (latest) | ```javascript window.snowplow( 'addPlugin', 'https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-event-specifications@latest/dist/index.umd.min.js', ['eventSpecifications', 'EventSpecificationsPlugin'] ); ``` **Browser (npm):** - `npm install @snowplow/browser-plugin-event-specifications` - `yarn add @snowplow/browser-plugin-event-specifications` - `pnpm add @snowplow/browser-plugin-event-specifications` ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { EventSpecificationsPlugin } from '@snowplow/browser-plugin-event-specifications'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ EventSpecificationsPlugin(/* plugin configuration */) ], }); ``` *** ## Configuration You can retrieve the configuration for your event specifications directly from your tracking plan after clicking on the `Implement tracking` button. ![implement tracking button](/assets/images/implement_tracking-237d544d543211d13699e36aac03fc1c.png) Configure the plugin by mapping each tracked event to the event specification ID from your tracking plan. For example: **JavaScript (tag):** ```javascript // Initialize tracker window.snowplow('newTracker', 'sp1', '{{collector_url}}', { appId: 'my-app-id' }); // Add the Media plugin window.snowplow( 'addPlugin', 'https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-media@latest/dist/index.umd.min.js', ['snowplowMedia', 'SnowplowMediaPlugin'] ); // Add the Event Specifications plugin with configuration window.snowplow( 'addPlugin', 'https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-event-specifications@latest/dist/index.umd.min.js', ['eventSpecifications', 'EventSpecificationsPlugin'], [ { SnowplowMediaPlugin: { // Map event names to your event specification IDs from Console "ad_break_end_event": "abb057a9-eb05-41b8-8d13-0a020f5f9960", "ad_break_start_event": "896fc117-aad9-4ef1-ab52-13fcf9156a08", "ad_click_event": "df27c1dd-b7e6-4d4a-8ba1-613d859594c4", "ad_complete_event": "a382f36b-39ed-46b4-9e6f-ac9bd1d65360", "ad_pause_event": "6bd62180-37ab-4a9c-9aa4-580aa39d7888", "ad_quartile_event": "7d946906-80eb-4ca0-bf7d-4a0f04ae3598", "ad_resume_event": "d5ae264a-3983-478b-b9d5-bcf46c66cab1", "ad_skip_event": "5f79e53b-9318-4644-b4e8-8bf7804c244b", "ad_start_event": "442a8e75-4884-434e-8e1d-d80bc35c4157", "buffer_end_event": "8951ce07-b497-45b0-81c3-49962d36fa6a", "buffer_start_event": "c53f4ea7-e7c3-44d9-97a8-1edf5a61d898", "error_event": "594bd013-8e6b-4ec4-9828-e5e609b4297c", "fullscreen_change_event": "3780c3e5-17ed-4e39-b22c-c0568c486bf3", "ping_event": "a4870ad5-e028-42ae-bfca-603a3d6837f1", "pause_event": "bf90af15-840d-4a76-a7f0-ccc8865a9c5c", "percent_progress_event": "6684eea3-82e6-4c2e-98db-ab0be61fdf0d", "picture_in_picture_change_event": "2e8be82e-11fb-4aa3-a5a2-7f49efc29abb", "end_event": "aaac78f1-8ee4-42a6-8e3c-46f660c32709", "play_event": "1094455c-4e99-4e1f-8445-f4fb12b4eccc", "quality_change_event": "bdf91319-c5f0-476f-922f-9215b76186af", "ready_event": "c1f9f850-dc68-47d8-9fd1-0db10328858c", "seek_end_event": "d748d09f-a361-4620-8651-f883b1502a23", "seek_start_event": "5fb03fae-3f0f-4538-908b-a55a6f7e69cb", "volume_change_event": "972836c4-73b8-45c3-abcf-22e3bd7eae6c" } } ] ); ``` **Browser (npm):** ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { SnowplowMediaPlugin, enableMediaTracking } from '@snowplow/browser-plugin-media'; import { EventSpecificationsPlugin } from '@snowplow/browser-plugin-event-specifications'; // Initialize tracker with both plugins newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ SnowplowMediaPlugin(), EventSpecificationsPlugin({ SnowplowMediaPlugin: { // Map event names to your event specification IDs from Console "ad_break_end_event": "abb057a9-eb05-41b8-8d13-0a020f5f9960", "ad_break_start_event": "896fc117-aad9-4ef1-ab52-13fcf9156a08", "ad_click_event": "df27c1dd-b7e6-4d4a-8ba1-613d859594c4", "ad_complete_event": "a382f36b-39ed-46b4-9e6f-ac9bd1d65360", "ad_pause_event": "6bd62180-37ab-4a9c-9aa4-580aa39d7888", "ad_quartile_event": "7d946906-80eb-4ca0-bf7d-4a0f04ae3598", "ad_resume_event": "d5ae264a-3983-478b-b9d5-bcf46c66cab1", "ad_skip_event": "5f79e53b-9318-4644-b4e8-8bf7804c244b", "ad_start_event": "442a8e75-4884-434e-8e1d-d80bc35c4157", "buffer_end_event": "8951ce07-b497-45b0-81c3-49962d36fa6a", "buffer_start_event": "c53f4ea7-e7c3-44d9-97a8-1edf5a61d898", "error_event": "594bd013-8e6b-4ec4-9828-e5e609b4297c", "fullscreen_change_event": "3780c3e5-17ed-4e39-b22c-c0568c486bf3", "ping_event": "a4870ad5-e028-42ae-bfca-603a3d6837f1", "pause_event": "bf90af15-840d-4a76-a7f0-ccc8865a9c5c", "percent_progress_event": "6684eea3-82e6-4c2e-98db-ab0be61fdf0d", "picture_in_picture_change_event": "2e8be82e-11fb-4aa3-a5a2-7f49efc29abb", "end_event": "aaac78f1-8ee4-42a6-8e3c-46f660c32709", "play_event": "1094455c-4e99-4e1f-8445-f4fb12b4eccc", "quality_change_event": "bdf91319-c5f0-476f-922f-9215b76186af", "ready_event": "c1f9f850-dc68-47d8-9fd1-0db10328858c", "seek_end_event": "d748d09f-a361-4620-8651-f883b1502a23", "seek_start_event": "5fb03fae-3f0f-4538-908b-a55a6f7e69cb", "volume_change_event": "972836c4-73b8-45c3-abcf-22e3bd7eae6c" } }) ] }); ``` *** ## Event specification entity When an event is tracked that matches one of the configured event names, the plugin will automatically add an event specification entity to it. ### `event_specification` **Type:** Entity Entity schema for referencing an event specification **Schema:** `iglu:com.snowplowanalytics.snowplow/event_specification/jsonschema/1-0-0` **Example:** ```json { "id": "abb057a9-eb05-41b8-8d13-0a020f5f9960" } ``` **Properties:** | Property | Description | | ------------- | ---------------------------------------------------------------------------- | | `id` _string_ | _Required._ Identifier for the event specification that the event adheres to | --- # Track Kantar Focal Meter events on web > Integrate with Kantar Focal Meter router meters to measure content audience by sending domain user IDs to Focal Meter endpoints. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/focalmeter/ This plugin provides integration with [Focal Meter by Kantar](https://www.virtualmeter.co.uk/focalmeter). Focal Meter is a box that connects directly to the broadband router and collects viewing information for the devices on your network. This integration enables measuring the audience of content through the Focal Meter router meter. The plugin has the ability to send the [domain user ID](/docs/fundamentals/canonical-event/#user-fields) to a [Kantar Focal Meter](https://www.virtualmeter.co.uk/focalmeter) endpoint. A request is made when the first event with a new user ID is tracked. The plugin inspects the domain user ID property in tracked events. Whenever it changes from the previously recorded value, it makes an HTTP GET request to the `kantarEndpoint` URL with the ID as a query parameter. Optionally, the tracker may store the last published domain user ID value in local storage in order to prevent it from making the same request on the next page load. If local storage is not used, the request is made on each page load. > **Note:** The plugin is available since version 3.16 of the tracker. The Focal Meter integration is **automatic** once configured. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ❌ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-focalmeter@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-focalmeter@latest/dist/index.umd.min.js) (latest) | **Browser (npm):** - `npm install @snowplow/browser-plugin-focalmeter` - `yarn add @snowplow/browser-plugin-focalmeter` - `pnpm add @snowplow/browser-plugin-focalmeter` *** ## Enable integration **JavaScript (tag):** To integrate with the Kantar FocalMeter, use the snippet below after [setting up your tracker](/docs/sources/web-trackers/quick-start-guide/): ```javascript window.snowplow( 'addPlugin', 'https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-focalmeter@latest/dist/index.umd.min.js', ['snowplowFocalMeter', 'FocalMeterPlugin'] ); window.snowplow('enableFocalMeterIntegration', { kantarEndpoint: '{{kantar_url}}', useLocalStorage: false // optional, defaults to false }); ``` **Browser (npm):** ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { FocalMeterPlugin, enableFocalMeterIntegration } from '@snowplow/browser-plugin-focalmeter'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ FocalMeterPlugin() ], }); enableFocalMeterIntegration({ kantarEndpoint: '{{kantar_url}}', useLocalStorage: false // optional, defaults to false }); ``` *** The `enableFocalMeterIntegration` function has the following arguments: | Parameter | Type | Default | Description | Required | | ----------------- | ---------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | | `kantarEndpoint` | `string` | - | URL of the Kantar endpoint to send the requests to (including protocol) | Yes | | `processUserId` | `(userId: string) => string` | - | Callback to process user ID before sending it in a request. This may be used to apply hashing to the value. | No | | `useLocalStorage` | `boolean` | `false` | Whether to store information about the last submitted user ID in local storage to prevent sending it again on next load (defaults not to use local storage) | No | If you choose to storage the last submitted user ID in local storage, the plugin will use the key `sp-fclmtr-{trackerId}`. The `trackerId` is your tracker namespace. ### Processing the user ID By default, the plugin sends the domain user ID as a GET parameter in requests to Kantar without modifying it. In case you want to apply some transformation on the value, such as hashing, you can provide the `processUserId` callback in the `enableFocalMeterIntegration` call: **JavaScript (tag):** ```javascript window.snowplow('enableFocalMeterIntegration', { kantarEndpoint: "https://kantar.example.com", processUserId: (userId) => md5(userId).toString(), // apply the custom hashing here }); ``` **Browser (npm):** ```javascript import md5 from 'crypto-js/md5'; enableFocalMeterIntegration({ kantarEndpoint: "https://kantar.example.com", processUserId: (userId) => md5(userId).toString(), // apply the custom hashing here }); ``` *** ### Configure multiple trackers If you have multiple trackers loaded on the same page, you can enable the Focal Meter integration for each of them by specifying the tracker namespace as the third parameter to the `enableFocalMeterIntegration` function: **JavaScript (tag):** ```javascript window.snowplow( 'enableFocalMeterIntegration', { kantarEndpoint: 'https://kantar.example.com' }, ['sp1', 'sp2'] // Only these tracker namespaces will send to Kantar ); ``` **Browser (npm):** ```javascript enableFocalMeterIntegration( { kantarEndpoint: 'https://kantar.example.com' }, ['sp1', 'sp2'] // Only these tracker namespaces will send to Kantar ); ``` *** ## Request format The tracker will send requests with this format: ```text GET https://your-kantar-endpoint.com?vendor=snowplow&cs_fpid=d5c4f9a2-3b7e-4d1f-8c6a-9e2b5f0a3c8d&c12=not_set ``` Where: - `vendor` is always `snowplow` - `cs_fpid` is the domain user ID, or the processed version if a `processUserId` callback is provided - `c12` is always `not_set` --- # Track form interactions on web > Automatically track form changes, submissions, and focus events with configurable allowlists, denylists, and transform functions for field values. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/form-tracking/ Snowplow form tracking creates three event types: `change_form`, `submit_form` and `focus_form`. Using the `enableFormTracking` method adds event listeners to the document listening for events from form elements and their interactive fields (that is, all `input`, `textarea`, and `select` elements). > **Note:** Events on password fields will not be tracked. Form events are **automatically tracked** once configured. ## Installation **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ✅ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-form-tracking@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-form-tracking@latest/dist/index.umd.min.js) (latest) | **Note:** The links to the CDNs above point to the current latest version. You should pin to a specific version when integrating this plugin on your website if you are using a third party CDN in production. **Browser (npm):** - `npm install @snowplow/browser-plugin-form-tracking` - `yarn add @snowplow/browser-plugin-form-tracking` - `pnpm add @snowplow/browser-plugin-form-tracking` *** ## Toggle form tracking Start tracking form events by enabling the plugin: **JavaScript (tag):** ```javascript window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-form-tracking@latest/dist/index.umd.min.js", ["snowplowFormTracking", "FormTrackingPlugin"] ); snowplow('enableFormTracking'); ``` **Browser (npm):** Initialize your tracker with the plugin. ```javascript import { newTracker, trackPageView } from '@snowplow/browser-tracker'; import { FormTrackingPlugin, enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ FormTrackingPlugin() ], }); enableFormTracking(); ``` *** To stop form tracking, call `disableFormTracking`: **JavaScript (tag):** ```javascript snowplow('disableFormTracking'); ``` **Browser (npm):** ```javascript import { FormTrackingPlugin, disableFormTracking } from '@snowplow/browser-plugin-form-tracking'; disableFormTracking(); ``` *** ## Events By default, all three event types are tracked. However, it is possible to subscribe only to specific event types using the `options.events` option when enabling form tracking: **JavaScript (tag):** ```javascript // subscribing to specific event types snowplow('enableFormTracking', { options: { events: ['submit_form', 'focus_form', 'change_form'] }, }); ``` **Browser (npm):** ```javascript // subscribing to specific event types enableFormTracking({ options: { events: ['submit_form', 'focus_form', 'change_form'] }, }); ``` *** Check out the [form tracking overview](/docs/events/ootb-data/page-elements/#form-interactions) page to see the schema details. ### Change form When a user changes the value of a `textarea`, `input`, or `select` element inside a form, a [`change_form`](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/change_form/jsonschema/1-0-0) event will be fired. It will capture the name, type, and new value of the element, and the id of the parent form. ### Submit form When a user submits a form, a [`submit_form`](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/submit_form/jsonschema/1-0-0) event will be fired. It will capture the id and classes of the form and the name, type, and value of all `textarea`, `input`, and `select` elements inside the form. Note that this will only work if the original form submission event is actually fired. If you prevent it from firing, for example by using a jQuery event handler which returns `false` to handle clicks on the form's submission button, the Snowplow `submit_form` event will not be fired. ### Focus form When a user focuses on a form element, a [`focus_form`](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/focus_form/jsonschema/1-0-0) event will be fired. It will capture the id and classes of the form and the name, type, and value of the `textarea`, `input`, or `select` element inside the form that received focus. ## Configuration It may be that you do not want to track every field in a form, or every form on a page. You can customize form tracking by passing a configuration argument to the `enableFormTracking` method. This argument should be an object with two elements named "forms" and "fields". The "forms" element determines which forms will be tracked; the "fields" element determines which fields inside the tracked forms will be tracked. As with link click tracking, there are three ways to configure each field: a denylist, an allowlist, or a filter function. You do not have to use the same method for both fields. **Denylists** This is an array of strings used to prevent certain elements from being tracked. Any form with a CSS class in the array will be ignored. Any field whose name property is in the array will be ignored. All other elements will be tracked. **Allowlists** This is an array of strings used to turn on tracking. Any form with a CSS class in the array will be tracked. Any field in a tracked form whose name property is in the array will be tracked. All other elements will be ignored. **Filter functions** This is a function used to determine which elements are tracked. The element is passed as the argument to the function and is tracked if and only if the value returned by the function is truthy. **Event phase** From v4 onwards, this plugin uses [capture-phase](https://developer.mozilla.org/en-US/docs/Web/API/Event/eventPhase#value) event listeners to detect form events. The capture phase is the earliest phase of event handlers, so events might be tracked before other code executes (e.g. form validation). If your filter or transform functions are relying on other event handlers to have executed to function correctly, they may not behave as expected when using capture-phase event handlers. From v4.6.8 onwards, the plugin supports a `useCapture` option, which you can set to `false` (default is `true`) to revert to the v3 behavior of using bubble-phase event handlers. This allows other event handlers time to execute before the event is detected and your filter/transform functions are executed. When using the bubble phase, other event handlers may [cancel the event's propagation](https://developer.mozilla.org/en-US/docs/Web/API/Event/stopPropagation) and the plugin will not receive the event and nothing will be tracked. This may be desirable if you want to wait for the form to validate before tracking a "form\_submit" event, for example. Native HTML form validation automatically prevents the "submit" event firing until the form is valid, so only validation code that doesn't integrate with native APIs should require explicitly using the bubble phase. The `focus` event for form fields [does not bubble](https://developer.mozilla.org/en-US/docs/Web/API/Element/focus_event), so this setting is ignored for `form_focus` tracking, which will always use capture-phase event listeners; only "change" and "submit" handlers will use the bubble phase when setting `useCapture: false`. ### Transform functions This is a function used to transform data in each form field. The value and element are passed as arguments to the function and the tracked value is replaced by the value returned. The transform function receives three arguments: 1. The value of the element. 2. Either the HTML element (for `change_form` and `focus_form` events) or an instance of `ElementData` (for `submit_form` events). 3. The HTML element (in all form tracking events). The function signature is: ```typescript type transformFn = ( elementValue: string | null, elementInfo: ElementData | TrackedHTMLElement, elt: TrackedHTMLElement ) => string | null; ``` This means that you can specify a transform function that applies the exact same logic to all `submit_form`, `change_form` and `focus_form` events independent of the element's attributes the logic may depend on. For example: **JavaScript (tag):** ```javascript function redactPII(eltValue, _, elt) { if (elt.id === 'pid') { return 'redacted'; } return eltValue; } snowplow('enableFormTracking', { options: { fields: { transform: redactPII }, }, }); ``` **Browser (npm):** ```javascript import { enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; function redactPII(eltValue, _, elt) { if (elt.id === 'pid') { return 'redacted'; } return eltValue; } enableFormTracking({ options: { fields: { transform: redactPII }, }, }); ``` *** ### Examples To track every form element and every field except those fields named "password": **JavaScript (tag):** ```javascript var opts = { forms: { denylist: [] }, fields: { denylist: ['password'] } }; snowplow('enableFormTracking', { options: opts }); ``` **Browser (npm):** ```javascript import { enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; var options = { forms: { denylist: [] }, fields: { denylist: ['password'] } }; enableFormTracking({ options }); ``` *** To track only the forms with CSS class "tracked", and only those fields whose ID is not "private": **JavaScript (tag):** ```javascript var opts = { forms: { allowlist: ["tracked"] }, fields: { filter: function (elt) { return elt.id !== "private"; } } }; snowplow('enableFormTracking', { options: opts }); ``` **Browser (npm):** ```javascript import { enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; var opts = { forms: { allowlist: ["tracked"] }, fields: { filter: function (elt) { return elt.id !== "private"; } } }; enableFormTracking({ options: opts }); ``` *** To transform the form fields with an MD5 hashing function: **JavaScript (tag):** ```javascript function hashMD5(value, _, elt) { // can use elt to make transformation decisions return MD5(value); } var opts = { forms: { allowlist: ["tracked"] }, fields: { filter: function (elt) { return elt.id !== "private"; }, transform: hashMD5 } }; snowplow('enableFormTracking', { options: opts }); ``` **Browser (npm):** ```javascript import { enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; function hashMD5(value, _, elt) { // can use elt to make transformation decisions return MD5(value); } var options = { forms: { allowlist: ["tracked"] }, fields: { filter: function (elt) { return elt.id !== "private"; }, transform: hashMD5 } }; enableFormTracking({ options }); ``` *** To use the bubble-phase event listeners: **JavaScript (tag):** ```javascript snowplow('enableFormTracking', { options: { useCapture: false } }); ``` **Browser (npm):** ```javascript import { enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; enableFormTracking({ options: { useCapture: false } }); ``` *** ## Tracking forms embedded inside iframes The options for tracking forms inside of iframes are limited – browsers block access to contents of iframes that are from different domains than the parent page. We are not able to provide a solution to track events using trackers initialized on the parent page in such cases. It is possible to track events from forms embedded in iframes loaded from the same domain as the parent page or iframes created using JavaScript on the parent page (e.g. HubSpot forms). In case you are able to access form elements inside an iframe, you can pass them in the `options.forms` argument when calling `enableFormTracking` on the parent page. This will enable form tracking for the specific form elements. The feature may also be used for forms not embedded in iframes, but it's most useful in this particular case. The following example shows how to identify the form elements inside an iframe and pass them to the `enableFormTracking` function: **JavaScript (tag):** ```javascript let iframe = document.getElementById('form_iframe'); // find the element for the iframe let forms = iframe.contentWindow.document.getElementsByTagName('form'); // find form elements inside the iframe snowplow('enableFormTracking', { options: { forms: forms // pass the embedded forms when enabling form tracking }, }); ``` **Browser (npm):** ```javascript let iframe = document.getElementById('form_iframe'); // find the element for the iframe let forms = iframe.contentWindow.document.getElementsByTagName('form'); // find form elements inside the iframe enableFormTracking({ options: { forms: forms // pass the embedded forms when enabling form tracking }, }); ``` *** Alternatively, you can specify the iframe's `document` as a `target` directly; this will enable form tracking for all forms within the iframe's document: **JavaScript (tag):** ```javascript let iframe = document.getElementById('form_iframe'); // find the element for the iframe let formDoc = iframe.contentWindow.document; // find iframe document that contains forms snowplow('enableFormTracking', { options: { targets: [document, formDoc] // pass the embedded document when enabling form tracking }, }); ``` **Browser (npm):** ```javascript let iframe = document.getElementById('form_iframe'); // find the element for the iframe let formDoc = iframe.contentWindow.document; // find iframe document that contains forms enableFormTracking({ options: { targets: [document, formDoc] // pass the embedded document when enabling form tracking }, }); ``` *** `targets` can also be used to only track subsets of a document by passing a parent element directly. ## Tracking forms from inside shadow trees Forms created within [shadow trees](https://developer.mozilla.org/en-US/docs/Glossary/Shadow_tree) (e.g. within custom [Web Components](https://developer.mozilla.org/en-US/docs/Web/API/Web_components)) can only be tracked once the user first focuses a field. The plugin relies on composed events to detect the form interactions at the document level. Only `focus` events are considered composed, `change` and `submit` events are not composed and so are not automatically detected by the plugin. When the user focuses a field in a form that is detected as being inside a shadow tree, the event listeners are added directly to the `form` element within the shadow tree in addition to the document-level event listeners in order to track future `change` and `submit` events correctly. If the form has no interactive field elements to first trigger the `focus` event, any `change` or `submit` events that fire will not be tracked. If the shadow root is attached in "closed" mode, no events will be tracked for elements in that shadow tree, only "open" mode is supported. ## Custom context entities Context entities can be sent with all form tracking events by supplying them in an array in the `context` argument. **JavaScript (tag):** ```javascript snowplow('enableFormTracking', { options: {}, context: [] }); ``` **Browser (npm):** ```javascript import { enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; enableFormTracking({ options: {}, context: [] }); ``` *** These context entities can be dynamic, i.e. they can be traditional self-describing JSON objects, or callbacks that generate valid self-describing JSON objects. For form change events, context generators are passed `(elt, type, value)`, and form submission events are passed `(elt, innerElements)`. A dynamic context could therefore look something like this for form change events: **JavaScript (tag):** ```javascript let dynamicContext = function (elt, type, value) { // perform operations here to construct the context entity return context; }; snowplow('enableFormTracking', { options: {}, context: [dynamicContext] }); ``` **Browser (npm):** ```javascript import { enableFormTracking } from '@snowplow/browser-plugin-form-tracking'; var dynamicContext = function (elt, type, value) { // perform operations here to construct the context entity return context; }; enableFormTracking({ options: {}, context: [dynamicContext] }); ``` *** --- # Track Google Analytics cookies with the web trackers > Automatically capture Google Analytics cookie values including GA4 and Universal Analytics cookies as context entities on every event. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/ga-cookies/ If this plugin is used, the tracker will look for Google Analytics cookies (GA4/Universal Analytics and "classic" GA; specifically the `_ga` cookie and older `__utma`, `__utmb`, `__utmc`, `__utmv`, `__utmz`) and combine their values into event context entities that get sent with every event. GA cookies information is **automatically tracked** once configured. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ✅ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-ga-cookies@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-ga-cookies@latest/dist/index.umd.min.js) (latest) | **Note:** The links to the CDNs above point to the current latest version. You should pin to a specific version when integrating this plugin on your website if you are using a third party CDN in production. **Browser (npm):** - `npm install @snowplow/browser-plugin-ga-cookies` - `yarn add @snowplow/browser-plugin-ga-cookies` - `pnpm add @snowplow/browser-plugin-ga-cookies` *** ## Initialization **JavaScript (tag):** ```javascript window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-ga-cookies@latest/dist/index.umd.min.js", ["snowplowGaCookies", "GaCookiesPlugin"], [pluginOptions] // note: for sp.js, pluginOptions can also be specified when calling `newTracker`; e.g. `contexts: { gaCookies: { ua: true, ga4: false } }` ); ``` **Browser (npm):** ```javascript import { newTracker, trackPageView } from '@snowplow/browser-tracker'; import { GaCookiesPlugin } from '@snowplow/browser-plugin-ga-cookies'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ GaCookiesPlugin(pluginOptions) ], }); ``` *** The `pluginOptions` parameter allows to configure the plugin. Its type is: ```javascript interface GACookiesPluginOptions { ua?: boolean; ga4?: boolean; ga4MeasurementId?: string | string[]; cookiePrefix?: string | string[]; } ``` | Name | Default | Description | | ---------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ua | `false` | Send Universal Analytics specific cookie values. | | ga4 | `true` | Send Google Analytics 4 specific cookie values. | | ga4MeasurementId | `""` | Measurement id(s) to search the Google Analytics 4 session cookie. Can be a single measurement id as a string or an array of measurement id strings. The cookie has the form of `_ga_` where `` is the data stream container id and `` is the optional `cookie_prefix` option of the gtag.js tracker. | | cookiePrefix | `[]` | Cookie prefix set on the Google Analytics 4 cookies using the `cookie_prefix` option of the gtag.js tracker. | ## Context entities Adding this plugin will automatically capture the following entities: 1. For GA4 cookies: `iglu:com.google.ga4/cookies/jsonschema/1-0-0` (default) ```json { "_ga": "G-1234", "cookie_prefix": "prefix", "session_cookies": [ { "measurement_id": "G-1234", "session_cookie": "567" } ] } ``` 2. For Universal Analytics cookies: `iglu:com.google.analytics/cookies/jsonschema/1-0-0` (if enabled) ```json { "_ga": "GA1.2.3.4" } ``` --- # Track data out-of-the-box with the web trackers > Track page views, structured events, and self-describing events with automatic context entities and custom timestamps using the web trackers. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/ To track an event, the API is slightly different depending if you're using the JavaScript or Browser version of our web tracker. The main built-in events are [page views](/docs/sources/web-trackers/tracking-events/page-views/) and [page pings](/docs/sources/web-trackers/tracking-events/activity-page-pings/). Here's how to track them: **JavaScript (tag):** ```javascript ``` **Browser (npm):** ```javascript import { newTracker, trackPageView, enableActivityTracking } from '@snowplow/browser-tracker'; newTracker('sp', '{{collector_url_here}}', { appId: 'my-app-id', }); enableActivityTracking({ minimumVisitLength: 30, heartbeatDelay: 10 }); trackPageView(); ``` *** As well as page views and activity tracking, you can track [custom events](/docs/sources/web-trackers/custom-tracking-using-schemas/), or use [plugins](/docs/sources/web-trackers/plugins/) to track a wide range of other events and entities. ## Add contextual data with entities The tracker can be set up to automatically add [entities](/docs/fundamentals/entities/) to every event sent. Most entity autotracking is specifically configured using plugins, which are imported, enabled, and configured individually. However, you can configure some entities directly when instrumenting the tracker, using the [configuration object](/docs/sources/web-trackers/tracker-setup/initialization-options/). | Entity | Usage | Added by default | JavaScript (tag) tracker | Browser (npm) tracker | | ---------------------------------------------------------------------------------------------------- | -------------------------------- | ---------------- | ------------------------ | --------------------- | | [`webPage`](/docs/sources/web-trackers/tracking-events/page-views/#page-view-id-and-web_page-entity) | UUID for the page view | ✅ | `contexts` config | `contexts` config | | [`session`](/docs/sources/web-trackers/tracking-events/session/) | Data about the current session | ❌ | `contexts` config | `contexts` config | | [`browser`](/docs/sources/web-trackers/tracking-events/browsers/) | Properties of the user's browser | ❌ | `contexts` config | `contexts` config | | [`performanceTiming`](/docs/sources/web-trackers/tracking-events/timings/) | Performance timing metrics | ❌ | `contexts` config | Plugin | | [`gaCookies`](/docs/sources/web-trackers/tracking-events/ga-cookies/) | Extract GA cookie values | ❌ | `contexts` config | Plugin | | [`geolocation`](/docs/sources/web-trackers/tracking-events/timezone-geolocation/) | User's geolocation | ❌ | `contexts` config | Plugin | If you're using the `sp.lite.js` JavaScript tracker distribution, only the `webPage`, `session`, and `browser` entities are available out of the box, as the others require plugins that aren't included in that distribution. You can also attach your own [custom entities](/docs/sources/web-trackers/custom-tracking-using-schemas/) to events. For example, here is a page view with an additional custom entity: **JavaScript (tag):** ```javascript snowplow('trackPageView', { context: [{ schema: "iglu:com.example_company/page/jsonschema/1-2-1", data: { pageType: 'test', lastUpdated: new Date(2021,04,01) } }] }); ``` **Browser (npm):** ```javascript trackPageView({ context: [{ schema: 'iglu:com.example_company/page/jsonschema/1-2-1', data: { pageType: 'test', lastUpdated: new Date(2021,04,01) } }] }); ``` *** > **Note:** Tracker methods available through plugins do not necessarily support adding custom entities. For those please refer to the corresponding plugin documentation for details. ## Set event properties Certain event properties, including `domain_userid` or `application_id`, can be set as [atomic properties](/docs/fundamentals/canonical-event/) in the raw event. ### Application ID Set the application ID using the `appId` field of the [tracker configuration object](/docs/sources/web-trackers/tracker-setup/initialization-options/). This will be attached to every event the tracker fires. You can set different application IDs on different parts of your site. You can then distinguish events that occur on different applications by grouping results based on `application_id`. ### Application version > **Info:** The option to track the application version was introduced in version 4.1 of the JavaScript tracker. Set the application ID using the `appVersion` field of the [tracker configuration object](/docs/sources/web-trackers/tracker-setup/initialization-options/). This will be attached to every event the tracker fires using the [application entity](/docs/events/ootb-data/app-information/#entity-definitions). The version of can be a semver-like structure (e.g 1.1.0) or a Git commit SHA hash. ### Application platform Set the application platform using the `platform` field of the [tracker configuration object](/docs/sources/web-trackers/tracker-setup/initialization-options/). This will be attached to every event the tracker fires. Its default value is `web`. For a list of supported platforms, please see the [Snowplow Tracker Protocol](/docs/fundamentals/canonical-event/#application-fields). ### Business user ID The JavaScript Tracker automatically sets a `domain_userid` based on a first party cookie. Read more about cookies [here](/docs/sources/web-trackers/cookies-and-local-storage/). There are many situations, however, when you will want to identify a specific user using an ID generated by one of your business systems. To do this, you use one of the methods described in this section: `setUserId`, `setUserIdFromLocation`, `setUserIdFromReferrer`, and `setUserIdFromCookie`. Typically, companies do this at points in the customer journey where users identify themselves e.g. if they log in. > **Note:** This will only set the user ID on further events fired while the user is on this page; if you want events on another page to record this user ID too, you must call `setUserId` on the other page as well. #### `setUserId` `setUserId` is the simplest of the four methods. It sets the business user ID to a string of your choice: **JavaScript (tag):** ```javascript snowplow('setUserId', 'joe.blogs@email.com'); ``` **Browser (npm):** ```javascript setUserId('joe.blogs@email.com'); ``` *** > **Note:** `setUserId` can also be called using the alias `identifyUser`. #### `setUserIdFromLocation` `setUserIdFromLocation` lets you set the user ID based on a querystring field of your choice. For example, if the URL is `http://www.mysite.com/home?id=user345`, then the following code would set the user ID to “user345”: **JavaScript (tag):** ```javascript snowplow('setUserIdFromLocation', 'id'); ``` **Browser (npm):** ```javascript setUserIdFromLocation('id'); ``` *** #### `setUserIdFromReferrer` `setUserIdFromReferrer` functions in the same way as `setUserIdFromLocation`, except that it uses the referrer querystring rather than the querystring of the current page. **JavaScript (tag):** ```javascript snowplow('setUserIdFromReferrer', 'id'); ``` **Browser (npm):** ```javascript setUserIdFromReferrer('id'); ``` *** #### `setUserIdFromCookie` Use `setUserIdFromCookie` to set the value of a cookie as the user ID. For example, if you have a cookie called “cookieid” whose value is “user123”, the following code would set the user ID to “user123”: **JavaScript (tag):** ```javascript snowplow('setUserIdFromCookie', 'cookieid'); ``` **Browser (npm):** ```javascript setUserIdFromCookie('cookieid'); ``` *** ### Custom page URL and referrer URL The Snowplow JavaScript Tracker automatically tracks the page URL and referrer URL on any event tracked. However, in certain situations, you may want to override the one or both of these URLs with a custom value. For example, this might be desirable if your CMS spits out particularly ugly URLs that are hard to unpick at analysis time. To set a custom page URL, use the `setCustomUrl` method: **JavaScript (tag):** ```javascript snowplow('setCustomUrl', 'http://mysite.com/checkout-page'); ``` **Browser (npm):** ```javascript setCustomUrl('http://mysite.com/checkout-page'); ``` *** To set a custom referrer, use the `setReferrerUrl` method: **JavaScript (tag):** ```javascript snowplow('setReferrerUrl', 'http://custom-referrer.com'); ``` **Browser (npm):** ```javascript setReferrerUrl('http://custom-referrer.com'); ``` *** > **Tip:** On an SPA, the page URL might change without the page being reloaded. Whenever an event is fired, the Tracker checks whether the page URL has changed since the last event. If it has, the page URL is updated and the URL at the time of the last event is used as the referrer. If you use `setCustomUrl`, the page URL will no longer be updated in this way. Similarly if you use `setReferrerUrl`, the referrer URL will no longer be updated in this way. > > To use `setCustomUrl` within an SPA, call it before all `trackPageView` calls. > > If you want to ensure that the original referrer is preserved even though your page URL can change without the page being reloaded, use `setReferrerUrl` like this before sending any events: > > **JavaScript (tag):** > > ```javascript > snowplow('setReferrerUrl', document.referrer); > ``` > > **Browser (npm):** > > ```javascript > setReferrerUrl(document.referrer); > ``` > > *** ### Custom timestamp Snowplow events have several [timestamps](/docs/events/timestamps/). Every `trackX...()` method in the tracker allows for a custom timestamp, called `trueTimestamp` to be set. In certain circumstances you might want to set the timestamp yourself e.g. if the JS tracker is being used to process historical event data, rather than tracking the events live. In this case you can set the `true_timestamp` for the event. To set the true timestamp add an extra argument to your track method: `{type: 'ttm', value: unixTimestampInMs}`. This example shows how to set a true timestamp for a page view event: **JavaScript (tag):** ```javascript snowplow('trackPageView', { timestamp: { type: 'ttm', value: 1361553733371 } }); ``` **Browser (npm):** ```javascript trackPageView({ timestamp: { type: 'ttm', value: 1361553733371 } }); ``` *** E.g. to set a true timestamp for a self-describing event: **JavaScript (tag):** ```javascript snowplow('trackSelfDescribingEvent', { event: { schema: 'iglu:com.acme_company/viewed_product/jsonschema/2-0-0', data: { productId: 'ASO01043', category: 'Dresses', brand: 'ACME', returning: true, price: 49.95, sizes: ['xs', 's', 'l', 'xl', 'xxl'], availableSince: new Date(2013,3,7) } }, timestamp: { type: 'ttm', value: 1361553733371 } }); ``` **Browser (npm):** ```javascript trackSelfDescribingEvent({ event: { schema: 'iglu:com.acme_company/viewed_product/jsonschema/2-0-0', data: { productId: 'ASO01043', category: 'Dresses', brand: 'ACME', returning: true, price: 49.95, sizes: ['xs', 's', 'l', 'xl', 'xxl'], availableSince: new Date(2013,3,7) } }, timestamp: { type: 'ttm', value: 1361553733371 } }); ``` *** ## Get event properties It's possible to retrieve certain identifiers and properties for use in your code. You'll need to use a callback for the JavaScript tracker. **JavaScript (tag):** If you call `snowplow` with a function as the argument, the function will be executed when `sp.js` loads: ```javascript snowplow(function () { console.log("sp.js has loaded"); }); ``` Or equivalently: ```javascript snowplow(function (x) { console.log(x); }, "sp.js has loaded"); ``` The callback you provide is executed as a method on the internal `trackerDictionary` object. You can access the `trackerDictionary` using `this`. ```javascript // Configure a tracker instance named "sp" snowplow('newTracker', 'sp', '{{COLLECTOR_URL}', { appId: 'snowplowExampleApp' }); // Access the tracker instance inside a callback snowplow(function () { var sp = this.sp; var domainUserId = sp.getDomainUserId(); console.log(domainUserId); }) ``` The callback function shouldn't be a method: ```javascript // TypeError: Illegal invocation snowplow(console.log, "sp.js has loaded"); ``` This won't work because the value of `this` in the `console.log` function will be the `trackerDictionary`, rather than `console`. You can get around this problem using `Function.prototoype.bind` as follows: ```javascript snowplow(console.log.bind(console), "sp.js has loaded"); ``` For more on execution context in JavaScript, see the [MDN page](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/this). **Browser (npm):** When initialising a tracker, you can use the returned `tracker` instance to access various properties from this tracker instance. ```javascript // Configure a tracker instance named "sp" const sp = newTracker('sp', '{{COLLECTOR_URL}', { appId: 'snowplowExampleApp' }); // Access the tracker properties const domainUserId = sp.getDomainUserId(); ``` *** ### Cookie values You can [retrieve cookie values](/docs/sources/web-trackers/cookies-and-local-storage/getting-cookie-values/) using the `getDomainUserInfo` and other getters, or from the cookies directly. ### Page view ID When the JavaScript Tracker loads on a page, it generates a new [page view UUID](/docs/sources/web-trackers/tracking-events/page-views/). To get this page view ID, use the `getPageViewId` method: **JavaScript (tag):** ```javascript // Access the tracker instance inside a callback snowplow(function () { var sp = this.sp; var pageViewId = sp.getPageViewId(); console.log(pageViewId); }) ``` **Browser (npm):** ```javascript const pageViewId = sp.getPageViewId(); console.log(pageViewId); ``` *** ### Business user ID The `getUserId` method returns the user ID which you configured using `setUserId()`: **JavaScript (tag):** ```javascript // Access the tracker instance inside a callback snowplow(function () { var sp = this.sp; var userId = sp.getUserId(); console.log(userId); }) ``` **Browser (npm):** ```javascript const userId = sp.getUserId(); console.log(userId); ``` *** ### Tab ID If you've enabled the [`browser` entity](/docs/sources/web-trackers/tracking-events/browsers/), you can get the tab ID using the `getTabId` method. It's a UUID identifier for the specific browser tab the event is sent from. **JavaScript (tag):** ```javascript // Access the tracker instance inside a callback snowplow(function () { var sp = this.sp; var tabId = sp.getTabId(); console.log(tabId); }) ``` **Browser (npm):** ```javascript const tabId = sp.getTabId(); console.log(tabId); ``` *** --- # Track link clicks on web > Automatically track clicks on anchor elements with configurable filters, pseudo-click support, and optional content capture for href destinations. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/link-click/ Link click tracking enables click tracking for all anchor/link elements (HTML `` elements). Link clicks are tracked as self describing events with [the link\_click schema](https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1). Each link click event captures the link’s href attribute. The event also has fields for the link’s id, classes, and target (where the linked document is opened, such as a new tab or new window). Link click events are **automatically tracked** once configured. ## Install plugin **JavaScript (tag):** | Tracker Distribution | Included | | -------------------- | -------- | | `sp.js` | ✅ | | `sp.lite.js` | ❌ | **Download:** | | | | ------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | | Download from GitHub Releases (Recommended) | [Github Releases (plugins.umd.zip)](https://github.com/snowplow/snowplow-javascript-tracker/releases) | | Available on jsDelivr | [jsDelivr](https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-link-click-tracking@latest/dist/index.umd.min.js) (latest) | | Available on unpkg | [unpkg](https://unpkg.com/@snowplow/browser-plugin-link-click-tracking@latest/dist/index.umd.min.js) (latest) | **Note:** The links to the CDNs above point to the current latest version. You should pin to a specific version when integrating this plugin on your website if you are using a third party CDN in production. **Browser (npm):** - `npm install @snowplow/browser-plugin-link-click-tracking` - `yarn add @snowplow/browser-plugin-link-click-tracking` - `pnpm add @snowplow/browser-plugin-link-click-tracking` *** ## Toggle link click tracking Turn on link click tracking like this: **JavaScript (tag):** ```javascript window.snowplow('addPlugin', "https://cdn.jsdelivr.net/npm/@snowplow/browser-plugin-link-click-tracking@latest/dist/index.umd.min.js", ["snowplowLinkClickTracking", "LinkClickTrackingPlugin"] ); snowplow('enableLinkClickTracking'); ``` **Browser (npm):** Initialize your tracker with the plugin. ```javascript import { newTracker } from '@snowplow/browser-tracker'; import { LinkClickTrackingPlugin, enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; newTracker('sp1', '{{collector_url}}', { appId: 'my-app-id', plugins: [ LinkClickTrackingPlugin() ], }); enableLinkClickTracking(); ``` *** Use this method once and the tracker will add click event listeners to the document to detect clicks on anchor elements. An optional, but recommended, parameter is `pseudoClicks`. If this isn't turned on, Firefox won't recognize middle clicks. However, when configured, there is a small possibility of false positives (click events firing when they shouldn't). **JavaScript (tag):** ```javascript snowplow('enableLinkClickTracking', { pseudoClicks: true }); ``` **Browser (npm):** ```javascript import { enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; enableLinkClickTracking({ pseudoClicks: true }); ``` *** This is its signature (where `?` is an optional property): **JavaScript (tag):** ```javascript snowplow('enableLinkClickTracking', { options?: FilterCriterion, pseudoClicks?: boolean, trackContent?: boolean context?: SelfDescribingJson[] }); ``` **Browser (npm):** ```javascript import { enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; enableLinkClickTracking({ options?: FilterCriterion, pseudoClicks?: boolean, trackContent?: boolean context?: SelfDescribingJson[] }); ``` *** To stop tracking link events, call `disableLinkClickTracking`: **JavaScript (tag):** ```javascript snowplow('disableLinkClickTracking'); ``` **Browser (npm):** ```javascript import { disableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; disableLinkClickTracking(); ``` *** ## Refresh link click tracking In previous versions, the `enableLinkClickTracking` method only tracked clicks on links that existed in the document at the time it was called. If new links were added to the page after that, you had to use `refreshLinkClickTracking` to add Snowplow click listeners to any new links. From v4, this method is **deprecated** and has no effect; event listeners are now added directly to the document rather than to individual link elements and new links should automatically be tracked with no action required. ## Configuration Control which links to track using the FilterCriterion object. Where FilterCriterion is an object: ```javascript interface FilterCriterion { /** A collection of class names to include */ allowlist?: string[]; /** A collector of class names to exclude */ denylist?: string[]; /** A callback which returns a boolean as to whether the element should be included */ filter?: (elt: HTMLElement) => boolean; } ``` You can control which links are tracked using the second argument. There are three ways to do this: a denylist, an allowlist, and a filter function. ### Denylist This is an array of CSS classes which should be ignored by link click tracking. For example, the below code will stop link click events firing for links with the class "barred" or "untracked", but will fire link click events for all other links: **JavaScript (tag):** ```javascript snowplow('enableLinkClickTracking', { options: { denylist: ['barred', 'untracked'] } }); // If there is only one class name you wish to deny, // you should still put it in an array snowplow('enableLinkClickTracking', { options: { 'denylist': ['barred'] } }); ``` **Browser (npm):** ```javascript import { enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; enableLinkClickTracking({ options: { denylist: ['barred', 'untracked'] } }); // If there is only one class name you wish to deny, // you should still put it in an array enableLinkClickTracking({ options: { 'denylist': ['barred'] } }); ``` *** ### Allowlist The opposite of a denylist. This is an array of the CSS classes of links which you do want to be tracked. Only clicks on links with a class in the list will be tracked. **JavaScript (tag):** ```javascript snowplow('enableLinkClickTracking', { options: { 'allowlist': ['unbarred', 'tracked'] } }); // If there is only one class name you wish to allow, // you should still put it in an array snowplow('enableLinkClickTracking', { options: { 'allowlist': ['unbarred'] } }); ``` **Browser (npm):** ```javascript import { enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; enableLinkClickTracking({ options: { 'allowlist': ['unbarred', 'tracked'] } }); // If there is only one class name you wish to allow, // you should still put it in an array enableLinkClickTracking({ options: { 'allowlist': ['unbarred'] } }); ``` *** ### Filter function You can provide a filter function which determines which links should be tracked. The function should take one argument, the link element, and return either 'true' (in which case clicks on the link will be tracked) or 'false' (in which case they won't). The following code will track clicks on those and only those links whose id contains the string "interesting": **JavaScript (tag):** ```javascript function myFilter (linkElement) { return linkElement.id.indexOf('interesting') > -1; } snowplow('enableLinkClickTracking', { options: { 'filter': myFilter } }); ``` **Browser (npm):** ```javascript import { enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; function myFilter (linkElement) { return linkElement.id.indexOf('interesting') > -1; } enableLinkClickTracking({ options: { 'filter': myFilter } }); ``` *** Another optional parameter is `trackContent`. Set it to `true` if you want link click events to capture the innerHTML of the clicked link: **JavaScript (tag):** ```javascript snowplow('enableLinkClickTracking', { trackContent: true }); ``` **Browser (npm):** ```javascript import { enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; enableLinkClickTracking({ trackContent: true }); ``` *** The innerHTML of a link is all the text between the `a` tags. Note that if you use a base 64 encoded image as a link, the entire base 64 string will be included in the event. Each link click event will include (if available) the destination URL, id, classes and target of the clicked link. (The target attribute of a link specifies a window or frame where the linked document will be loaded.) **Context** `enableLinkClickTracking` can also be passed an array of custom context entities to attach to every link click event as an additional final parameter. Link click tracking supports dynamic context entities. Callbacks passed in the context argument will be evaluated with the source element passed as the only argument. The self-describing JSON context object returned by the callback will be sent with the link click event. A dynamic context could therefore look something like this for link click events: **JavaScript (tag):** ```javascript let dynamicContext = function (element) { // perform operations here to construct the context return context; }; snowplow('enableLinkClickTracking', { context: [dynamicContext] }); ``` **Browser (npm):** ```javascript import { enableLinkClickTracking } from '@snowplow/browser-plugin-link-click-tracking'; let dynamicContext = function (element) { // perform operations here to construct the context return context; }; enableLinkClickTracking({ context: [ dynamicContext ] }); ``` *** See [this page](/docs/sources/web-trackers/custom-tracking-using-schemas/) for more information about tracking context entities. ## Manual link click tracking You can manually track individual link click events with the `trackLinkClick` method. You do not need to call `enableLinkClickTracking` before using this method. This is its signature: **JavaScript (tag):** ```javascript snowplow('trackLinkClick, { /** The target URL of the link */ targetUrl: string; /** The ID of the element clicked if present */ elementId?: string; /** An array of class names from the element clicked */ elementClasses?: Array; /** The target value of the element if present */ elementTarget?: string; /** The content of the element if present and enabled */ elementContent?: string; }); ``` **Browser (npm):** ```javascript import { trackLinkClick } from '@snowplow/browser-plugin-link-click-tracking'; trackLinkClick({ /** The target URL of the link */ targetUrl: string; /** The ID of the element clicked if present */ elementId?: string; /** An array of class names from the element clicked */ elementClasses?: Array; /** The target value of the element if present */ elementTarget?: string; /** The content of the element if present and enabled */ elementContent?: string; }); ``` *** Of these arguments, only `targetUrl` is required. This is how to use `trackLinkClick`: **JavaScript (tag):** ```javascript snowplow('trackLinkClick', { targetUrl: 'http://www.example.com', elementId: 'first-link', elementClasses: ['class-1', 'class-2'], elementTarget: '', elementContent: 'this page' }); ``` **Browser (npm):** ```javascript import { trackLinkClick } from '@snowplow/browser-plugin-link-click-tracking'; trackLinkClick({ targetUrl: 'http://www.example.com', elementId: 'first-link', elementClasses: ['class-1', 'class-2'], elementTarget: '', elementContent: 'this page' }); ``` *** Rather than specify the values explicitly, you may also supply the link element directly (and optionally control whether to include the content or not): **JavaScript (tag):** ```javascript snowplow('trackLinkClick', { element: document.links[0], trackContent: false }); ``` **Browser (npm):** ```javascript import { trackLinkClick } from '@snowplow/browser-plugin-link-click-tracking'; trackLinkClick({ element: document.links[0], trackContent: false }); ``` *** --- # HTML5 media tracking on web > Automatically track HTML5 video and audio elements with media events including play, pause, seek, buffer, and progress milestones. > Source: https://docs.snowplow.io/docs/sources/web-trackers/tracking-events/media/html5/ This plugin enables the automatic tracking of HTML5 media elements (`