Custom SQL query enrichment
The SQL query enrichment lets you add data to a Snowplow event via your own MySQL or PostgreSQL relational database.
Supported database types:
- MySQL, plus variants which speak MySQL e.g. MariaDB, Amazon Aurora
- PostgreSQL, plus variants which speak PostgreSQL
We don't recommend using this enrichment with analytical databases such as Redshift. These databases are optimized for large analytical queries, not the high-frequency, low-latency per-event lookups this enrichment requires.
Configuration
For historical reasons, the configuration uses terms that are no longer used elsewhere in Snowplow.
The enrichment takes these parameters:
| Parameter | Required | Description |
|---|---|---|
inputs | ✅ | Event values to substitute in SQL placeholders. |
database | ✅ | Defines access to the database. |
query | ✅ | The SQL statement to query your database. |
output | ✅ | How to convert the returned rows into self-describing JSON entities. |
cache | ✅ | Whether to store retrieved rows. |
ignoreOnError | ❌ | Whether to make the event fail if the API request fails. |
- Console
- Self-Hosted
Configure the parameters in the Console enrichment editor. For example:
{
"inputs": [
{
"placeholder": 1,
"pojo": {
"field": "user_id"
}
},
{
"placeholder": 1,
"json": {
"field": "contexts",
"schemaCriterion": "iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-*-*",
"jsonPath": "$.userId"
}
},
{
"placeholder": 2,
"pojo": {
"field": "app_id"
}
}
],
"database": {
"postgresql": {
"host": "cluster01.redshift.acme.com",
"port": 5439,
"sslMode": true,
"username": "snowplow_enrich_ro",
"password": "1asIkJed",
"database": "crm"
}
},
"query": {
"sql": "SELECT username, email_address, date_of_birth FROM tbl_users WHERE user = ? AND client = ? LIMIT 1"
},
"output": {
"expectedRows": "AT_MOST_ONE",
"json": {
"schema": "iglu:com.acme/user/jsonschema/1-0-0",
"describes": "ALL_ROWS",
"propertyNames": "CAMEL_CASE"
}
},
"cache": {
"size": 3000,
"ttl": 60
},
"ignoreOnError": false
}
For Self-Hosted, provide a complete JSON. For example:
{
"schema": "iglu:com.snowplowanalytics.snowplow.enrichments/sql_query_enrichment_config/jsonschema/1-0-1",
"data": {
"vendor": "com.snowplowanalytics.snowplow.enrichments",
"name": "sql_query_enrichment_config",
"enabled": false,
"parameters": {
"inputs": [
{
"placeholder": 1,
"pojo": {
"field": "user_id"
}
},
{
"placeholder": 1,
"json": {
"field": "contexts",
"schemaCriterion": "iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-*-*",
"jsonPath": "$.userId"
}
},
{
"placeholder": 2,
"pojo": {
"field": "app_id"
}
}
],
"database": {
"postgresql": {
"host": "cluster01.redshift.acme.com",
"port": 5439,
"sslMode": true,
"username": "snowplow_enrich_ro",
"password": "1asIkJed",
"database": "crm"
}
},
"query": {
"sql": "SELECT username, email_address, date_of_birth FROM tbl_users WHERE user = ? AND client = ? LIMIT 1"
},
"output": {
"expectedRows": "AT_MOST_ONE",
"json": {
"schema": "iglu:com.acme/user/jsonschema/1-0-0",
"describes": "ALL_ROWS",
"propertyNames": "CAMEL_CASE"
}
},
"cache": {
"size": 3000,
"ttl": 60
},
"ignoreOnError": false
}
}
}
Unsure if your enrichment configuration is correct or works as expected? You can easily test it using Snowplow Micro, either through Console or on your machine.
inputs
The SQL statement you provide can include placeholders. These will be replaced with actual values from the event, called inputs.
The enrichment can use any property in the event as input data. The values can be extracted from:
- Atomic event properties such as
user_id - Self-describing event fields
- Entities attached by tracker SDKs
- Entities attached by other enrichments
Each input consists of a placeholder index and a source: pojo for atomic event fields, or json for JSON fields, whether event or entity.
The placeholder is the index of the ? SQL placeholder which this input will populate. It must be an integer greater than or equal to 1.
For json, specify the field name as either unstruct_event for self-describing event fields, contexts for fields in entities added during tracking, or derived_contexts for fields in enrichment entities. Add two additional fields:
schemaCriterionis the Iglu schema URI. You can specify all versions of the schema (*-*-*), or a specific major version (e.g.1-*-*), major plus minor (e.g.1-1-*) or a full major-minor-patch version (e.g.1-1-1)jsonPathis the JSON Path statement to navigate to the field inside the JSON that you want to use as the input.
The resolved values should be primitive types (string, number, or boolean).
As shown in the example configuration, you can have multiple inputs with the same placeholder index. The enrichment will use the last configured input for that index that resolves to a non-null value. This allows you to specify fallback inputs.
database
Configure how the enrichment should access your database in the database section. Populate all properties in the postgresql or mysql configuration object:
host: the hostname or IP address of the database server or clusterport: the port to connect to the database onsslMode: whether the database requires connections to use SSLusername: the username for the database loginpassword: the password for this usernamedatabase: the name of the specific database to run the query against
We strongly recommend restricting the username to minimal read-only permissions on just the tables required to execute the SQL query.
If your database server has additional authentication in place, such as IP allowlisting, configure it to permit access from the NAT gateway associated with the Snowplow pipeline.
query
Consists of a single sql key with a SQL query in the form of prepared statement. It will run on your relational database to return rows to attach to this event as entities. Use ? placeholders in the query to indicate where the input values should be substituted. Values will be inserted according to their type: strings will be quoted, numbers won’t.
If a placeholder index required in the sql prepared statement isn't found in any of the inputs, then the lookup won't proceed, but this will not be flagged as a failure.
A final ; is optional.
This enrichment makes no attempt to sanitize the SQL statement, nor to verify that the SQL statement doesn't have harmful side effects such as SQL injection.
output
The enrichment adds the returned rows to the event as one or more entities. You'll need to specify the schema or data structure that the enrichment should use to define the retrieved data.
The output object has two keys: expectedRows and json. The expectedRows enum defines the expected SQL output:
EXACTLY_ONE: exactly one row is expected. 0 or 2+ rows will throw an error, causing the entire event to fail processing.AT_MOST_ONE: either one or zero rows is expected. 2+ rows will throw an error.AT_LEAST_ZERO: between 0 and N rows are expected, i.e. an array of results.AT_LEAST_ONE: between 1 and N rows are expected, i.e. an array of results. 0 rows will throw an error.
Within the json key, use schema to specify the URI of the schema you want to attach to the event.
The describes field configures whether to add one entity containing all the returned rows, or one entity per returned row. This will depend on your schema definition. Supported options are:
ALL_ROWSmeans that the added entity will contain all the returned rowsEVERY_ROWmeans that one entity will be added per returned row
The propertyNames field supports reformatting of the returned columns to fit the schema. Supported options are:
AS_IS: preserve the column names exactly as they areCAMEL_CASE: e.g.date_of_birthbecomesdateOfBirthPASCAL_CASE: e.g.date_of_birthbecomesDateOfBirthSNAKE_CASE: e.g.dateOfBirthbecomesdate_of_birthLOWER_CASE: changes all characters to lowercaseUPPER_CASE: changes all characters to uppercase
You could also use column aliasing in your SQL statement to change individual column names.
cache
An enrichment can run many millions of time per hour, effectively launching a DoS attack on a data source. The cache configuration attempts to minimize the number of lookups performed.
The cache is an LRU (least-recently used) cache, where less frequently accessed values are evicted to make space for new values. Configure the cache as follows:
sizeis the maximum number of entries to hold in the cache at any one time. The minimum value is1.ttlis the number of seconds that an entry can stay in the cache before it is forcibly evicted. This is useful to prevent stale values from being retrieved in the case that your database can return different values for the same key over time.
ignoreOnError
When set to true, if the enrichment fails for any reason, the event is still considered successfully enriched. It'll be loaded as usual, except without the entities added by the enrichment.
When set to false, the event will become a failed event if the SQL query fails.
Edge case handling
This enrichment uses a relational database to fetch data in JSON format. This table describes what will happen under different conditions:
| Scenario | Outcome |
|---|---|
| A provided JSONPath is invalid. | Failed event, unless ignoreOnError is set to true. |
More than one entity in the event matches the schemaCriterion. | The first matching entity is used. |
| Multiple inputs share the same placeholder index. | The last configured input is used. |
| Any placeholder index has no value. | The SQL query won't run and no entities will be added. The event is otherwise processed as usual — not failed. |
An input value is non-primitive (object or array), or null. | Treated as not found. The SQL query won't run and no entities will be added. The event is otherwise processed as usual — not failed. |
| The database query failed for any reason, or timed out. | Failed event, unless ignoreOnError is set to true. |
| The database connection was lost. | Failed event, unless ignoreOnError is set to true. |
If the database connection is lost, the enrichment will attempt to reestablish it on the next event.
Output
This enrichment adds entities based on your configuration.
Single row
With this configuration:
...
"query": {
"sql": "SELECT username, date_of_birth FROM tbl_users WHERE user = ?"
},
"output": {
"expectedRows": "AT_MOST_ONE",
"json": {
"schema": "iglu:com.acme/user/jsonschema/1-0-0",
"describes": "ALL_ROWS",
"propertyNames": "CAMEL_CASE"
}
},
...
And this query result:
SELECT username, date_of_birth FROM tbl_users WHERE user = 123;
| username | date_of_birth |
| -------- | ------------- |
| karl | 1980-06-12 |
The enrichment would add this entity to your event:
{
"schema": "iglu:com.acme/user/jsonschema/1-0-0",
"data": {
"username": "karl",
"dateOfBirth": "1980-06-12T00:00:00"
}
}
If the query returned no rows, e.g.:
SELECT username, date_of_birth FROM tbl_users WHERE user = 123;
| username | date_of_birth |
| -------- | ------------- |
The enrichment won't add an entity, but the event would continue processing.
If the query returned more than one row, e.g.:
SELECT username, date_of_birth FROM tbl_users WHERE user = 123;
| username | date_of_birth |
| -------- | ------------- |
| karl | 1980-06-12 |
| mary | 1975-03-22 |
An error would be triggered, on account of the AT_MOST_ONE setting, and the event would become a failed event.
Multiple rows
With this ALL_ROWS configuration:
...
"query": {
"sql": "SELECT * FROM product WHERE category = ?"
},
"output": {
"expectedRows": "AT_LEAST_ZERO",
"json": {
"schema": "iglu:com.acme/products/jsonschema/1-0-0",
"describes": "ALL_ROWS",
"propertyNames": "AS_IS"
}
},
...
And this query result:
SELECT * FROM product WHERE category = 'homeware';
| SKU | prod_name |
| --- | --------- |
| 123 | iPad |
| 456 | Ray-Bans |
The enrichment would add this entity to your event:
{
"schema": "iglu:com.acme/products/jsonschema/1-0-0",
"data": [
{
"SKU": "123",
"prod_name": "iPad"
},
{
"SKU": "456",
"prod_name": "Ray-Bans"
}
]
}
If you change the configuration to EVERY_ROW:
...
"output": {
"expectedRows": "AT_LEAST_ZERO",
"json": {
"schema": "iglu:com.acme/product/jsonschema/1-0-0",
"describes": "EVERY_ROW",
"propertyNames": "LOWER_CASE"
}
}
...
Then two entities would be added to your event, one per row:
{
"schema": "iglu:com.acme/product/jsonschema/1-0-0",
"data": {
"sku": "123",
"prod_name": "iPad"
}
},
{
"schema": "iglu:com.acme/product/jsonschema/1-0-0",
"data": {
"sku": "456",
"prod_name": "Ray-Bans"
}
}