Iglu Resolver configuration for Snowplow applications
If you're a Snowplow CDI customer, you don't need to manage Iglu directly — it's included in your pipeline. Use Event Studio or the Snowplow CLI to manage your schemas. This page is for Self-Hosted customers.
Iglu Resolver is a component embedded into many Snowplow applications, including Enrich and loaders. It's responsible for fetching schemas from Iglu registries and validating data against these schemas.
Configuration
Most of the time, configuring Iglu Resolver means providing a JSON file like this:
{
"schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-3",
"data": {
"cacheSize": 500,
"cacheTtl": 600,
"repositories": [
{
"name": "Iglu Central",
"priority": 0,
"vendorPrefixes": [ "com.snowplowanalytics" ],
"connection": {
"http": {
"uri": "https://iglucentral.com"
}
}
},
{
"name": "Custom Iglu Server",
"priority": 0,
"vendorPrefixes": [ "com.snowplowanalytics" ],
"connection": {
"http": {
"uri": "https://${iglu_server_hostname}/api",
"apikey": "${iglu_server_apikey}"
}
}
}
]
}
}
The above configuration assumes Snowplow-authored schemas (Iglu Central) will be used in a pipeline, and that you have your own Iglu Server registry hosted at https://${iglu_server_hostname}/ with a read-rights API key ${iglu_server_apikey}.
Configuration parameters
cacheSizedetermines how many individual schemas the resolver keeps cached.cacheTtldetermines how long a schema can live in the cache before being reloaded, in seconds.repositoriesis a JSON array of registries to look up schemas in.priorityandvendorPrefixeshelp the resolver decide which registry to check first for a given schema. See Registry priority.
How schemas are resolved
When the resolver is asked for a schema, it checks its cache first. On a miss, it queries the configured registries in priority order until a registry returns the schema.
Caching
The resolver caches both successful and failed lookups, keyed per-registry:
- A schema fetched successfully is cached until it's evicted by the LRU algorithm (when the cache reaches
cacheSizeor aftercacheTtlseconds). - If a registry responds with "not found", that result is also cached, so the registry won't be queried again for that schema until the entry is evicted.
- If a registry responds with another error (timeout, network error, server fault), the resolver retries that registry up to three more times before marking the schema as missing for that registry.
If cacheTtl is set, cache entries (both successful fetches and "not found" results) are re-resolved after the TTL expires. This lets you patch schemas without restarting the pipeline (though patching production schemas isn't recommended). For real-time pipelines, cacheTtl also prevents stale "not found" results from persisting for too long — for example, if a schema was missing when first looked up but has since been uploaded.
Registry priority
For each schema lookup, registries are sorted by:
vendorPrefixes— the resolver checks registries with a matchingvendorPrefixfirst. Other registries aren't skipped, just queried later.- Class priority — a hardcoded value per registry type. Embedded registries (which read schemas from the local filesystem or from resources bundled with the application, rather than over HTTP) are always checked before HTTP registries within the same
vendorPrefixmatch. priority— the user-defined value in your config. Only affects ordering within the same class priority.
Lower numbers mean higher priority. [0, 1, 2, 3] is checked left to right.