Iglu Resolver configuration for Snowplow applications

CDI customers

If you're a Snowplow CDI customer, you don't need to manage Iglu directly — it's included in your pipeline. Use Event Studio or the Snowplow CLI to manage your schemas. This page is for Self-Hosted customers.

Iglu Resolver is a component embedded into many Snowplow applications, including Enrich and loaders. It's responsible for fetching schemas from Iglu registries and validating data against these schemas.

Configuration

Most of the time, configuring Iglu Resolver means providing a JSON file like this:

json
{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-3",
  "data": {
    "cacheSize": 500,
    "cacheTtl": 600,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "https://iglucentral.com"
          }
        }
      },
      {
        "name": "Custom Iglu Server",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "https://${iglu_server_hostname}/api",
            "apikey": "${iglu_server_apikey}"
          }
        }
      }
    ]
  }
}

The above configuration assumes Snowplow-authored schemas (Iglu Central) will be used in a pipeline, and that you have your own Iglu Server registry hosted at https://${iglu_server_hostname}/ with a read-rights API key ${iglu_server_apikey}.

Configuration parameters

cacheSize determines how many individual schemas the resolver keeps cached.
cacheTtl determines how long a schema can live in the cache before being reloaded, in seconds.
repositories is a JSON array of registries to look up schemas in.
priority and vendorPrefixes help the resolver decide which registry to check first for a given schema. See Registry priority.

How schemas are resolved

When the resolver is asked for a schema, it checks its cache first. On a miss, it queries the configured registries in priority order until a registry returns the schema.

Caching

The resolver caches both successful and failed lookups, keyed per-registry:

A schema fetched successfully is cached until it's evicted by the LRU algorithm (when the cache reaches cacheSize or after cacheTtl seconds).
If a registry responds with "not found", that result is also cached, so the registry won't be queried again for that schema until the entry is evicted.
If a registry responds with another error (timeout, network error, server fault), the resolver retries that registry up to three more times before marking the schema as missing for that registry.

If cacheTtl is set, cache entries (both successful fetches and "not found" results) are re-resolved after the TTL expires. This lets you patch schemas without restarting the pipeline (though patching production schemas isn't recommended). For real-time pipelines, cacheTtl also prevents stale "not found" results from persisting for too long — for example, if a schema was missing when first looked up but has since been uploaded.

Registry priority

For each schema lookup, registries are sorted by:

vendorPrefixes — the resolver checks registries with a matching vendorPrefix first. Other registries aren't skipped, just queried later.
Class priority — a hardcoded value per registry type. Embedded registries (which read schemas from the local filesystem or from resources bundled with the application, rather than over HTTP) are always checked before HTTP registries within the same vendorPrefix match.
priority — the user-defined value in your config. Only affects ordering within the same class priority.

Lower numbers mean higher priority. [0, 1, 2, 3] is checked left to right.

Configuration​

Configuration parameters​

How schemas are resolved​

Caching​

Registry priority​

Want to see a custom demo?

Configuration

Configuration parameters

How schemas are resolved

Caching

Registry priority