Skip to main content

Referrer parser enrichment

This enrichment uses snowplow referer-parser library to extract attribution data from referer URLs.

Knowing which sites refer users to our website is very much a staple of analytics in order to help understand traffic patterns. This enrichment takes the value of the referring URL and matches it against the company/site it belongs to.

This is particularly useful when looking for specific traffic from search engine providers or social networks for instance. Rather than scouring a full referrer URL list this enrichment adds an additional field so that it's possible to look at reports that combine sub-domains from some of the bigger referrers.

Configuration example

Testing with Micro

Unsure if your enrichment configuration is correct or works as expected? You can easily test it using Snowplow Micro, either through Console or on your machine.

Internal domains

Snowplow has several subdomains like community.snowplow.io and docs.snowplow.io. As users move from these subdomains to our main snowplow.io domain, we would like to capture that traffic as being referred internally. Therefore we would set the configuration in the example schema as such:

json
"internalDomains": [
"community.snowplow.io",
"docs.snowplow.io"
],

Enabling this enrichment with the above configuration would fill the refr_medium column in our data warehouse with "Internal" (rather then "Unknown") when the referring URL to a page matches the subdomains above.

note

The enrichment will also classify refr_medium as Internal when an event's page_urlhost matches it's refr_urlhost, regardless of the configured internalDomains. This behavior is not configurable, and may require handling in data models or a JavaScript enrichment to change.

Custom referrer mappings

Availability

This feature is available since version 6.9.0 of Enrich.

By default, the enrichment classifies referrers using a hosted database of known sources. You can add your own referrer-to-category mappings directly in the enrichment configuration using the optional referrers parameter. This is useful when you need to classify new traffic sources (such as internal tools, niche search engines, or AI chatbots) without waiting for changes to the upstream database.

Custom mappings take precedence over the default database. If a domain appears in both your custom mappings and the default database, the custom mapping is used.

The referrers parameter is a nested object structured as follows:

json
"referrers": {
"<medium>": {
"<source name>": {
"domains": ["<domain1>", "<domain2>"],
"parameters": ["<param1>"]
}
}
}
FieldDescription
<medium>The referrer category (e.g., search, social, email). This value populates refr_medium.
<source name>A human-readable name for the source (e.g., "Google", "Internal Search"). This value populates refr_source.
domainsAn array of hostnames to match against the referrer URL. At least one domain is required.
parametersAn optional array of URL query parameter names to extract search terms from. Matched values populate refr_term.

For example, to classify a custom search engine and a social network:

json
"referrers": {
"search": {
"Corporate Search": {
"domains": ["search.example.com"],
"parameters": ["q", "query"]
}
},
"social": {
"Internal Forum": {
"domains": ["forum.example.com"]
}
}
}

With this configuration, a referrer URL of https://search.example.com/?q=snowplow would produce the following:

FieldValue
refr_mediumsearch
refr_sourceCorporate Search
refr_termsnowplow
Contributing mappings upstream

You can use custom referrer mappings to immediately test new categorizations in your pipeline. Once validated, consider contributing your mappings back to the upstream referer-parser database via a pull request.

Output

This enrichment populates the following fields of the atomic event :

FieldPurpose
refr_mediumType of referer. Examples : Search, Internal, Unknown, Social, Email
refr_sourceName of referer if recognised. Examples: Google, Facebook
refr_termKeywords if source is a search engine

With this information in the data warehouse it's possible to get such insights:

refr_mediumnumber of sessions
Search272,699
Internal142,555
Unknown127,335
Social14,525
Email5,345

On this page

Want to see a custom demo?

Our technical experts are here to help.