Skip to main content

IP Lookup enrichment

Summary

This enrichment uses MaxMind databases to look up useful data based on the IP address collected by your Snowplow tracker(s).

Overview

When a user browses your site or app their IP address is collected. MaxMind maintains databases of additional points of information like geographic location, second level domain names (acme.com), Internet Service Provider, organization name and several other data points publicly associated with a given IP address.

The IP lookup enrichment uses MaxMind databases in order to take the IP address collected and add additional data points to every event generated by the user with a given IP address.

Some of the databases MaxMind maintains require a commercial subscription with MaxMind.

Setting up this Enrichment

1. Decide which databases you’d like to use and download them

MaxMind offers five different databases with information on different IP addresses which can be used with Snowplow, one free:

And four paid for databases:

  • GeoIP2 City, which also contains geographic information, but that with a lot more precision and coverage than that found in the GeoLite2 Free Database
  • GeoIP2 ISP, which contains information about the ISP serving that IP
  • GeoIP2 Domain, which contains information about the domain at that IP address
  • GeoIP2 Connection Type, which contains information about the connection type at that IP address.

You need to decide which of the different Maxmind databases listed above you wish to enrich your data with, download the .mmdb files and then setup the enrichment configuration accordingly.

2. Upload the databases to a location on your cloud

Once downloaded, take the .mmdb file(s) and upload them to a location on your cloud:

  • Amazon S3 (if running Snowplow on AWS) e.g. s3://my-private-bucket/third-party/maxmind
  • Google Cloud Storage (if running Snowplow on GCS) e.g. gs://my-private-bucket/third-party/maxmind

When the database(s) need updating in future you can simply download the latest version and overwrite this file in your storage.

MaxMind also offer a method to download and update their databases programmatically.

3. Configure the enrichment for your pipeline

Testing with Micro

Unsure if your enrichment configuration is correct or works as expected? You can easily test it using Snowplow Micro on your machine. Follow the Micro usage guide to set up Micro and configure it to use your enrichment.

Note that to test this enrichment, you will need events with realistic IP addresses (not local ones like 192.168.0.42).

If you are using a web browser to test your site or app, you can spoof a specific IP address using a browser plugin that sets an X-Forwarded-For header. For example, here are plugins for Chrome and Firefox. Install the plugin and set the IP address to your liking.

Alternatively, you can set up Micro to receive external IP addresses.

There are four possible fields you can add to the “parameters” section of the enrichment configuration JSON: “geo”, “isp”, “domain”, and “connectionType”:

  • The database field contains the name of the MaxMind database file.
  • The uri field contains the URI of the bucket in which the database file is found. This can have either http: or s3: or gs: as the scheme and must not end with a trailing slash.

It is important to note that accepted database filenames are the strings which are allowed in the database subfield. If the file name you provide is not one of these, the enrichment JSON will fail validation.

ENRICHMENT PARAMETERVALID DATABASE NAMES
geo"GeoLite2-City.mmdb" (free) "GeoIP2-City.mmdb" (paid)
isp"GeoIP2-ISP.mmdb"
domain"GeoIP2-Domain.mmdb"
connectionType"GeoIP2-Connection-Type.mmdb"

Configuration

Example minimal configuration

On AWS
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoLite2-City.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
}
}
}
}
On GCS
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoLite2-City.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
}
}
}
}

In the configurations above, we are enabling this enrichment to take all IP addresses from each event and do a lookup against the GeoLite2-City.mmdb.

The parameters to set start with the type of MaxMind database we are accessing (in this case the “geo” type). Then we specify the name of the database file, and the URI it’s available at.

When configuring the enrichment you will replace the following string my-private-bucket/third-party/maxmind with the path to your hosted database.

If we were to enable this enrichment as shown, we would see the following columns in our data warehouse get populated with data for a user with the IP Address 37.157.33.178:

COLUMN NAMESAMPLE DATAPURPOSE
geo_countryGBCountry of IP origin
geo_regionENGRegion of IP origin
geo_cityLondonCity of IP origin
geo_zipcodeEC2AZip (postal) code of IP origin
geo_latitude51.5237An approximate latitude (coordinates)
geo_longitude-0.089An approximate longitude (coordinates)
geo_region_nameEnglandRegion of IP origin
geo_timezoneEurope/LondonTimezone of IP origin

Example full configuration

To extend this enrichment for the additional databases offered by Maxmind we would simply repeat the process for the other databases:

On AWS
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIP2-City.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
},
"isp": {
"database": "GeoIP2-ISP.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
},
"domain": {
"database": "GeoIP2-Domain.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
},
"connectionType": {
"database": "GeoIP2-Connection-Type.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
}
}
}
}
On GCS
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIP2-City.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
},
"isp": {
"database": "GeoIP2-ISP.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
},
"domain": {
"database": "GeoIP2-Domain.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
},
"connectionType": {
"database": "GeoIP2-Connection-Type.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
}
}
}
}

The data from these databases would then be loaded into the following columns:

COLUMN NAMEPURPOSE
ip_ispISP name
ip_organizationOrganization name for larger networks
ip_domainSecond level domain name
ip_netspeedIndication of connection type (dial-up, cellular, cable/DSL)

Output

This enrichment populates atomic table fields prefixed with "geo" and "ip" seen here.