IP Lookup enrichment
Summary
This enrichment uses MaxMind databases to look up useful data based on the IP address collected by your Snowplow tracker(s).
Overview
When a user browses your site or app their IP address is collected. MaxMind maintains databases of additional points of information like geographic location, second level domain names (acme.com), Internet Service Provider, organization name and several other data points publicly associated with a given IP address.
The IP lookup enrichment uses MaxMind databases in order to take the IP address collected and add additional data points to every event generated by the user with a given IP address.
Some of the databases MaxMind maintains require a commercial subscription with MaxMind.
Setting up this Enrichment
1. Decide which databases you’d like to use and download them
MaxMind offers five different databases with information on different IP addresses which can be used with Snowplow, one free:
- GeoLite2 Free Database, which contains geographic information (e.g. country) by IP address
And four paid for databases:
- GeoIP2 City, which also contains geographic information, but that with a lot more precision and coverage than that found in the GeoLite2 Free Database
- GeoIP2 ISP, which contains information about the ISP serving that IP
- GeoIP2 Domain, which contains information about the domain at that IP address
- GeoIP2 Connection Type, which contains information about the connection type at that IP address.
You need to decide which of the different Maxmind databases listed above you wish to enrich your data with, download the .mmdb files and then setup the enrichment configuration accordingly.
2. Upload the databases to a location on your cloud
Once downloaded, take the .mmdb file(s) and upload them to a location on your cloud:
- Amazon S3 (if running Snowplow on AWS) e.g. s3://my-private-bucket/third-party/maxmind
- Google Cloud Storage (if running Snowplow on GCS) e.g. gs://my-private-bucket/third-party/maxmind
When the database(s) need updating in future you can simply download the latest version and overwrite this file in your storage.
MaxMind also offer a method to download and update their databases programmatically.
3. Configure the enrichment for your pipeline
Unsure if your enrichment configuration is correct or works as expected? You can easily test it using Snowplow Micro on your machine. Follow the Micro usage guide to set up Micro and configure it to use your enrichment.
Note that to test this enrichment, you will need events with realistic IP addresses (not local ones like 192.168.0.42
).
If you are using a web browser to test your site or app, you can spoof a specific IP address using a browser plugin that sets an X-Forwarded-For
header. For example, here are plugins for Chrome and Firefox. Install the plugin and set the IP address to your liking.
Alternatively, you can set up Micro to receive external IP addresses.
There are four possible fields you can add to the “parameters” section of the enrichment configuration JSON: “geo”, “isp”, “domain”, and “connectionType”:
- The
database
field contains the name of the MaxMind database file. - The
uri
field contains the URI of the bucket in which the database file is found. This can have eitherhttp:
ors3:
orgs:
as the scheme and must not end with a trailing slash.
It is important to note that accepted database filenames are the strings which are allowed in the database
subfield. If the file name you provide is not one of these, the enrichment JSON will fail validation.
ENRICHMENT PARAMETER | VALID DATABASE NAMES |
---|---|
geo | "GeoLite2-City.mmdb" (free) "GeoIP2-City.mmdb" (paid) |
isp | "GeoIP2-ISP.mmdb" |
domain | "GeoIP2-Domain.mmdb" |
connectionType | "GeoIP2-Connection-Type.mmdb" |
Configuration
Example minimal configuration
On AWS
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoLite2-City.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
}
}
}
}
On GCS
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoLite2-City.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
}
}
}
}
In the configurations above, we are enabling this enrichment to take all IP addresses from each event and do a lookup against the GeoLite2-City.mmdb.
The parameters to set start with the type of MaxMind database we are accessing (in this case the “geo” type). Then we specify the name of the database file, and the URI it’s available at.
When configuring the enrichment you will replace the following string my-private-bucket/third-party/maxmind
with the path to your hosted database.
If we were to enable this enrichment as shown, we would see the following columns in our data warehouse get populated with data for a user with the IP Address 37.157.33.178:
COLUMN NAME | SAMPLE DATA | PURPOSE |
---|---|---|
geo_country | GB | Country of IP origin |
geo_region | ENG | Region of IP origin |
geo_city | London | City of IP origin |
geo_zipcode | EC2A | Zip (postal) code of IP origin |
geo_latitude | 51.5237 | An approximate latitude (coordinates) |
geo_longitude | -0.089 | An approximate longitude (coordinates) |
geo_region_name | England | Region of IP origin |
geo_timezone | Europe/London | Timezone of IP origin |
Example full configuration
To extend this enrichment for the additional databases offered by Maxmind we would simply repeat the process for the other databases:
On AWS
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIP2-City.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
},
"isp": {
"database": "GeoIP2-ISP.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
},
"domain": {
"database": "GeoIP2-Domain.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
},
"connectionType": {
"database": "GeoIP2-Connection-Type.mmdb",
"uri": "s3://my-private-bucket/third-party/maxmind"
}
}
}
}
On GCS
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIP2-City.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
},
"isp": {
"database": "GeoIP2-ISP.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
},
"domain": {
"database": "GeoIP2-Domain.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
},
"connectionType": {
"database": "GeoIP2-Connection-Type.mmdb",
"uri": "gs://my-private-bucket/third-party/maxmind"
}
}
}
}
The data from these databases would then be loaded into the following columns:
COLUMN NAME | PURPOSE |
---|---|
ip_isp | ISP name |
ip_organization | Organization name for larger networks |
ip_domain | Second level domain name |
ip_netspeed | Indication of connection type (dial-up, cellular, cable/DSL) |
Output
This enrichment populates atomic table fields prefixed with "geo" and "ip" seen here.