Skip to main content

PII pseudonymization enrichment

PII (personally identifiable information) pseudonymization enrichment runs after all the other enrichments and pseudonymizes the fields that are configured as PIIs.

It enables the users of Snowplow to better protect the privacy rights of data subjects, therefore aiding in compliance for regulatory measures.

In Europe the obligations regarding Personal Data handling have been outlined on the GDPR EU website.

Configuration

Testing with Micro

Unsure if your enrichment configuration is correct or works as expected? You can easily test it using Snowplow Micro on your machine. Follow the Micro usage guide to set up Micro and configure it to use your enrichment.

Two types of fields can be configured to be hashed:

  • pojo: field that is effectively a scalar field in the enriched event (full list of fields that can be pseudonymized here)
  • json: field contained inside a self-describing JSON (e.g. in unstruct_event)

With the configuration example, the fields user_id and user_ipaddress of the enriched event would be hashed, as well as the fields email and ip_opt of the unstructured event in case its schema matches iglu:com.mailchimp/subscribe/jsonschema/1-*-*.

At the moment only "pseudonymize" strategy is available and the available hashing algorithms can be found below:

  • _MD2_: the 128-bit algorithm MD2 (not-recommended due to performance reasons see RFC6149)
  • _MD5_: the 128-bit algorithm MD5
  • _SHA-1_: the 160-bit algorithm SHA-1
  • _SHA-256_: 256-bit variant of the SHA-2 algorithm
  • _SHA-384_: 384-bit variant of the SHA-2 algorithm
  • _SHA-512_: 512-bit variant of the SHA-2 algorithm

It's important to keep these things in mind when using this enrichment:

  • Hashing a field can change its format (e.g. email) and its length, thus making a whole valid original event invalid if its schema is not compatible with the hashing.
  • When updating the salt after it has already been used, same original values hashed with previous and new salt will have different hashes, thus making a join impossible and/or creating duplicate values.

Input

These fields of the enriched event and any field of an unstructured event or context can be hashed.

Output

The fields are updated in-place in the enriched event.

If emitEvent is set to true in the configuration, for each enriched event, an unstructured event wrapping the list of updates that happened with the fields is also emitted to the configured PII stream. Its schema can be found here.

Was this page helpful?