PII (personally identifiable information) pseudonymization enrichment runs after all the other enrichments and pseudonymizes the fields that are configured as PIIs.
It enables the users of Snowplow to better protect the privacy rights of data subjects, therefore aiding in compliance for regulatory measures.
In Europe the obligations regarding Personal Data handling have been outlined on the GDPR EU website.
Two types of fields can be configured to be hashed:
pojo: field that is effectively a scalar field in the enriched event (full list of fields that can be pseudonymized here)
json: field contained inside a self-describing JSON (e.g. in
With the configuration example, the fields
user_ipaddress of the enriched event would be hashed, as well as the fields
ip_opt of the unstructured event in case its schema matches iglu:com.mailchimp/subscribe/jsonschema/1-*-*.
At the moment only
"pseudonymize" strategy is available and the available hashing algorithms can be found below:
_MD2_: the 128-bit algorithm MD2 (not-recommended due to performance reasons see RFC6149)
_MD5_: the 128-bit algorithm MD5
_SHA-1_:the 160-bit algorithm SHA-1
_SHA-256_: 256-bit variant of the SHA-2 algorithm
_SHA-384_: 384-bit variant of the SHA-2 algorithm
_SHA-512_: 512-bit variant of the SHA-2 algorithm
It's important to keep these things in mind when using this enrichment:
- Hashing a field can change its format (e.g. email) and its length, thus making a whole valid original event invalid if its schema is not compatible with the hashing.
- When updating the
saltafter it has already been used, same original values hashed with previous and new salt will have different hashes, thus making a join impossible and/or creating duplicate values.
These fields of the enriched event and any field of an unstructured event or context can be hashed.
The fields are updated in-place in the enriched event.
emitEvent is set to true in the configuration, for each enriched event, an unstructured event wrapping the list of updates that happened with the fields is also emitted to the configured PII stream. Its schema can be found here.