Writing your enrichment
Your JavaScript enrichment code should contain a function called process
:
- This function will receive each event as its only argument
- It can optionally return an array of entities to be added to the event (under
derived_contexts
) - Any uncaught exceptions will result in failed events
function process(event) {
// do something with the event...
...
// add entities to the event
return [ ... ];
}
You can only have one JavaScript enrichment, and hence a single process
function for your pipeline. However, you can split more complex logic into multiple helper functions and variables as you see fit, as long as you comply with the above interface.
JavaScript enrichment uses the Nashorn Engine and since version 3.0.0 of Enrich, many features of ECMAScript 6 are supported. For a list of those features, please refer to this OpenJDK proposal. Regarding the features the proposal says “might be feasible” in the future, as of 2023 our testing shows that classes and generators don't work, but tail calls do.
Best practices​
Before we dive into it, here are a few general tips:
- Make sure your code works for all of your events, not just the particular types of events you are interested in. Remember, unhandled exceptions will result in failed events.
- Don’t try to share state across multiple enriched events. Enrichments are run inside a highly parallel application with multiple independent instances, so this will not work.
- In your enrichment code, avoid CPU-intensive tasks (e.g. encryption) and IO-intensive tasks (e.g. requests to an external service) without thoroughly benchmarking the impact they might have on your event processing time.
- The enrichment code has access to the Java standard library and therefore to the filesystem of the machine it’s running on. Proceed with caution when copying code from untrusted sources.
Inspecting the event fields​
Regardless of what you want to do with the event, you will likely want to inspect some of the data within. For instance, to get the app_id
field:
function process(event) {
const appId = event.getApp_id();
...
There are getter methods available for each of the standard event fields — just capitalize the first letter of the field and prepend it with get
, for example event.getUser_ipaddress()
or event.getGeo_country()
.
One exception is refr_device_tstamp
, where the getter method is called getRefr_dvce_tstamp
and not getRefr_device_tstamp
.
Inspecting self-describing events and entities​
If your event is a self-describing event, you might want to access its fields. Here’s how to do it:
function process(event) {
...
const myEvent = JSON.parse(event.getUnstruct_event());
if (myEvent) {
// the schema of your self-describing event
const mySchema = myEvent.data.schema;
// the custom data in your self-describing event
const myData = myEvent.data.data;
...
For events other than self-describing events, getUnstruct_event()
will return null
. The pattern above works, as null
is a valid input for JSON.parse
.
You can access any entities in the event in a similar fashion:
function process(event) {
...
const entities = JSON.parse(event.getContexts());
if (entities) {
// loop through the entities
for (const entity of entities.data) {
if (entity.schema.startsWith('iglu:org.my-company/my-schema/jsonschema/1')) {
// work with the entity
const myField = entity.data.myField;
...
}
}
}
...
For events with no entities attached, getContexts()
will return null
. The pattern above works, as null
is a valid input for JSON.parse
.
For derived entities (added by other enrichments), you can use event.getDerived_contexts()
in the same way as above. Note that this is only supported since Enrich 3.8.0 (and Snowplow Micro 1.7.1). In prior versions, this function always returns null
.
Adding extra entities to the event​
Adding entities is the preferred way of augmenting your events with extra information, because it preserves the original event fields intact.
In some cases, you might choose to update existing fields instead of adding entities. However, keep in mind that if you overwrite a field, you won’t have access to its original value in your data warehouse or lake.
The (optional) return value of the process
function is an array of extra entities to add to the event. So adding entities is as simple as returning them!
function process(event) {
...
return [
{
schema: 'iglu:com.my-company/traffic-source/jsonschema/1-0-0',
data: {
traffic_source: 'internal'
}
}
];
}
The entities you add with this method will be derived entities, similar to what other enrichments add. You will find them in the derived_contexts
field of the event.
Behavior for special values (e.g. NaN
)
Your array of entities will be passed to JSON.stringify()
before being attached to the event. This is irrelevant for you, unless your entities have NaN
values (they will become null
), or undefined
values (they will be dropped), or circular references (an exception will be thrown).
If you are still iterating on the schema while writing the JavaScript code, you might find the setup described in the testing guide very useful.
Make sure that the schemas of your entities are defined and accessible to your pipeline.
Modifying event fields directly​
Sometimes you will want to modify the original event fields directly.
Keep in mind that the old value of a modified field will not be available in your data warehouse or lake. However, that might be your goal.
Just like with getters, there are setter methods available for each of the standard event fields:
function process(event) {
event.setMkt_source('Facegoog');
event.setGeo_latitude(null);
event.setGeo_longitude(null);
...
One exception is refr_device_tstamp
, where the setter method is called setRefr_dvce_tstamp
and not setRefr_device_tstamp
.
Modifying self-describing events and entities​
If you want to modify the self-describing event fields or the entities attached to the event, you will need to reverse the steps you took to fetch them.
For self-describing events:
function process(event) {
...
// unpack the self-describing event
const myEvent = JSON.parse(event.getUnstruct_event());
if (myEvent && myEvent.data.schema === ...) {
// update a field inside
myEvent.data.data.myField = 'new value';
// pack the self-describing event back
event.setUnstruct_event(JSON.stringify(myEvent));
}
...
For entities:
function process(event) {
...
// unpack the entities
const entities = JSON.parse(event.getContexts());
if (entities) {
// loop through the entities
for (const entity of entities.data) {
if (entity.schema === ...) {
// update a field inside
entity.data.myField = entity.data.myField + 1;
}
}
// pack the entities back
event.setContexts(JSON.stringify(entities));
}
...
You might be tempted to update derived entities in a similar way by using event.setDerived_contexts()
. However, this is not supported (the function exists, but has no effect). Instead, refer to the Adding extra entities section.
Discarding the event​
Sometimes you don’t want the event to appear in your data warehouse or lake, e.g. because you suspect it comes from a bot and not a real user. In this case, you can throw
an exception in your JavaScript code, which will send the event to failed events:
const botPattern = /.*Googlebot.*/;
function process(event) {
const useragent = event.getUseragent();
if (useragent !== null && botPattern.test(useragent)) {
throw "Filtered event produced by Googlebot";
}
}
This will create an “enrichment failure” failed event, which may be tricky to distinguish from genuine failures in your enrichment code, e.g. due to a mistake. In the future, we might provide a better mechanism for discarding events.
Accessing Java methods​
Because the JavaScript enrichment runs inside the Enrich application, it has access to the Java standard library, as well as some Java libraries (the ones used by Enrich). You can call Java methods via their fully qualified path, for example:
function process(event) {
...
const salt = 'pepper';
const hashedIp = org.apache.commons.codec.digest.DigestUtils.sha256Hex(event.getUser_ipaddress() + salt);
...
}
Passing an object of parameters​
Starting with Enrich 4.1.0, it is possible to pass an object of parameters to the JS enrichment.
You can pass these parameters in the enrichment configuration, for example:
{
"schema": "iglu:com.snowplowanalytics.snowplow/javascript_script_config/jsonschema/1-0-1",
"data": {
"vendor": "com.snowplowanalytics.snowplow",
"name": "javascript_script_config",
"enabled": true,
"parameters": {
"script": "script",
"config": {
"foo": 3,
"nested": {
"bar": "test"
}
}
}
}
}
The parameter object can be accessed in JavaScript enrichment code via the second parameter of the process
function, for example:
function process(event, params) {
event.setApp_id(params.nested.bar);
return [];
}
This is useful when you want to quickly reconfigure the enrichment without updating the JavaScript code.
Accessing the HTTP request headers​
Starting with Enrich 5.1.1, it is possible to access the HTTP headers that came with the original request to the Collector. The array of headers is passed in the third argument to the process
function. Each header is a string in the format HEADER:value
.
Per the HTTP specification, headers are not case sensitive, so we recommend using a case insensitive match, like what is shown below.
It is also a good practice to trim whitespace from the value, as there might or might not be some space around the :
character.
For example, if you want to pass some token via a custom X-JWT
header and then use it in the enrichment:
function process(event, params, headers) {
for (header of headers) {
const token = header.match(/X-JWT:(.+)/i)
if (token) {
const value = token[1].trim()
// do something with the value
}
}
}