Configuration
Configuration for event recovery involves the use of reusable components that help map onto specific failure types. The following components are available:
Steps
Steps are individual modifications applied to specific failed event payloads. The steps operate on specific field values and can replace, remove/nullify or cast JSON type values.
Notice
Recovery operates on an individual event's payload
field. As an example to reach the vendor
field that’s located within the payload, the path
would be $.vendor
.
Steps are constructed in JSON object format consisting of four individual parts:
op
– transformation operation to perform: Replace, Remove, Castpath
– JSON Path to the object where the operation is meant to happen. The path can contain specific field names (ie.vendor
) , array ids (ie.[1]
) or filters (by regex:$.raw.parameters.cx.data.[?(@.data.navigationStart=~([0-9]+))].data.domComplete
).match
– an expression applied when replacing field’s values with new valuevalue
– a new value to be setfrom
/to
- used for casting types from one type to another
// Replace
{
"op": "Replace",
"path": "$.raw.parameters.aid",
"match": "(?U)^.*$",
"value": "https://console.snplow.com/"
}
// Remove
{
"op": "Remove",
"path": "$.raw.parameters.aid",
"match": "(?U)^.*$"
}
// Cast
{
"op": "Cast",
"path": "
$.raw.parameters.cx.data.[?(@.data.navigationStart=~([0-9]+))].data.domComplete",
"from": "Numeric",
"to": "Boolean"
}
Conditions
Conditions are boolean expressions that operate on failed event fields in order to match on specific structure or values.
Note that conditions can be applied to arbitrary fields and therefore a condition's path scopes its entry point as the failed events “root”.
Much like steps, conditions are constructed with the following JSON object format consisting of four individual parts:
op
– transformation operation to perform: Replace, Remove, Cast & Test (tests that the specified value is set in the document. If the test fails, then the patch as a whole should not apply.)path
– JSON Path to object, the path can contain specific field names (ie.raw
) , array ids (ie.[1]
) or filters (by regex:[?(@.schema=~iglu://\\.*)]
).value
– a value matcher to match against: match a regular expression, compare directly (object equality), check size for equality, less or greater than
// Compare values
{
"op": "Test",
"path": "$.processor.artifact",
"value": {
"value": "beam-enrich"
}
}
// Match against regex
{
"op": "Test",
"path": "$.payload.raw.vendor",
"value": {
"regex": "com.snowplow\\.*"
}
}
// Compare sizes
{
"op": "Test",
"path": "$.processor.artifact",
"value": {
"size": {
"eq": 11
}
}
}
{
"op": "Test",
"path": "$.processor.artifact",
"value": {
"size": {
"gt": 3
}
}
}
{
"op": "Test",
"path": "$.processor.artifact",
"value": {
"size": {
"lt": 30
}
}
}
Flows
Flows are sequences of Steps applied one by one. Flows are mapped onto specific steps and conditions.
{
"name": "main flow",
"conditions": [],
"steps": []
}
I/O
In principle data is sourced from bucket storage and delivered back into a collector. Recoverable and unrecoverable failed events are stored in bucket storage. For cloud-specific locations see below tables.
AWS
input | output | failed output | unrecoverable output |
S3 ( --input ) | Kinesis ( --output ) | S3 ( --failedOutput ) | S3 ( --unrecoverableOutput ) |
GCP
input | output | failed output | unrecoverable output |
GCS ( --input ) | PubSub ( --output ) | GCS ( --failedOutput ) | GCS (--unrecoverableOutput ) |