Skip to main content

Getting Started

info
This documentation only applies to Snowplow Community Edition. See the feature comparison page for more information about the different Snowplow offerings.

Event recovery at its core, is the ability to fix events that have failed and replay them through your pipeline.

After inspecting failed events either in the Snowplow BDP Console, or in the partitioned failure buckets, you can determine which events are possible to recover based on what the fix entails.

With recovery it is possible to:

  • replace values - e.g. correct a typo in a schema name for validation
  • remove values - e.g. remove improperly encoded values from a URL string
  • cast JSON types - e.g. change a property's type from string to integer

If your failed events would not be fixed by applying the above, they currently would be considered unrecoverable. Due to the fact that there might be a mix of recoverable and unrecoverable data in your storage, event recovery uses configuration in order to process only a subset of the failed events.

What you'll need to get started

The typical flow for recovery and some prerequisites to consider would be:
Understanding the failure issue

Configuring a recovery

Testing the configuration

  • Ability to edit/run a Scala script locally

Run the recovery

  • AWS sub account or GCP project admin access in order to create a recovery user

Monitor the recovery

  • Access to DataFlow UI (GCP) or EMR reporting (AWS)