2.0.0 upgrade guide
Caution
If you're upgrading from Snowplow pre-R119 and S3 Loader pre-version 0.7.0 you have to upgrade to version 0.7.0 or 1.0.0 first in order to split bad data produced during transition period.
In Snowplow R119 we introduced a new self-describing bad rows format. S3 Loader 0.7.0 was the first version capable of partitioning self-describing data based on its schema. 0.7.0 and 1.0.0 are capable to recognize at runtime whether old or new format is consumed and use partitionedBucket output path only if necessary, so both formats can be consumed.
S3 Loader 2.0.0 supports only new self-describing format and will be raising exceptions if legacy bad data is pushed.
Config file
In 2.0.0 the S3 Loader went through a major configuration refactoring. A sample config is available in GitHub repository.
- No more
awsproperty allowing to hardcode credentials - default credentials chain is used - NSQ support has been dropped
- Instead of
kinesisands3the topology now is represented asinput(Kinesis Stream) andoutput(S3 bucket and a Kinesis Stream for bad data) partitionedBucketproperty has been removed (see Caution above)- New
purposeproperty allowing Loader to recognize the data it works with:ENRICHEDfor enriched TSVs enabling latency monitoring,SELF_DESCRIBINGgenerally for any self-describing JSON but usually used for bad rows andRAW
New features
metrics.sentry.dsncan be used to track exceptions, including internal KCL exceptionsmetricsd.statsdcan be used to send observability data to StatsD-compatible server