Skip to main content

Snowflake Streaming Loader

Overviewโ€‹

The Snowflake Streaming Loader is an application that loads Snowplow events to Snowflake using the Snowpipe Streaming API.

Streaming Loader or RDB Loader?

Both Snowflake Streaming Loader and RDB Loader can load data into Snowflake.

Snowflake Streaming Loader is newer and has two advantages:

  • Much lower latency โ€” you can get data in Snowflake in seconds, as opposed to minutes with RDB Loader
  • Much lower cost โ€” unlike with RDB Loader, there is no need for EMR and extensive Snowflake compute to load batch files

We recommend the Streaming Loader over the RDB Loader. If you already use RDB Loader, see the migration guide for more information.

The Snowflake Streaming Loader on AWS is a fully streaming application that continually pulls events from Kinesis and writes to Snowflake using the Snowpipe Streaming API.

The Snowflake Streaming Loader is published as a Docker image which you can run on any AWS VM. You do not need a Spark cluster to run this loader.

docker pull snowplow/snowflake-loader-kinesis:0.3.0

To run the loader, mount your config file into the docker image, and then provide the file path on the command line. We recommend setting the SNOWFLAKE_PRIVATE_KEY environment variable so that you can refer to it in the config file.

docker run \
--mount=type=bind,source=/path/to/myconfig,destination=/myconfig \
--env SNOWFLAKE_PRIVATE_KEY="${SNOWFLAKE_PRIVATE_KEY}" \
snowplow/snowflake-loader-kinesis:0.3.0 \
--config=/myconfig/loader.hocon

Configuring the loaderโ€‹

The loader config file is in HOCON format, and it allows configuring many different properties of how the loader runs.

The simplest possible config file just needs a description of your pipeline inputs and outputs:

config/config.kinesis.minimal.hocon
loading...

See the configuration reference for all possible configuration parameters.