Skippr Docker¶
What is the Skippr Docker image?¶
With Skippr Docker image you can connect any data source to any destination by simply configuring an input and output plugin. Schema discovery and validation is automatic which enables conversion of your source data to a format that's optimal for your destination.
Examples:
- json files on your local disk to Parquet on S3
- json files on S3 to Avro messages in Kafka
How to use this image¶
1. Configure¶
Configure your docker-compose.yml
file, specifically the input and output plugin connection details via environment variables.
Example:
In the below docker-compose.yml
, we use the File Input and File Output plugins to configure Skippr to ingest json files from a local directory and output parquet files to another local directory.
---
version: "3.7"
services:
skipprd:
image: skippr/skipprd:latest
volumes:
- ~/demo:/data
environment:
DATA_SOURCE_PLUGIN_NAME: "file"
DATA_SOURCE_PATH: /data/input-dir
DATA_SOURCE_FORMAT: json
DATA_OUTPUT_PLUGIN_NAME: "file"
DATA_OUTPUT_PATH: /data/output-dir
DATA_OUTPUT_FORMAT: parquet
Note: Volume Mounts
DATA_SOURCE_PATH
and DATA_OUTPUT_PATH
should be prefixed with the docker volume /data
dir.
You can configure any sub-paths (input-dir
and output-dir
in this example). The input dir must exist in our hosts mounted volume (~/demo
in this example) and contain your source data files.
e.g.
ls -l ~/demo/input-dir
example-data-1.json
example-data-2.json.gz
example-data-3.json.gz
2. Init Skipprd¶
docker-compose up -d
On first run, Skipprd will auto-discover the input data schema and initialise Skippr internal state in skippr-state.json
. No data will be ingested on this first run.
2. Run Skippr¶
docker-compose up -d
Subsequent runs will ingest data from your source to your destination, converting to your configured output data format.
General Usage Concepts¶
Skippr State Storage¶
Skippr creates and maintains the state file skippr-state.json
in your state backend (e.g. local docker /data
volume). This file contains Skippr internal metadata such as data source schema and checkpoints for ingestion progress (think kafka offsets or logstash sincedb).
NOTE: no source data or secrets are stored in the state.
Skippr File Buffer¶
Skippr manages temporary file buffers while ingesting data. The default buffer backend is file
located your local volume mount (e.g. ~/demo/buffer
directory).
Help¶
View the docs
Join us on Slack.