Skip to content

Skippr Docker

What is the Skippr Docker image?

With Skippr Docker image you can connect any data source to any destination by simply configuring an input and output plugin. Schema discovery and validation is automatic which enables conversion of your source data to a format that's optimal for your destination.

Examples:

  • json files on your local disk to Parquet on S3
  • json files on S3 to Avro messages in Kafka

How to use this image

1. Configure

Configure your docker-compose.yml file, specifically the input and output plugin connection details via environment variables.

Example:

In the below docker-compose.yml, we use the File Input and File Output plugins to configure Skippr to ingest json files from a local directory and output parquet files to another local directory.

---
version: "3.7"
services:
  skipprd:
    image: skippr/skipprd:latest
    volumes:
      - ~/demo:/data
    environment:
      DATA_SOURCE_PLUGIN_NAME: "file"
      DATA_SOURCE_PATH: /data/input-dir
      DATA_SOURCE_FORMAT: json

      DATA_OUTPUT_PLUGIN_NAME: "file"
      DATA_OUTPUT_PATH: /data/output-dir
      DATA_OUTPUT_FORMAT: parquet


Note: Volume Mounts

DATA_SOURCE_PATH and DATA_OUTPUT_PATH should be prefixed with the docker volume /data dir.

You can configure any sub-paths (input-dir and output-dir in this example). The input dir must exist in our hosts mounted volume (~/demo in this example) and contain your source data files.

e.g.

ls -l ~/demo/input-dir
example-data-1.json
example-data-2.json.gz
example-data-3.json.gz

2. Init Skipprd

docker-compose up -d

On first run, Skipprd will auto-discover the input data schema and initialise Skippr internal state in skippr-state.json. No data will be ingested on this first run.

2. Run Skippr

docker-compose up -d

Subsequent runs will ingest data from your source to your destination, converting to your configured output data format.

General Usage Concepts

Skippr State Storage

Skippr creates and maintains the state file skippr-state.json in your state backend (e.g. local docker /data volume). This file contains Skippr internal metadata such as data source schema and checkpoints for ingestion progress (think kafka offsets or logstash sincedb).

NOTE: no source data or secrets are stored in the state.

Skippr File Buffer

Skippr manages temporary file buffers while ingesting data. The default buffer backend is file located your local volume mount (e.g. ~/demo/buffer directory).

Help

View the docs

Join us on Slack.