Skip to content

CDC Configuration

CDC is configured via a top-level cdc: block in skippr.yaml, alongside cdc_enabled: true on the source connector.

The cdc: block

yaml
cdc:
  business_key_columns:
    - id
FieldRequiredDescription
business_key_columnsYesList of column names that uniquely identify a row. Used as the ON clause in MERGE operations.

Automatic guarantee inference

Skippr automatically determines the strongest CDC semantics your source/sink pair supports. You never need to specify a guarantee level -- the system derives it at startup and enforces it throughout the run.

Exactly-once final state

When both the source and sink support full CDC reconciliation (e.g. PostgreSQL to Snowflake), Skippr enforces exactly-once final-state semantics:

  • Inserts, updates, and deletes are applied via MERGE with order-token guards
  • Stale writes are rejected
  • Deletes are tracked in tombstone tables to prevent ghost resurrections
  • business_key_columns is required -- Skippr will error at startup if they are missing

CDC-encoded

When the sink cannot perform full MERGE reconciliation but can faithfully land CDC payloads, Skippr writes events with their mutation metadata (_skippr_mutation_kind, _skippr_order_token) as an append-only change log. Downstream consumers can process this log independently.

Validation at startup

Skippr performs the following checks before starting a CDC pipeline:

  1. The source connector must support CDC (cdc_enabled: true is accepted)
  2. The destination connector must be capable of accepting CDC payloads
  3. If the pair supports exactly-once final state, business_key_columns must be non-empty
  4. Column names in business_key_columns must exist in the source schema

If any check fails, the pipeline exits with a descriptive error message before any data is read.

Source/destination compatibility

All CDC-capable sources work with all warehouse destinations:

SourceDestinations
PostgreSQLSnowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck
MySQLSnowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck
MongoDBSnowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck
DynamoDBSnowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck
Kafka (Debezium)Snowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck

Non-CDC sources (e.g. S3, SFTP, HTTP) cannot enable cdc_enabled: true. Skippr validates source/destination compatibility at startup and returns a clear error if the combination is unsupported.

Complete example

PostgreSQL CDC to Snowflake (exactly-once final state is inferred automatically):

yaml
project: pg_cdc_to_snowflake

source:
  kind: postgres
  host: db.example.com
  port: 5432
  user: replicator
  password: ${POSTGRES_PASSWORD}
  database: production
  cdc_enabled: true

warehouse:
  kind: snowflake
  database: ANALYTICS
  schema: RAW
  warehouse: COMPUTE_WH
  role: SKIPPR_ROLE

cdc:
  business_key_columns:
    - id

MySQL CDC to BigQuery:

yaml
project: mysql_cdc_to_bq

source:
  kind: mysql
  connection_string: mysql://replicator:${MYSQL_PASSWORD}@host:3306/ecommerce
  cdc_enabled: true

warehouse:
  kind: bigquery
  project: my-gcp-project
  dataset: raw
  location: US

cdc:
  business_key_columns:
    - order_id

Kafka Debezium CDC to Redshift:

yaml
project: kafka_cdc_to_redshift

source:
  kind: kafka
  brokers: "kafka.example.com:9092"
  topic: dbserver1.public.orders
  cdc_enabled: true

warehouse:
  kind: redshift
  cluster_identifier: my-cluster
  database: analytics
  db_user: admin
  region: us-east-1

cdc:
  business_key_columns:
    - id