CDC Configuration
CDC is configured via a top-level cdc: block in skippr.yaml, alongside cdc_enabled: true on the source connector.
The cdc: block
cdc:
business_key_columns:
- id| Field | Required | Description |
|---|---|---|
business_key_columns | Yes | List of column names that uniquely identify a row. Used as the ON clause in MERGE operations. |
Automatic guarantee inference
Skippr automatically determines the strongest CDC semantics your source/sink pair supports. You never need to specify a guarantee level -- the system derives it at startup and enforces it throughout the run.
Exactly-once final state
When both the source and sink support full CDC reconciliation (e.g. PostgreSQL to Snowflake), Skippr enforces exactly-once final-state semantics:
- Inserts, updates, and deletes are applied via MERGE with order-token guards
- Stale writes are rejected
- Deletes are tracked in tombstone tables to prevent ghost resurrections
business_key_columnsis required -- Skippr will error at startup if they are missing
CDC-encoded
When the sink cannot perform full MERGE reconciliation but can faithfully land CDC payloads, Skippr writes events with their mutation metadata (_skippr_mutation_kind, _skippr_order_token) as an append-only change log. Downstream consumers can process this log independently.
Validation at startup
Skippr performs the following checks before starting a CDC pipeline:
- The source connector must support CDC (
cdc_enabled: trueis accepted) - The destination connector must be capable of accepting CDC payloads
- If the pair supports exactly-once final state,
business_key_columnsmust be non-empty - Column names in
business_key_columnsmust exist in the source schema
If any check fails, the pipeline exits with a descriptive error message before any data is read.
Source/destination compatibility
All CDC-capable sources work with all warehouse destinations:
| Source | Destinations |
|---|---|
| PostgreSQL | Snowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck |
| MySQL | Snowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck |
| MongoDB | Snowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck |
| DynamoDB | Snowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck |
| Kafka (Debezium) | Snowflake, BigQuery, PostgreSQL, Redshift, ClickHouse, Databricks, Synapse, MotherDuck |
Non-CDC sources (e.g. S3, SFTP, HTTP) cannot enable cdc_enabled: true. Skippr validates source/destination compatibility at startup and returns a clear error if the combination is unsupported.
Complete example
PostgreSQL CDC to Snowflake (exactly-once final state is inferred automatically):
project: pg_cdc_to_snowflake
source:
kind: postgres
host: db.example.com
port: 5432
user: replicator
password: ${POSTGRES_PASSWORD}
database: production
cdc_enabled: true
warehouse:
kind: snowflake
database: ANALYTICS
schema: RAW
warehouse: COMPUTE_WH
role: SKIPPR_ROLE
cdc:
business_key_columns:
- idMySQL CDC to BigQuery:
project: mysql_cdc_to_bq
source:
kind: mysql
connection_string: mysql://replicator:${MYSQL_PASSWORD}@host:3306/ecommerce
cdc_enabled: true
warehouse:
kind: bigquery
project: my-gcp-project
dataset: raw
location: US
cdc:
business_key_columns:
- order_idKafka Debezium CDC to Redshift:
project: kafka_cdc_to_redshift
source:
kind: kafka
brokers: "kafka.example.com:9092"
topic: dbserver1.public.orders
cdc_enabled: true
warehouse:
kind: redshift
cluster_identifier: my-cluster
database: analytics
db_user: admin
region: us-east-1
cdc:
business_key_columns:
- id