pipeline-design

Design ETL/ELT pipeline architectures with proper patterns for reliability and scalability.

Stars 0

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/timequity/vibe-coder/tree/main/skills/data/pipeline-design

SKILL.md

Pipeline Design

ETL vs ELT

Approach	When to Use
ETL	Transform before load, limited warehouse compute
ELT	Modern warehouses (Snowflake, BigQuery, Redshift)

Pipeline Patterns

Batch

Source → Extract → Stage → Transform → Load → Target
         │                    │
         └── Checkpoint ──────┘

Scheduled intervals (hourly, daily)
Full or incremental loads
Idempotent operations

Streaming

Source → Kafka/Kinesis → Process → Sink
              │
              └── State Store

Real-time requirements
Event-driven architecture
Exactly-once semantics

Design Principles

Idempotent - Safe to re-run
Incremental - Process only new/changed data
Observable - Metrics, logs, alerts
Testable - Unit tests for transformations
Recoverable - Checkpoints, retry logic

Staging Pattern

sql

-- 1. Land raw data
COPY INTO raw.source_data FROM @stage;

-- 2. Deduplicate
CREATE TABLE staging.deduped AS
SELECT * FROM raw.source_data
QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY _loaded_at DESC) = 1;

-- 3. Transform to target
MERGE INTO target.dim_customer
USING staging.deduped
ON target.id = staging.id
WHEN MATCHED THEN UPDATE ...
WHEN NOT MATCHED THEN INSERT ...;

Error Handling

Dead letter queues for failed records
Retry with exponential backoff
Alert on threshold breaches
Quarantine bad data for review