Agent skill

ELT Modeling

Comprehensive guide to ELT (Extract, Load, Transform) modeling patterns, dimensional modeling, fact and dimension tables, and data warehouse design

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/elt-modeling

SKILL.md

ELT Modeling

ELT vs ETL

ETL (Traditional)

Extract → Transform → Load

1. Extract from source
2. Transform in ETL tool (Informatica, Talend)
3. Load to warehouse

Pros: Clean data before loading
Cons: Slow, expensive transformation servers

ELT (Modern)

Extract → Load → Transform

1. Extract from source
2. Load raw data to warehouse
3. Transform in warehouse (SQL, dbt)

Pros: Fast loading, leverage warehouse power
Cons: Raw data in warehouse (need governance)

Why ELT?

Cloud warehouses: Snowflake, BigQuery, Redshift (powerful, scalable)
Separation of concerns: EL (Fivetran, Airbyte) + T (dbt)
Flexibility: Transform multiple ways from same raw data
Speed: Parallel processing in warehouse

Dimensional Modeling

Star Schema

        dim_customers
              |
        dim_products
              |
         fct_orders  ← Central fact table
              |
        dim_dates
              |
        dim_locations

Characteristics:

One fact table (center)
Multiple dimension tables (points)
Denormalized (fast queries)

Snowflake Schema

    dim_customers
          |
    dim_customer_segments
          |
     fct_orders
          |
    dim_products
          |
    dim_product_categories

Characteristics:

Normalized dimensions
Less redundancy
More joins (slower queries)

Recommendation: Use star schema for analytics (faster)

Fact Tables

Definition

Fact table: Stores measurable events (transactions, orders, clicks)

Characteristics

Large: Millions to billions of rows
Numeric measures: Amounts, quantities, counts
Foreign keys: Links to dimensions
Grain: Level of detail (one row per order, per day, etc.)

Types of Facts

Transaction Facts:

sql

-- One row per transaction
fct_orders:
  order_id (PK)
  customer_id (FK)
  product_id (FK)
  order_date_id (FK)
  quantity
  amount

Periodic Snapshot Facts:

sql

-- One row per period (daily, monthly)
fct_inventory_daily:
  date_id (PK)
  product_id (PK)
  warehouse_id (PK)
  quantity_on_hand
  quantity_sold

Accumulating Snapshot Facts:

sql

-- One row per process (order lifecycle)
fct_order_lifecycle:
  order_id (PK)
  order_date
  payment_date
  shipment_date
  delivery_date
  days_to_ship
  days_to_deliver

Fact Table Design

sql

CREATE TABLE fct_orders (
  -- Surrogate key
  order_key BIGINT PRIMARY KEY,
  
  -- Natural key
  order_id VARCHAR(50) NOT NULL,
  
  -- Foreign keys (dimensions)
  customer_key BIGINT NOT NULL,
  product_key BIGINT NOT NULL,
  date_key INT NOT NULL,
  
  -- Degenerate dimensions (no separate dim table)
  order_number VARCHAR(50),
  
  -- Measures
  quantity INT,
  unit_price DECIMAL(10,2),
  total_amount DECIMAL(10,2),
  discount_amount DECIMAL(10,2),
  
  -- Audit columns
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

Dimension Tables

Definition

Dimension table: Stores descriptive attributes (who, what, where, when)

Characteristics

Small: Thousands to millions of rows
Descriptive: Text, categories, hierarchies
Slowly changing: Updates over time (SCD)

Dimension Design

sql

CREATE TABLE dim_customers (
  -- Surrogate key
  customer_key BIGINT PRIMARY KEY,
  
  -- Natural key
  customer_id VARCHAR(50) NOT NULL,
  
  -- Attributes
  customer_name VARCHAR(255),
  email VARCHAR(255),
  phone VARCHAR(50),
  
  -- Hierarchies
  city VARCHAR(100),
  state VARCHAR(100),
  country VARCHAR(100),
  region VARCHAR(100),
  
  -- Segments
  customer_segment VARCHAR(50),
  customer_tier VARCHAR(50),
  
  -- SCD Type 2 columns
  effective_date DATE,
  expiration_date DATE,
  is_current BOOLEAN,
  
  -- Audit columns
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

Slowly Changing Dimensions (SCD)

Type 1: Overwrite

sql

-- Customer moves, update address
UPDATE dim_customers
SET city = 'New York', state = 'NY'
WHERE customer_id = '123';

-- No history preserved

Type 2: Add New Row (Historical)

sql

-- Customer moves, add new row
INSERT INTO dim_customers (
  customer_id, city, state,
  effective_date, is_current
) VALUES (
  '123', 'New York', 'NY',
  '2024-01-15', TRUE
);

-- Mark old row as not current
UPDATE dim_customers
SET is_current = FALSE,
    expiration_date = '2024-01-14'
WHERE customer_id = '123'
  AND is_current = TRUE;

-- History preserved

Type 3: Add New Column

sql

-- Track previous value
ALTER TABLE dim_customers
ADD COLUMN previous_city VARCHAR(100);

UPDATE dim_customers
SET previous_city = city,
    city = 'New York'
WHERE customer_id = '123';

Grain Definition

What is Grain?

Grain: Level of detail in fact table (one row represents what?)

Examples

Order grain:      One row per order
Order line grain: One row per product in order
Daily grain:      One row per day per product
Hourly grain:     One row per hour per product

Choosing Grain

Too fine (order line):
- More rows
- More storage
- More flexibility

Too coarse (monthly):
- Fewer rows
- Less storage
- Less flexibility

Balance: Choose finest grain needed for analysis

Kimball vs Inmon

Kimball (Bottom-Up, Dimensional)

Data Marts (Star Schemas) → Enterprise Data Warehouse

Pros:
- Fast to implement
- Business-friendly (denormalized)
- Query performance

Cons:
- Data redundancy
- Hard to change

Inmon (Top-Down, Normalized)

Enterprise Data Warehouse (3NF) → Data Marts

Pros:
- Single source of truth
- Flexible
- Less redundancy

Cons:
- Slow to implement
- Complex queries (many joins)

Modern Approach: Hybrid (normalized staging, dimensional marts)

Data Vault

Components

Hubs:       Business keys (customers, products)
Links:      Relationships (orders link customers + products)
Satellites: Descriptive attributes (customer details)

Example

sql

-- Hub: Customer
CREATE TABLE hub_customer (
  customer_hub_key BIGINT PRIMARY KEY,
  customer_id VARCHAR(50) UNIQUE,
  load_date TIMESTAMP,
  record_source VARCHAR(50)
);

-- Satellite: Customer Details
CREATE TABLE sat_customer_details (
  customer_hub_key BIGINT,
  load_date TIMESTAMP,
  customer_name VARCHAR(255),
  email VARCHAR(255),
  phone VARCHAR(50),
  PRIMARY KEY (customer_hub_key, load_date)
);

-- Link: Order
CREATE TABLE link_order (
  order_link_key BIGINT PRIMARY KEY,
  customer_hub_key BIGINT,
  product_hub_key BIGINT,
  load_date TIMESTAMP
);

Pros:

Audit trail
Flexible
Handles source changes

Cons:

Complex
Many joins
Steep learning curve

Conformed Dimensions

Definition

Conformed dimension: Shared across multiple fact tables

Example

sql

-- dim_date used by multiple facts
dim_date ← fct_orders
        ← fct_shipments
        ← fct_returns

-- Ensures consistent date attributes

Benefits

Consistency: Same date logic everywhere
Drill-across: Compare metrics across facts
Reusability: Build once, use many times

Bridge Tables

Many-to-Many Relationships

sql

-- Customer can have multiple accounts
-- Account can have multiple customers

CREATE TABLE bridge_customer_account (
  customer_key BIGINT,
  account_key BIGINT,
  allocation_percentage DECIMAL(5,2),
  PRIMARY KEY (customer_key, account_key)
);

Junk Dimensions

Definition

Junk dimension: Collection of low-cardinality flags

Example

sql

-- Instead of many boolean columns in fact
CREATE TABLE dim_order_flags (
  order_flag_key INT PRIMARY KEY,
  is_gift BOOLEAN,
  is_express_shipping BOOLEAN,
  is_first_order BOOLEAN,
  has_discount BOOLEAN
);

-- Fact table references junk dimension
fct_orders:
  order_key
  customer_key
  order_flag_key  ← Reference to junk dimension
  amount

Role-Playing Dimensions

Definition

Role-playing dimension: Same dimension used multiple times with different meanings

Example

sql

-- dim_date used multiple times
fct_orders:
  order_key
  order_date_key      ← dim_date (order date)
  ship_date_key       ← dim_date (ship date)
  delivery_date_key   ← dim_date (delivery date)
  amount

Surrogate Keys

Natural Key vs Surrogate Key

sql

-- Natural key: Business identifier
customer_id = 'CUST-12345'

-- Surrogate key: System-generated
customer_key = 1001 (auto-increment or hash)

Why Surrogate Keys?

Performance: Integer joins faster than string joins
SCD Type 2: Multiple rows for same natural key
Independence: Source system can change natural key
Consistency: Same format across all tables

Generating Surrogate Keys

sql

-- Auto-increment
customer_key BIGINT AUTO_INCREMENT PRIMARY KEY

-- Hash (dbt)
{{ dbt_utils.surrogate_key(['customer_id', 'effective_date']) }}

-- Sequence (PostgreSQL)
customer_key BIGINT DEFAULT nextval('customer_key_seq')

Data Warehouse Layers

Bronze (Raw)

Purpose: Exact copy of source data
Format: As-is from source
Schema: Source schema
Example: raw_salesforce_accounts

Silver (Cleaned)

Purpose: Cleaned, standardized
Format: Consistent types, naming
Schema: Staging schema
Example: stg_salesforce_accounts

Gold (Analytics)

Purpose: Business-ready
Format: Dimensional models
Schema: Analytics schema
Example: dim_customers, fct_orders

Best Practices

1. Define Grain Clearly

✓ "One row per order"
✗ "Order data"

2. Use Surrogate Keys

✓ customer_key BIGINT
✗ customer_id VARCHAR(50)

3. Denormalize Dimensions

✓ dim_customers includes city, state, country
✗ Separate dim_cities, dim_states, dim_countries

4. Keep Facts Narrow

✓ fct_orders: Keys + measures only
✗ fct_orders: Keys + measures + customer name, product name

5. Use Conformed Dimensions

✓ dim_date shared across all facts
✗ Each fact has own date dimension

Common Patterns

Daily Snapshot

sql

-- Capture state once per day
INSERT INTO fct_inventory_daily
SELECT
  CURRENT_DATE as snapshot_date,
  product_id,
  warehouse_id,
  quantity_on_hand,
  quantity_reserved
FROM current_inventory;

Cumulative Metrics

sql

-- Running totals
SELECT
  order_date,
  SUM(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) as cumulative_revenue
FROM fct_orders;

Summary

ELT: Extract → Load → Transform (modern approach)

Dimensional Modeling:

Star schema (recommended)
Fact tables (measures)
Dimension tables (attributes)

Fact Types:

Transaction (one per event)
Periodic snapshot (one per period)
Accumulating snapshot (one per process)

SCD Types:

Type 1: Overwrite
Type 2: Historical (recommended)
Type 3: Previous value

Grain: Level of detail (define clearly!)

Surrogate Keys: Use for performance and SCD

Layers:

Bronze: Raw
Silver: Cleaned
Gold: Analytics-ready

Best Practices:

Define grain
Use surrogate keys
Denormalize dimensions
Keep facts narrow
Use conformed dimensions

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/elt-modeling
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

ELT Modeling

ELT vs ETL

ETL (Traditional)

ELT (Modern)

Why ELT?

Dimensional Modeling

Star Schema

Snowflake Schema

Fact Tables

Definition

Characteristics

Types of Facts

Fact Table Design

Dimension Tables

Definition

Characteristics

Dimension Design

Slowly Changing Dimensions (SCD)

Grain Definition

What is Grain?

Examples

Choosing Grain

Kimball vs Inmon

Kimball (Bottom-Up, Dimensional)

Inmon (Top-Down, Normalized)

Data Vault

Components

Example

Conformed Dimensions

Definition

Example

Benefits

Bridge Tables

Many-to-Many Relationships

Junk Dimensions

Definition

Example

Role-Playing Dimensions

Definition

Example

Surrogate Keys

Natural Key vs Surrogate Key

Why Surrogate Keys?

Generating Surrogate Keys

Data Warehouse Layers

Bronze (Raw)

Silver (Cleaned)

Gold (Analytics)

Best Practices

1. Define Grain Clearly

2. Use Surrogate Keys

3. Denormalize Dimensions

4. Keep Facts Narrow

5. Use Conformed Dimensions

Common Patterns

Daily Snapshot

Cumulative Metrics

Summary

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state