Agent skill

setup-timescaledb-hypertables

Use this skill when creating database schemas or tables for Timescale, TimescaleDB, TigerData, or Tiger Cloud, especially for time-series, IoT, metrics, events, or log data. Use this to improve the performance of any insert-heavy table. **Trigger when user asks to:** - Create or design SQL schemas/tables AND Timescale/TimescaleDB/TigerData/Tiger Cloud is available - Set up hypertables, compression, retention policies, or continuous aggregates - Configure partition columns, segment_by, order_by, or chunk intervals - Optimize time-series database performance or storage - Create tables for sensors, metrics, telemetry, events, or transaction logs **Keywords:** CREATE TABLE, hypertable, Timescale, TimescaleDB, time-series, IoT, metrics, sensor data, compression policy, continuous aggregates, columnstore, retention policy, chunk interval, segment_by, order_by Step-by-step instructions for hypertable creation, column selection, compression policies, retention, continuous aggregates, and indexes.

Stars 1,661
Forks 82

Install this agent skill to your Project

npx add-skill https://github.com/timescale/pg-aiguide/tree/main/skills/setup-timescaledb-hypertables

SKILL.md

TimescaleDB Complete Setup

Instructions for insert-heavy data patterns where data is inserted but rarely changed:

  • Time-series data (sensors, metrics, system monitoring)
  • Event logs (user events, audit trails, application logs)
  • Transaction records (orders, payments, financial transactions)
  • Sequential data (records with auto-incrementing IDs and timestamps)
  • Append-only datasets (immutable records, historical data)

Step 1: Create Hypertable

sql
CREATE TABLE your_table_name (
    timestamp TIMESTAMPTZ NOT NULL,
    entity_id TEXT NOT NULL,          -- device_id, user_id, symbol, etc.
    category TEXT,                    -- sensor_type, event_type, asset_class, etc.
    value_1 DOUBLE PRECISION,         -- price, temperature, latency, etc.
    value_2 DOUBLE PRECISION,         -- volume, humidity, throughput, etc.
    value_3 INTEGER,                  -- count, status, level, etc.
    metadata JSONB                    -- flexible additional data
) WITH (
    tsdb.hypertable,
    tsdb.partition_column='timestamp',
    tsdb.enable_columnstore=true,     -- Disable if table has vector columns
    tsdb.segmentby='entity_id',       -- See selection guide below
    tsdb.orderby='timestamp DESC',     -- See selection guide below
    tsdb.sparse_index='minmax(value_1),minmax(value_2),minmax(value_3)' -- see selection guide below
);

Compression Decision

  • Enable by default for insert-heavy patterns
  • Disable if table has vector type columns (pgvector) - indexes on vector columns incompatible with columnstore

Partition Column Selection

Must be time-based (TIMESTAMP/TIMESTAMPTZ/DATE) or integer (INT/BIGINT) with good temporal/sequential distribution.

Common patterns:

  • TIME-SERIES: timestamp, event_time, measured_at
  • EVENT LOGS: event_time, created_at, logged_at
  • TRANSACTIONS: created_at, transaction_time, processed_at
  • SEQUENTIAL: id (auto-increment when no timestamp), sequence_number
  • APPEND-ONLY: created_at, inserted_at, id

Less ideal: ingested_at (when data entered system - use only if it's your primary query dimension) Avoid: updated_at (breaks time ordering unless it's primary query dimension)

Segment_By Column Selection

PREFER SINGLE COLUMN - multi-column rarely optimal. Multi-column can only work for highly correlated columns (e.g., metric_name + metric_type) with sufficient row density.

Requirements:

  • Frequently used in WHERE clauses (most common filter)
  • Good row density (>100 rows per value per chunk)
  • Primary logical partition/grouping

Examples:

  • IoT: device_id
  • Finance: symbol
  • Metrics: service_name, service_name, metric_type (if sufficient row density), metric_name, metric_type (if sufficient row density)
  • Analytics: user_id if sufficient row density, otherwise session_id
  • E-commerce: product_id if sufficient row density, otherwise category_id

Row density guidelines:

  • Target: >100 rows per segment_by value within each chunk.
  • Poor: <10 rows per segment_by value per chunk → choose less granular column
  • What to do with low-density columns: prepend to order_by column list.

Query pattern drives choice:

sql
SELECT * FROM table WHERE entity_id = 'X' AND timestamp > ...
-- ↳ segment_by: entity_id (if >100 rows per chunk)

Avoid: timestamps, unique IDs, low-density columns (<100 rows/value/chunk), columns rarely used in filtering

Order_By Column Selection

Creates natural time-series progression when combined with segment_by for optimal compression.

Most common: timestamp DESC

Examples:

  • IoT/Finance/E-commerce: timestamp DESC
  • Metrics: metric_name, timestamp DESC (if metric_name has too low density for segment_by)
  • Analytics: user_id, timestamp DESC (user_id has too low density for segment_by)

Alternative patterns:

  • sequence_id DESC for event streams with sequence numbers
  • timestamp DESC, event_order DESC for sub-ordering within same timestamp

Low-density column handling: If a column has <100 rows per chunk (too low for segment_by), prepend it to order_by:

  • Example: metric_name has 20 rows/chunk → use segment_by='service_name', order_by='metric_name, timestamp DESC'
  • Groups similar values together (all temperature readings, then pressure readings) for better compression

Good test: ordering created by (segment_by_column, order_by_column) should form a natural time-series progression. Values close to each other in the progression should be similar.

Avoid in order_by: random columns, columns with high variance between adjacent rows, columns unrelated to segment_by

Compression Sparse Index Selection

Sparse indexes enable query filtering on compressed data without decompression. Store metadata per batch (~1000 rows) to eliminate batches that don't match query predicates.

Types:

  • minmax: Min/max values per batch - for range queries (>, <, BETWEEN) on numeric/temporal columns

Use minmax for: price, temperature, measurement, timestamp (range filtering)

Use for:

  • minmax for outlier detection (temperature > 90).
  • minmax for fields that are highly correlated with segmentby and orderby columns (e.g. if orderby includes created_at, minmax on updated_at is useful).

Avoid: rarely filtered columns.

IMPORTANT: NEVER index columns in segmentby or orderby. Orderby columns will always have minmax indexes without any configuration.

Configuration: The format is a comma-separated list of type_of_index(column_name).

sql
ALTER TABLE table_name SET (
    timescaledb.sparse_index = 'minmax(value_1),minmax(value_2)'
);

Explicit configuration available since v2.22.0 (was auto-created since v2.16.0).

Chunk Time Interval (Optional)

Default: 7 days (use if volume unknown, or ask user). Adjust based on volume:

  • High frequency: 1 hour - 1 day
  • Medium: 1 day - 1 week
  • Low: 1 week - 1 month
sql
SELECT set_chunk_time_interval('your_table_name', INTERVAL '1 day');

Good test: recent chunk indexes should fit in less than 25% of RAM.

Indexes & Primary Keys

Common index patterns - composite indexes on an id and timestamp:

sql
CREATE INDEX idx_entity_timestamp ON your_table_name (entity_id, timestamp DESC);

Important: Only create indexes you'll actually use - each has maintenance overhead.

Primary key and unique constraints rules: Must include partition column.

Option 1: Composite PK with partition column

sql
ALTER TABLE your_table_name ADD PRIMARY KEY (entity_id, timestamp);

Option 2: Single-column PK (only if it's the partition column)

sql
CREATE TABLE ... (id BIGINT PRIMARY KEY, ...) WITH (tsdb.partition_column='id');

Option 3: No PK: strict uniqueness is often not required for insert-heavy patterns.

Step 2: Compression Policy (Optional)

IMPORTANT: If you used tsdb.enable_columnstore=true in Step 1, starting with TimescaleDB version 2.23 a columnstore policy is automatically created with after => INTERVAL '7 days'. You only need to call add_columnstore_policy() if you want to customize the after interval to something other than 7 days.

Set after interval for when: data becomes mostly immutable (some updates/backfill OK) AND B-tree indexes aren't needed for queries (less common criterion).

sql
-- In TimescaleDB 2.23 and later only needed if you want to override the default 7-day policy created by tsdb.enable_columnstore=true
-- Remove the existing auto-created policy first:
-- CALL remove_columnstore_policy('your_table_name');
-- Then add custom policy:
-- CALL add_columnstore_policy('your_table_name', after => INTERVAL '1 day');

Step 3: Retention Policy

IMPORTANT: Don't guess - ask user or comment out if unknown.

sql
-- Example - replace with requirements or comment out
SELECT add_retention_policy('your_table_name', INTERVAL '365 days');

Step 4: Create Continuous Aggregates

Use different aggregation intervals for different uses.

Short-term (Minutes/Hours)

For up-to-the-minute dashboards on high-frequency data.

sql
CREATE MATERIALIZED VIEW your_table_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 hour', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;

Long-term (Days/Weeks/Months)

For long-term reporting and analytics.

sql
CREATE MATERIALIZED VIEW your_table_daily
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 day', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value_1) as median_value_1,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value_1) as p95_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;

Step 5: Aggregate Refresh Policies

Set up refresh policies based on your data freshness requirements.

start_offset: Usually omit (refreshes all). Exception: If you don't care about refreshing data older than X (see below). With retention policy on raw data: match the retention policy.

end_offset: Set beyond active update window (e.g., 15 min if data usually arrives within 10 min). Data newer than end_offset won't appear in queries without real-time aggregation. If you don't know your update window, use the size of the time_bucket in the query, but not less than 5 minutes.

schedule_interval: Set to the same value as the end_offset but not more than 1 hour.

Hourly - frequent refresh for dashboards:

sql
SELECT add_continuous_aggregate_policy('your_table_hourly',
    start_offset => NULL,
    end_offset => INTERVAL '15 minutes',
    schedule_interval => INTERVAL '15 minutes');

Daily - less frequent for reports:

sql
SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

Use start_offset only if you don't care about refreshing old data Use for high-volume systems where query accuracy on older data doesn't matter:

sql
-- the following aggregate can be stale for data older than 7 days
-- SELECT add_continuous_aggregate_policy('aggregate_for_last_7_days',
--     start_offset => INTERVAL '7 days',    -- only refresh last 7 days (NULL = refresh all)
--     end_offset => INTERVAL '15 minutes',
--     schedule_interval => INTERVAL '15 minutes');

IMPORTANT: you MUST set a start_offset to be less than the retention policy on raw data. By default, set the start_offset equal to the retention policy. If the retention policy is commented out, comment out the start_offset as well. like this:

sql
SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,    -- Use NULL to refresh all data, or set to retention period if enabled on raw data
--  start_offset => INTERVAL '<retention period here>',    -- uncomment if retention policy is enabled on the raw data table
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

Step 6: Real-Time Aggregation (Optional)

Real-time combines materialized + recent raw data at query time. Provides up-to-date results at the cost of higher query latency.

More useful for fine-grained aggregates (e.g., minutely) than coarse ones (e.g., daily/monthly) since large buckets will be mostly incomplete with recent data anyway.

Disabled by default in v2.13+, before that it was enabled by default.

Use when: Need data newer than end_offset, up-to-minute dashboards, can tolerate higher query latency Disable when: Performance critical, refresh policies sufficient, high query volume, missing and stale data for recent data is acceptable

Enable for current results (higher query cost):

sql
ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = false);

Disable for performance (but with stale results):

sql
ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = true);

Step 7: Compress Aggregates

Rule: segment_by = ALL GROUP BY columns except time_bucket, order_by = time_bucket DESC

sql
-- Hourly
ALTER MATERIALIZED VIEW your_table_hourly SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_hourly', after => INTERVAL '3 days');

-- Daily
ALTER MATERIALIZED VIEW your_table_daily SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_daily', after => INTERVAL '7 days');

Step 8: Aggregate Retention

Aggregates are typically kept longer than raw data. IMPORTANT: Don't guess - ask user or you MUST comment out if unknown.

sql
-- Example - replace or comment out
SELECT add_retention_policy('your_table_hourly', INTERVAL '2 years');
SELECT add_retention_policy('your_table_daily', INTERVAL '5 years');

Step 9: Performance Indexes on Continuous Aggregates

Index strategy: Analyze WHERE clauses in common queries → Create indexes matching filter columns + time ordering

Pattern: (filter_column, bucket DESC) supports WHERE filter_column = X AND bucket >= Y ORDER BY bucket DESC

Examples:

sql
CREATE INDEX idx_hourly_entity_bucket ON your_table_hourly (entity_id, bucket DESC);
CREATE INDEX idx_hourly_category_bucket ON your_table_hourly (category, bucket DESC);

Multi-column filters: Create composite indexes for WHERE entity_id = X AND category = Y:

sql
CREATE INDEX idx_hourly_entity_category_bucket ON your_table_hourly (entity_id, category, bucket DESC);

Important: Only create indexes you'll actually use - each has maintenance overhead.

Step 10: Optional Enhancements

Space Partitioning (NOT RECOMMENDED)

Only for query patterns where you ALWAYS filter by the space-partition column with expert knowledge and extensive benchmarking. STRONGLY prefer time-only partitioning.

Step 11: Verify Configuration

sql
-- Check hypertable
SELECT * FROM timescaledb_information.hypertables
WHERE hypertable_name = 'your_table_name';

-- Check compression settings
SELECT * FROM hypertable_compression_stats('your_table_name');

-- Check aggregates
SELECT * FROM timescaledb_information.continuous_aggregates;

-- Check policies
SELECT * FROM timescaledb_information.jobs ORDER BY job_id;

-- Monitor chunk information
SELECT
    chunk_name,
    range_start,
    range_end,
    is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'your_table_name'
ORDER BY range_start DESC;

Performance Guidelines

  • Chunk size: Recent chunk indexes should fit in less than 25% of RAM
  • Compression: Expect 90%+ reduction (10x) with proper columnstore config
  • Query optimization: Use continuous aggregates for historical queries and dashboards
  • Memory: Run timescaledb-tune for self-hosting (auto-configured on cloud)

Schema Best Practices

Do's and Don'ts

  • ✅ Use TIMESTAMPTZ NOT timestamp
  • ✅ Use >= and < NOT BETWEEN for timestamps
  • ✅ Use TEXT with constraints NOT char(n)/varchar(n)
  • ✅ Use snake_case NOT CamelCase
  • ✅ Use BIGINT GENERATED ALWAYS AS IDENTITY NOT SERIAL
  • ✅ Use BIGINT for IDs by default over INTEGER or SMALLINT
  • ✅ Use DOUBLE PRECISION by default over REAL/FLOAT
  • ✅ Use NUMERIC NOT MONEY
  • ✅ Use NOT EXISTS NOT NOT IN
  • ✅ Use time_bucket() or date_trunc() NOT timestamp(0) for truncation

API Reference (Current vs Deprecated)

Deprecated Parameters → New Parameters:

  • timescaledb.compresstimescaledb.enable_columnstore
  • timescaledb.compress_segmentbytimescaledb.segmentby
  • timescaledb.compress_orderbytimescaledb.orderby

Deprecated Functions → New Functions:

  • add_compression_policy()add_columnstore_policy()
  • remove_compression_policy()remove_columnstore_policy()
  • compress_chunk()convert_to_columnstore() (use with CALL, not SELECT)
  • decompress_chunk()convert_to_rowstore() (use with CALL, not SELECT)

Compression Stats (use functions, not views):

  • Use function: hypertable_compression_stats('table_name')
  • Use function: chunk_compression_stats('_timescaledb_internal._hyper_X_Y_chunk')
  • Note: Views like columnstore_settings may not be available in all versions; use functions instead

Manual Compression Example:

sql
-- Compress a specific chunk
CALL convert_to_columnstore('_timescaledb_internal._hyper_7_1_chunk');

-- Check compression statistics
SELECT
    number_compressed_chunks,
    pg_size_pretty(before_compression_total_bytes) as before_compression,
    pg_size_pretty(after_compression_total_bytes) as after_compression,
    ROUND(100.0 * (1 - after_compression_total_bytes::numeric / NULLIF(before_compression_total_bytes, 0)), 1) as compression_pct
FROM hypertable_compression_stats('your_table_name');

Questions to Ask User

  1. What kind of data will you be storing?
  2. How do you expect to use the data?
  3. What queries will you run?
  4. How long to keep the data?
  5. Column types if unclear

Expand your agent's capabilities with these related and highly-rated skills.

timescale/pg-aiguide

postgres-hybrid-text-search

Use this skill to implement hybrid search combining BM25 keyword search with semantic vector search using Reciprocal Rank Fusion (RRF). **Trigger when user asks to:** - Combine keyword and semantic search - Implement hybrid search or multi-modal retrieval - Use BM25/pg_textsearch with pgvector together - Implement RRF (Reciprocal Rank Fusion) for search - Build search that handles both exact terms and meaning **Keywords:** hybrid search, BM25, pg_textsearch, RRF, reciprocal rank fusion, keyword search, full-text search, reranking, cross-encoder Covers: pg_textsearch BM25 index setup, parallel query patterns, client-side RRF fusion (Python/TypeScript), weighting strategies, and optional ML reranking.

1,661 82
Explore
timescale/pg-aiguide

design-postgres-tables

Use this skill for general PostgreSQL table design. **Trigger when user asks to:** - Design PostgreSQL tables, schemas, or data models when creating new tables and when modifying existing ones. - Choose data types, constraints, or indexes for PostgreSQL - Create user tables, order tables, reference tables, or JSONB schemas - Understand PostgreSQL best practices for normalization, constraints, or indexing - Design update-heavy, upsert-heavy, or OLTP-style tables **Keywords:** PostgreSQL schema, table design, data types, PRIMARY KEY, FOREIGN KEY, indexes, B-tree, GIN, JSONB, constraints, normalization, identity columns, partitioning, row-level security Comprehensive reference covering data types, indexing strategies, constraints, JSONB patterns, partitioning, and PostgreSQL-specific best practices.

1,661 82
Explore
timescale/pg-aiguide

find-hypertable-candidates

Use this skill to analyze an existing PostgreSQL database and identify which tables should be converted to Timescale/TimescaleDB hypertables. **Trigger when user asks to:** - Analyze database tables for hypertable conversion potential - Identify time-series or event tables in an existing schema - Evaluate if a table would benefit from Timescale/TimescaleDB - Audit PostgreSQL tables for migration to Timescale/TimescaleDB/TigerData - Score or rank tables for hypertable candidacy **Keywords:** hypertable candidate, table analysis, migration assessment, Timescale, TimescaleDB, time-series detection, insert-heavy tables, event logs, audit tables Provides SQL queries to analyze table statistics, index patterns, and query patterns. Includes scoring criteria (8+ points = good candidate) and pattern recognition for IoT, events, transactions, and sequential data.

1,661 82
Explore
timescale/pg-aiguide

migrate-postgres-tables-to-hypertables

Use this skill to migrate identified PostgreSQL tables to Timescale/TimescaleDB hypertables with optimal configuration and validation. **Trigger when user asks to:** - Migrate or convert PostgreSQL tables to hypertables - Execute hypertable migration with minimal downtime - Plan blue-green migration for large tables - Validate hypertable migration success - Configure compression after migration **Prerequisites:** Tables already identified as candidates (use find-hypertable-candidates first if needed) **Keywords:** migrate to hypertable, convert table, Timescale, TimescaleDB, blue-green migration, in-place conversion, create_hypertable, migration validation, compression setup Step-by-step migration planning including: partition column selection, chunk interval calculation, PK/constraint handling, migration execution (in-place vs blue-green), and performance validation queries.

1,661 82
Explore
timescale/pg-aiguide

design-postgis-tables

Comprehensive PostGIS spatial table design reference covering geometry types, coordinate systems, spatial indexing, and performance patterns for location-based applications

1,661 82
Explore
timescale/pg-aiguide

pgvector-semantic-search

Use this skill for setting up vector similarity search with pgvector for AI/ML embeddings, RAG applications, or semantic search. **Trigger when user asks to:** - Store or search vector embeddings in PostgreSQL - Set up semantic search, similarity search, or nearest neighbor search - Create HNSW or IVFFlat indexes for vectors - Implement RAG (Retrieval Augmented Generation) with PostgreSQL - Optimize pgvector performance, recall, or memory usage - Use binary quantization for large vector datasets **Keywords:** pgvector, embeddings, semantic search, vector similarity, HNSW, IVFFlat, halfvec, cosine distance, nearest neighbor, RAG, LLM, AI search Covers: halfvec storage, HNSW index configuration (m, ef_construction, ef_search), quantization strategies, filtered search, bulk loading, and performance tuning.

1,661 82
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results