Agent skill
clickhouse-best-practices
MUST USE when reviewing ClickHouse schemas, queries, or configurations. Contains 28 rules that MUST be checked before providing recommendations. Always read relevant rule files and cite specific rules in responses.
Install this agent skill to your Project
npx add-skill https://github.com/ClickHouse/agent-skills/tree/main/skills/clickhouse-best-practices
Metadata
Additional technical details for this skill
- author
- ClickHouse Inc
- version
- 0.3.0
SKILL.md
ClickHouse Best Practices
Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.
Official docs: ClickHouse Best Practices
IMPORTANT: How to Apply This Skill
Before answering ClickHouse questions, follow this priority order:
- Check for applicable rules in the
rules/directory - If rules exist: Apply them and cite them in your response using "Per
rule-name..." - If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
- If uncertain: Use web search for current best practices
- Always cite your source: rule name, "general ClickHouse guidance", or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.
For Formal Reviews
When performing a formal review of schemas, queries, or data ingestion:
Review Procedures
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
rules/schema-pk-plan-before-creation.md- ORDER BY is immutablerules/schema-pk-cardinality-order.md- Column ordering in keysrules/schema-pk-prioritize-filters.md- Filter column inclusionrules/schema-types-native-types.md- Proper type selectionrules/schema-types-minimize-bitwidth.md- Numeric type sizingrules/schema-types-lowcardinality.md- LowCardinality usagerules/schema-types-avoid-nullable.md- Nullable vs DEFAULTrules/schema-partition-low-cardinality.md- Partition count limitsrules/schema-partition-lifecycle.md- Partitioning purpose
Check for:
- PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
- Data types match actual data ranges
- LowCardinality applied to appropriate string columns
- Partition key cardinality bounded (100-1,000 values)
- ReplacingMergeTree has version column if used
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
rules/query-join-choose-algorithm.md- Algorithm selectionrules/query-join-filter-before.md- Pre-join filteringrules/query-join-use-any.md- ANY vs regular JOINrules/query-index-skipping-indices.md- Secondary index usagerules/schema-pk-filter-on-orderby.md- Filter alignment with ORDER BY
Check for:
- Filters use ORDER BY prefix columns
- JOINs filter tables before joining (not after)
- Correct JOIN algorithm for table sizes
- Skipping indices for non-ORDER BY filter columns
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
rules/insert-batch-size.md- Batch sizing requirementsrules/insert-mutation-avoid-update.md- UPDATE alternativesrules/insert-mutation-avoid-delete.md- DELETE alternativesrules/insert-async-small-batches.md- Async insert usagerules/insert-optimize-avoid-final.md- OPTIMIZE TABLE risks
Check for:
- Batch size 10K-100K rows per INSERT
- No ALTER TABLE UPDATE for frequent changes
- ReplacingMergeTree or CollapsingMergeTree for update patterns
- Async inserts enabled for high-frequency small batches
Output Format
Structure your response as follows:
## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...
## Findings
### Violations
- **`rule-name`**: Description of the issue
- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
### Compliant
- `rule-name`: Brief note on why it's correct
## Recommendations
[Prioritized list of changes, citing rules]
Rule Categories by Priority
| Priority | Category | Impact | Prefix | Rule Count |
|---|---|---|---|---|
| 1 | Primary Key Selection | CRITICAL | schema-pk- |
4 |
| 2 | Data Type Selection | CRITICAL | schema-types- |
5 |
| 3 | JOIN Optimization | CRITICAL | query-join- |
5 |
| 4 | Insert Batching | CRITICAL | insert-batch- |
1 |
| 5 | Mutation Avoidance | CRITICAL | insert-mutation- |
2 |
| 6 | Partitioning Strategy | HIGH | schema-partition- |
4 |
| 7 | Skipping Indices | HIGH | query-index- |
1 |
| 8 | Materialized Views | HIGH | query-mv- |
2 |
| 9 | Async Inserts | HIGH | insert-async- |
2 |
| 10 | OPTIMIZE Avoidance | HIGH | insert-optimize- |
1 |
| 11 | JSON Usage | MEDIUM | schema-json- |
1 |
Quick Reference
Schema Design - Primary Key (CRITICAL)
schema-pk-plan-before-creation- Plan ORDER BY before table creation (immutable)schema-pk-cardinality-order- Order columns low-to-high cardinalityschema-pk-prioritize-filters- Include frequently filtered columnsschema-pk-filter-on-orderby- Query filters must use ORDER BY prefix
Schema Design - Data Types (CRITICAL)
schema-types-native-types- Use native types, not String for everythingschema-types-minimize-bitwidth- Use smallest numeric type that fitsschema-types-lowcardinality- LowCardinality for <10K unique stringsschema-types-enum- Enum for finite value sets with validationschema-types-avoid-nullable- Avoid Nullable; use DEFAULT instead
Schema Design - Partitioning (HIGH)
schema-partition-low-cardinality- Keep partition count 100-1,000schema-partition-lifecycle- Use partitioning for data lifecycle, not queriesschema-partition-query-tradeoffs- Understand partition pruning trade-offsschema-partition-start-without- Consider starting without partitioning
Schema Design - JSON (MEDIUM)
schema-json-when-to-use- JSON for dynamic schemas; typed columns for known
Query Optimization - JOINs (CRITICAL)
query-join-choose-algorithm- Select algorithm based on table sizesquery-join-use-any- ANY JOIN when only one match neededquery-join-filter-before- Filter tables before joiningquery-join-consider-alternatives- Dictionaries/denormalization vs JOINquery-join-null-handling- join_use_nulls=0 for default values
Query Optimization - Indices (HIGH)
query-index-skipping-indices- Skipping indices for non-ORDER BY filters
Query Optimization - Materialized Views (HIGH)
query-mv-incremental- Incremental MVs for real-time aggregationsquery-mv-refreshable- Refreshable MVs for complex joins
Insert Strategy - Batching (CRITICAL)
insert-batch-size- Batch 10K-100K rows per INSERT
Insert Strategy - Async (HIGH)
insert-async-small-batches- Async inserts for high-frequency small batchesinsert-format-native- Native format for best performance
Insert Strategy - Mutations (CRITICAL)
insert-mutation-avoid-update- ReplacingMergeTree instead of ALTER UPDATEinsert-mutation-avoid-delete- Lightweight DELETE or DROP PARTITION
Insert Strategy - Optimization (HIGH)
insert-optimize-avoid-final- Let background merges work
When to Apply
This skill activates when you encounter:
CREATE TABLEstatementsALTER TABLEmodificationsORDER BYorPRIMARY KEYdiscussions- Data type selection questions
- Slow query troubleshooting
- JOIN optimization requests
- Data ingestion pipeline design
- Update/delete strategy questions
- ReplacingMergeTree or other specialized engine usage
- Partitioning strategy decisions
Rule File Structure
Each rule file in rules/ contains:
- YAML frontmatter: title, impact level, tags
- Brief explanation: Why this rule matters
- Incorrect example: Anti-pattern with explanation
- Correct example: Best practice with explanation
- Additional context: Trade-offs, when to apply, references
Full Compiled Document
For the complete guide with all rules expanded inline: AGENTS.md
Use AGENTS.md when you need to check multiple rules quickly without reading individual files.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
ubiquitous-language
Extract a DDD-style ubiquitous language glossary from the current conversation, flagging ambiguities and proposing canonical terms. Saves to UBIQUITOUS_LANGUAGE.md. Use when user wants to define domain terms, build a glossary, harden terminology, create a ubiquitous language, or mentions "domain model" or "DDD".
every-style-editor
This skill should be used when reviewing or editing copy to ensure adherence to Every's style guide. It provides a systematic line-by-line review process for grammar, punctuation, mechanics, and style guide compliance.
manage-codex
Autonomous Codex batch orchestrator. Use for "/manage-codex", "manage codex", "use codex", "dispatch to codex", or long-running Codex work.
seo-audit
When the user wants to audit, review, or diagnose SEO issues on their site. Also use when the user mentions "SEO audit," "technical SEO," "why am I not ranking," "SEO issues," "on-page SEO," "meta tags review," "SEO health check," "my traffic dropped," "lost rankings," "not showing up in Google," "site isn't ranking," "Google update hit me," "page speed," "core web vitals," "crawl errors," or "indexing issues." Use this even if the user just says something vague like "my SEO is bad" or "help with SEO" — start with an audit. For building pages at scale to target keywords, see programmatic-seo. For adding structured data, see schema-markup. For AI search optimization, see ai-seo.
capture-learning
Analyze recent conversation context and capture learnings to project knowledge files (for project-specific insights) or skills/commands/subagents (for cross-project patterns). Use when the user asks to "capture this learning", "update the docs with this", "remember this for next time", "document this issue", "add this to CLAUDE.md", "save this knowledge", or "update project knowledge". Also triggers after resolving build/setup issues, discovering non-obvious patterns, or completing debugging sessions with valuable insights.
agent-changelog
Compile an agent-optimized changelog by cross-referencing git history with plans and documentation. Use when asked to "update changelog", "compile history", "document project evolution", or proactively after major milestones, architectural changes, or when stale/deprecated information is detected that could confuse coding agents.
Didn't find tool you were looking for?