Agent skill
sql-query-optimizer
Analyzes and optimizes SQL queries across different data warehouse platforms (Snowflake, BigQuery, Redshift, Databricks) with platform-specific recommendations.
Install this agent skill to your Project
npx add-skill https://github.com/a5c-ai/babysitter/tree/main/library/specializations/data-engineering-analytics/skills/sql-query-optimizer
SKILL.md
SQL Query Optimizer
Analyzes and optimizes SQL queries across different data warehouse platforms with platform-specific recommendations.
Overview
This skill examines SQL queries to identify performance bottlenecks, suggest optimizations, and provide platform-specific recommendations for Snowflake, BigQuery, Redshift, and Databricks. It analyzes query execution plans, recommends indexes/clustering keys, and identifies anti-patterns.
Capabilities
- Query execution plan analysis - Parse and analyze EXPLAIN outputs
- Index recommendations - Suggest clustering keys, sort keys, partition keys
- Join optimization - Identify inefficient join patterns and suggest improvements
- Subquery elimination - Convert correlated subqueries to CTEs or joins
- CTE optimization - Materialize vs reference optimization
- Window function optimization - Frame and partition optimization
- Predicate pushdown validation - Verify filter pushdown effectiveness
- Clustering key recommendations - Platform-specific clustering strategies
- Materialized view suggestions - Identify candidates for materialized views
- Platform-specific optimizations - Snowflake, BigQuery, Redshift, Databricks
Input Schema
{
"query": {
"type": "string",
"description": "The SQL query to analyze",
"required": true
},
"platform": {
"type": "string",
"enum": ["snowflake", "bigquery", "redshift", "databricks", "postgres"],
"required": true,
"description": "Target data warehouse platform"
},
"tableStatistics": {
"type": "object",
"description": "Table statistics including row counts, column cardinality",
"properties": {
"tables": {
"type": "array",
"items": {
"name": "string",
"rowCount": "number",
"sizeGB": "number",
"columns": "array"
}
}
}
},
"executionPlan": {
"type": "object",
"description": "Query execution plan (EXPLAIN output)"
},
"queryHistory": {
"type": "object",
"description": "Historical query performance metrics"
},
"optimizationGoals": {
"type": "array",
"items": {
"type": "string",
"enum": ["latency", "cost", "throughput", "scan_reduction"]
},
"default": ["latency", "cost"]
}
}
Output Schema
{
"optimizedQuery": {
"type": "string",
"description": "The optimized SQL query"
},
"improvements": {
"type": "array",
"items": {
"type": {
"type": "string",
"enum": ["join", "predicate", "aggregation", "cte", "window", "scan", "index"]
},
"description": "string",
"impact": "high|medium|low",
"lineNumber": "number",
"originalCode": "string",
"optimizedCode": "string"
}
},
"indexRecommendations": {
"type": "array",
"items": {
"table": "string",
"type": "clustering|sort|partition|index",
"columns": "array",
"rationale": "string",
"ddl": "string"
}
},
"estimatedImprovement": {
"scanReduction": {
"type": "number",
"description": "Percentage reduction in data scanned"
},
"timeReduction": {
"type": "number",
"description": "Percentage reduction in execution time"
},
"costReduction": {
"type": "number",
"description": "Percentage reduction in query cost"
}
},
"antiPatterns": {
"type": "array",
"items": {
"pattern": "string",
"severity": "high|medium|low",
"location": "string",
"suggestion": "string"
}
},
"platformSpecificNotes": {
"type": "array",
"items": "string"
}
}
Usage Examples
Basic Query Optimization
{
"query": "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.created_at > '2024-01-01'",
"platform": "snowflake"
}
With Execution Plan Analysis
{
"query": "SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id",
"platform": "bigquery",
"executionPlan": {
"stages": [...],
"totalBytesProcessed": 1073741824
},
"optimizationGoals": ["cost", "scan_reduction"]
}
With Table Statistics
{
"query": "SELECT ... complex query ...",
"platform": "redshift",
"tableStatistics": {
"tables": [
{
"name": "orders",
"rowCount": 10000000,
"sizeGB": 50,
"columns": [
{"name": "order_id", "cardinality": 10000000},
{"name": "customer_id", "cardinality": 500000}
]
}
]
}
}
Platform-Specific Optimizations
Snowflake
| Optimization | Description |
|---|---|
| Clustering keys | Recommend micro-partition clustering |
| Result cache | Identify queries benefiting from caching |
| Query acceleration | Suggest QUERY_ACCELERATION_MAX_SCALE_FACTOR |
| Warehouse sizing | Right-size warehouse recommendations |
BigQuery
| Optimization | Description |
|---|---|
| Partitioning | DATE/TIMESTAMP partitioning recommendations |
| Clustering | Up to 4 clustering columns |
| BI Engine | Identify BI Engine-eligible queries |
| Slots | Estimate slot usage optimization |
Redshift
| Optimization | Description |
|---|---|
| Sort keys | COMPOUND vs INTERLEAVED recommendations |
| Distribution | KEY, EVEN, ALL distribution strategies |
| Compression | Column encoding recommendations |
| Vacuum | VACUUM and ANALYZE recommendations |
Databricks
| Optimization | Description |
|---|---|
| Z-ordering | Multi-column Z-order recommendations |
| Delta cache | Caching strategy recommendations |
| Photon | Photon-eligible query patterns |
| Adaptive execution | AQE configuration suggestions |
Common Anti-Patterns Detected
Query Structure
| Anti-Pattern | Impact | Fix |
|---|---|---|
| SELECT * | High | Specify columns explicitly |
| Correlated subqueries | High | Convert to JOIN or CTE |
| DISTINCT on large datasets | Medium | Use GROUP BY or window functions |
| Non-SARGable predicates | High | Rewrite for index usage |
Join Issues
| Anti-Pattern | Impact | Fix |
|---|---|---|
| Cartesian products | Critical | Add join conditions |
| Implicit joins | Medium | Use explicit JOIN syntax |
| Wrong join order | High | Reorder by selectivity |
| Missing indexes on join keys | High | Add clustering/sort keys |
Aggregation Issues
| Anti-Pattern | Impact | Fix |
|---|---|---|
| GROUP BY ordinal | Low | Use column names |
| Aggregating before filter | High | Filter first, then aggregate |
| Over-grouping | Medium | Reduce GROUP BY columns |
Integration Points
MCP Server Integration
- Snowflake MCP - Real-time execution plan analysis
- BigQuery MCP - Cost estimation and slot analysis
- Redshift MCP - Query execution and statistics
Related Skills
- Data Quality Profiler (SK-DEA-005) - Table statistics gathering
- dbt Project Analyzer (SK-DEA-003) - Model query optimization
Applicable Processes
- Query Optimization (
query-optimization.js) - Data Warehouse Setup (
data-warehouse-setup.js) - BI Dashboard Development (
bi-dashboard.js) - OBT Creation (
obt-creation.js)
References
- Snowflake Query Performance
- BigQuery Best Practices
- Redshift Performance Tuning
- Databricks SQL Best Practices
Version History
- 1.0.0 - Initial release with multi-platform support
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-tools
Central utility skill for GSD operations. Provides config parsing, slug generation, timestamps, path operations, and orchestrates calls to other specialized skills. Acts as the unified entry point that the original gsd-tools.cjs provided via its lib/ modules (commands, config, core, init).
model-profile-resolution
Resolve model profile (quality/balanced/budget) at orchestration start and map agents to specific models. Enables cost/quality tradeoffs by selecting appropriate AI models for each agent role.
verification-suite
Plan structure validation, phase completeness checks, reference integrity verification, and artifact existence confirmation. Provides the structured verification layer ensuring GSD artifacts are well-formed and complete.
state-management
STATE.md reading, writing, and field-level updates. Provides cross-session state persistence via .planning/STATE.md with structured fields for current task, completed phases, blockers, decisions, and quick tasks.
git-integration
Git commit patterns, formats, and conventions for GSD methodology. Provides atomic commits per task, structured commit messages, planning file commits, branch management, and milestone tag operations.
frontmatter-parsing
YAML frontmatter parsing and manipulation for .planning/ documents. Provides read, write, update, query, and validation operations on frontmatter blocks in GSD markdown artifacts.
Didn't find tool you were looking for?