Agent skill
sql-splitter
High-performance CLI for working with SQL dump files: split/merge by table, analyze contents, validate integrity, convert between MySQL/PostgreSQL/SQLite/MSSQL, create FK-safe samples, shard multi-tenant dumps, generate ERD diagrams, reorder for safe imports, and run SQL analytics with embedded DuckDB. Use when working with .sql dump files for migrations, dev seeding, CI validation, schema visualization, data extraction, or ad-hoc analytics.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/sql-splitter
SKILL.md
sql-splitter Skill
This skill helps you use sql-splitter to manipulate SQL dump files safely and efficiently.
When to Use This Skill
Use sql-splitter when:
- The user mentions SQL dump files (
.sql,.sql.gz,.sql.bz2,.sql.xz,.sql.zst) - The user wants to migrate, restore, or work with database dump files
- The user needs to validate, analyze, split, merge, convert, sample, shard, or query dumps
- Working with MySQL, PostgreSQL, SQLite, or MSSQL dump formats
- The user wants to run SQL analytics on a dump file without loading it into a database
When NOT to Use This Skill
Do not use sql-splitter when:
- Running complex ad-hoc SQL queries against a live database (use
psql/mysql/sqlcmddirectly) - No dump file exists; only a running database is available
- The user needs interactive data editing rather than dump manipulation
- Working with dialects beyond MySQL/PostgreSQL/SQLite/MSSQL
- Working with MSSQL binary backup files (.bak) or DACPAC/BACPAC formats (only script-based .sql dumps are supported)
Command Reference
split
Split a dump into per-table files.
sql-splitter split dump.sql --output tables/ --progress
sql-splitter split dump.sql --tables users,orders --output tables/
sql-splitter split dump.sql --schema-only --output schema/
sql-splitter split dump.sql --data-only --output data/
merge
Merge per-table files back into a single dump.
sql-splitter merge tables/ --output restored.sql
sql-splitter merge tables/ --output restored.sql --transaction
sql-splitter merge tables/ --exclude logs,cache --output restored.sql
analyze
Get statistics about a dump (read-only).
sql-splitter analyze dump.sql --progress
sql-splitter analyze "dumps/*.sql" --fail-fast
convert
Convert between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs).
sql-splitter convert mysql.sql --to postgres --output pg.sql
sql-splitter convert pg_dump.sql --to mysql --output mysql.sql
sql-splitter convert dump.sql --from postgres --to sqlite --output sqlite.sql
sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql
sql-splitter convert mysql.sql --to mssql --output mssql.sql
sql-splitter convert mysql.sql --to postgres --output - | psql "$PG_CONN"
validate
Check dump integrity (syntax, encoding, PK/FK).
sql-splitter validate dump.sql --strict --progress
sql-splitter validate "dumps/*.sql" --json --fail-fast
sql-splitter validate dump.sql --no-fk-checks --progress
sample
Create reduced datasets with FK preservation.
sql-splitter sample dump.sql --output sampled.sql --percent 10 --preserve-relations
sql-splitter sample dump.sql --output sampled.sql --rows 1000 --preserve-relations
sql-splitter sample dump.sql --output sampled.sql --percent 10 --tables users,orders
sql-splitter sample dump.sql --output sampled.sql --percent 10 --seed 42
shard
Extract tenant-specific data.
sql-splitter shard dump.sql --tenant-value 123 --tenant-column tenant_id --output tenant.sql
sql-splitter shard dump.sql --tenant-values "1,2,3" --tenant-column account_id --output shards/
diff
Compare two SQL dumps for schema and data changes.
sql-splitter diff old.sql new.sql --progress
sql-splitter diff old.sql new.sql --schema-only
sql-splitter diff old.sql new.sql --data-only
sql-splitter diff old.sql new.sql --format json --output diff.json
sql-splitter diff old.sql new.sql --format sql --output migration.sql
sql-splitter diff old.sql new.sql --tables users,orders --progress
sql-splitter diff old.sql new.sql --verbose # Show sample PKs
sql-splitter diff old.sql new.sql --ignore-columns "*.updated_at" # Ignore columns
sql-splitter diff old.sql new.sql --primary-key logs:ts+msg # Override PK
sql-splitter diff old.sql new.sql --allow-no-pk # Tables without PK
redact
Anonymize PII in SQL dumps by replacing sensitive data with fake, hashed, or null values.
# Using YAML config file
sql-splitter redact dump.sql --output safe.sql --config redact.yaml
# Using CLI flags
sql-splitter redact dump.sql --output safe.sql --null "*.ssn" --hash "*.email" --fake "*.name"
# Mask credit cards (keep last 4 digits)
sql-splitter redact dump.sql --output safe.sql --mask "****-****-****-XXXX=*.credit_card"
# Generate config by analyzing input file
sql-splitter redact dump.sql --generate-config --output redact.yaml
# Reproducible with seed
sql-splitter redact dump.sql --output safe.sql --config redact.yaml --seed 42
# Validate config only
sql-splitter redact dump.sql --config redact.yaml --validate
# With specific locale for fake data
sql-splitter redact dump.sql --output safe.sql --fake "*.name" --locale de_de
Strategies:
--null "pattern": Replace with NULL--hash "pattern": SHA256 hash (deterministic, preserves FK integrity)--fake "pattern": Generate realistic fake data--mask "pattern=column": Partial masking--constant "column=value": Fixed value replacement
Fake generators: email, name, first_name, last_name, phone, address, city, zip, company, ip, uuid, date, credit_card, ssn, lorem, and more.
graph
Generate Entity-Relationship Diagrams (ERD) from SQL dumps.
# Interactive HTML ERD with dark/light mode and panzoom
sql-splitter graph dump.sql --output schema.html
# Graphviz DOT format with ERD-style tables
sql-splitter graph dump.sql --output schema.dot
# Mermaid erDiagram syntax (paste into GitHub/GitLab)
sql-splitter graph dump.sql --output schema.mmd --format mermaid
# JSON with full schema details
sql-splitter graph dump.sql --json
# Filter tables
sql-splitter graph dump.sql --tables "user*,order*" --exclude "log*"
# Show only circular dependencies
sql-splitter graph dump.sql --cycles-only
# Focus on specific table and its dependencies
sql-splitter graph dump.sql --table orders --transitive
# Show tables that depend on users
sql-splitter graph dump.sql --table users --reverse
order
Reorder SQL dump in topological FK order for safe imports.
# Rewrite in safe import order
sql-splitter order dump.sql --output ordered.sql
# Check for cycles without rewriting
sql-splitter order dump.sql --check
# Reverse order (for DROP operations)
sql-splitter order dump.sql --reverse --output drop_order.sql
query
Run SQL analytics on dump files using embedded DuckDB (no database required).
# Single query
sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
# Interactive REPL
sql-splitter query dump.sql --interactive
# Export to JSON/CSV
sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100" -f json -o results.json
sql-splitter query dump.sql "SELECT * FROM users LIMIT 100" -f csv -o users.csv
# With caching (400x faster on repeated queries)
sql-splitter query dump.sql "SELECT ..." --cache
# Disk mode for large dumps (>2GB auto-enabled)
sql-splitter query huge.sql "SELECT ..." --disk
# Filter tables to import (faster startup)
sql-splitter query dump.sql "SELECT * FROM orders" --tables orders,users
# Memory limit
sql-splitter query dump.sql "SELECT ..." --memory-limit 4GB
# Cache management
sql-splitter query --list-cache
sql-splitter query --clear-cache
REPL commands:
.tables- List all tables.schema [table]- Show schema.describe <table>- Describe table.count <table>- Count rows.sample <table> [n]- Sample rows.format <fmt>- Set output format (table, json, csv, tsv).export <file> <query>- Export query results.exit- Exit REPL
Step-by-Step Patterns
Pattern 1: Validate Before Use
Before using any dump from an external source:
-
Validate integrity
bashsql-splitter validate path/to/dump.sql.gz --strict --progress -
If validation fails, check:
- Incorrect dialect? Try
--dialect=postgres,--dialect=mysql, or--dialect=mssql - Encoding issues? Report specific errors to user
- Truncated file? Check file size and completeness
- For MSSQL: Ensure GO batch separators are on their own lines
- Incorrect dialect? Try
-
Analyze structure
bashsql-splitter analyze path/to/dump.sql.gz --progress
Pattern 2: Database Migration
For migrating between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs):
-
Validate source
bashsql-splitter validate source.sql.gz --strict --progress -
Convert dialect
bashsql-splitter convert source.sql.gz --to postgres --output target.sql --strict # or for MSSQL sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql -
Validate converted output
bashsql-splitter validate target.sql --dialect=postgres --strict -
Or stream directly
bashsql-splitter convert source.sql.gz --to postgres --output - | psql "$PG_CONN"
Pattern 3: Create Dev Dataset
For creating smaller realistic data for development:
-
Analyze to understand sizes
bashsql-splitter analyze prod.sql.zst --progress -
Sample with FK preservation
bashsql-splitter sample prod.sql.zst \ --output dev_seed.sql \ --percent 10 \ --preserve-relations \ --progress -
Restore to dev database
bashpsql "$DEV_DB" < dev_seed.sql
Pattern 4: CI Validation Gate
For validating dumps in CI pipelines:
sql-splitter validate "dumps/*.sql.gz" --json --fail-fast --strict
Parse with jq:
sql-splitter validate "dumps/*.sql.gz" --json --fail-fast \
| jq '.results[] | select(.passed == false)'
Pattern 5: Per-Table Editing
When the user needs to edit specific tables:
-
Split
bashsql-splitter split dump.sql --output tables/ --progress -
Edit the per-table files (
tables/users.sql, etc.) -
Merge back
bashsql-splitter merge tables/ --output updated.sql --transaction
Pattern 6: Tenant Extraction
For multi-tenant databases:
-
Identify tenant column (often
tenant_id,account_id,company_id) -
Extract tenant data
bashsql-splitter shard dump.sql \ --tenant-value 12345 \ --tenant-column tenant_id \ --output tenant_12345.sql \ --progress
Pattern 7: Comparing Dumps for Changes
For detecting schema or data changes between two versions:
-
Full comparison (schema + data)
bashsql-splitter diff old_dump.sql new_dump.sql --progress -
Schema-only comparison (fast, no data parsing)
bashsql-splitter diff old.sql new.sql --schema-only -
Generate migration script
bashsql-splitter diff old.sql new.sql --format sql --output migration.sql -
JSON output for automation
bashsql-splitter diff old.sql new.sql --format json | jq '.summary'
Pattern 8: Data Anonymization
For creating safe development/testing datasets:
-
Generate redaction config by analyzing dump
bashsql-splitter redact dump.sql --generate-config --output redact.yaml -
Review and customize the generated config
-
Apply redaction
bashsql-splitter redact dump.sql --output safe.sql --config redact.yaml --progress -
Or use inline patterns for quick redaction
bashsql-splitter redact dump.sql --output safe.sql \ --null "*.ssn,*.tax_id" \ --hash "*.email" \ --fake "*.name,*.phone" -
Validate the redacted output
bashsql-splitter validate safe.sql --strict
Pattern 9: Schema Visualization
For understanding complex database schemas:
-
Generate interactive ERD
bashsql-splitter graph dump.sql --output schema.html # Opens in browser with dark/light mode, zoom/pan -
For documentation (Mermaid)
bashsql-splitter graph dump.sql --output docs/schema.mmd --format mermaid # Paste into GitHub/GitLab/Notion -
Focus on specific area
bash# What does orders depend on? sql-splitter graph dump.sql --table orders --transitive --output orders.html # What depends on users? sql-splitter graph dump.sql --table users --reverse --output users_deps.html -
Find circular dependencies
bashsql-splitter graph dump.sql --cycles-only
Pattern 10: Safe Import Order
For ensuring FK constraints don't fail during restore:
-
Check for cycles
bashsql-splitter order dump.sql --check -
Reorder if needed
bashsql-splitter order dump.sql --output ordered.sql -
For DROP operations (reverse order)
bashsql-splitter order dump.sql --reverse --output drop_order.sql
Pattern 11: Ad-hoc SQL Analytics
For running SQL queries on dump files without loading into a database:
-
Quick exploratory query
bashsql-splitter query dump.sql "SELECT COUNT(*) FROM users" -
Interactive exploration (REPL)
bashsql-splitter query dump.sql --interactive # sql> .tables # sql> SELECT * FROM orders LIMIT 10 # sql> .count users -
Export analysis results
bashsql-splitter query dump.sql "SELECT * FROM orders WHERE total > 1000" -f csv -o big_orders.csv -
Repeated queries with caching (400x speedup)
bash# First run imports and caches sql-splitter query dump.sql "SELECT COUNT(*) FROM orders" --cache # Subsequent runs use cache sql-splitter query dump.sql "SELECT SUM(total) FROM orders" --cache -
Complex analytics
bashsql-splitter query dump.sql " SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name ORDER BY total_spent DESC LIMIT 10 " -f json
Common Flag Combinations
| Goal | Flags |
|---|---|
| CI validation | --strict --fail-fast --json |
| Safe exploration | --dry-run --progress |
| Reproducible sampling | --seed 42 --preserve-relations |
| Fast progress feedback | --progress |
| Compressed output | Pipe to gzip -c or zstd -c |
Error Handling
Dialect Detection Issues
If auto-detection fails, specify explicitly:
sql-splitter validate dump.sql --dialect=postgres
sql-splitter validate mssql_dump.sql --dialect=mssql
sql-splitter convert dump.sql --from mysql --to postgres --output out.sql
sql-splitter convert dump.sql --from mssql --to mysql --output out.sql
Validation Failures
- Parse
--jsonoutput for specific errors - Check for encoding issues, missing FKs, duplicate PKs
- Use
--no-fk-checksto skip expensive integrity checks
Large Files
- sql-splitter uses constant ~50MB memory
- Downstream tools may be bottlenecks
- Test with
samplebefore full operations
Implementation Checklist
When using this skill:
- Detect applicability: Check for
.sqlfiles or dump-related tasks - Clarify intent: Validation? Conversion? Sampling? Splitting?
- Choose pattern: Map goal to one of the patterns above
- Propose plan: Explain steps before executing
- Use safe flags:
--dry-runfirst, then--progressfor feedback - Summarize results: Report success/failure with key stats
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?