Agent skill

sql-splitter

High-performance CLI for working with SQL dump files: split/merge by table, analyze contents, validate integrity, convert between MySQL/PostgreSQL/SQLite/MSSQL, create FK-safe samples, shard multi-tenant dumps, generate ERD diagrams, reorder for safe imports, and run SQL analytics with embedded DuckDB. Use when working with .sql dump files for migrations, dev seeding, CI validation, schema visualization, data extraction, or ad-hoc analytics.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/sql-splitter

SKILL.md

sql-splitter Skill

This skill helps you use sql-splitter to manipulate SQL dump files safely and efficiently.

When to Use This Skill

Use sql-splitter when:

The user mentions SQL dump files (.sql, .sql.gz, .sql.bz2, .sql.xz, .sql.zst)
The user wants to migrate, restore, or work with database dump files
The user needs to validate, analyze, split, merge, convert, sample, shard, or query dumps
Working with MySQL, PostgreSQL, SQLite, or MSSQL dump formats
The user wants to run SQL analytics on a dump file without loading it into a database

When NOT to Use This Skill

Do not use sql-splitter when:

Running complex ad-hoc SQL queries against a live database (use psql/mysql/sqlcmd directly)
No dump file exists; only a running database is available
The user needs interactive data editing rather than dump manipulation
Working with dialects beyond MySQL/PostgreSQL/SQLite/MSSQL
Working with MSSQL binary backup files (.bak) or DACPAC/BACPAC formats (only script-based .sql dumps are supported)

Command Reference

split

Split a dump into per-table files.

bash

sql-splitter split dump.sql --output tables/ --progress
sql-splitter split dump.sql --tables users,orders --output tables/
sql-splitter split dump.sql --schema-only --output schema/
sql-splitter split dump.sql --data-only --output data/

merge

Merge per-table files back into a single dump.

bash

sql-splitter merge tables/ --output restored.sql
sql-splitter merge tables/ --output restored.sql --transaction
sql-splitter merge tables/ --exclude logs,cache --output restored.sql

analyze

Get statistics about a dump (read-only).

bash

sql-splitter analyze dump.sql --progress
sql-splitter analyze "dumps/*.sql" --fail-fast

convert

Convert between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs).

bash

sql-splitter convert mysql.sql --to postgres --output pg.sql
sql-splitter convert pg_dump.sql --to mysql --output mysql.sql
sql-splitter convert dump.sql --from postgres --to sqlite --output sqlite.sql
sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql
sql-splitter convert mysql.sql --to mssql --output mssql.sql
sql-splitter convert mysql.sql --to postgres --output - | psql "$PG_CONN"

validate

Check dump integrity (syntax, encoding, PK/FK).

bash

sql-splitter validate dump.sql --strict --progress
sql-splitter validate "dumps/*.sql" --json --fail-fast
sql-splitter validate dump.sql --no-fk-checks --progress

sample

Create reduced datasets with FK preservation.

bash

sql-splitter sample dump.sql --output sampled.sql --percent 10 --preserve-relations
sql-splitter sample dump.sql --output sampled.sql --rows 1000 --preserve-relations
sql-splitter sample dump.sql --output sampled.sql --percent 10 --tables users,orders
sql-splitter sample dump.sql --output sampled.sql --percent 10 --seed 42

shard

Extract tenant-specific data.

bash

sql-splitter shard dump.sql --tenant-value 123 --tenant-column tenant_id --output tenant.sql
sql-splitter shard dump.sql --tenant-values "1,2,3" --tenant-column account_id --output shards/

diff

Compare two SQL dumps for schema and data changes.

bash

sql-splitter diff old.sql new.sql --progress
sql-splitter diff old.sql new.sql --schema-only
sql-splitter diff old.sql new.sql --data-only
sql-splitter diff old.sql new.sql --format json --output diff.json
sql-splitter diff old.sql new.sql --format sql --output migration.sql
sql-splitter diff old.sql new.sql --tables users,orders --progress
sql-splitter diff old.sql new.sql --verbose                           # Show sample PKs
sql-splitter diff old.sql new.sql --ignore-columns "*.updated_at"     # Ignore columns
sql-splitter diff old.sql new.sql --primary-key logs:ts+msg           # Override PK
sql-splitter diff old.sql new.sql --allow-no-pk                       # Tables without PK

redact

Anonymize PII in SQL dumps by replacing sensitive data with fake, hashed, or null values.

bash

# Using YAML config file
sql-splitter redact dump.sql --output safe.sql --config redact.yaml

# Using CLI flags
sql-splitter redact dump.sql --output safe.sql --null "*.ssn" --hash "*.email" --fake "*.name"

# Mask credit cards (keep last 4 digits)
sql-splitter redact dump.sql --output safe.sql --mask "****-****-****-XXXX=*.credit_card"

# Generate config by analyzing input file
sql-splitter redact dump.sql --generate-config --output redact.yaml

# Reproducible with seed
sql-splitter redact dump.sql --output safe.sql --config redact.yaml --seed 42

# Validate config only
sql-splitter redact dump.sql --config redact.yaml --validate

# With specific locale for fake data
sql-splitter redact dump.sql --output safe.sql --fake "*.name" --locale de_de

Strategies:

--null "pattern": Replace with NULL
--hash "pattern": SHA256 hash (deterministic, preserves FK integrity)
--fake "pattern": Generate realistic fake data
--mask "pattern=column": Partial masking
--constant "column=value": Fixed value replacement

Fake generators: email, name, first_name, last_name, phone, address, city, zip, company, ip, uuid, date, credit_card, ssn, lorem, and more.

graph

Generate Entity-Relationship Diagrams (ERD) from SQL dumps.

bash

# Interactive HTML ERD with dark/light mode and panzoom
sql-splitter graph dump.sql --output schema.html

# Graphviz DOT format with ERD-style tables
sql-splitter graph dump.sql --output schema.dot

# Mermaid erDiagram syntax (paste into GitHub/GitLab)
sql-splitter graph dump.sql --output schema.mmd --format mermaid

# JSON with full schema details
sql-splitter graph dump.sql --json

# Filter tables
sql-splitter graph dump.sql --tables "user*,order*" --exclude "log*"

# Show only circular dependencies
sql-splitter graph dump.sql --cycles-only

# Focus on specific table and its dependencies
sql-splitter graph dump.sql --table orders --transitive

# Show tables that depend on users
sql-splitter graph dump.sql --table users --reverse

order

Reorder SQL dump in topological FK order for safe imports.

bash

# Rewrite in safe import order
sql-splitter order dump.sql --output ordered.sql

# Check for cycles without rewriting
sql-splitter order dump.sql --check

# Reverse order (for DROP operations)
sql-splitter order dump.sql --reverse --output drop_order.sql

query

Run SQL analytics on dump files using embedded DuckDB (no database required).

bash

# Single query
sql-splitter query dump.sql "SELECT COUNT(*) FROM users"

# Interactive REPL
sql-splitter query dump.sql --interactive

# Export to JSON/CSV
sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100" -f json -o results.json
sql-splitter query dump.sql "SELECT * FROM users LIMIT 100" -f csv -o users.csv

# With caching (400x faster on repeated queries)
sql-splitter query dump.sql "SELECT ..." --cache

# Disk mode for large dumps (>2GB auto-enabled)
sql-splitter query huge.sql "SELECT ..." --disk

# Filter tables to import (faster startup)
sql-splitter query dump.sql "SELECT * FROM orders" --tables orders,users

# Memory limit
sql-splitter query dump.sql "SELECT ..." --memory-limit 4GB

# Cache management
sql-splitter query --list-cache
sql-splitter query --clear-cache

REPL commands:

.tables - List all tables
.schema [table] - Show schema
.describe <table> - Describe table
.count <table> - Count rows
.sample <table> [n] - Sample rows
.format <fmt> - Set output format (table, json, csv, tsv)
.export <file> <query> - Export query results
.exit - Exit REPL

Step-by-Step Patterns

Pattern 1: Validate Before Use

Before using any dump from an external source:

Validate integrity

bash

sql-splitter validate path/to/dump.sql.gz --strict --progress

If validation fails, check:
- Incorrect dialect? Try --dialect=postgres, --dialect=mysql, or --dialect=mssql
- Encoding issues? Report specific errors to user
- Truncated file? Check file size and completeness
- For MSSQL: Ensure GO batch separators are on their own lines

Analyze structure

bash

sql-splitter analyze path/to/dump.sql.gz --progress

Pattern 2: Database Migration

For migrating between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs):

Validate source

bash

sql-splitter validate source.sql.gz --strict --progress

Convert dialect

bash

sql-splitter convert source.sql.gz --to postgres --output target.sql --strict
# or for MSSQL
sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql

Validate converted output

bash

sql-splitter validate target.sql --dialect=postgres --strict

Or stream directly

bash

sql-splitter convert source.sql.gz --to postgres --output - | psql "$PG_CONN"

Pattern 3: Create Dev Dataset

For creating smaller realistic data for development:

Analyze to understand sizes

bash

sql-splitter analyze prod.sql.zst --progress

Sample with FK preservation

bash

sql-splitter sample prod.sql.zst \
  --output dev_seed.sql \
  --percent 10 \
  --preserve-relations \
  --progress

Restore to dev database
bash
```
psql "$DEV_DB" < dev_seed.sql
```

Pattern 4: CI Validation Gate

For validating dumps in CI pipelines:

bash

sql-splitter validate "dumps/*.sql.gz" --json --fail-fast --strict

Parse with jq:

bash

sql-splitter validate "dumps/*.sql.gz" --json --fail-fast \
  | jq '.results[] | select(.passed == false)'

Pattern 5: Per-Table Editing

When the user needs to edit specific tables:

Split

bash

sql-splitter split dump.sql --output tables/ --progress

Edit the per-table files (tables/users.sql, etc.)

Merge back

bash

sql-splitter merge tables/ --output updated.sql --transaction

Pattern 6: Tenant Extraction

For multi-tenant databases:

Identify tenant column (often tenant_id, account_id, company_id)

Extract tenant data

bash

sql-splitter shard dump.sql \
  --tenant-value 12345 \
  --tenant-column tenant_id \
  --output tenant_12345.sql \
  --progress

Pattern 7: Comparing Dumps for Changes

For detecting schema or data changes between two versions:

Full comparison (schema + data)

bash

sql-splitter diff old_dump.sql new_dump.sql --progress

Schema-only comparison (fast, no data parsing)
bash
```
sql-splitter diff old.sql new.sql --schema-only
```

Generate migration script

bash

sql-splitter diff old.sql new.sql --format sql --output migration.sql

JSON output for automation

bash

sql-splitter diff old.sql new.sql --format json | jq '.summary'

Pattern 8: Data Anonymization

For creating safe development/testing datasets:

Generate redaction config by analyzing dump

bash

sql-splitter redact dump.sql --generate-config --output redact.yaml

Review and customize the generated config

Apply redaction

bash

sql-splitter redact dump.sql --output safe.sql --config redact.yaml --progress

Or use inline patterns for quick redaction

bash

sql-splitter redact dump.sql --output safe.sql \
  --null "*.ssn,*.tax_id" \
  --hash "*.email" \
  --fake "*.name,*.phone"

Validate the redacted output

bash

sql-splitter validate safe.sql --strict

Pattern 9: Schema Visualization

For understanding complex database schemas:

Generate interactive ERD

bash

sql-splitter graph dump.sql --output schema.html
# Opens in browser with dark/light mode, zoom/pan

For documentation (Mermaid)

bash

sql-splitter graph dump.sql --output docs/schema.mmd --format mermaid
# Paste into GitHub/GitLab/Notion

Focus on specific area

bash

# What does orders depend on?
sql-splitter graph dump.sql --table orders --transitive --output orders.html

# What depends on users?
sql-splitter graph dump.sql --table users --reverse --output users_deps.html

Find circular dependencies

bash

sql-splitter graph dump.sql --cycles-only

Pattern 10: Safe Import Order

For ensuring FK constraints don't fail during restore:

Check for cycles
bash
```
sql-splitter order dump.sql --check
```

Reorder if needed

bash

sql-splitter order dump.sql --output ordered.sql

For DROP operations (reverse order)

bash

sql-splitter order dump.sql --reverse --output drop_order.sql

Pattern 11: Ad-hoc SQL Analytics

For running SQL queries on dump files without loading into a database:

Quick exploratory query

bash

sql-splitter query dump.sql "SELECT COUNT(*) FROM users"

Interactive exploration (REPL)

bash

sql-splitter query dump.sql --interactive
# sql> .tables
# sql> SELECT * FROM orders LIMIT 10
# sql> .count users

Export analysis results

bash

sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 1000" -f csv -o big_orders.csv

Repeated queries with caching (400x speedup)

bash

# First run imports and caches
sql-splitter query dump.sql "SELECT COUNT(*) FROM orders" --cache

# Subsequent runs use cache
sql-splitter query dump.sql "SELECT SUM(total) FROM orders" --cache

Complex analytics

bash

sql-splitter query dump.sql "
  SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent
  FROM users u
  JOIN orders o ON u.id = o.user_id
  GROUP BY u.name
  ORDER BY total_spent DESC
  LIMIT 10
" -f json

Common Flag Combinations

Goal	Flags
CI validation	`--strict --fail-fast --json`
Safe exploration	`--dry-run --progress`
Reproducible sampling	`--seed 42 --preserve-relations`
Fast progress feedback	`--progress`
Compressed output	Pipe to `gzip -c` or `zstd -c`

Error Handling

Dialect Detection Issues

If auto-detection fails, specify explicitly:

bash

sql-splitter validate dump.sql --dialect=postgres
sql-splitter validate mssql_dump.sql --dialect=mssql
sql-splitter convert dump.sql --from mysql --to postgres --output out.sql
sql-splitter convert dump.sql --from mssql --to mysql --output out.sql

Validation Failures

Parse --json output for specific errors
Check for encoding issues, missing FKs, duplicate PKs
Use --no-fk-checks to skip expensive integrity checks

Large Files

sql-splitter uses constant ~50MB memory
Downstream tools may be bottlenecks
Test with sample before full operations

Implementation Checklist

When using this skill:

Detect applicability: Check for .sql files or dump-related tasks
Clarify intent: Validation? Conversion? Sampling? Splitting?
Choose pattern: Map goal to one of the patterns above
Propose plan: Explain steps before executing
Use safe flags: --dry-run first, then --progress for feedback
Summarize results: Report success/failure with key stats

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/sql-splitter
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

sql-splitter Skill

When to Use This Skill

When NOT to Use This Skill

Command Reference

split

merge

analyze

convert

validate

sample

shard

diff

redact

graph

order

query

Step-by-Step Patterns

Pattern 1: Validate Before Use

Pattern 2: Database Migration

Pattern 3: Create Dev Dataset

Pattern 4: CI Validation Gate

Pattern 5: Per-Table Editing

Pattern 6: Tenant Extraction

Pattern 7: Comparing Dumps for Changes

Pattern 8: Data Anonymization

Pattern 9: Schema Visualization

Pattern 10: Safe Import Order

Pattern 11: Ad-hoc SQL Analytics

Common Flag Combinations

Error Handling

Dialect Detection Issues

Validation Failures

Large Files

Implementation Checklist

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state