Agent skill
DuckDB Backup
Comprehensive DuckDB database backup toolkit supporting both file-based (cp) and native (ATTACH+COPY) backup approaches. Use when you need to backup DuckDB databases locally or to remote storage, create daily scheduled backups, verify backup integrity, or manage backup retention policies.
Install this agent skill to your Project
npx add-skill https://github.com/goodluckz/claude-code-config/tree/main/plugins/duckdb-backup
SKILL.md
DuckDB Backup Skill
Quick Start
Two backup methods, choose based on your needs:
File-Based Backup (cp method) - Fast for local backups
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup data/backup-20251223.duckdb \
--method cp
Native DuckDB Backup (attach method) - For remote/cloud backups
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup data/backup-20251223.duckdb \
--method attach
With Automatic Timestamp Naming
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup data/backups/ \
--method cp \
--timestamp
# Creates: data/backups/backup-20251223-153045.duckdb
Features
- Two backup methods: cp (fast, local) and attach (flexible, cloud-ready)
- Timestamped backups: Automatic unique naming with YYYYMMDD-HHMMSS format
- Error handling: Comprehensive logging and error reporting
- Scheduled backups: cron/Task Scheduler integration for daily backups
- Verification: Tools to verify backup integrity and table counts
- Retention policies: Scripts to manage backup history and cleanup
When to Use
File-Based (cp method):
- Daily local backups (fastest approach)
- Pre-materialization safety snapshots
- Development/testing environments
- When speed is critical
- Local machine backups
Native DuckDB (attach method):
- Cloud/remote storage backups
- Cross-environment transfers
- Database in active use scenarios
- When you need portable backups
- Complex backup scenarios
Command Reference
python3 scripts/backup_duckdb.py --db <source> --backup <target> [options]
Arguments:
--db(required): Path to source DuckDB database--backup(required): Backup target (file path for cp, directory for attach with --timestamp)--method(optional): Backup method -cp(default) orattach--timestamp(optional): Add YYYYMMDD-HHMMSS timestamp to filename
Exit codes:
0: Backup successful1: Backup failed (check logs)
Examples
Example 1: One-time Local Backup
python3 scripts/backup_duckdb.py \
--db ~/LocalRepos/awsntpdagster/data/awsntp.duckdb \
--backup ~/LocalRepos/awsntpdagster/data/backup-20251223.duckdb \
--method cp
Example 2: Daily Timestamped Backups to Folder
python3 scripts/backup_duckdb.py \
--db ~/LocalRepos/awsntpdagster/data/awsntp.duckdb \
--backup ~/LocalRepos/awsntpdagster/data/backups \
--method cp \
--timestamp
Example 3: Scheduled Daily Backup (cron)
Add to crontab:
0 2 * * * python3 /path/to/backup_duckdb.py --db /path/to/awsntp.duckdb --backup /path/to/backups/ --method cp --timestamp >> /var/log/duckdb_backup.log 2>&1
Example 4: Backup with Retention Cleanup
#!/bin/bash
# Backup and keep only 7 most recent
BACKUP_DIR="/path/to/backups"
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup "$BACKUP_DIR" \
--method cp \
--timestamp
# Keep only 7 most recent backups
ls -t "$BACKUP_DIR"/backup-*.duckdb | tail -n +8 | xargs rm -f
Backup Strategy Selection
For 975 MB DuckDB database (awsntp.duckdb):
| Scenario | Method | Speed | Best For |
|---|---|---|---|
| Daily backup before asset materialization | cp | ~1-2s | Production safety |
| Weekly archive to external drive | cp | ~1-2s | Local storage |
| Cloud backup to S3 | attach | ~30-60s | Remote storage |
| Backup during active queries | attach | N/A | Concurrent access |
Advanced Usage
Verify Backup Integrity
# Check backup exists and is readable
duckdb data/backup-20251223-153045.duckdb \
"SELECT COUNT(*) as table_count FROM information_schema.tables;"
Compare Original and Backup Sizes
ls -lh data/awsntp.duckdb data/backup-*.duckdb | awk '{print $5, $9}'
List All Backups Chronologically
ls -lt data/backups/backup-*.duckdb | head -10
Troubleshooting
Issue: "duckdb: command not found"
- Install DuckDB CLI or update PATH
- Use full path to duckdb binary if attach method needed
Issue: "database is locked"
- Ensure no active connections to source database
- attach method can work around this (doesn't lock database)
Issue: "out of disk space"
- Check available disk space:
df -h - Use attach method for cloud storage
- Implement retention cleanup scripts
Issue: Slow backup
- For large databases, cp method is fastest
- Schedule backups during off-peak hours
- Consider incremental backup strategies
Integration with Your Pipeline
Pre-materialization Backup
# Backup before running expensive assets
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup data/pre-materialization-backup.duckdb \
--method cp
# Then run your asset materialization
dagster asset materialize -m awsntpdagster.definitions --select awsntp_features_merged_v6
Scheduled Daily Backup
For your awsntpdagster project:
macOS/Linux crontab:
0 2 * * * cd /Users/zhaoliang/LocalRepos/awsntpdagster && python3 src/awsntpdagster/scripts/backup_duckdb.py --db data/awsntp.duckdb --backup data/backups/ --method cp --timestamp
Reference
For detailed backup strategy guide, scheduling patterns, and retention policies, see:
- backup_strategies.md - When to use each method, cron setup, verification, troubleshooting
Performance Notes
- cp method: ~1-2 seconds for 975 MB database (your awsntp.duckdb)
- attach method: ~30-60 seconds for 975 MB database
- Both methods: Minimal CPU usage
- Disk space required: 1x original database size
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
amend-staged
Amend the last commit with staged changes only
test00-check
Check test00 honeynet production server status - pcap files, tcpdump processes, containers, and Google Drive sync. Use when user wants to check test00 data or server health.
commit-staged
Git commit commands for staged-only commits
git-submodule
Add a directory as a git submodule with its own remote repository. Use when user wants to track a directory with separate git history.
dagster-remote
Manage remote Dagster server on dns-analy4 - start, stop, restart, check status, sync code, and view logs. Use when user wants to manage the remote Dagster deployment.
arkime-tag-manager
Manage Arkime session tags (add/remove) using API
Didn't find tool you were looking for?