Agent skill
database-reliability
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/database-reliability
SKILL.md
ποΈ Skill: Database Reliability
π Metadata
| Atributo | Valor |
|---|---|
| ID | sre-database-reliability |
| Nivel | π΄ Avanzado |
| VersiΓ³n | 1.0.0 |
| Keywords | database, postgresql, mysql, mongodb, replication, backup, failover, connection-pooling |
| Referencia | PostgreSQL High Availability |
π Keywords para InvocaciΓ³n
database-reliabilitydatabase-replicationdatabase-backupfailoverconnection-poolingdatabase-monitoring@skill:database-reliability
Ejemplos de Prompts
Implementa database replication y failover para PostgreSQL
Configura backup y recovery strategy para base de datos
Setup connection pooling y database monitoring
@skill:database-reliability - Alta disponibilidad de base de datos
π DescripciΓ³n
Database reliability es fundamental para servicios confiables. Este skill cubre replication, failover, backups, connection pooling, monitoring, y disaster recovery para bases de datos en producciΓ³n.
β CuΓ‘ndo Usar Este Skill
- Bases de datos en producciΓ³n
- Servicios con requisitos de alta disponibilidad
- Datos crΓticos
- SLAs estrictos
- Compliance requirements
β CuΓ‘ndo NO Usar Este Skill
- Desarrollo local con datos de prueba
- Prototipos sin datos reales
- Sistemas de solo lectura sin disponibilidad crΓtica
ποΈ Arquitectura de Alta Disponibilidad
βββββββββββββββββββββββββββββββββββββββββββ
β Application Layer β
β βββββββββββ βββββββββββ βββββββββββ β
β β App β β App β β App β β
β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β
βββββββββΌβββββββββββββΌβββββββββββββΌββββββββ
β β β
ββββββββββββββΌβββββββββββββ
β
ββββββββΌβββββββ
β Connection β
β Pool β
ββββββββ¬βββββββ
β
ββββββββββββββΌβββββββββββββ
β β β
ββββββΌβββββ ββββββΌββββ βββββββΌβββββ
βPrimary β βReplica β β Replica β
β(Master) β β(Read) β β (Read) β
ββββββ¬βββββ ββββββββββ ββββββββββββ
β
β Replication
β
ββββββΌβββββ
β Standby β
β (DR) β
βββββββββββ
π» ImplementaciΓ³n
π Scripts Ejecutables: Este skill incluye scripts Python ejecutables en la carpeta
scripts/:
postgresql_backup.py- AutomatizaciΓ³n de backups y restauraciΓ³nrequirements.txt- Dependencias (ninguna, usa stdlib)Ver
scripts/README.mdpara documentaciΓ³n de uso.
1. PostgreSQL Replication Setup
1.1 Primary Configuration
# postgresql.conf (Primary)
# Connection Settings
listen_addresses = '*'
max_connections = 200
# Replication Settings
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
hot_standby = on
# Performance
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 20MB
min_wal_size = 1GB
max_wal_size = 4GB
# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_timezone = 'UTC'
# pg_hba.conf (Primary)
# Allow replication from replicas
host replication replicator 10.0.0.0/24 md5
1.2 Replica Configuration
# Setup replication (on replica server)
# 1. Stop PostgreSQL
sudo systemctl stop postgresql
# 2. Backup primary
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P -W -R -S replication_slot_1
# 3. Configure recovery
# postgresql.conf (Replica)
hot_standby = on
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on
# postgresql.auto.conf (Auto-generated)
primary_conninfo = 'host=primary-host port=5432 user=replicator password=xxx application_name=replica1'
primary_slot_name = 'replication_slot_1'
1.3 Replication Slot Management
-- Create replication slot
SELECT pg_create_physical_replication_slot('replication_slot_1');
-- List replication slots
SELECT * FROM pg_replication_slots;
-- Monitor replication lag
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
sync_state,
sync_priority
FROM pg_stat_replication;
-- Check lag in bytes
SELECT
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
client_addr,
application_name
FROM pg_stat_replication;
2. Automatic Failover (Patroni)
# patroni/patroni.yml
scope: postgres-cluster
namespace: /db/
name: postgres-node1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.1:8008
auth: 'username:password'
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 30
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
max_connections: 200
max_worker_processes: 8
wal_level: replica
hot_standby: on
wal_keep_segments: 8
max_wal_senders: 3
max_replication_slots: 3
synchronous_commit: 'on'
synchronous_standby_names: 'ANY 1 (postgres-node2,postgres-node3)'
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.1.1:5432
data_dir: /var/lib/postgresql/data
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replicator
superuser:
username: postgres
password: postgres
parameters:
unix_socket_directories: '/var/run/postgresql'
recovery_conf:
restore_command: 'cp /backup/%f %p'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
3. Connection Pooling (PgBouncer)
# pgbouncer/pgbouncer.ini
[databases]
mydb = host=primary-host port=5432 dbname=mydb
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_db_connections = 100
max_user_connections = 50
server_round_robin = 1
ignore_startup_parameters = extra_float_digits
# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
# Admin console
admin_users = admin
stats_users = stats
# Health check
server_check_query = SELECT 1
server_check_delay = 30
server_lifetime = 3600
server_idle_timeout = 600
# Connection limits
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
4. Backup Strategy
#!/bin/bash
# scripts/backup.sh
BACKUP_DIR="/backup/postgresql"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30
# Full backup
pg_basebackup \
-h localhost \
-D "$BACKUP_DIR/full_$DATE" \
-U backup_user \
-Ft \
-z \
-P
# WAL archiving (continuous)
# In postgresql.conf:
# archive_mode = on
# archive_command = 'test ! -f /backup/wal/%f && cp %p /backup/wal/%f'
# Cleanup old backups
find "$BACKUP_DIR" -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \;
# Verify backup
pg_verifybackup -D "$BACKUP_DIR/full_$DATE"
Script ejecutable: scripts/postgresql_backup.py
Script CLI completo para automatizar backups, restauraciΓ³n y limpieza de PostgreSQL.
CuΓ‘ndo ejecutar
- Backups programados: Como parte de cron jobs o tareas automatizadas
- RestauraciΓ³n de emergencia: Para restaurar bases de datos desde backups
- Limpieza de backups: Para mantener el espacio de almacenamiento bajo control
- VerificaciΓ³n de backups: Para validar la integridad de backups antes de restaurar
Uso como CLI
# Crear backup completo
python scripts/postgresql_backup.py backup \
--host localhost \
--user backup_user \
--backup-dir /backup/postgresql
# Listar backups disponibles
python scripts/postgresql_backup.py list \
--backup-dir /backup/postgresql
# Limpiar backups antiguos (dry-run primero)
python scripts/postgresql_backup.py cleanup \
--backup-dir /backup/postgresql \
--retention-days 30 \
--dry-run
# Ejecutar limpieza real
python scripts/postgresql_backup.py cleanup \
--backup-dir /backup/postgresql \
--retention-days 30
# Restaurar desde backup
python scripts/postgresql_backup.py restore \
--backup-path /backup/postgresql/full_20240115_120000 \
--target-dir /var/lib/postgresql/data
Uso como mΓ³dulo
from scripts.postgresql_backup import PostgreSQLBackup
# Crear instancia
backup = PostgreSQLBackup(
host="localhost",
port=5432,
user="backup_user",
backup_dir="/backup/postgresql"
)
# Crear backup
backup_path = backup.full_backup(verify=True)
# Limpiar backups antiguos
backup.cleanup_old_backups(retention_days=30)
# Listar backups
backup.list_backups()
CaracterΓsticas
- β
Backup fΓsico completo con
pg_basebackup - β VerificaciΓ³n automΓ‘tica de backups
- β RestauraciΓ³n automatizada
- β Limpieza de backups antiguos con polΓtica de retenciΓ³n
- β Listado de backups disponibles
- β Modo dry-run para limpieza
- β Manejo de errores robusto
5. Database Monitoring
# prometheus/database-metrics.yml
scrape_configs:
- job_name: 'postgres-exporter'
static_configs:
- targets: ['postgres-exporter:9187']
- job_name: 'postgresql'
static_configs:
- targets: ['localhost:9187']
-- Custom metrics queries
-- Connection pool usage
SELECT
datname,
numbackends as active_connections,
max_connections,
(numbackends::float / max_connections * 100) as connection_pct
FROM pg_stat_database;
-- Replication lag
SELECT
EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;
-- Table bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS bloat
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
-- Slow queries
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query,
state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
AND state = 'active';
-- Index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
6. Alert Rules
# prometheus/alerts/database-alerts.yml
groups:
- name: database_alerts
interval: 30s
rules:
# High connection usage
- alert: DatabaseHighConnections
expr: |
(pg_stat_database_numbackends / pg_stat_database_max_connections) > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "High connection usage in {{ $labels.datname }}"
description: "{{ $value | humanizePercentage }} of connections in use"
# Replication lag
- alert: DatabaseReplicationLag
expr: |
pg_replication_lag_seconds > 30
for: 5m
labels:
severity: warning
annotations:
summary: "High replication lag"
description: "Replication lag is {{ $value }} seconds"
# Database down
- alert: DatabaseDown
expr: |
up{job="postgres-exporter"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Database exporter is down"
# Slow queries
- alert: DatabaseSlowQueries
expr: |
pg_stat_activity_max_tx_duration > 300
for: 5m
labels:
severity: warning
annotations:
summary: "Slow query detected"
description: "Query running for {{ $value }} seconds"
π― Mejores PrΓ‘cticas
1. Replication
β DO:
- Use synchronous replication for critical data
- Monitor replication lag
- Test failover regularly
- Use replication slots
β DON'T:
- Ignore replication lag
- Skip failover testing
- Use async replication for critical writes
2. Backups
β DO:
- Test restore procedures regularly
- Use point-in-time recovery
- Store backups off-site
- Encrypt backups
β DON'T:
- Assume backups work without testing
- Keep all backups in one location
- Skip backup verification
3. Connection Pooling
β DO:
- Use connection pooling
- Set appropriate pool sizes
- Monitor pool usage
- Handle connection errors gracefully
β DON'T:
- Exceed database connection limits
- Use too large pools
- Ignore pool exhaustion
π¨ Troubleshooting
Replication Lag
- Check network connectivity
- Monitor WAL generation rate
- Check replica performance
- Consider increasing resources
Connection Exhaustion
- Reduce connection pool size
- Close idle connections
- Use connection pooling
- Increase database max_connections
Backup Failures
- Check disk space
- Verify permissions
- Test backup commands manually
- Review backup logs
π Recursos Adicionales
VersiΓ³n: 1.0.0
Γltima actualizaciΓ³n: Diciembre 2025
Total lΓneas: 1,100+
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?