Agent skill

disaster-recovery-testing

Execute comprehensive disaster recovery tests, validate recovery procedures, and document lessons learned from DR exercises.

Stars 151
Forks 20

Install this agent skill to your Project

npx add-skill https://github.com/aj-geddes/useful-ai-prompts/tree/main/skills/disaster-recovery-testing

SKILL.md

Disaster Recovery Testing

Table of Contents

Overview

Implement systematic disaster recovery testing to validate recovery procedures, measure RTO/RPO, identify gaps, and ensure team readiness for actual incidents.

When to Use

  • Annual DR exercises
  • Infrastructure changes
  • New service deployments
  • Compliance requirements
  • Team training
  • Recovery procedure validation
  • Cross-region failover testing

Quick Start

Minimal working example:

yaml
# dr-test-plan.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dr-test-procedures
  namespace: operations
data:
  dr-test-plan.md: |
    # Disaster Recovery Test Plan

    ## Test Objectives
    - Validate backup restoration procedures
    - Verify failover mechanisms
    - Test DNS failover
    - Validate data integrity post-recovery
    - Measure RTO and RPO
    - Train incident response team

    ## Pre-Test Checklist
    - [ ] Notify stakeholders
    - [ ] Schedule 4-6 hour window
    - [ ] Disable alerting to prevent noise
    - [ ] Backup production data
    - [ ] Ensure DR environment is isolated
    - [ ] Have rollback plan ready
// ... (see reference guides for full implementation)

Reference Guides

Detailed implementations in the references/ directory:

Guide Contents
DR Test Plan and Execution DR Test Plan and Execution
DR Test Script DR Test Script
DR Test Automation DR Test Automation

Best Practices

✅ DO

  • Schedule regular DR tests
  • Document procedures in advance
  • Test in isolated environments
  • Measure actual RTO/RPO
  • Involve all teams
  • Automate validation
  • Record findings
  • Update procedures based on results

❌ DON'T

  • Skip DR testing
  • Test during business hours
  • Test against production
  • Ignore test failures
  • Neglect post-test analysis
  • Forget to re-enable monitoring
  • Use stale backup processes
  • Test only once a year

Didn't find tool you were looking for?

Be as detailed as possible for better results