Agent skill

chaos-engineering

Test system resilience through controlled failures. Use when validating fault tolerance, disaster recovery, or system reliability. Covers chaos experiments.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/chaos-engineering

SKILL.md

Chaos Engineering

Principles

  1. Build a Hypothesis: Define expected behavior
  2. Minimize Blast Radius: Start small
  3. Run in Production: Real conditions matter
  4. Automate: Make experiments repeatable
  5. Minimize Impact: Have abort conditions

Experiment Process

  1. Steady State: Define normal metrics
  2. Hypothesis: "System will maintain X under condition Y"
  3. Introduce Variables: Inject failure
  4. Observe: Compare to steady state
  5. Analyze: Confirm or disprove hypothesis

Common Experiments

Network Failures

bash
# Add latency
tc qdisc add dev eth0 root netem delay 100ms

# Packet loss
tc qdisc add dev eth0 root netem loss 10%

# Remove
tc qdisc del dev eth0 root

Resource Exhaustion

bash
# CPU stress
stress --cpu 4 --timeout 60s

# Memory stress
stress --vm 2 --vm-bytes 1G --timeout 60s

# Disk fill
dd if=/dev/zero of=/tmp/fill bs=1M count=1024

Service Failures

  • Kill processes
  • Restart containers
  • Terminate instances
  • Block dependencies

Chaos Tools

  • Chaos Monkey: Random instance termination
  • Gremlin: Comprehensive chaos platform
  • Litmus: Kubernetes chaos engineering
  • Chaos Mesh: Cloud-native chaos

Experiment Template

markdown
## Experiment: [Name]

### Hypothesis
If [condition], then [expected behavior].

### Steady State
- Metric A: [baseline value]
- Metric B: [baseline value]

### Method
1. [Step 1]
2. [Step 2]
3. [Step 3]

### Abort Conditions
- If [condition], stop immediately

### Results
[What happened]

### Findings
[What we learned]

Safety Rules

  1. Start in non-production
  2. Have rollback ready
  3. Monitor continuously
  4. Communicate with team
  5. Document everything

Didn't find tool you were looking for?

Be as detailed as possible for better results