Agent skill
anth-incident-runbook
Execute incident response procedures for Claude API outages and degradation. Use when Claude API is returning errors, experiencing high latency, or showing degraded performance in production. Trigger with phrases like "anthropic incident", "claude api down", "anthropic outage", "claude degraded", "anthropic runbook".
Install this agent skill to your Project
npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/anthropic-pack/skills/anth-incident-runbook
SKILL.md
Anthropic Incident Runbook
Severity Classification
| Severity | Condition | Response Time |
|---|---|---|
| P1 | API returning 500/529 for all requests | Immediate |
| P2 | Rate limiting (429) or high latency (>10s p99) | 15 minutes |
| P3 | Intermittent errors (<5% error rate) | 1 hour |
| P4 | Degraded quality (not errors) | Next business day |
Immediate Triage (First 5 Minutes)
# 1. Check Anthropic status page
curl -s https://status.anthropic.com/api/v2/status.json | python3 -c \
"import sys,json; d=json.load(sys.stdin); print(d['status']['indicator'], '-', d['status']['description'])"
# 2. Test API connectivity
curl -s -w "\nHTTP %{http_code} | Time: %{time_total}s\n" \
https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-20250514","max_tokens":8,"messages":[{"role":"user","content":"1"}]}'
# 3. Check rate limit headers
curl -s -D - https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-20250514","max_tokens":8,"messages":[{"role":"user","content":"1"}]}' \
2>/dev/null | grep -i "ratelimit\|retry-after\|request-id"
Decision Tree
API returning errors?
├── 401/403 → Key issue → Check ANTHROPIC_API_KEY is set and valid
├── 429 → Rate limited → Check headers, reduce traffic, wait for retry-after
├── 500 → Server error → Check status.anthropic.com, retry with backoff
├── 529 → Overloaded → Temporary, retry after 30-60s
└── Timeouts → Network or long generation → Increase timeout, check max_tokens
Mitigation Actions
Rate Limiting (429)
# Immediate: reduce traffic
# 1. Enable circuit breaker
# 2. Queue non-critical requests
# 3. Switch to Message Batches for bulk work
# 4. Reduce max_tokens to shorten generation time
API Outage (500/529)
# Graceful degradation
def get_response_with_fallback(prompt: str) -> str:
try:
msg = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return msg.content[0].text
except (anthropic.InternalServerError, anthropic.APIStatusError):
return "Our AI assistant is temporarily unavailable. Please try again shortly."
Key Compromise
# 1. Immediately revoke key at console.anthropic.com
# 2. Generate new key
# 3. Deploy new key to all environments
# 4. Audit recent usage for unauthorized calls
# 5. File incident report
Postmortem Template
## Incident: [Title]
- **Duration:** [start] to [end]
- **Severity:** P[1-4]
- **Impact:** [what users experienced]
- **Root Cause:** [what went wrong]
- **Detection:** [how we found out]
- **Mitigation:** [what we did to fix it]
- **Request IDs:** [from debug logs]
- **Action Items:**
- [ ] [preventive measure 1]
- [ ] [preventive measure 2]
Error Handling
| Symptom | Likely Cause | Quick Fix |
|---|---|---|
| All requests fail 401 | Key rotated/expired | Check Console for active keys |
| Sudden 429 spike | Traffic burst or tier change | Check rate limit headers |
| Slow responses (>10s) | Large max_tokens or complex prompt | Reduce max_tokens, use Haiku |
| Intermittent 500s | Upstream API issue | Check status.anthropic.com |
Resources
Next Steps
For data compliance, see anth-data-handling.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
dockerfile-generator
Dockerfile Generator - Auto-activating skill for DevOps Basics. Triggers on: dockerfile generator, dockerfile generator Part of the DevOps Basics skill category.
branch-naming-helper
Branch Naming Helper - Auto-activating skill for DevOps Basics. Triggers on: branch naming helper, branch naming helper Part of the DevOps Basics skill category.
readme-generator
Readme Generator - Auto-activating skill for DevOps Basics. Triggers on: readme generator, readme generator Part of the DevOps Basics skill category.
makefile-generator
Makefile Generator - Auto-activating skill for DevOps Basics. Triggers on: makefile generator, makefile generator Part of the DevOps Basics skill category.
gitignore-generator
Gitignore Generator - Auto-activating skill for DevOps Basics. Triggers on: gitignore generator, gitignore generator Part of the DevOps Basics skill category.
pre-commit-hook-setup
Pre Commit Hook Setup - Auto-activating skill for DevOps Basics. Triggers on: pre commit hook setup, pre commit hook setup Part of the DevOps Basics skill category.
Didn't find tool you were looking for?