Agent skill
clade-incident-runbook
Respond to Anthropic API incidents — outages, degraded performance, Use when working with incident-runbook patterns. error spikes, and rate limit issues in production. Trigger with "anthropic down", "claude outage", "anthropic incident", "claude not responding", "anthropic 529".
Install this agent skill to your Project
npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/claude-pack/skills/clade-incident-runbook
SKILL.md
Anthropic Incident Runbook
Overview
Respond to Anthropic API incidents in production — outages, sustained 529 errors, authentication failures, and timeouts. Covers status page checking, severity classification, model fallback activation, communication, and post-incident review.
Step 1: Confirm the Issue
# Check Anthropic status
curl -s https://status.anthropic.com/api/v2/status.json | python3 -c "
import json, sys
d = json.load(sys.stdin)
print(f\"Status: {d['status']['description']} ({d['status']['indicator']})\")"
# Test API directly
curl -s -w "\nHTTP %{http_code} in %{time_total}s\n" \
https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "claude-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-5-20251001","max_tokens":5,"messages":[{"role":"user","content":"ping"}]}'
Step 2: Classify Severity
| Symptom | Severity | Action |
|---|---|---|
| 529 overloaded (intermittent) | Low | SDK auto-retries handle this |
| 529 overloaded (sustained 5+ min) | Medium | Switch to fallback model |
| 401/403 on all requests | High | API key issue — check console |
| All requests timing out | High | Check status page, activate fallback |
| Status page shows incident | Varies | Follow status page updates |
Step 3: Activate Fallback
async function callWithFallback(params: Anthropic.MessageCreateParams) {
try {
return await client.messages.create(params);
} catch (err) {
if (err instanceof Anthropic.APIError && (err.status === 529 || err.status === 500)) {
// Try a different model
if (params.model.includes('opus')) {
return await client.messages.create({ ...params, model: 'claude-sonnet-4-20250514' });
}
if (params.model.includes('sonnet')) {
return await client.messages.create({ ...params, model: 'claude-haiku-4-5-20251001' });
}
}
throw err;
}
}
Step 4: Communicate
- Update your status page if user-facing
- Note: Anthropic incidents typically resolve in 15-60 minutes
Step 5: Post-Incident
- Check your error logs for the incident window
- Calculate impact (failed requests, user impact)
- Verify all systems recovered
Output
- Incident confirmed via status page and direct API test
- Severity classified (Low/Medium/High) based on symptoms
- Fallback activated if needed (downgrade model or queue requests)
- Impact assessed and documented post-incident
Error Handling
| Error | Cause | Solution |
|---|---|---|
| API Error | Check error type and status code | See clade-common-errors |
Examples
See Step 1 (curl status check and API test), Step 2 (severity classification table), Step 3 (fallback code with model downgrade), and Step 5 (post-incident checklist) above.
Resources
Next Steps
See clade-reliability-patterns for building resilient integrations.
Prerequisites
- Production Claude integration deployed
- Fallback model configuration in place (see
clade-reliability-patterns) - Monitoring/alerting configured (see
clade-observability)
Instructions
Step 1: Review the patterns below
Each section contains production-ready code examples. Copy and adapt them to your use case.
Step 2: Apply to your codebase
Integrate the patterns that match your requirements. Test each change individually.
Step 3: Verify
Run your test suite to confirm the integration works correctly.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
dockerfile-generator
Dockerfile Generator - Auto-activating skill for DevOps Basics. Triggers on: dockerfile generator, dockerfile generator Part of the DevOps Basics skill category.
branch-naming-helper
Branch Naming Helper - Auto-activating skill for DevOps Basics. Triggers on: branch naming helper, branch naming helper Part of the DevOps Basics skill category.
readme-generator
Readme Generator - Auto-activating skill for DevOps Basics. Triggers on: readme generator, readme generator Part of the DevOps Basics skill category.
makefile-generator
Makefile Generator - Auto-activating skill for DevOps Basics. Triggers on: makefile generator, makefile generator Part of the DevOps Basics skill category.
gitignore-generator
Gitignore Generator - Auto-activating skill for DevOps Basics. Triggers on: gitignore generator, gitignore generator Part of the DevOps Basics skill category.
pre-commit-hook-setup
Pre Commit Hook Setup - Auto-activating skill for DevOps Basics. Triggers on: pre commit hook setup, pre commit hook setup Part of the DevOps Basics skill category.
Didn't find tool you were looking for?