Agent skills
clade-incident-runbook

Agent skill

clade-incident-runbook

Respond to Anthropic API incidents — outages, degraded performance, Use when working with incident-runbook patterns. error spikes, and rate limit issues in production. Trigger with "anthropic down", "claude outage", "anthropic incident", "claude not responding", "anthropic 529".

View SKILL.md on GitHub Repository

Stars 1,803

Forks 241

Install this agent skill to your Project

npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/claude-pack/skills/clade-incident-runbook

SKILL.md

Anthropic Incident Runbook

Overview

Respond to Anthropic API incidents in production — outages, sustained 529 errors, authentication failures, and timeouts. Covers status page checking, severity classification, model fallback activation, communication, and post-incident review.

Step 1: Confirm the Issue

bash

# Check Anthropic status
curl -s https://status.anthropic.com/api/v2/status.json | python3 -c "
import json, sys
d = json.load(sys.stdin)
print(f\"Status: {d['status']['description']} ({d['status']['indicator']})\")"

# Test API directly
curl -s -w "\nHTTP %{http_code} in %{time_total}s\n" \
  https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "claude-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-haiku-4-5-20251001","max_tokens":5,"messages":[{"role":"user","content":"ping"}]}'

Step 2: Classify Severity

Symptom	Severity	Action
529 overloaded (intermittent)	Low	SDK auto-retries handle this
529 overloaded (sustained 5+ min)	Medium	Switch to fallback model
401/403 on all requests	High	API key issue — check console
All requests timing out	High	Check status page, activate fallback
Status page shows incident	Varies	Follow status page updates

Step 3: Activate Fallback

typescript

async function callWithFallback(params: Anthropic.MessageCreateParams) {
  try {
    return await client.messages.create(params);
  } catch (err) {
    if (err instanceof Anthropic.APIError && (err.status === 529 || err.status === 500)) {
      // Try a different model
      if (params.model.includes('opus')) {
        return await client.messages.create({ ...params, model: 'claude-sonnet-4-20250514' });
      }
      if (params.model.includes('sonnet')) {
        return await client.messages.create({ ...params, model: 'claude-haiku-4-5-20251001' });
      }
    }
    throw err;
  }
}

Step 4: Communicate

Update your status page if user-facing
Note: Anthropic incidents typically resolve in 15-60 minutes

Step 5: Post-Incident

Check your error logs for the incident window
Calculate impact (failed requests, user impact)
Verify all systems recovered

Output

Incident confirmed via status page and direct API test
Severity classified (Low/Medium/High) based on symptoms
Fallback activated if needed (downgrade model or queue requests)
Impact assessed and documented post-incident

Error Handling

Error	Cause	Solution
API Error	Check error type and status code	See `clade-common-errors`

Examples

See Step 1 (curl status check and API test), Step 2 (severity classification table), Step 3 (fallback code with model downgrade), and Step 5 (post-incident checklist) above.

Resources

Next Steps

See clade-reliability-patterns for building resilient integrations.

Prerequisites

Production Claude integration deployed
Fallback model configuration in place (see clade-reliability-patterns)
Monitoring/alerting configured (see clade-observability)

Instructions

Step 1: Review the patterns below

Each section contains production-ready code examples. Copy and adapt them to your use case.

Step 2: Apply to your codebase

Integrate the patterns that match your requirements. Test each change individually.

Step 3: Verify

Run your test suite to confirm the integration works correctly.

Maintainer

jeremylongshore Core maintainer

Source details

Full Name: jeremylongshore/claude-code-plugins-plus-skills
Branch: main
Path in repo: plugins/saas-packs/claude-pack/skills/clade-incident-runbook
License: Other
Topics: ai claude-code anthropic agent-skills automation mcp ai-agents developer-tools skills llm marketplace saas claude-code-plugins devops plugin-marketplace plugin-system

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

jeremylongshore/claude-code-plugins-plus-skills

dockerfile-generator

Dockerfile Generator - Auto-activating skill for DevOps Basics. Triggers on: dockerfile generator, dockerfile generator Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

branch-naming-helper

Branch Naming Helper - Auto-activating skill for DevOps Basics. Triggers on: branch naming helper, branch naming helper Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

readme-generator

Readme Generator - Auto-activating skill for DevOps Basics. Triggers on: readme generator, readme generator Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

makefile-generator

Makefile Generator - Auto-activating skill for DevOps Basics. Triggers on: makefile generator, makefile generator Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

gitignore-generator

Gitignore Generator - Auto-activating skill for DevOps Basics. Triggers on: gitignore generator, gitignore generator Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

pre-commit-hook-setup

Pre Commit Hook Setup - Auto-activating skill for DevOps Basics. Triggers on: pre commit hook setup, pre commit hook setup Part of the DevOps Basics skill category.

1,803 241

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Anthropic Incident Runbook

Overview

Step 1: Confirm the Issue

Step 2: Classify Severity

Step 3: Activate Fallback

Step 4: Communicate

Step 5: Post-Incident

Output

Error Handling

Examples

Resources

Next Steps

Prerequisites

Instructions

Step 1: Review the patterns below

Step 2: Apply to your codebase

Step 3: Verify

Recommended Agent Skills

dockerfile-generator

branch-naming-helper

readme-generator

makefile-generator

gitignore-generator

pre-commit-hook-setup