Agent skill

clade-rate-limits

Handle Anthropic rate limits — understand tiers, implement backoff, Use when working with rate-limits patterns. optimize throughput, and monitor usage. Trigger with "anthropic rate limit", "claude 429", "anthropic throttling", "anthropic usage limits", "claude tokens per minute".

Stars 1,803
Forks 241

Install this agent skill to your Project

npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/claude-pack/skills/clade-rate-limits

SKILL.md

Anthropic Rate Limits

Overview

Anthropic enforces three types of limits: requests per minute (RPM), input tokens per minute (TPM), and output tokens per minute. Limits depend on your spend tier.

Rate Limit Tiers

Tier Qualification RPM Input TPM Output TPM
Tier 1 Free 50 40,000 8,000
Tier 2 $40+ spend 1,000 80,000 16,000
Tier 3 $200+ spend 2,000 160,000 32,000
Tier 4 $400+ spend 4,000 400,000 80,000
Scale Custom Custom Custom Custom

Check your tier: console.anthropic.com → Settings → Limits

Response Headers

Every API response includes rate limit headers:

claude-ratelimit-requests-limit: 1000
claude-ratelimit-requests-remaining: 998
claude-ratelimit-requests-reset: 2025-01-01T00:01:00Z
claude-ratelimit-tokens-limit: 80000
claude-ratelimit-tokens-remaining: 79500
claude-ratelimit-tokens-reset: 2025-01-01T00:01:00Z
retry-after: 5

Built-In SDK Retries

The SDK automatically retries 429 and 529 errors with exponential backoff:

typescript
import Anthropic from '@claude-ai/sdk';

const client = new Anthropic({
  maxRetries: 3, // default: 2. Set to 0 to disable.
});

Custom Backoff

typescript
async function callWithBackoff(params: Anthropic.MessageCreateParams, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.messages.create(params);
    } catch (err) {
      if (err instanceof Anthropic.RateLimitError) {
        const retryAfter = Number(err.headers?.['retry-after'] || 2 ** attempt);
        const jitter = Math.random() * 1000;
        console.log(`Rate limited. Retry in ${retryAfter}s (attempt ${attempt + 1})`);
        await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter));
      } else {
        throw err;
      }
    }
  }
  throw new Error('Exceeded max retries');
}

Throughput Optimization

Strategy Impact
Use Message Batches API Bypasses rate limits entirely (async, 24h SLA)
Use prompt caching Cached tokens don't count toward input TPM
Use smaller models for simple tasks Lower token counts = more requests per minute
Pre-count tokens with countTokens Avoid wasted requests that will fail
Queue and batch requests Smooth out bursts

Token Counting

typescript
// Count before sending — avoid burning RPM on requests that'll fail
const count = await client.messages.countTokens({
  model: 'claude-sonnet-4-20250514',
  messages,
  system: systemPrompt,
});
console.log(`This request will use ${count.input_tokens} input tokens`);

Python

python
import anthropic
import time

client = anthropic.Anthropic(max_retries=5)

# Or manual handling:
try:
    message = client.messages.create(...)
except anthropic.RateLimitError as e:
    retry_after = float(e.response.headers.get("retry-after", 5))
    time.sleep(retry_after)

Output

  • Rate limit tier identified from response headers
  • SDK configured with appropriate maxRetries setting
  • Custom backoff implemented with jitter for high-throughput use cases
  • Throughput optimized using batches, caching, or model selection

Error Handling

Error Cause Solution
API Error Check error type and status code See clade-common-errors

Examples

See Rate Limit Tiers table, Response Headers section, Built-In SDK Retries, Custom Backoff implementation, and Throughput Optimization strategies above.

Resources

Next Steps

See clade-cost-tuning for cost optimization strategies.

Prerequisites

  • Completed clade-install-auth
  • Understanding of HTTP response headers
  • Familiarity with exponential backoff patterns

Instructions

Step 1: Review the patterns below

Each section contains production-ready code examples. Copy and adapt them to your use case.

Step 2: Apply to your codebase

Integrate the patterns that match your requirements. Test each change individually.

Step 3: Verify

Run your test suite to confirm the integration works correctly.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results