Agent skills
clade-performance-tuning

Agent skill

clade-performance-tuning

Optimize Anthropic API latency — streaming, prompt caching, model selection, Use when working with performance-tuning patterns. connection reuse, and parallel requests. Trigger with "anthropic slow", "claude latency", "speed up anthropic", "anthropic performance", "claude response time".

View SKILL.md on GitHub Repository

Stars 1,803

Forks 241

Install this agent skill to your Project

npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/claude-pack/skills/clade-performance-tuning

SKILL.md

Anthropic Performance Tuning

Overview

Claude latency has two components: time to first token (TTFT) and tokens per second (TPS). Different strategies target each.

Latency Benchmarks (approximate)

Model	TTFT (p50)	TTFT (p95)	Output TPS
Claude Haiku 4.5	200ms	600ms	~150
Claude Sonnet 4	400ms	1.2s	~90
Claude Opus 4	800ms	2.5s	~40

Optimization Strategies

Instructions

Step 1: Always Stream

typescript

// Streaming delivers the first token ASAP — user sees response instantly
// instead of waiting for the full response to generate

const stream = client.messages.stream({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages,
});

// First token arrives in ~400ms (Sonnet)
// Full response may take 5-10s, but user sees progress immediately
for await (const event of stream) {
  if (event.type === 'content_block_delta') {
    yield event.delta.text;
  }
}

Step 2: Prompt Caching — Faster TTFT

typescript

// Cached prompts skip re-processing — dramatically lower TTFT for large system prompts
const message = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  system: [{
    type: 'text',
    text: largeSystemPrompt, // 10K+ tokens
    cache_control: { type: 'ephemeral' },
  }],
  messages,
}, {
  headers: { 'claude-beta': 'prompt-caching-2024-07-31' },
});
// TTFT drops from ~2s to ~500ms on cache hit with large prompts

Step 3: Use Haiku for Speed-Critical Paths

typescript

// Haiku is 2-4x faster than Sonnet with 80% quality for many tasks
// Use for: classification, extraction, simple Q&A, routing decisions

const route = await client.messages.create({
  model: 'claude-haiku-4-5-20251001', // 200ms TTFT
  max_tokens: 10,
  system: 'Classify the intent. Reply with exactly one word: search, create, update, delete.',
  messages: [{ role: 'user', content: userInput }],
});

// Then use Sonnet/Opus for the actual task

Step 4: Reuse Client Instance

typescript

// BAD — creates new connection pool per request
app.get('/api/chat', async (req, res) => {
  const client = new Anthropic(); // DON'T
  // ...
});

// GOOD — single client shared across requests
const client = new Anthropic(); // Module-level singleton

app.get('/api/chat', async (req, res) => {
  const message = await client.messages.create({ ... });
  // ...
});

Step 5: Parallel Requests

typescript

// When you need multiple independent Claude calls, fire them in parallel
const [summary, sentiment, entities] = await Promise.all([
  client.messages.create({ model: 'claude-haiku-4-5-20251001', max_tokens: 200,
    messages: [{ role: 'user', content: `Summarize: ${text}` }] }),
  client.messages.create({ model: 'claude-haiku-4-5-20251001', max_tokens: 20,
    messages: [{ role: 'user', content: `Sentiment (positive/negative/neutral): ${text}` }] }),
  client.messages.create({ model: 'claude-haiku-4-5-20251001', max_tokens: 200,
    messages: [{ role: 'user', content: `Extract named entities from: ${text}` }] }),
]);

Step 6: Minimize Output Tokens

typescript

// Fewer output tokens = faster response
system: 'Be extremely concise. Use bullet points, not paragraphs.',

// Set tight max_tokens
max_tokens: 256, // Don't use 4096 for short answers

Output

Streaming enabled for all user-facing responses (first token in ~400ms with Sonnet)
Prompt caching reducing TTFT for large system prompts
Model routing to Haiku for speed-critical classification/routing tasks
Client instance reused across requests (no per-request connection overhead)
Parallel requests firing independent Claude calls concurrently

Error Handling

Issue	Cause	Fix
TTFT > 3s	Large uncached prompt	Enable prompt caching
Slow output	Using Opus for simple tasks	Downgrade to Haiku/Sonnet
Timeouts	Long generation + default timeout	`new Anthropic({ timeout: 120_000 })`
529 overloaded	API capacity	SDK auto-retries; add fallback model

Examples

See Latency Benchmarks table and six numbered strategy sections above, each with complete TypeScript code examples.

Resources

Next Steps

See clade-deploy-integration for production deployment patterns.

Prerequisites

Completed clade-install-auth
User-facing application where latency matters
Understanding of streaming and async patterns

Maintainer

jeremylongshore Core maintainer

Source details

Full Name: jeremylongshore/claude-code-plugins-plus-skills
Branch: main
Path in repo: plugins/saas-packs/claude-pack/skills/clade-performance-tuning
License: Other
Topics: ai claude-code anthropic agent-skills automation mcp ai-agents developer-tools skills llm marketplace saas claude-code-plugins devops plugin-marketplace plugin-system

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

jeremylongshore/claude-code-plugins-plus-skills

dockerfile-generator

Dockerfile Generator - Auto-activating skill for DevOps Basics. Triggers on: dockerfile generator, dockerfile generator Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

branch-naming-helper

Branch Naming Helper - Auto-activating skill for DevOps Basics. Triggers on: branch naming helper, branch naming helper Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

readme-generator

Readme Generator - Auto-activating skill for DevOps Basics. Triggers on: readme generator, readme generator Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

makefile-generator

Makefile Generator - Auto-activating skill for DevOps Basics. Triggers on: makefile generator, makefile generator Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

gitignore-generator

Gitignore Generator - Auto-activating skill for DevOps Basics. Triggers on: gitignore generator, gitignore generator Part of the DevOps Basics skill category.

1,803 241

Explore

jeremylongshore/claude-code-plugins-plus-skills

pre-commit-hook-setup

Pre Commit Hook Setup - Auto-activating skill for DevOps Basics. Triggers on: pre commit hook setup, pre commit hook setup Part of the DevOps Basics skill category.

1,803 241

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Anthropic Performance Tuning

Overview

Latency Benchmarks (approximate)

Optimization Strategies

Instructions

Step 1: Always Stream

Step 2: Prompt Caching — Faster TTFT

Step 3: Use Haiku for Speed-Critical Paths

Step 4: Reuse Client Instance

Step 5: Parallel Requests

Step 6: Minimize Output Tokens

Output

Error Handling

Examples

Resources

Next Steps

Prerequisites

Recommended Agent Skills

dockerfile-generator

branch-naming-helper

readme-generator

makefile-generator

gitignore-generator

pre-commit-hook-setup