Agent skill
benchmark
Use this skill to measure performance baselines, detect regressions before/after PRs, and compare stack alternatives.
Install this agent skill to your Project
npx add-skill https://github.com/affaan-m/everything-claude-code/tree/main/skills/benchmark
SKILL.md
Benchmark — Performance Baseline & Regression Detection
When to Use
- Before and after a PR to measure performance impact
- Setting up performance baselines for a project
- When users report "it feels slow"
- Before a launch — ensure you meet performance targets
- Comparing your stack against alternatives
How It Works
Mode 1: Page Performance
Measures real browser metrics via browser MCP:
1. Navigate to each target URL
2. Measure Core Web Vitals:
- LCP (Largest Contentful Paint) — target < 2.5s
- CLS (Cumulative Layout Shift) — target < 0.1
- INP (Interaction to Next Paint) — target < 200ms
- FCP (First Contentful Paint) — target < 1.8s
- TTFB (Time to First Byte) — target < 800ms
3. Measure resource sizes:
- Total page weight (target < 1MB)
- JS bundle size (target < 200KB gzipped)
- CSS size
- Image weight
- Third-party script weight
4. Count network requests
5. Check for render-blocking resources
Mode 2: API Performance
Benchmarks API endpoints:
1. Hit each endpoint 100 times
2. Measure: p50, p95, p99 latency
3. Track: response size, status codes
4. Test under load: 10 concurrent requests
5. Compare against SLA targets
Mode 3: Build Performance
Measures development feedback loop:
1. Cold build time
2. Hot reload time (HMR)
3. Test suite duration
4. TypeScript check time
5. Lint time
6. Docker build time
Mode 4: Before/After Comparison
Run before and after a change to measure impact:
/benchmark baseline # saves current metrics
# ... make changes ...
/benchmark compare # compares against baseline
Output:
| Metric | Before | After | Delta | Verdict |
|--------|--------|-------|-------|---------|
| LCP | 1.2s | 1.4s | +200ms | WARNING: WARN |
| Bundle | 180KB | 175KB | -5KB | ✓ BETTER |
| Build | 12s | 14s | +2s | WARNING: WARN |
Output
Stores baselines in .ecc/benchmarks/ as JSON. Git-tracked so the team shares baselines.
Integration
- CI: run
/benchmark compareon every PR - Pair with
/canary-watchfor post-deploy monitoring - Pair with
/browser-qafor full pre-ship checklist
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
python-testing
Python testing best practices using pytest including fixtures, parametrization, mocking, coverage analysis, async testing, and test organization. Use when writing or improving Python tests.
golang-patterns
Go-specific design patterns and best practices including functional options, small interfaces, dependency injection, concurrency patterns, error handling, and package organization. Use when working with Go code to apply idiomatic Go patterns.
e2e-testing
Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.
agentic-engineering
Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls.
api-design
REST API design patterns including resource naming, status codes, pagination, filtering, error responses, versioning, and rate limiting for production APIs.
python-patterns
Python-specific design patterns and best practices including protocols, dataclasses, context managers, decorators, async/await, type hints, and package organization. Use when working with Python code to apply Pythonic patterns.
Didn't find tool you were looking for?