Agent skills
when-profiling-performance-use...

Agent skill

when-profiling-performance-use-performance-profiler

Comprehensive performance profiling, bottleneck detection, and optimization system

Stars 232

Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/dnyoussef/when-profiling-performance-use-performance-profiler

SKILL.md

Performance Profiler Skill

Overview

When profiling performance, use performance-profiler to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions.

MECE Breakdown

Mutually Exclusive Components:

Baseline Phase: Establish current performance metrics
Detection Phase: Identify bottlenecks and hot paths
Analysis Phase: Root cause analysis and impact assessment
Optimization Phase: Generate and prioritize recommendations
Implementation Phase: Apply optimizations with agent assistance
Validation Phase: Benchmark improvements and verify gains

Collectively Exhaustive Coverage:

CPU Profiling: Function execution time, hot paths, call graphs
Memory Profiling: Heap usage, allocations, leaks, garbage collection
I/O Profiling: File system, database, network latency
Network Profiling: Request timing, bandwidth, connection pooling
Concurrency: Thread utilization, lock contention, async operations
Algorithm Analysis: Time complexity, space complexity
Cache Analysis: Hit rates, cache misses, invalidation patterns
Database: Query performance, N+1 problems, index usage

Features

Core Capabilities:

Multi-dimensional performance profiling (CPU, memory, I/O, network)
Automated bottleneck detection with prioritization
Real-time profiling and historical analysis
Flame graph generation for visual analysis
Memory leak detection and heap snapshots
Database query optimization
Algorithmic complexity analysis
A/B comparison of before/after optimizations
Production-safe profiling with minimal overhead
Integration with APM tools (New Relic, DataDog, etc.)

Profiling Modes:

Quick Scan: 30-second lightweight profiling
Standard: 5-minute comprehensive analysis
Deep: 30-minute detailed investigation
Continuous: Long-running production monitoring
Stress Test: Load-based profiling under high traffic

Usage

Slash Command:

bash

/profile [path] [--mode quick|standard|deep] [--target cpu|memory|io|network|all]

Subagent Invocation:

javascript

Task("Performance Profiler", "Profile ./app with deep CPU and memory analysis", "performance-analyzer")

MCP Tool:

javascript

mcp__performance-profiler__analyze({
  project_path: "./app",
  profiling_mode: "standard",
  targets: ["cpu", "memory", "io"],
  generate_optimizations: true
})

Architecture

Phase 1: Baseline Measurement

Establish current performance metrics
Define performance budgets
Set up monitoring infrastructure
Capture baseline snapshots

Phase 2: Bottleneck Detection

CPU profiling (sampling or instrumentation)
Memory profiling (heap analysis)
I/O profiling (syscall tracing)
Network profiling (packet analysis)
Database profiling (query logs)

Phase 3: Root Cause Analysis

Correlate metrics across dimensions
Identify causal relationships
Calculate performance impact
Prioritize issues by severity

Phase 4: Optimization Generation

Algorithmic improvements
Caching strategies
Parallelization opportunities
Database query optimization
Memory optimization
Network optimization

Phase 5: Implementation

Generate optimized code with coder agent
Apply database optimizations
Configure caching layers
Implement parallelization

Phase 6: Validation

Run benchmark suite
Compare before/after metrics
Verify no regressions
Generate performance report

Output Formats

Performance Report:

json

{
  "project": "my-app",
  "profiling_mode": "standard",
  "duration_seconds": 300,
  "baseline": {
    "requests_per_second": 1247,
    "avg_response_time_ms": 123,
    "p95_response_time_ms": 456,
    "p99_response_time_ms": 789,
    "cpu_usage_percent": 67,
    "memory_usage_mb": 512,
    "error_rate_percent": 0.1
  },
  "bottlenecks": [
    {
      "type": "cpu",
      "severity": "high",
      "function": "processData",
      "time_percent": 34.5,
      "calls": 123456,
      "avg_time_ms": 2.3,
      "recommendation": "Optimize algorithm complexity from O(n²) to O(n log n)"
    }
  ],
  "optimizations": [...],
  "estimated_improvement": {
    "throughput_increase": "3.2x",
    "latency_reduction": "68%",
    "memory_reduction": "45%"
  }
}

Flame Graph:

Interactive SVG flame graph showing call stack with time proportions

Heap Snapshot:

Memory allocation breakdown with retention paths

Optimization Report:

Prioritized list of actionable improvements with code examples

Examples

Example 1: Quick CPU Profiling

bash

/profile ./my-app --mode quick --target cpu

Example 2: Deep Memory Analysis

bash

/profile ./my-app --mode deep --target memory --detect-leaks

Example 3: Full Stack Optimization

bash

/profile ./my-app --mode standard --target all --optimize --benchmark

Example 4: Database Query Optimization

bash

/profile ./my-app --mode standard --target io --database --explain-queries

Integration with Claude-Flow

Coordination Pattern:

javascript

// Step 1: Initialize profiling swarm
mcp__claude-flow__swarm_init({ topology: "star", maxAgents: 5 })

// Step 2: Spawn specialized agents
[Parallel Execution]:
  Task("CPU Profiler", "Profile CPU usage and identify hot paths in ./app", "performance-analyzer")
  Task("Memory Profiler", "Analyze heap usage and detect memory leaks", "performance-analyzer")
  Task("I/O Profiler", "Profile file system and database operations", "performance-analyzer")
  Task("Network Profiler", "Analyze network requests and identify slow endpoints", "performance-analyzer")
  Task("Optimizer", "Generate optimization recommendations based on profiling data", "optimizer")

// Step 3: Implementation agent applies optimizations
[Sequential Execution]:
  Task("Coder", "Implement recommended optimizations from profiling analysis", "coder")
  Task("Benchmarker", "Run benchmark suite and validate improvements", "performance-benchmarker")

Configuration

Default Settings:

json

{
  "profiling": {
    "sampling_rate_hz": 99,
    "stack_depth": 128,
    "include_native_code": false,
    "track_allocations": true
  },
  "thresholds": {
    "cpu_hot_path_percent": 10,
    "memory_leak_growth_mb": 10,
    "slow_query_ms": 100,
    "slow_request_ms": 1000
  },
  "optimization": {
    "auto_apply": false,
    "require_approval": true,
    "run_tests_before": true,
    "run_benchmarks_after": true
  },
  "output": {
    "flame_graph": true,
    "heap_snapshot": true,
    "call_tree": true,
    "recommendations": true
  }
}

Profiling Techniques

CPU Profiling:

Sampling: Periodic stack sampling (low overhead)
Instrumentation: Function entry/exit hooks (accurate but higher overhead)
Tracing: Event-based profiling

Memory Profiling:

Heap Snapshots: Point-in-time memory state
Allocation Tracking: Record all allocations
Leak Detection: Compare snapshots over time
GC Analysis: Garbage collection patterns

I/O Profiling:

Syscall Tracing: Track system calls (strace, dtrace)
File System: Monitor read/write operations
Database: Query logging and EXPLAIN ANALYZE
Network: Packet capture and request timing

Concurrency Profiling:

Thread Analysis: CPU utilization per thread
Lock Contention: Identify blocking operations
Async Operations: Promise/callback timing

Performance Optimization Strategies

Algorithmic:

Reduce time complexity (O(n²) → O(n log n))
Use appropriate data structures
Eliminate unnecessary work
Memoization and dynamic programming

Caching:

In-memory caching (Redis, Memcached)
CDN for static assets
HTTP caching headers
Query result caching

Parallelization:

Multi-threading
Worker pools
Async I/O
Batching operations

Database:

Add missing indexes
Optimize queries
Reduce N+1 queries
Connection pooling
Read replicas

Memory:

Object pooling
Reduce allocations
Stream processing
Compression

Network:

Connection keep-alive
HTTP/2 or HTTP/3
Compression
Request batching
Rate limiting

Performance Budgets

Frontend:

Time to First Byte (TTFB): < 200ms
First Contentful Paint (FCP): < 1.8s
Largest Contentful Paint (LCP): < 2.5s
Time to Interactive (TTI): < 3.8s
Total Blocking Time (TBT): < 200ms
Cumulative Layout Shift (CLS): < 0.1

Backend:

API Response Time (p50): < 100ms
API Response Time (p95): < 500ms
API Response Time (p99): < 1000ms
Throughput: > 1000 req/s
Error Rate: < 0.1%
CPU Usage: < 70%
Memory Usage: < 80%

Database:

Query Time (p50): < 10ms
Query Time (p95): < 50ms
Query Time (p99): < 100ms
Connection Pool Utilization: < 80%

Best Practices

Profile production workloads when possible
Use production-like data volumes
Profile under realistic load
Measure multiple times for consistency
Focus on p95/p99, not just averages
Optimize bottlenecks in order of impact
Always benchmark before and after
Monitor for regressions in CI/CD
Set up continuous profiling
Track performance over time

Troubleshooting

Issue: High CPU usage but no obvious hot path

Solution: Check for excessive small function calls, increase sampling rate, or use instrumentation

Issue: Memory grows continuously

Solution: Run heap snapshot comparison to identify leak sources

Issue: Slow database queries

Solution: Use EXPLAIN ANALYZE, check for missing indexes, analyze query plans

Issue: High latency but low CPU

Solution: Profile I/O operations, check for blocking synchronous calls

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

aiskillstore/marketplace

perigon-backend

Perigon ASP.NET Core + EF Core + Aspire conventions

232 15

Explore

aiskillstore/marketplace

perigon-agent

Pointers for Copilot/agents to apply Perigon conventions

232 15

Explore

aiskillstore/marketplace

perigon-angular

Angular 21+ standalone/Material/signal conventions for Perigon WebApp

232 15

Explore

aiskillstore/marketplace

fastapi-mastery

Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.

232 15

Explore

aiskillstore/marketplace

context7-efficient

Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.

232 15

Explore

aiskillstore/marketplace

browser-use

Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.

232 15

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Performance Profiler Skill

Overview

MECE Breakdown

Mutually Exclusive Components:

Collectively Exhaustive Coverage:

Features

Core Capabilities:

Profiling Modes:

Usage

Slash Command:

Subagent Invocation:

MCP Tool:

Architecture

Phase 1: Baseline Measurement

Phase 2: Bottleneck Detection

Phase 3: Root Cause Analysis

Phase 4: Optimization Generation

Phase 5: Implementation

Phase 6: Validation

Output Formats

Performance Report:

Flame Graph:

Heap Snapshot:

Optimization Report:

Examples

Example 1: Quick CPU Profiling

Example 2: Deep Memory Analysis

Example 3: Full Stack Optimization

Example 4: Database Query Optimization

Integration with Claude-Flow

Coordination Pattern:

Configuration

Default Settings:

Profiling Techniques

CPU Profiling:

Memory Profiling:

I/O Profiling:

Concurrency Profiling:

Performance Optimization Strategies

Algorithmic:

Caching:

Parallelization:

Database:

Memory:

Network:

Performance Budgets

Frontend:

Backend:

Database:

Best Practices

Troubleshooting

Issue: High CPU usage but no obvious hot path

Issue: Memory grows continuously

Issue: Slow database queries

Issue: High latency but low CPU

See Also

Recommended Agent Skills

perigon-backend

perigon-agent

perigon-angular

fastapi-mastery

context7-efficient

browser-use