Agent skill

agent-performance

Track and report agent invocation metrics including usage counts, success/failure rates, and completion times. Use for understanding which agents are utilized, identifying underused agents, and optimizing agent delegation patterns.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/agent-performance

SKILL.md

Agent Performance Dashboard

Purpose

Provides visibility into agent usage patterns to optimize delegation and identify improvement opportunities.

When I Activate

I automatically load when you mention:

  • "agent performance" or "agent metrics"
  • "agent dashboard" or "agent usage"
  • "which agents are used" or "underutilized agents"
  • "agent success rate" or "agent statistics"

What I Do

  1. Track Invocations: Record agent usage via workflow tracker
  2. Measure Success: Track completion rates per agent
  3. Analyze Patterns: Identify usage trends and gaps
  4. Generate Reports: Create actionable dashboards

Quick Start

User: "Show me agent performance metrics"
Skill: *activates automatically*
       "Generating agent performance report..."

Core Capabilities

1. Report Generation

Generate a performance report by reading workflow logs and aggregating agent metrics:

User: "Generate agent performance report"

Report includes:

  • Invocation counts per agent
  • Success/failure rates
  • Average completion times (when tracked)
  • Underutilized agents list
  • Recommendations for optimization

2. Live Tracking

Track agent invocations during workflow execution using the existing workflow_tracker:

python
# Already available in .claude/tools/amplihack/hooks/workflow_tracker.py
from workflow_tracker import log_agent_invocation

log_agent_invocation(
    agent_name="architect",
    purpose="Design authentication module",
    step_number=2
)

3. Metrics Storage

Metrics are stored in:

  • Raw logs: .claude/runtime/logs/workflow_adherence/workflow_execution.jsonl
  • Aggregated: .claude/runtime/metrics/agent_performance.yaml

Report Format

Summary Dashboard

yaml
# Agent Performance Summary
# Generated: 2025-11-25

total_invocations: 142

agents:
  architect:
    invocations: 45
    success_rate: 95.6%
    avg_duration_ms: 2340
    trend: increasing

  builder:
    invocations: 38
    success_rate: 89.5%
    avg_duration_ms: 4520
    trend: stable

  reviewer:
    invocations: 25
    success_rate: 100%
    avg_duration_ms: 1890
    trend: increasing

underutilized:
  - database (0 invocations in last 30 days)
  - integration (2 invocations in last 30 days)
  - patterns (3 invocations in last 30 days)

recommendations:
  - Consider using database agent for schema work
  - Integration agent available for external service connections
  - Patterns agent can identify reusable solutions

Implementation Guide

To Generate a Report

  1. Read workflow execution logs:

    Read: .claude/runtime/logs/workflow_adherence/workflow_execution.jsonl
    
  2. Filter for agent_invoked events:

    json
    { "event": "agent_invoked", "agent": "architect", "purpose": "...", "step": 2 }
    
  3. Aggregate by agent name:

    • Count invocations
    • Calculate success rates from workflow_end events
    • Compute average durations
  4. Identify underutilized agents:

    • List all available agents from .claude/agents/amplihack/
    • Compare against invocation counts
    • Flag agents with <5 invocations in analysis period
  5. Write report to:

    .claude/runtime/metrics/agent_performance.yaml
    

Available Agents Inventory

Core Agents (6):

  • architect, builder, reviewer, tester, optimizer, api-designer

Specialized Agents (25):

  • ambiguity, amplifier-cli-architect, analyzer, azure-kubernetes-expert
  • ci-diagnostic-workflow, cleanup, database, documentation-writer
  • fallback-cascade, fix-agent, integration, knowledge-archaeologist
  • memory-manager, multi-agent-debate, n-version-validator, patterns
  • philosophy-guardian, pre-commit-diagnostic, preference-reviewer
  • prompt-writer, rust-programming-expert, security, visualization-architect
  • worktree-manager, xpia-defense

Note: Agent count may change as specialized agents are added/removed. Use ls .claude/agents/amplihack/specialized/ for current count.

Tracking Best Practices

When Invoking Agents

Always log invocations for accurate tracking:

python
# Before invoking an agent via Task tool
log_agent_invocation(
    agent_name="security",
    purpose="Audit authentication implementation",
    step_number=7  # Optional: link to workflow step
)

# Then invoke the agent
Task(subagent_type="security", prompt="...")

Workflow Integration

The DEFAULT_WORKFLOW.md specifies agent delegation at each step. This skill helps verify adherence:

  • Step 1: prompt-writer
  • Step 2: architect
  • Step 3: builder
  • Step 4: tester
  • Step 5: reviewer
  • etc.

Configuration

Setting Default Description
ANALYSIS_DAYS 30 Days of history to analyze
UNDERUTILIZED_THRESHOLD 5 Invocations below this = underutilized
METRICS_FILE agent_performance.yaml Output file name

Philosophy Alignment

This skill follows:

  • Ruthless Simplicity: Uses existing infrastructure (workflow_tracker)
  • Zero-BS: No placeholders, working aggregation logic
  • Modular Design: Self-contained skill, clear boundaries
  • Emergence: Insights emerge from simple tracking patterns

Interpreting Metrics

Success Rate Guidelines

Rate Assessment Action
95-100% Excellent Maintain current patterns
85-94% Good Review occasional failures for patterns
70-84% Needs Attention Investigate failure causes, adjust prompts
Below 70% Critical Agent may need redesign or prompt overhaul

Invocation Volume Interpretation

  • High volume (30+ in 30 days): Core workflow agent, ensure reliability
  • Medium volume (10-29): Regular use, monitor for optimization opportunities
  • Low volume (5-9): Specialized use case, verify still needed
  • Very low (<5): Consider if agent is discoverable or relevant

Duration Benchmarks

  • < 2 seconds: Fast execution, typical for simple analysis
  • 2-10 seconds: Normal for moderate complexity
  • 10-60 seconds: Expected for deep analysis or multi-step tasks
  • > 60 seconds: May indicate inefficiency, consider optimization

Empty State Handling

When no log data exists (new project or logs cleared):

yaml
# Agent Performance Report
# Period: Last 30 days
# Status: No data available

summary:
  total_invocations: 0
  message: "No agent invocations logged yet"

getting_started:
  - "Agent tracking begins when workflow_tracker logs invocations"
  - "Ensure agents are invoked via Task tool with proper logging"
  - "First report available after initial workflow execution"

next_steps:
  - "Run a workflow task to generate initial data"
  - "Verify workflow_tracker is properly configured"
  - "Check .claude/runtime/logs/ directory exists"

Limitations

This skill has the following constraints:

  1. Depends on workflow_tracker: Only tracks agents invoked through the logging system
  2. No real-time metrics: Reports are generated on-demand, not streamed
  3. Historical data only: Cannot predict future usage patterns
  4. Manual log analysis: Does not auto-detect anomalies or alert on issues
  5. Single-project scope: Metrics are per-project, no cross-project aggregation
  6. Time-based only: No correlation with code quality or PR outcomes

Didn't find tool you were looking for?

Be as detailed as possible for better results