Agent skill
metrics-report
Scan an entire codebase, discover and run all test types, compute hybrid coverage, evaluate quality, and generate a full metrics report website with trends and charts.
Install this agent skill to your Project
npx add-skill https://github.com/diegopacheco/ai-playground/tree/main/pocs/metrics-report-skill
SKILL.md
Metrics Report Skill
You are a metrics analysis agent. When invoked, scan the codebase, discover tests, run them, compute coverage, evaluate quality, and produce a JSON report for the metrics React application.
IMPORTANT: Optimize for speed. Use Agent tool to parallelize independent work. Use Grep/Glob instead of reading files when possible. Batch shell commands.
Progress Reporting (MANDATORY)
You MUST print progress to the user at every phase transition and every test type execution. Use this exact format:
- Phase transitions:
[Phase X/7] <phase name>... - Agent 3 test steps:
[Test X/Y] Running <type> tests (<backend|frontend>)...then[Test X/Y] <type> tests: <N> passed, <M> failed - Phase completion:
[Phase X/7] Done. <one-line summary of result>
This is NON-NEGOTIABLE. The user must always know what is happening and where we are in the process. Never go silent for more than one tool call without printing status.
Phase 0: Install Metrics Report Site
Print: [Phase 0/7] Installing metrics report site...
Check if metrics-report/metrics-application/ exists. If NOT, copy it from the skill installation directory and run npm install:
SKILL_DIR="$HOME/.claude/skills/metrics-report"
if [ ! -d "metrics-report/metrics-application" ]; then
mkdir -p metrics-report
cp -r "$SKILL_DIR/metrics-report/"* metrics-report/
cd metrics-report/metrics-application && npm install && cd ../..
fi
If metrics-report/metrics-application/ already exists but node_modules is missing, just run npm install:
if [ -d "metrics-report/metrics-application" ] && [ ! -d "metrics-report/metrics-application/node_modules" ]; then
cd metrics-report/metrics-application && npm install && cd ../..
fi
Print: [Phase 0/7] Done. Metrics report site ready.
Phase 1: Setup and Detection (do all at once)
Print: [Phase 1/7] Setup and detection...
Run these in a SINGLE Bash call chained with &&:
git remote get-url origin 2>/dev/null || echo ""; git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "main"; git rev-parse --short HEAD 2>/dev/null || echo "unknown"
Check if metrics-report/metrics-config.json exists. If not, create it:
{
"port": 3737,
"testTypes": {
"unit": true,
"integration": true,
"contract": true,
"e2e": true,
"css": false,
"stress": false,
"chaos": false,
"mutation": false,
"observability": true,
"fuzzy": false,
"propertybased": false
},
"githubRemote": "origin"
}
Detect stacks using Glob (one call, check results):
pom.xmlorbuild.gradleorbuild.gradle.kts= Java/Spring Bootpackage.jsonwith react = React/Nodemanage.pyor django inpyproject.toml= Django/PythonCargo.tomlwith actix/axum/tokio = Rust
Print: [Phase 1/7] Done. Detected stacks: <list stacks found>
Phase 2: Scan and Run Tests
Print: [Phase 2/7] Scanning codebase and running tests...
Launch 2 background agents for read-only scanning, then run tests DIRECTLY with Bash (no agent).
Background Agent 1: File Count and Test Discovery
Use Glob to count all source files (skip node_modules, target, dist, build, .git, pycache, venv, .venv, metrics-report). Split by frontend/backend using file extension and path.
Then use Grep to find all test files in ONE pass per pattern:
Grep pattern="@Test|@ParameterizedTest" for Java
Grep pattern="def test_|class Test" for Python
Grep pattern="#\[test\]|#\[cfg\(test\)\]" for Rust
Grep pattern="describe\(|it\(|test\(" for JS/TS
Classify each test file by type using these Grep checks (batch them):
- Integration: Grep for
@SpringBootTest|@Testcontainers|@DataJpaTest|IntegrationTest|supertestacross test files - E2E: Grep for
@playwright/test|playwrightacross test files - Contract: Grep for
Pact|Contract|pactacross test files - Stress: Grep for
k6/http|k6/wsor files matching*.k6.js - Chaos: Grep for
chaos-monkey|litmus|toxiproxyacross test files - Observability: Grep for
MeterRegistry|prometheus|opentelemetry|actuator/health|tracingin test files - CSS: Grep for
jest-image-snapshot|percy|chromatic|toMatchImageSnapshotin test files - Fuzzy: Grep for
fuzz|Fuzz|FuzzTest|@FuzzTest|cargo-fuzz|go-fuzz|jazzer|atherisin test files - Property-Based: Grep for
@Property|jqwik|QuickCheck|hypothesis|proptest|fast-check|fc\.propertyin test files - Everything else in test dirs = unit
For each test file, extract test method names and line numbers using Grep (not Read):
- Java:
Grep pattern="@Test" -A 1to get method names - Python:
Grep pattern="def test_" - Rust:
Grep pattern="fn test_|#\[test\]" -A 1 - JS/TS:
Grep pattern="(it|test)\('"
Return: file counts, test file list with types and test names.
Background Agent 2: Git Attribution (batch)
Run a SINGLE bash command to get all test file authors at once:
for f in $(find . -path '*/test*' -name '*.java' -o -name '*.py' -o -name '*.rs' -o -name '*.ts' -o -name '*.tsx' -o -name '*.js' | grep -v node_modules | grep -v target | grep -v dist); do echo "$f:$(git log --format='%an' --diff-filter=A -- "$f" 2>/dev/null | head -1)"; done
Return: map of file path to author.
Run Tests Directly (NO agent — use Bash directly)
CRITICAL: Do NOT delegate test execution to an agent. Run tests DIRECTLY using Bash tool calls from the main conversation. Agent overhead adds massive latency for fast-running tests.
First use Grep to quickly check which test types actually exist:
- Grep for
@SpringBootTest|@Testcontainersin backend test files for integration - Grep for
Pact|Contractfor contract tests - Grep for
MeterRegistry|prometheus|actuatorfor observability tests - Grep for
@playwright/testfor e2e tests
Only run test types that have actual test files. Skip types with zero tests immediately.
Run each test type sequentially with Bash. Print progress before and after each:
[Test X/Y] Running <type> tests (<backend|frontend>)...
[Test X/Y] <type> tests done: <N> passed, <M> failed (<duration>s)
Commands per stack:
Java (Maven): cd backend && ./mvnw test jacoco:report 2>&1 | tail -20
Java (Gradle): cd backend && ./gradlew test jacocoTestReport -q 2>&1 | tail -20
Scala (Maven): cd backend && ./mvnw test jacoco:report 2>&1 | tail -20
Scala (sbt): cd backend && sbt jacoco 2>&1 | tail -20
Python: cd backend && python -m pytest --tb=short -v --cov=. --cov-report=json 2>&1 | tail -30
Rust: cd backend && cargo test 2>&1 | tail -20 then cargo tarpaulin --out json 2>&1 if available
Node: cd frontend && npx react-scripts test --watchAll=false --coverage 2>&1 | tail -40
E2E: cd frontend && npx playwright test 2>&1 | tail -20
Parse test results from output after each run. Parse coverage from report files:
- Java/Scala (Maven):
backend/target/site/jacoco/jacoco.csv - Scala (sbt):
backend/target/scala-*/jacoco/report/jacoco.csv - Python:
backend/coverage.json - Rust:
backend/tarpaulin-report.json - Node:
frontend/coverage/coverage-summary.json
When done, print: [Phase 2/7] Done. Found <N> test files, <M> source files, ran <K> test types.
Phase 3: LLM Coverage Mapping and Tool Coverage Parsing
Print: [Phase 3/7] Mapping test coverage to source files...
Step A: Parse tool coverage reports for per-file percentages
For backend files, parse the coverage report generated in Phase 2:
Java/Scala (JaCoCo CSV): Read backend/target/site/jacoco/jacoco.csv with Bash:
cat backend/target/site/jacoco/jacoco.csv 2>/dev/null | tail -n +2 | awk -F',' '{missed=$8; covered=$9; total=missed+covered; pct=(total>0)? int(covered*100/total) : 0; print $3","pct}'
This gives CLASS_NAME,COVERAGE_PCT. Map each class name to its source file path.
Scala (sbt JaCoCo): Same CSV format at backend/target/scala-*/jacoco/report/jacoco.csv.
Python (coverage.json): Read with Bash:
python3 -c "import json,sys; d=json.load(open('backend/coverage.json')); [print(f'{k},{int(v[\"summary\"][\"percent_covered\"])}') for k,v in d.get('files',{}).items()]" 2>/dev/null
Rust (tarpaulin): Read with Bash:
cat backend/tarpaulin-report.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); [print(f'{f[\"path\"]},{int(f[\"coverable\"] and f[\"covered\"]*100/f[\"coverable\"] or 0)}') for f in d.get('files',[])]"
Node/React (coverage-summary.json): Read with Bash:
cat frontend/coverage/coverage-summary.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); [print(f'{k},{int(v[\"lines\"][\"pct\"])}') for k,v in d.items() if k != 'total']"
If the coverage report file does not exist (tool not available), set tool: null and proceed to Step B for LLM mapping.
If the coverage report IS available, populate tool with the parsed percentage for each source file. Files not in the report get tool: 0.
Step B: LLM import mapping (for files missing tool coverage)
For each test file, use Grep to extract imports:
Grep pattern="^import " path="test-file.java"
Map imports to source files. For E2E tests, Grep for route URLs (page.goto, http.get) and map to controllers. Keep it to direct imports only — no full call chains.
Set llm: true on every source file that is directly imported by at least one test file.
Building the coverage array
For EVERY backend source file and EVERY frontend source file, create a coverage entry. Each entry must include ALL test types that have tests (not just unit). For test types with no tests, omit them from the coverage object.
If a file has tool coverage percentage: {"tool": <pct>, "llm": <true if also import-mapped>}
If a file has only LLM mapping: {"tool": null, "llm": true}
If a file has neither: {"tool": 0, "llm": false} when tool reports exist, or {"tool": null, "llm": false} when no tool report exists.
Print: [Phase 3/7] Done. Mapped <N> test files to <M> source files.
Phase 4: Quality Evaluation (fast)
Print: [Phase 4/7] Evaluating test quality...
For each test type that has tests, read ONLY 3 representative test files (not 10). Evaluate quickly:
- Assertion quality (meaningful vs trivial)
- Edge case coverage
- Test naming quality
Rate: poor/fair/good/excellent with one sentence justification.
Print: [Phase 4/7] Done. Quality ratings: <list each type and its rating>
Phase 5: Compute Score and Generate JSON
Print: [Phase 5/7] Computing score and generating report JSON...
Score (0-10):
| Criteria | Max | Calculation |
|---|---|---|
| Coverage breadth | 3 | (covered files / total source files) * 3 |
| Type diversity | 2 | 11 types=2.0, 9-10=1.7, 7-8=1.5, 5-6=1.2, 3-4=1.0, 1-2=0.5 |
| Pass rate | 2 | (passing / total) * 2 |
| Test quality | 2 | Average ratings: poor=0, fair=0.7, good=1.5, excellent=2.0 |
| Code-to-test ratio | 1 | test LOC / source LOC: >0.8=1.0, >0.5=0.7, >0.3=0.4, <0.3=0.1 |
Generate metrics-report/data/metrics-latest.json with the full schema:
{
"timestamp": "ISO-8601",
"repository": { "name": "", "url": "", "branch": "", "commit": "" },
"stacks": [],
"files": { "total": 0, "backend": 0, "frontend": 0, "testFiles": 0, "byExtension": {} },
"tests": {
"total": 0, "passing": 0, "failing": 0,
"byType": {
"unit": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"integration": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"contract": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"e2e": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"css": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"stress": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"chaos": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"mutation": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"observability": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"fuzzy": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
"propertybased": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] }
}
},
"failures": { "byType": {}, "details": [] },
"coverage": { "backend": [], "frontend": [] },
"authors": {},
"quality": { "byType": {} },
"score": { "total": 0, "breakdown": { "coverageBreadth": 0, "typeDiversity": 0, "passRate": 0, "testQuality": 0, "codeToTestRatio": 0 } }
}
Test file shape: { "path": "", "type": "", "testCount": 0, "passing": 0, "failing": 0, "author": "", "githubUrl": "", "tests": [{ "name": "", "status": "pass|fail", "duration": 0, "error": "", "line": 0, "githubUrl": "" }] }
Coverage file shape: { "file": "", "layer": "backend|frontend", "githubUrl": "", "coverage": { "unit": { "tool": 85, "llm": true }, "integration": { "tool": null, "llm": false } } } — include an entry for every test type that has tests. tool is a 0-100 integer from the coverage report (or null if no tool report), llm is true if the file is import-mapped by a test.
GitHub URL pattern: https://github.com/{owner}/{repo}/blob/{branch}/{filepath}#L{line}
Print: [Phase 5/7] Done. Score: <total>/10 (coverage: <X>, diversity: <X>, pass rate: <X>, quality: <X>, ratio: <X>)
Phase 6: History and Finalize
Print: [Phase 6/7] Saving history and finalizing...
Run in ONE Bash call:
mkdir -p metrics-report/data/history
cp metrics-report/data/metrics-latest.json "metrics-report/data/history/metrics-$(date -u +%Y-%m-%dT%H-%M-%S).json"
ls metrics-report/data/history/*.json | xargs -I{} basename {} | sort -r | head -50
Write the history-index.json with the file list from above.
If metrics-report/metrics-application/ exists, copy data:
mkdir -p metrics-report/metrics-application/public/data/history
cp metrics-report/data/metrics-latest.json metrics-report/metrics-application/public/data/
cp metrics-report/data/history-index.json metrics-report/metrics-application/public/data/
cp metrics-report/data/history/*.json metrics-report/metrics-application/public/data/history/
Print: [Phase 6/7] Done. History saved.
Print final summary:
=== METRICS REPORT COMPLETE ===
Score: <total>/10
Tests: <passing> passed / <failing> failed / <total> total
Coverage: <backend>% backend, <frontend>% frontend
Stacks: <list>
Then run the website:
cd metrics-report && bash run.sh
Print: Metrics report running at http://localhost:3737
Rules
- Do NOT fabricate results. Use real test output and real file analysis.
- Do NOT invent files that do not exist.
- If a test type has zero tests, report zeros.
- Preserve previous history snapshots.
- JSON must be valid.
- ALWAYS print progress. Never go more than 1 tool call without telling the user what is happening. Use [Phase X/6] and [Test X/Y] format.
- MAXIMIZE parallelism for codebase scanning (Agents 1, 2). Use Agent tool ONLY for read-only scanning work.
- NEVER delegate test execution to agents. Run tests DIRECTLY with Bash tool calls. Agent overhead adds massive latency for fast tests.
- NEVER parallelize test execution. Run test types sequentially: backend first, then frontend.
- MINIMIZE file reads. Use Grep/Glob over Read when possible.
- BATCH shell commands. One Bash call with chained commands over many calls.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
json-formatter
Validate, format, and minify JSON files when users request JSON validation, formatting, or ask to validate their JSONs
bruno-generator
Scans the entire codebase, detects all HTTP/API endpoints across Java/Spring Boot, Node/Express, Go/Gin, Rust/Actix+Axum, Python/Django, and generates a complete Bruno API client project with .bru files, sample requests, and environments.
infra-automation-generator
leak-detect
Scan code for leaked PII, secrets/credentials, and security vulnerabilities that would get you hacked in production.
skill-evaluator
This skill should be used when the user asks to "evaluate a skill", "review skill quality", "score my skill", "check skill best practices", "rate my skills", "evaluate all skills", "compare skills", or wants to assess skill quality across criteria like clarity, token efficiency, anti-cheating, quality gates, determinism, scope discipline, error recovery, observability, and idempotency.
threat-analyst
Maps all external-facing endpoints, inputs, and auth boundaries. Scans the whole codebase and produces threat-analysis.md with attack vectors, protection scores, diagrams, and remediation actions.
Didn't find tool you were looking for?