Agent skill

eval-skills

Audit all skills in the current project for frontmatter completeness, effort level appropriateness, allowed-tools scoping, and content quality. Produces a scored report with effort-level recommendations for each skill. Use when onboarding to a new project, reviewing skill quality before shipping, or adding effort fields to an existing skill library.

View SKILL.md on GitHub Repository

Stars 3,172

Forks 439

Install this agent skill to your Project

npx add-skill https://github.com/FlorianBruniaux/claude-code-ultimate-guide/tree/main/examples/skills/eval-skills

SKILL.md

Skill Evaluator

Discover all skills in the project, score them across 6 criteria, and infer the appropriate effort level based on content analysis.

When to Use

New project: run once to establish baseline quality
Before committing a skill to a team repo
After bulk-importing skills from another project
When adding effort fields for the first time (v2.1.80+)

What Gets Audited

All SKILL.md files and flat .md files found in:

.claude/skills/**
~/.claude/skills/** (if requested)
Any path passed as argument: /eval-skills ./my-skills-dir

Scoring Criteria (14 pts per skill)

#	Criterion	Max	What is checked
1	name	1	Present, lowercase, hyphens only, matches directory name
2	description	2	Present + has "Use when" / "when to" / trigger phrasing
3	allowed-tools	2	Present + not overly broad (Bash without scoping when read-only)
4	effort	3	Present (1pt) + appropriate for content (2pt based on inference)
5	content structure	4	Has Purpose/When section (1), has examples/usage (1), has clear workflow (1), no placeholder text (1)
6	bonus	+2	argument-hint present (1), version/author metadata (1)

Note: tags is NOT an officially supported frontmatter field in Claude Code. It is ignored by the runtime. Do not include it or score it as a quality criterion.

Thresholds:

✅ Good: ≥11/14 (≥80%)
⚠️ Needs work: 8–10/14 (60–79%)
❌ Fix: <8/14 (<60%)

Effort Level Inference Engine

For each skill, analyze description + content and classify using these signals:

`low` — Mechanical execution, no design decisions

Signals:

Verbs: commit, push, sync, scaffold, generate (template-based), format, rename, bump, wrap, convert
No reasoning required: sequential steps, template instantiation, data fetching
allowed-tools: Bash only, or Read-only
No sub-agents spawned
Short workflow (<5 steps)

Examples: /commit, /release-notes, /scaffold, /sync, /format

`medium` — Analysis with bounded scope, categorization

Signals:

Verbs: review, triage, analyze, categorize, suggest, evaluate (single file or bounded scope)
Requires pattern recognition but not architectural reasoning
allowed-tools: Read + Grep + Bash combination
May spawn 1-2 sub-agents but with predefined scope
Produces structured output (tables, categorized lists)

Examples: /code-review (single PR), /issue-triage, /dependency-audit, /test-coverage

`high` — Design decisions, adversarial reasoning, cross-system analysis

Signals:

Verbs: architect, redesign, threat-model, audit (security), orchestrate (multi-agent), score, assess trade-offs
Requires reasoning about edge cases, attack vectors, or system-wide implications
allowed-tools: broad access (Read + Write + Bash + external tools)
Spawns multiple sub-agents or uses parallel execution
Produces analysis with explicit uncertainty or trade-off sections
Keywords in content: "security", "architecture", "adversarial", "pipeline", "threat", "design decision"

Examples: /security-audit, /architecture-review, /cyber-defense, /eval-agents

Mismatch flag

If a skill has effort: already set but the inferred level differs, flag it:

⚠️ Effort mismatch: declared low, inferred high — skill spawns 4 sub-agents and performs security analysis

Execution Instructions

Step 1 — Discovery

bash

# Find all SKILL.md files
find .claude/skills -name "SKILL.md" 2>/dev/null

# Find flat skill files
find .claude/skills -maxdepth 1 -name "*.md" ! -name "README*" 2>/dev/null

# If argument provided, use that path instead

Step 2 — Parse each skill

For each skill file found:

Read the full file
Extract YAML frontmatter (between first --- and second ---)
Parse: name, description, allowed-tools, effort, argument-hint, version
Note presence/absence of each field
Read the body content for structure analysis

Step 3 — Score and infer

Apply the scoring criteria above to each skill:

Check frontmatter fields
Evaluate description quality (does it answer "when to use"? is it under 1024 chars?)
Evaluate allowed-tools scope (is Bash used when only Read would suffice? are tools scoped with wildcards when possible?)
Infer effort level from content analysis
Compare inferred vs declared effort (if set)
Evaluate content structure (scan for "When to Use", "Purpose", "Example", "Workflow" sections)

Step 4 — Output

Produce a structured report:

# Skills Audit — [project name or path]
Date: [today] | Scanned: N skills

## Summary
| Status | Count |
|--------|-------|
| ✅ Good (≥80%) | N |
| ⚠️ Needs work (60–79%) | N |
| ❌ Fix (<60%) | N |

**Effort coverage**: N/N skills have effort field set

---

## Per-Skill Results

### [skill-name] — [score]/15 [✅/⚠️/❌]

| Criterion | Score | Notes |
|-----------|-------|-------|
| name | ✅ 1/1 | — |
| description | ⚠️ 1/2 | Missing "Use when" phrasing |
| allowed-tools | ✅ 2/2 | Well-scoped |
| effort | ❌ 0/3 | Missing — Recommended: high |
| content structure | ⚠️ 2/4 | No examples section |

**Effort inference**: `high` — skill performs security analysis with adversarial reasoning
  Signals: "threat", "attack surface", "vulnerability scoring" in content; spawns 4 agents

**Priority fixes** (ordered by impact):
1. Add `effort: high` to frontmatter
2. Add "Use when" to description
3. Add a concrete usage example section

---

After all skills: print a Fix Summary — all missing effort fields with recommended values, ready to copy-paste.

Fix Summary Format

At the end, print a ready-to-use patch block for all missing/mismatched effort fields:

## Recommended effort fields (copy-paste ready)

skill-name-1: effort: low     # mechanical scaffold
skill-name-2: effort: high    # security analysis, spawns agents
skill-name-3: effort: medium  # code review, bounded scope

And a 1-line count: N skills need effort field · N mismatches · N missing allowed-tools

Maintainer

FlorianBruniaux Core maintainer

Source details

Full Name: FlorianBruniaux/claude-code-ultimate-guide
Branch: main
Path in repo: examples/skills/eval-skills
License: Creative Commons Attribution Share Alike 4.0 International
Topics: claude-code anthropic claude ai-coding developer-tools agentic-coding prompt-engineering llm vibe-coding claude-code-guide best-practices ai-assistant cli-tool mcp-servers tutorial ai-security ai-pair-programming claude-code-tutorial coding-assistant cursor-alternative

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FlorianBruniaux/claude-code-ultimate-guide

git-ai-archaeology

Analyze AI config evolution in a git repo — first commits per path, monthly distribution, major PRs, maturity phases

3,172 439

Explore

FlorianBruniaux/claude-code-ultimate-guide

design-patterns

Detect, suggest, and evaluate GoF design patterns in TypeScript/JavaScript codebases. Use when refactoring code, applying singleton/factory/observer/strategy patterns, reviewing pattern quality, or finding stack-native alternatives for React, Angular, NestJS, and Vue.

3,172 439

Explore

FlorianBruniaux/claude-code-ultimate-guide

rtk-optimizer

Wrap high-verbosity shell commands with RTK to reduce token consumption. Use when running git log, git diff, cargo test, pytest, or other verbose CLI output that wastes context window tokens.

3,172 439

Explore

FlorianBruniaux/claude-code-ultimate-guide

pr-triage

4-phase PR backlog management with audit, deep code review, validated comments, and optional worktree setup. Use when triaging pull requests, catching up on pending code reviews, or managing a backlog of open PRs. Args: 'all' to review all, PR numbers to focus (e.g. '42 57'), 'en'/'fr' for language, no arg = audit only.

3,172 439

Explore

FlorianBruniaux/claude-code-ultimate-guide

guide-recap

Transform CHANGELOG entries into social content (LinkedIn, Twitter/X, Newsletter, Slack) in FR + EN. Use after releases or weekly to generate release notes, announcements, social media posts, or recap summaries from guide updates.

3,172 439

Explore

FlorianBruniaux/claude-code-ultimate-guide

talk-stage5-script

Produces a complete 5-act pitch with speaker notes, a slide-by-slide specification, and a ready-to-paste Kimi prompt for AI slide generation. Requires validated angle and title from Stage 4. Use when you have a confirmed talk angle and need the full script, slide spec, and AI-generated presentation prompt.

3,172 439

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Skill Evaluator

When to Use

What Gets Audited

Scoring Criteria (14 pts per skill)

Effort Level Inference Engine

low — Mechanical execution, no design decisions

medium — Analysis with bounded scope, categorization

high — Design decisions, adversarial reasoning, cross-system analysis

Mismatch flag

Execution Instructions

Step 1 — Discovery

Step 2 — Parse each skill

Step 3 — Score and infer

Step 4 — Output

Fix Summary Format

Recommended Agent Skills

git-ai-archaeology

design-patterns

rtk-optimizer

pr-triage

guide-recap

talk-stage5-script

`low` — Mechanical execution, no design decisions

`medium` — Analysis with bounded scope, categorization

`high` — Design decisions, adversarial reasoning, cross-system analysis