Agent skill

plugin-dev-workflow

Guide plugin development workflow — editing skills, agents, hooks, or eval framework in this repo. Use when modifying files in plugins/elixir-phoenix/, lab/eval/, or lab/autoresearch/. Ensures changes pass eval, lint, and tests before committing.

Stars 252
Forks 17

Install this agent skill to your Project

npx add-skill https://github.com/oliver-kriska/claude-elixir-phoenix/tree/main/.claude/skills/plugin-dev-workflow

SKILL.md

Plugin Development Workflow

This repo is the Elixir/Phoenix Claude Code plugin. When editing plugin files, follow this workflow to ensure quality.

Before You Start

Run make help to see all available commands:

bash
make eval          # Quick: lint + score changed skills/agents
make eval-all      # Full: all 40 skills + 20 agents
make eval-fix      # Auto-fix + show failures
make test          # 52 pytest tests for eval framework
make ci            # Full CI pipeline

Scoring Individual Files (CLI)

IMPORTANT: Always use -m module syntax, never run scorer.py directly.

bash
# Score ONE skill (use -m, NOT direct file path)
python3 -m lab.eval.scorer plugins/elixir-phoenix/skills/verify/SKILL.md

# Score ONE skill with pretty output
python3 -m lab.eval.scorer plugins/elixir-phoenix/skills/verify/SKILL.md --pretty

# Score all skills
python3 -m lab.eval.scorer --all

# Score ONE agent
python3 -m lab.eval.agent_scorer plugins/elixir-phoenix/agents/verification-runner.md

# Score all agents
python3 -m lab.eval.agent_scorer --all
make ci            # Full CI pipeline

When Editing Skills (plugins/elixir-phoenix/skills/*/SKILL.md)

  1. Read CLAUDE.md conventions (size limits, frontmatter requirements)
  2. Make your changes
  3. Run make eval — it auto-detects changed skills and scores them
  4. If FAIL: check the dimension that failed, fix it
  5. Run make lint to verify markdown formatting
  6. Commit

Skill requirements (eval checks all of these):

  • Frontmatter: name, description, effort. Description must start with action verb + include "Use when..."
  • Iron Laws section with 1+ numbered items
  • Under 185 lines (command skills) or 150 lines (reference skills)
  • No section exceeds 45 lines
  • All /phx: references point to existing skills
  • All references/*.md paths exist
  • No dangerous code patterns outside Iron Laws sections
  • Code examples present (1+ fenced code blocks)
  • "Use when..." in description (for trigger accuracy)

When Editing Agents (plugins/elixir-phoenix/agents/*.md)

  1. Make your changes
  2. Run make eval-agents to score all agents
  3. Agent requirements:
    • permissionMode: bypassPermissions (always — background agents need it)
    • disallowedTools: Write, Edit, NotebookEdit for review/analysis agents
    • model matches effort: haiku=low, sonnet=medium, opus=high
    • Under 300 lines (specialist) or 535 lines (orchestrator)

When Editing Eval Framework (lab/eval/*.py)

  1. Make your changes
  2. Run make test — 52 pytest tests must pass
  3. Run make eval-all — verify no skills/agents regressed
  4. If adding new matchers: add tests in lab/eval/tests/test_matchers.py

When Editing Hooks (plugins/elixir-phoenix/hooks/scripts/*.sh)

  1. Make your changes
  2. Run make lint (markdown in hook comments)
  3. Test the hook manually (hooks run on Edit/Write/Bash events)
  4. Check CLAUDE.md hook documentation is still accurate

Autoresearch (Self-Improvement Loop)

If make eval-fix shows failures, it suggests an autoresearch command:

bash
# Copy-paste the suggested command from eval-fix output
claude -p 'Run autoresearch. Score all skills...' --allowedTools 'Edit,Read,Write,Bash,Glob,Grep'

This runs the autoresearch loop: find weakest skill → fix ONE issue → re-score → keep/revert.

Pre-Commit Checklist

Before committing any plugin changes:

  • make lint passes
  • make eval passes (changed files)
  • make test passes (if eval framework changed)
  • CHANGELOG.md updated (if user-visible change)
  • Version bumped in plugin.json (if releasing)

References

  • CLAUDE.md — full conventions, size limits, checklist
  • lab/eval/ — scoring framework (24 matchers, 8 dimensions)
  • lab/autoresearch/ — self-improvement loop
  • lab/findings/interesting.jsonl — log interesting discoveries here

Expand your agent's capabilities with these related and highly-rated skills.

oliver-kriska/claude-elixir-phoenix

lab:autoresearch

Self-improving loop for plugin skills. Reads program.md, proposes one mutation per iteration, evaluates against deterministic scorer, keeps improvements via git, reverts failures. Targets weakest skill+dimension. Use with /loop for overnight runs.

252 17
Explore
oliver-kriska/claude-elixir-phoenix

promote

Generate X/Twitter release promotion posts with ASCII tables and CodeSnap rendering. Use when writing release posts, promotion tweets, plugin announcements, or preparing social media content for new versions.

252 17
Explore
oliver-kriska/claude-elixir-phoenix

skill-monitor

Analyze skill effectiveness across sessions. Computes per-skill metrics (action rate, friction, outcomes), identifies degrading skills, and generates improvement recommendations. Requires session-scan data in metrics.jsonl.

252 17
Explore
oliver-kriska/claude-elixir-phoenix

session-trends

Analyze trends across session metrics. Computes windowed aggregates, deltas, and compares against MEMORY.md findings. Use periodically for progress tracking.

252 17
Explore
oliver-kriska/claude-elixir-phoenix

cc-changelog

CONTRIBUTOR TOOL - Track CC changelog, extract new versions since last check, analyze impact on plugin (breaking changes, opportunities, deprecations). Run periodically or before releases. NOT part of the distributed plugin.

252 17
Explore
oliver-kriska/claude-elixir-phoenix

session-scan

Compute metrics for Claude Code sessions. Discovers via ccrider, filters trivial, computes friction/opportunity/fingerprint scores. Use for broad session triage.

252 17
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results