Agent skill

lab:autoresearch

Self-improving loop for plugin skills. Reads program.md, proposes one mutation per iteration, evaluates against deterministic scorer, keeps improvements via git, reverts failures. Targets weakest skill+dimension. Use with /loop for overnight runs.

View SKILL.md on GitHub Repository

Stars 252

Forks 17

Install this agent skill to your Project

npx add-skill https://github.com/oliver-kriska/claude-elixir-phoenix/tree/main/lab/autoresearch

SKILL.md

Autoresearch — Plugin Skill Self-Improvement

Iteratively improve plugin skills via the autoresearch pattern: propose one mutation -> eval -> keep/revert -> repeat.

Usage

/lab:autoresearch                           # Targeted: attack weakest skill+dimension
/lab:autoresearch --skill review            # Focus on one skill
/lab:autoresearch --strategy sweep          # Process all skills alphabetically
/lab:autoresearch --dry-run                 # Show what would change, don't commit

For overnight runs:

/loop 5m /lab:autoresearch --strategy sweep --max-iterations 200

Iron Laws

ONE mutation per iteration — if description needs "and", split into two
NEVER mutate read-only files — check program.md before every write
EVAL is deterministic — always use the wrapper script, never LLM-judge
REVERT on regression OR checks failure — no exceptions
LOG every iteration — use keep or revert command (never skip)
CHECK ideas.md before proposing — don't rediscover known optimizations

Wrapper Script Commands

All eval/git/journal operations go through ONE script. Do NOT run these manually.

bash

# Find the weakest skill+dimension
python3 lab/autoresearch/scripts/run-iteration.py target --strategy targeted

# Score a skill (before mutation, to get baseline)
python3 lab/autoresearch/scripts/run-iteration.py score <skill-name>

# After mutation: score + checks + compare → verdict (KEEP or REVERT)
python3 lab/autoresearch/scripts/run-iteration.py eval <skill-name>

# Act on verdict:
python3 lab/autoresearch/scripts/run-iteration.py keep <skill> <dim> <old> <new> \
  --desc "what changed" --asi '{"hypothesis": "why", "mechanism": "how"}'

python3 lab/autoresearch/scripts/run-iteration.py revert <skill> <dim> <old> <new> \
  --desc "what was attempted" --asi '{"hypothesis": "why", "regression": "what broke", "avoid": "do not retry this"}'

# Check overall progress
python3 lab/autoresearch/scripts/run-iteration.py status

Core Loop (ONE iteration)

Step 1: Read State

Read lab/autoresearch/program.md (goals, mutable surface, rules)
Read lab/autoresearch/ideas.md if it exists (deferred optimizations)
Run: python3 lab/autoresearch/scripts/run-iteration.py status

Step 2: Select Target

Run: python3 lab/autoresearch/scripts/run-iteration.py target --strategy targeted

Parse the JSON: skill, dimension, failing_checks. If all_perfect → STOP.

Step 3: Read + Propose

Read target SKILL.md and its references/ listing
Read eval definition from lab/eval/evals/{skill}.json
Check ideas.md for deferred ideas about this skill
Check recent journal entries for prior failures on this skill (avoid repeats)
Consult ${CLAUDE_SKILL_DIR}/references/mutation-strategies.md
Propose exactly ONE change targeting the failing checks

Step 4: Apply + Evaluate

Apply the mutation via Edit tool
Run: python3 lab/autoresearch/scripts/run-iteration.py eval <skill-name>
Parse JSON → check verdict field

Step 5: Keep or Revert

If verdict is KEEP:

bash

python3 lab/autoresearch/scripts/run-iteration.py keep <skill> <dim> <old> <new> \
  --desc "..." --asi '{"hypothesis": "...", "mechanism": "..."}'

If verdict is REVERT:

bash

python3 lab/autoresearch/scripts/run-iteration.py revert <skill> <dim> <old> <new> \
  --desc "..." --asi '{"hypothesis": "...", "regression": "...", "avoid": "..."}'

Step 6: Ideas Backlog

If during analysis you discovered a promising optimization you can't act on now:

Append it to lab/autoresearch/ideas.md as a bullet
On next resume: prune stale/tried ideas, experiment with the rest

Step 7: Continue or Stop

All targets >= 0.95? Print "AUTORESEARCH_COMPLETE"
Max iterations reached? Print "AUTORESEARCH_COMPLETE"
50 consecutive discards? Print "AUTORESEARCH_STUCK"
Otherwise: immediately start Step 1 again

References

${CLAUDE_SKILL_DIR}/references/mutation-strategies.md — mutation type catalog
${CLAUDE_SKILL_DIR}/references/state-management.md — git protocol, journaling
lab/autoresearch/program.md — research agenda (read every iteration)

Maintainer

oliver-kriska Core maintainer

Source details

Full Name: oliver-kriska/claude-elixir-phoenix
Branch: main
Path in repo: lab/autoresearch
License: MIT License
Topics: claude-code claude claude-code-skills automation claude-skills vibe-coding claude-code-plugin elixir elixir-phoenix phoenix elixir-lang phoenix-framework

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

oliver-kriska/claude-elixir-phoenix

promote

Generate X/Twitter release promotion posts with ASCII tables and CodeSnap rendering. Use when writing release posts, promotion tweets, plugin announcements, or preparing social media content for new versions.

252 17

Explore

oliver-kriska/claude-elixir-phoenix

skill-monitor

Analyze skill effectiveness across sessions. Computes per-skill metrics (action rate, friction, outcomes), identifies degrading skills, and generates improvement recommendations. Requires session-scan data in metrics.jsonl.

252 17

Explore

oliver-kriska/claude-elixir-phoenix

session-trends

Analyze trends across session metrics. Computes windowed aggregates, deltas, and compares against MEMORY.md findings. Use periodically for progress tracking.

252 17

Explore

oliver-kriska/claude-elixir-phoenix

cc-changelog

CONTRIBUTOR TOOL - Track CC changelog, extract new versions since last check, analyze impact on plugin (breaking changes, opportunities, deprecations). Run periodically or before releases. NOT part of the distributed plugin.

252 17

Explore

oliver-kriska/claude-elixir-phoenix

session-scan

Compute metrics for Claude Code sessions. Discovers via ccrider, filters trivial, computes friction/opportunity/fingerprint scores. Use for broad session triage.

252 17

Explore

oliver-kriska/claude-elixir-phoenix

plugin-dev-workflow

Guide plugin development workflow — editing skills, agents, hooks, or eval framework in this repo. Use when modifying files in plugins/elixir-phoenix/, lab/eval/, or lab/autoresearch/. Ensures changes pass eval, lint, and tests before committing.

252 17

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Autoresearch — Plugin Skill Self-Improvement

Usage

Iron Laws

Wrapper Script Commands

Core Loop (ONE iteration)

Step 1: Read State

Step 2: Select Target

Step 3: Read + Propose

Step 4: Apply + Evaluate

Step 5: Keep or Revert

Step 6: Ideas Backlog

Step 7: Continue or Stop

References

Recommended Agent Skills

promote

skill-monitor

session-trends

cc-changelog

session-scan

plugin-dev-workflow