Agent skill
hyperagent
Run a self-referential self-improving agent loop where a meta-agent iteratively modifies a task-agent's code to optimize for any measurable target. Based on Facebook Research's Hyperagents paper (arXiv:2603.19461). Use when asked to "run hyperagent", "self-improve this", "optimize with self-modification", or "evolve this agent/script".
Install this agent skill to your Project
npx add-skill https://github.com/ckorhonen/claude-skills/tree/main/skills/hyperagent
SKILL.md
Hyperagent
Quick Start — Simple Examples
New to Hyperagent? Try these beginner-friendly tasks before the full setup.
1. Optimize a simple Python script to run faster
Say: "Use hyperagent to optimize this script for speed" and paste something like:
# slow_sort.py
def sort_numbers(nums):
result = []
while nums:
smallest = min(nums)
result.append(smallest)
nums.remove(smallest)
return result
Hyperagent will benchmark it, propose a faster implementation, and validate the improvement.
2. Improve a prompt to get better answers
Say: "Run hyperagent on this prompt and improve accuracy" with a prompt like:
Summarize this article in one sentence.
The meta-agent iterates on the prompt, measures quality, and keeps improvements that score higher.
3. Make a sorting function more efficient
Say: "Evolve this function with hyperagent" and paste any function. Hyperagent creates a benchmark, runs generations of improvements, and shows you the performance gain per generation.
4. Self-improve any script
Say: "Self-improve this agent/script" and point to any Python file. Hyperagent wraps it in an evaluation loop, proposes modifications, and tracks what works.
The simplest possible setup: create
task.shthat printsMETRIC score=0.5, then runpython3 scripts/init_session.py. From there the loop is fully automated.
Self-referential self-improvement: a meta-agent that modifies a task-agent (and itself) to optimize any measurable objective.
Inspired by Facebook Research's Hyperagents paper (arXiv:2603.19461), which demonstrated that agents combining a task-solver and a self-modifying meta-level into a single editable program can achieve open-ended, compounding improvements that transfer across domains.
How It Works
A hyperagent is a system with two components in a single editable codebase:
- Task Agent — solves the target task (benchmark, code generation, data processing, etc.)
- Meta Agent — analyzes task performance history and proposes modifications to the task agent's code (and optionally its own code)
The key insight from the paper: when the meta-level modification procedure is itself editable, the system can improve not just task performance but also the mechanism that generates future improvements — enabling compounding, transferable gains.
Core Principles
-
Self-referential modification
The meta-agent can modify the task-agent's code AND its own strategy. Both live in the same editable workspace. This enables metacognitive self-improvement: improving how you improve.
-
Population-based exploration (archive)
Don't just keep the best variant — maintain an archive of all successful variants as stepping stones. Parent selection favors high performers with unexplored potential.
-
Empirical evaluation gates everything
No change is accepted without measurement. Every candidate is evaluated against the task benchmark with repeated trials.
-
Persistent memory and performance tracking
The system maintains a structured history of all experiments, hypotheses, and outcomes. Later generations build on earlier insights — no rediscovering dead ends.
-
Transfer across domains
Meta-level improvements (performance tracking, evaluation strategies, hypothesis generation patterns) are domain-agnostic and can be transferred to new tasks.
Available Scripts
scripts/common.py— shared utilities (archive management, metrics, reporting)scripts/init_session.py— initialize a hyperagent session, scaffold the workspacescripts/run_task.py— evaluate a task-agent variant and record metricsscripts/log_variant.py— log evaluated record, decide disposition, update archive and reportsscripts/render_report.py— generate HTML report of the full evolutionary historyscripts/select_parent.py— select a parent from the archive for the next generation
Note: There is no
generate_variant.pyscript — the meta-agent role (hypothesis generation and code modification) is performed by the LLM agent itself, not by a script.
All scripts are non-interactive, expose --help, emit structured JSON on stdout, and keep diagnostics on stderr.
Default Workflow
-
Initialize the session after defining the optimization target:
bashpython3 scripts/init_session.py \ --goal "Improve prompt accuracy on math benchmark" \ --metric-name accuracy \ --unit pct \ --direction higher \ --task-command ./task.sh \ --checks-command ./checks.sh \ --scope src/agent.py \ --max-generations 50 -
Evaluate the baseline (generation 0):
bashpython3 scripts/run_task.py \ --id gen-000 \ --hypothesis "Control: unmodified task agent" \ --change-summary "No modifications" \ --baseline \ --output .hyperagent/gen-000.json python3 scripts/log_variant.py --input .hyperagent/gen-000.json -
Selection → Modification → Evaluation loop:
bash# Select a parent from the archive python3 scripts/select_parent.py --output .hyperagent/parent.json # Generate a variant (meta-agent proposes modifications) # This is where YOU (the LLM agent) act as the meta-agent: # - Read the parent's code and performance history # - Hypothesize an improvement # - Apply code modifications # - Record what you changed and why # Evaluate the variant python3 scripts/run_task.py \ --id gen-001 \ --hypothesis "Add chain-of-thought prompting to improve reasoning" \ --change-summary "Wrap task prompt in step-by-step reasoning template" \ --parent gen-000 \ --output .hyperagent/gen-001.json python3 scripts/log_variant.py --input .hyperagent/gen-001.json -
Render reports at any time:
bashpython3 scripts/render_report.py
Up-Front Q&A
Before starting, gather or confirm:
- Objective — what are we optimizing?
- Primary metric — exact name, unit, direction (lower/higher)
- Task command — the script that runs the task agent and emits
METRIC name=valuelines - Correctness gates — tests or checks that must pass for a variant to be kept
- Scope — which files can the meta-agent modify?
- Meta-scope — can the meta-agent modify its own strategy? (default: yes)
- Generation budget — max generations before stopping
- Minimum improvement threshold — default 1%
Workspace Setup
-
Prefer a dedicated worktree on a fresh branch:
bashgit worktree add ../hyperagent-<goal>-<date> -b hyperagent/<goal>-<date> -
Create:
hyperagent.md— checked in, durable session brief with full evolutionary historytask.sh— checked in, benchmark runner (emitsMETRIC name=value)checks.sh— checked in, correctness gates.hyperagent/— local artifact directory, NOT checked in
-
Ensure artifacts stay untracked:
bashrg -qxF '.hyperagent/' .git/info/exclude || printf '\n.hyperagent/\n' >> .git/info/exclude
The Meta-Agent Role
You (the LLM) are the meta-agent. Your job each generation is:
- Select parent — use
scripts/select_parent.pyor choose based on the archive - Analyze — read the parent's code, performance history, and past experiment outcomes
- Hypothesize — propose a specific, testable modification with a causal theory for why it should help
- Modify — apply code changes to the task agent (and optionally to your own strategy notes in
hyperagent.md) - Evaluate — run
scripts/run_task.pyto measure the variant - Log — use
scripts/log_variant.pyto record the result and update the archive - Reflect — update
hyperagent.mdwith what you learned
Meta-Level Self-Modification
The meta-agent can improve its own process by updating:
- Strategy notes in
hyperagent.md(hypothesis generation patterns, evaluation heuristics) - Memory entries in
.hyperagent/memory.jsonl(qualitative insights, correction plans) - The task evaluation protocol (adding secondary metrics, changing trial counts)
These meta-improvements compound across generations and transfer to new tasks.
Required Files
hyperagent.md
The durable contract and evolutionary history. A fresh agent can resume from this.
# Hyperagent: <goal>
## Objective
<What is being optimized and why.>
## Configuration
- Primary metric:
- Unit:
- Direction:
- Minimum improvement: X%
- Task command:
- Correctness gates:
- Generation budget:
## Scope
- Task agent files:
- Meta-agent can self-modify: yes/no
## Archive
`.hyperagent/archive.jsonl`
## Lineage
<Tree showing parent→child relationships and which variants were kept>
## Meta-Strategy
<Current approach to hypothesis generation — updated as the meta-agent learns>
## What We've Learned
<Key wins, dead ends, transferable insights>
## Performance Tracking
<Best variant, improvement trajectory, current plateau status>
task.sh
Bash script that runs the task agent and emits METRIC name=value lines:
#!/bin/bash
set -euo pipefail
# Run the task agent
python3 src/agent.py --input data/test.json 2>/dev/null
# The agent script should emit: METRIC accuracy=0.85
Archive Structure
The archive (.hyperagent/archive.jsonl) stores every variant ever evaluated:
{
"id": "gen-007",
"generation": 7,
"parent_id": "gen-003",
"timestamp": "2026-03-27T20:00:00Z",
"hypothesis": "Add few-shot examples to improve pattern recognition",
"change_summary": "Inserted 3 domain-specific examples into the task prompt",
"files_touched": ["src/agent.py"],
"metric_name": "accuracy",
"direction": "higher",
"warmup_trials": [0.82, 0.83],
"measured_trials": [0.85, 0.86, 0.84, 0.85, 0.87],
"summary": {"median": 0.85, "mean": 0.854, "min": 0.84, "max": 0.87},
"checks": "passed",
"disposition": "keep",
"children_count": 0,
"meta_modifications": ["Updated strategy notes with few-shot pattern"],
"reason": "Improved by 3.2% over parent gen-003 (0.824). Checks passed."
}
Parent Selection
Selection probability for a parent is proportional to:
- Performance score (higher is better for archive diversity)
- Inverse of children count (favor unexplored high-performers)
This balances exploitation (good variants) with exploration (understudied variants).
python3 scripts/select_parent.py
# Output: {"selected_parent": "gen-003", "score": 0.824, "children": 1, "reason": "High performer with few children"}
Decision Rules
keep— variant beats current best by ≥ threshold, checks passdiscard— variant is worse, equal, or improvement below thresholdchecks_failed— metric improved but correctness gates failedcrash— variant could not be evaluated
Plateau Detection
Track improvement velocity. Stop or pivot when:
- 3+ consecutive generations with no improvement
- Hypothesis diversity drops (recycling ideas)
- Improvement velocity < 0.1% per generation over last 5
Loop Behavior
Run autonomously until:
- Generation budget exhausted
- Plateau detected (3 consecutive non-improvements)
- All promising hypotheses explored
- User interrupts
During the loop:
- One hypothesis per generation
- Record dead ends explicitly
- Keep the worktree clean between variants (revert discarded changes)
- Update
hyperagent.mdafter every generation
Common Pitfalls
1. Meta-Agent Overfitting Its Own Strategy
Symptom: Meta-strategy becomes over-specialized to early successes Fix: Periodically review and broaden the strategy; try categorically different approaches
2. Archive Bloat
Symptom: Archive grows large, selection becomes slow Fix: Archive old generations after 50 variants; maintain a compact summary
3. Self-Modification Destabilizing the Loop
Symptom: Meta-agent modifies evaluation or logging in ways that break the loop Fix: Keep outer-loop scripts (init, run, log, select) immutable. Only modify task code and strategy notes.
4. Hypothesis Recycling
Symptom: Later generations retry earlier failed ideas
Fix: Always read .hyperagent/memory.jsonl before proposing. Explicitly check against dead ends.
Transfer Protocol
To transfer meta-improvements to a new domain:
- Extract meta-strategy from
hyperagent.md"What We've Learned" section - Copy
.hyperagent/memory.jsonlas starting knowledge - Initialize new session with transferred strategy as initial context
- The meta-agent starts with accumulated wisdom instead of from scratch
Report Generation
python3 scripts/render_report.py
Generates .hyperagent/report.html with:
- Lineage tree visualization
- Performance over generations
- Best-so-far trend
- Disposition breakdown
- Per-variant trial distributions
- Meta-strategy evolution timeline
Didn't find tool you were looking for?