Agent skill
autocontext
Iterative strategy generation and evaluation system. Use when the user wants to evaluate agent output quality, run improvement loops, queue tasks for background evaluation, check run status, or discover available scenarios. Provides LLM-based judging with rubric-driven scoring.
Install this agent skill to your Project
npx add-skill https://github.com/greyhaven-ai/autocontext/tree/main/pi/skills/autocontext
SKILL.md
autocontext
autocontext is an iterative strategy generation and evaluation system that uses LLM-based judging to score and improve agent outputs.
Available Tools
- autocontext_judge — Evaluate agent output against a rubric. Returns a 0–1 score with reasoning and per-dimension breakdowns.
- autocontext_improve — Run a multi-round improvement loop. The agent output is judged, revised based on feedback, and re-evaluated until the quality threshold is met or max rounds are exhausted.
- autocontext_queue — Enqueue a task for background evaluation by the task runner daemon.
- autocontext_status — Check the status of runs and queued tasks.
- autocontext_scenarios — List available evaluation scenarios and their families.
Quick Start
1. Evaluate output quality
Use autocontext_judge with a task prompt, the agent's output, and a rubric:
autocontext_judge(
task_prompt="Write a Python function to parse CSV files",
agent_output="def parse_csv(path): ...",
rubric="Correctness, error handling, edge cases, documentation"
)
2. Improve output iteratively
Use autocontext_improve to automatically revise output through
judge-guided feedback loops:
autocontext_improve(
task_prompt="Write a Python function to parse CSV files",
initial_output="def parse_csv(path): ...",
rubric="Correctness, error handling, edge cases, documentation",
max_rounds=5,
quality_threshold=0.85
)
3. Queue background tasks
Use autocontext_queue with a scenario name to enqueue evaluation tasks
for asynchronous processing:
autocontext_queue(spec_name="my_scenario")
Check results later with autocontext_status.
4. Discover scenarios
Use autocontext_scenarios to see what evaluation scenarios are available:
autocontext_scenarios()
autocontext_scenarios(family="agent_task")
Configuration
The extension auto-detects configuration from these sources:
- Project config —
.autoctx.jsonin the working directory (created viaautoctx init) - Environment variables:
AUTOCONTEXT_AGENT_PROVIDERorAUTOCONTEXT_PROVIDER— Provider typeAUTOCONTEXT_AGENT_API_KEYorAUTOCONTEXT_API_KEY— Provider API keyAUTOCONTEXT_AGENT_DEFAULT_MODELorAUTOCONTEXT_MODEL— Model overrideAUTOCONTEXT_DB_PATH— SQLite database path override
- Pi provider — Falls back to Pi's configured LLM provider
CLI Companion
For standalone usage outside Pi, install the autoctx CLI:
npm install -g autoctx
autoctx init
autoctx solve --description "your problem" --gens 5
autoctx simulate --description "your simulation" --runs 3
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
grid-ctf-ops
Operational knowledge for the grid_ctf scenario including strategy playbook, lessons learned, and resource references. Use when generating, evaluating, coaching, or debugging grid_ctf strategies.
grey-haven-prompt-engineering
Master 26 documented prompt engineering principles for crafting effective LLM prompts with 400%+ quality improvement. Includes templates, anti-patterns, and quality checklists for technical, learning, creative, and research tasks. Use when writing prompts for LLMs, improving AI response quality, training on prompting, designing agent instructions, or when user mentions 'prompt engineering', 'better prompts', 'LLM quality', 'prompt templates', 'AI prompts', 'prompt principles', or 'prompt optimization'.
grey-haven-tool-design
Design effective MCP tools and Claude Code integrations using the consolidation principle. Fewer, better-designed tools dramatically improve agent success rates. Use when creating MCP servers, designing tool interfaces, optimizing tool sets, or when user mentions 'tool design', 'MCP', 'fewer tools', 'tool consolidation', 'tool architecture', or 'tool optimization'.
grey-haven-documentation-alignment
6-phase verification system ensuring code matches documentation with automated alignment scoring (signature, type, behavior, error, example checks). Reduces onboarding friction 40%. Use when verifying code-docs alignment, onboarding developers, after code changes, pre-release documentation checks, or when user mentions 'docs out of sync', 'documentation verification', 'code-docs alignment', 'docs accuracy', 'documentation drift', or 'verify documentation'.
grey-haven-tdd-orchestration
Master TDD orchestration with multi-agent coordination, strict red-green-refactor enforcement, automated test generation, coverage tracking, and >90% coverage quality gates. Supports Claude Teams for parallel TDD workflows with plan approval gates, or falls back to sequential subagent coordination. Coordinates tdd-python, tdd-typescript, and test-generator agents. Use when implementing features with TDD workflow, coordinating multiple TDD agents, enforcing test-first development, orchestrating TDD teams, or when user mentions 'TDD workflow', 'test-first', 'TDD orchestration', 'multi-agent TDD', 'test coverage', or 'red-green-refactor'.
grey-haven-performance-optimization
Comprehensive performance analysis and optimization for algorithms (O(n²)→O(n)), databases (N+1 queries, indexes), React (memoization, virtual lists), bundles (code splitting), API caching, and memory leaks. 85%+ improvement rate. Use when application is slow, response times exceed SLA, high CPU/memory usage, performance budgets needed, or when user mentions 'performance', 'slow', 'optimization', 'bottleneck', 'speed up', 'latency', 'memory leak', or 'performance tuning'.
Didn't find tool you were looking for?