Agent skill

autocontext

Iterative strategy generation and evaluation system. Use when the user wants to evaluate agent output quality, run improvement loops, queue tasks for background evaluation, check run status, or discover available scenarios. Provides LLM-based judging with rubric-driven scoring.

Stars 729
Forks 50

Install this agent skill to your Project

npx add-skill https://github.com/greyhaven-ai/autocontext/tree/main/pi/skills/autocontext

SKILL.md

autocontext

autocontext is an iterative strategy generation and evaluation system that uses LLM-based judging to score and improve agent outputs.

Available Tools

  • autocontext_judge — Evaluate agent output against a rubric. Returns a 0–1 score with reasoning and per-dimension breakdowns.
  • autocontext_improve — Run a multi-round improvement loop. The agent output is judged, revised based on feedback, and re-evaluated until the quality threshold is met or max rounds are exhausted.
  • autocontext_queue — Enqueue a task for background evaluation by the task runner daemon.
  • autocontext_status — Check the status of runs and queued tasks.
  • autocontext_scenarios — List available evaluation scenarios and their families.

Quick Start

1. Evaluate output quality

Use autocontext_judge with a task prompt, the agent's output, and a rubric:

autocontext_judge(
  task_prompt="Write a Python function to parse CSV files",
  agent_output="def parse_csv(path): ...",
  rubric="Correctness, error handling, edge cases, documentation"
)

2. Improve output iteratively

Use autocontext_improve to automatically revise output through judge-guided feedback loops:

autocontext_improve(
  task_prompt="Write a Python function to parse CSV files",
  initial_output="def parse_csv(path): ...",
  rubric="Correctness, error handling, edge cases, documentation",
  max_rounds=5,
  quality_threshold=0.85
)

3. Queue background tasks

Use autocontext_queue with a scenario name to enqueue evaluation tasks for asynchronous processing:

autocontext_queue(spec_name="my_scenario")

Check results later with autocontext_status.

4. Discover scenarios

Use autocontext_scenarios to see what evaluation scenarios are available:

autocontext_scenarios()
autocontext_scenarios(family="agent_task")

Configuration

The extension auto-detects configuration from these sources:

  1. Project config.autoctx.json in the working directory (created via autoctx init)
  2. Environment variables:
    • AUTOCONTEXT_AGENT_PROVIDER or AUTOCONTEXT_PROVIDER — Provider type
    • AUTOCONTEXT_AGENT_API_KEY or AUTOCONTEXT_API_KEY — Provider API key
    • AUTOCONTEXT_AGENT_DEFAULT_MODEL or AUTOCONTEXT_MODEL — Model override
    • AUTOCONTEXT_DB_PATH — SQLite database path override
  3. Pi provider — Falls back to Pi's configured LLM provider

CLI Companion

For standalone usage outside Pi, install the autoctx CLI:

bash
npm install -g autoctx
autoctx init
autoctx solve --description "your problem" --gens 5
autoctx simulate --description "your simulation" --runs 3

Expand your agent's capabilities with these related and highly-rated skills.

greyhaven-ai/autocontext

grid-ctf-ops

Operational knowledge for the grid_ctf scenario including strategy playbook, lessons learned, and resource references. Use when generating, evaluating, coaching, or debugging grid_ctf strategies.

729 50
Explore
greyhaven-ai/claude-code-config

grey-haven-prompt-engineering

Master 26 documented prompt engineering principles for crafting effective LLM prompts with 400%+ quality improvement. Includes templates, anti-patterns, and quality checklists for technical, learning, creative, and research tasks. Use when writing prompts for LLMs, improving AI response quality, training on prompting, designing agent instructions, or when user mentions 'prompt engineering', 'better prompts', 'LLM quality', 'prompt templates', 'AI prompts', 'prompt principles', or 'prompt optimization'.

23 2
Explore
greyhaven-ai/claude-code-config

grey-haven-tool-design

Design effective MCP tools and Claude Code integrations using the consolidation principle. Fewer, better-designed tools dramatically improve agent success rates. Use when creating MCP servers, designing tool interfaces, optimizing tool sets, or when user mentions 'tool design', 'MCP', 'fewer tools', 'tool consolidation', 'tool architecture', or 'tool optimization'.

23 2
Explore
greyhaven-ai/claude-code-config

grey-haven-documentation-alignment

6-phase verification system ensuring code matches documentation with automated alignment scoring (signature, type, behavior, error, example checks). Reduces onboarding friction 40%. Use when verifying code-docs alignment, onboarding developers, after code changes, pre-release documentation checks, or when user mentions 'docs out of sync', 'documentation verification', 'code-docs alignment', 'docs accuracy', 'documentation drift', or 'verify documentation'.

23 2
Explore
greyhaven-ai/claude-code-config

grey-haven-tdd-orchestration

Master TDD orchestration with multi-agent coordination, strict red-green-refactor enforcement, automated test generation, coverage tracking, and >90% coverage quality gates. Supports Claude Teams for parallel TDD workflows with plan approval gates, or falls back to sequential subagent coordination. Coordinates tdd-python, tdd-typescript, and test-generator agents. Use when implementing features with TDD workflow, coordinating multiple TDD agents, enforcing test-first development, orchestrating TDD teams, or when user mentions 'TDD workflow', 'test-first', 'TDD orchestration', 'multi-agent TDD', 'test coverage', or 'red-green-refactor'.

23 2
Explore
greyhaven-ai/claude-code-config

grey-haven-performance-optimization

Comprehensive performance analysis and optimization for algorithms (O(n²)→O(n)), databases (N+1 queries, indexes), React (memoization, virtual lists), bundles (code splitting), API caching, and memory leaks. 85%+ improvement rate. Use when application is slow, response times exceed SLA, high CPU/memory usage, performance budgets needed, or when user mentions 'performance', 'slow', 'optimization', 'bottleneck', 'speed up', 'latency', 'memory leak', or 'performance tuning'.

23 2
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results