Agent skill

autonomous-agent-readiness

Assess a codebase's readiness for autonomous agent development and provide tailored recommendations. Use when asked to evaluate how well a project supports unattended agent execution, assess development practices for agent autonomy, audit infrastructure for agent reliability, or improve a codebase for autonomous agent workflows. Triggers on requests like "assess this project for agent readiness", "how autonomous-ready is this codebase", "evaluate agent infrastructure", or "improve development practices for agents".

Stars 3
Forks 1

Install this agent skill to your Project

npx add-skill https://github.com/petekp/agent-skills/tree/main/skills/autonomous-agent-readiness

SKILL.md

Autonomous Agent Readiness Assessment

Evaluate a codebase against proven patterns for autonomous agent execution and provide tailored recommendations.

Core Philosophy

Most agent failures are system design failures, not model failures. An agent that requires human approval at every step or depends on a developer's laptop being open is not autonomous. Autonomy is an infrastructure decision.

Assessment Workflow

Phase 1: Discovery

Gather information about the project's current state:

  1. Examine project structure

    • Look for CI/CD configuration (.github/workflows/, Jenkinsfile, .gitlab-ci.yml)
    • Check for containerization (Dockerfile, docker-compose.yml, devcontainer.json)
    • Identify test infrastructure (tests/, __tests__/, test config files)
    • Find environment management (.env.example, requirements.txt, package.json)
  2. Review development workflow

    • Read contributing guidelines, README, or developer docs
    • Check for sandbox/isolation patterns
    • Look for database setup scripts or fixtures
    • Identify how dependencies are managed
  3. Assess current automation

    • Review existing CI/CD pipelines
    • Check for automated testing patterns
    • Look for environment provisioning scripts
    • Identify cleanup/teardown procedures

Phase 2: Evaluate Against Principles

Score the project (0-3) on each dimension. See references/assessment-criteria.md for detailed rubrics.

Dimension What to Look For
Sandbox Isolation Ephemeral environments, container support, clean state per run
Database Independence Local DB setup, migrations in code, no external DB dependencies
Environment Reproducibility Explicit dependencies, no hidden state, deterministic setup
Session Independence Remote execution capability, no user session dependencies
Outcome-Oriented Design Clear acceptance criteria, minimal procedural coupling
Direct Interfaces CLI-first tools, OS primitives, minimal abstraction layers
Minimal Framework Overhead Simple interfaces, no heavy orchestration, composable CLI tools
Explicit State Workspace directories, file-based artifacts, inspectable logs
Benchmarking Measurable quality criteria, automated verification
Cost Awareness Resource limits, usage tracking, explicit provisioning
Verifiable Output Automated validation, deterministic results, clear exit codes
Infrastructure-Bounded Permissions System-enforced constraints, least-privilege, no runtime prompts

Phase 3: Generate Recommendations

For each dimension scoring below 2, provide:

  1. Current state: What exists today
  2. Gap: What's missing for autonomous execution
  3. Recommendation: Specific, actionable improvement
  4. Priority: High/Medium/Low based on impact and effort

Tailor recommendations to the project's:

  • Technology stack
  • Team size and workflow
  • Existing infrastructure
  • Deployment targets

Output Format

markdown
# Autonomous Agent Readiness Assessment

## Project: [name]
## Assessment Date: [date]

## Executive Summary

[1-2 paragraphs summarizing overall readiness and top priorities]

**Overall Readiness Score: X/36** (sum of dimension scores)

## Dimension Scores

| Dimension | Score | Status |
|-----------|-------|--------|
| Sandbox Isolation | X/3 | [emoji] |
| Database Independence | X/3 | [emoji] |
| ... | ... | ... |

Status: 0-1 = needs work, 2 = adequate, 3 = strong

## Detailed Findings

### [Dimension Name] (X/3)

**Current State:**
[What exists]

**Gap:**
[What's missing]

**Recommendation:**
[Specific action]

**Priority:** [High/Medium/Low]

[Repeat for each dimension]

## Prioritized Action Plan

### Immediate (This Week)
1. [Highest impact, lowest effort items]

### Short-term (This Month)
1. [Important foundational changes]

### Medium-term (This Quarter)
1. [Larger infrastructure investments]

## Quick Wins

[2-3 changes that can be made today with minimal effort]

Key Principles Reference

Sandbox Everything

Every agent run executes in its own ephemeral, isolated, disposable environment. Clean environment, writable filesystem, command execution, scoped network access. Environment destroyed after verified output.

No External Databases

Agents create their own databases inside the sandbox. Install packages on demand, spin up DBs locally, run migrations, seed data explicitly, tear down at end. Reproducible runs without shared state.

Environment Garbage Is Real

Long-lived environments accumulate stray files, half-installed packages, cached state, orphaned processes. Fresh environments surface correctness; persistent environments obscure it.

Run Independently of User Sessions

Agent loop decoupled from browser tabs, terminal sessions, developer machines. Start task, close laptop, return to completed artifacts. Control via wall-clock limits, resource limits, automatic cleanup.

Define Outcomes, Not Procedures

Avoid step-by-step plans and tool-level micromanagement. Define desired outcome, acceptance criteria, constraints. Planning and execution belong to the agent.

Direct, Low-Level Interfaces

Direct access to command execution, persistent files, network requests. OS primitives over abstraction layers. CLI-first systems are easier to debug and more capable than they look.

Persist State Explicitly

Writable workspace directory for intermediate results, logs, partial outputs, planning artifacts. Files are inspectable, deterministic, and enable post-run analysis.

Benchmarks Early

Introduce benchmarks as early as possible. Representative and repeatable metrics for quality. Even crude benchmarks beat none.

Minimal Framework Overhead

Most real-world agent workflows reduce to running commands, reading/writing files, and making network calls. CLI-first systems are easier to reason about, debug, and more capable than they look. When an abstraction layer is more complex than the task, it becomes the bottleneck.

Plan for Cost

Provision token usage, allocate compute explicitly, enforce limits by system. Autonomy shifts where costs appear, doesn't remove them.

Verifiable Output

Output must be verifiable without human review. Automated validation, deterministic results, clear success/failure exit codes. If quality cannot be measured, it cannot be trusted in autonomous operation.

Infrastructure-Bounded Permissions

Permissions are constrained by the environment, not by prompts or runtime decisions. Explicit capability grants, sandbox restrictions on dangerous operations, least-privilege by default. No runtime permission prompts required.

Expand your agent's capabilities with these related and highly-rated skills.

petekp/agent-skills

multi-model-meta-analysis

Synthesize outputs from multiple AI models into a comprehensive, verified assessment. Use when: (1) User pastes feedback/analysis from multiple LLMs (Claude, GPT, Gemini, etc.) about code or a project, (2) User wants to consolidate model outputs into a single reliable document, (3) User needs conflicting model claims resolved against actual source code. This skill verifies model claims against the codebase, resolves contradictions with evidence, and produces a more reliable assessment than any single model.

3 1
Explore
petekp/agent-skills

capture-learning

Analyze recent conversation context and capture learnings to project knowledge files (for project-specific insights) or skills/commands/subagents (for cross-project patterns). Use when the user asks to "capture this learning", "update the docs with this", "remember this for next time", "document this issue", "add this to CLAUDE.md", "save this knowledge", or "update project knowledge". Also triggers after resolving build/setup issues, discovering non-obvious patterns, or completing debugging sessions with valuable insights.

3 1
Explore
petekp/agent-skills

optimize-agent-docs

Build a retrieval-optimized knowledge layer over agent documentation in dotfiles (.claude, .codex, .cursor, .aider). Use when asked to "optimize docs", "improve agent knowledge", "make docs more efficient", or when documentation has accumulated and retrieval feels inefficient. Generates a manifest mapping task-contexts to knowledge chunks, optimizes information density, and creates compiled artifacts for efficient agent consumption.

3 1
Explore
petekp/agent-skills

agent-changelog

Compile an agent-optimized changelog by cross-referencing git history with plans and documentation. Use when asked to "update changelog", "compile history", "document project evolution", or proactively after major milestones, architectural changes, or when stale/deprecated information is detected that could confuse coding agents.

3 1
Explore
petekp/agent-skills

literate-guide

Create a narrative guide to a codebase or feature in the style of Knuth's Literate Programming — code and prose interwoven as a single essay, ordered for human understanding rather than compiler needs. Use when the user asks to 'explain this codebase as a story', 'write a literate guide', 'create a narrative walkthrough', 'tell the story of this code', 'Knuth-style documentation', 'weave a guide for this feature', or when they want deep, readable documentation that treats the program as literature. Also trigger when someone wants a document that a thoughtful reader could follow from start to finish and come away understanding both WHAT the code does and WHY every design choice was made.

3 1
Explore
petekp/agent-skills

process-hunter

CAVEMAN HUNT BAD PROCESS! Me find greedy creature eating fire and rocks. Me bonk them good. Use when tribe say "kill processes", "clean up servers", "save battery", "find resource hogs", "bonk next.js", or "hunt processes". Me bonk known bad creature automatic. Me ask before bonk mystery creature.

3 1
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results