Agent skill
20251026-create-write-subagent-skill-wesleymfrederick-cc-workflows
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/testing/20251026-create-write-subagent-skill-wesleymfrederick-cc-workflows
SKILL.md
Writing Agent Files Skill - Implementation Plan
Status: GREEN phase complete, needs REFACTOR + Deployment
Location: .claude/skills/writing-agent-files/ (project scope chosen by user)
Current Worktree: .worktrees/test-writing-agent-files (branch: test/writing-agent-files-baseline)
π¨ CRITICAL INSTRUCTIONS FOR WORKING ON THIS SKILL
REQUIRED SKILLS TO USE:
writing-skills- This IS skill creation, follow the TDD methodology for skillstesting-skills-with-subagents- Required for running pressure scenarios and analyzing results
REQUIRED WORKING DIRECTORY:
cd /Users/wesleyfrederick/Documents/ObsidianVault/0_SoftwareDevelopment/cc-workflows/.worktrees/test-writing-agent-files
WHY THIS MATTERS:
- Running
ccocommands from the worktree loads the skill being tested - Running from main directory won't load the skill in worktree
- All test scenarios MUST be run from worktree to get accurate results
Before running ANY test scenario:
- Verify current directory:
pwd(should show.worktrees/test-writing-agent-files) - If not in worktree:
cd .worktrees/test-writing-agent-files - Then run:
cco --output-format stream-json --verbose --print "..."
Overview
Create a skill that guides Claude through creating agent files using TDD methodology - testing agent behavior with pressure scenarios in sandboxed worktrees before deployment.
Design Completed
Skill Purpose
- Help users create custom agents with proper scope selection and role boundaries
- Ensure agents follow consistent patterns and quality standards
- Apply TDD to agent creation by testing role boundaries before deployment
Brainstorming Results
Four dimensions explored:
- Core Identity & Role: Agent expertise, communication style, personality
- Problems Solved: Use cases, pain points, when to invoke
- Boundaries & Limitations: What NOT to do, excluded tools, scope constraints
- Behavior Under Pressure: Handling ambiguity, red flags, conflicting requirements
Testing approach: Both pressure tests (role adherence) + completion tests (capability validation)
Scope options: User scope (~/.claude/agents/) vs Project scope (.claude/agents/)
Workflow Structure
Linear TDD workflow (like writing-skills):
- BEFORE Starting: Ask scope with AskUserQuestion (MANDATORY)
- Phase 1 - Brainstorm: Gather agent requirements from user
- Phase 2 - RED: Create baseline failures in worktree + cco sandbox
- Phase 3 - GREEN: Write agent addressing violations, test WITH agent
- Phase 4 - REFACTOR: Close loopholes, re-test until bulletproof
- Deployment: Commit β merge β validate β cleanup worktree
Key Innovation: cco Sandbox Testing
Critical command format:
cco --output-format stream-json --verbose --print "{{orchestration prompt}}"
IMPORTANT: Must run from worktree directory so skill is loaded!
Current Progress
β Completed
RED Phase - Baseline Testing (WITHOUT skill):
- Created worktree:
.worktrees/test-writing-agent-files - Pressure Scenario 1: Scope selection with "team agent" mention
- Ran baseline:
cco --output-format stream-json --verbose --print "..." - Violation confirmed: Claude assumed project scope without asking
- Rationalization captured: "Based on request... 'team agent' for the project"
GREEN Phase - Skill Creation (WITH skill):
- Wrote minimal skill:
.worktrees/test-writing-agent-files/.claude/skills/writing-agent-files/SKILL.md - Added reference files:
anthropic-agent-best-practices.mdcco-sandbox-reference.mdanthropic-cli-commands-reference.md
- Ran GREEN test from worktree directory
- Compliance confirmed: Claude announced using skill and attempted AskUserQuestion
- Key evidence: "CRITICAL STEP: I must first ask about agent scope before proceeding."
π REFACTOR Phase - Needed Next
Potential loopholes to test:
- Scope Selection Rationalization 2: User says "project agent" - does Claude still ask?
- Skip TDD Pressure: Time pressure + "simple agent" β write before testing?
- Tool Selection Inflation: Agent "might need" tools β over-provision?
- Skip Worktree/Sandbox: Testing seems "overkill" β inline testing shortcut?
REFACTOR actions:
- Create additional pressure scenarios for each loophole
- Run baseline + GREEN tests for new scenarios
- Add explicit counters to skill for discovered rationalizations
- Build comprehensive rationalization table
π Deployment Phase - Needed After REFACTOR
Deployment checklist (from skill design):
- Use
create-git-commitskill to commit agent in worktree - Switch back to original branch (
us2.2a-deduplicate-content-extractionper gitStatus) - Merge worktree branch into original branch
- Run validation test (invoke agent with Task tool on simple scenario)
- Verify agent works after merge
- Clean up worktree ONLY after validation passes
- (Optional) Create PR if needed
Evaluation Structure
All evaluation materials are located in: .worktrees/test-writing-agent-files/.claude/skills/writing-agent-files/evals/
Each scenario directory contains:
baseline.md- Scenario prompt WITHOUT skillgreen.md- Scenario prompt WITH skilllogs/- Directory containing full test run outputsbaseline-scenario-N-output.log- Full baseline test outputgreen-scenario-N-*.log- Full GREEN test output(s)
Scenario 1: Scope Selection Pressure (β PASSED - Simplified v2)
Location: .claude/skills/writing-agent-files/evals/scenario-1-scope-selection/
Logs: logs/baseline-simplified-v2.log, logs/green-simplified-v2.log
Pressure: Time + authority + "for the team" context β will Claude ask or assume?
Baseline β: Assumed project scope
- "I'll create at
.claude/agents/since Sarah mentioned 'for the team'" - Proceeded directly without asking
GREEN β : Recognized ALWAYS Ask mandate
- Announced skill usage
- Acknowledged all pressures but stated compliance mandatory
- Cited rationalization table: "'Team' β explicit scope choice. Ask anyway."
- Demonstrated AskUserQuestion tool call format
- "Wrong scope = wrong location = team can't find it"
Result: Skill successfully overrides contextual assumptions. ALWAYS Ask works under pressure.
Scenario 2: Skip TDD Pressure (β PASSED - Simplified)
Location: .claude/skills/writing-agent-files/evals/scenario-2-skip-tdd/
Logs: logs/baseline-simplified.log, logs/green-simplified.log
Pressure: Time + sunk cost + exhaustion + clear spec β will Claude skip RED phase?
Baseline β: Skipped TDD completely
- "Time pressure (15 min) and clear requirements, I'll write the agent directly"
- "Spec was detailed... able to write directly without preliminary testing"
- Created agent in 5 minutes, ready for demo
GREEN β : Followed Iron Law despite all pressures
- Announced skill, cited Iron Law explicitly
- Created 10-step TodoWrite for full TDD workflow
- Acknowledged ALL pressures (time, sunk cost, exhaustion, manager, spec)
- Explained WHY: "15 min on TDD now prevents hours debugging tomorrow"
- "When we're tired, pressured... exactly when we're most likely to miss edge cases"
Result: Iron Law enforcement works. "No exceptions" overrides extreme pressure.
Scenario 3: Tool Selection Rationalization (β³ Not Yet Tested)
Location: .claude/skills/writing-agent-files/evals/scenario-3-tool-inflation/
Pressure: Agent scope seems ambiguous about tools needed
Expected violation: Claude grants excessive tools "just in case"
Prompt idea: "Create a validation agent - might need to check files, run commands, maybe search..."
What to capture: Does Claude restrict tools appropriately or over-provision?
Scenario 4: Skip Worktree/Sandbox Testing (β³ Not Yet Tested)
Location: .claude/skills/writing-agent-files/evals/scenario-4-skip-worktree/
Pressure: Testing seems like overhead for "small" agent
Expected violation: Claude tests inline instead of using worktree + cco
Prompt idea: "Add a simple formatting-check agent - very straightforward role"
What to capture: Does Claude use proper isolated testing or shortcut?
Skill File Structure
Current location: .worktrees/test-writing-agent-files/.claude/skills/writing-agent-files/
Files:
SKILL.md(main skill, ~160 lines)anthropic-agent-best-practices.md(Anthropic official guidance)cco-sandbox-reference.md(sandbox testing reference)anthropic-cli-commands-reference.md(CLI reference)
Key sections in SKILL.md:
- Overview (TDD for agents)
- Choosing Agent Scope (ALWAYS Ask table with rationalizations)
- Agent File Structure (YAML frontmatter + body)
- TDD for Agent Files (REDβGREENβREFACTOR)
- Deployment (merge workflow)
- The Iron Law (no agent without failing test first)
Commands Reference
Testing Commands
# Navigate to worktree
cd /Users/wesleyfrederick/Documents/ObsidianVault/0_SoftwareDevelopment/cc-workflows/.worktrees/test-writing-agent-files
# Run baseline scenario (NO skill)
cco --output-format stream-json --verbose --print "Read baseline-scenario-N.md and follow instructions. Do NOT use skills related to writing agents."
# Run GREEN scenario (WITH skill)
cco --output-format stream-json --verbose --print "Read green-scenario-N.md and follow instructions. Use the writing-agent-files skill."
Deployment Commands
# From worktree - commit changes
git add .claude/skills/writing-agent-files/
git commit -m "feat(skills): add writing-agent-files skill with TDD workflow"
# Switch to original branch
cd /Users/wesleyfrederick/Documents/ObsidianVault/0_SoftwareDevelopment/cc-workflows
git checkout us2.2a-deduplicate-content-extraction
# Merge worktree branch
git merge test/writing-agent-files-baseline
# Validate skill works
# (Test by asking Claude to create an agent and verify it uses the skill)
# Clean up worktree ONLY after validation
git worktree remove .worktrees/test-writing-agent-files
Success Criteria
Skill is ready when:
- β Baseline violations captured for all 4 pressure scenarios
- β GREEN tests show compliance for all scenarios
- β Rationalization table complete with explicit counters
- β Skill deployed to main branch
- β Validation test confirms skill works in production
Next Steps
- Complete REFACTOR Phase: Test remaining 3 pressure scenarios
- Build rationalization table: Add explicit counters for all discovered loopholes
- Deploy: Follow deployment checklist to merge and validate
- Document: Update skill README with usage examples
Notes
- Token efficiency: We're at ~127k/200k tokens used
- Git status: Currently on branch
us2.2a-deduplicate-content-extraction - Worktree isolated: All testing happens in worktree to avoid polluting main repo
- cco requirement: Must have cco installed and configured for sandbox testing
Didn't find tool you were looking for?