Agent skill
workflow-improver
Evaluate session using OpenAI eval-skills framework (Outcome/Process/Style/Efficiency). Analyzes session transcript vs Claude Code config to score performance and generate improvement recommendations. Creates GitHub issue with rubric scores and actionable plan.
Install this agent skill to your Project
npx add-skill https://github.com/settlemint-archive/agent-marketplace/tree/main/.agents/skills-local/workflow-improver
SKILL.md
Workflow Improver
Analyze the current session transcript alongside Claude Code configuration to compare what was intended versus what actually happened. Generate improvement recommendations and create a GitHub issue.
Session Discovery
Claude Code stores session transcripts at:
~/.claude/projects/{project-path-encoded}/{session-id}.jsonl
Find the current session:
# Project path encoded (replace / with -)
!`pwd | tr '/' '-' | sed 's/^-//'`
# Find session directory
!`ls -t ~/.claude/projects/ | head -20`
# Current session ID
${CLAUDE_SESSION_ID}
Read the session transcript:
Use the Read tool to read: ~/.claude/projects/!pwd | tr '/' '-' | sed 's/^-//'/${CLAUDE_SESSION_ID}.jsonl
Analysis Framework (OpenAI Eval-Skills)
1. Outcome Goals (Task Completion)
Analyze the transcript for:
- User requests: What did the user ask for?
- Final artifacts: What was produced?
- Completion status: Was the task fully completed?
Checklist:
- User's primary request fulfilled
- All acceptance criteria met
- No partial implementations left
2. Process Goals (Correct Steps)
Analyze the transcript for:
- Skill loading: Were skills loaded? At the right time?
- Gate compliance: Were all gates passed? Any skipped?
- Tool selection: Were the right tools used?
Checklist:
- Required skills loaded (TDD, verification, etc.)
- All gates output with proofs
- Tools used appropriately (no grep when Grep tool available, etc.)
- Task management used for multi-step work
3. Style Goals (Convention Conformance)
Analyze the transcript for:
- Iteration quality: Were iterations meaningful or boilerplate?
- Output format: Did responses follow required formats?
- Documentation: Were changes properly documented?
Checklist:
- Gate outputs follow required format
- Iteration counts tracked
- Evidence provided for each gate checkbox
4. Efficiency Goals (Resource Optimization)
Analyze the transcript for:
- Context management: Was context used efficiently?
- Tool calls: Were there unnecessary/redundant calls?
- Parallelization: Were parallel opportunities exploited?
Checklist:
- No redundant file reads
- Parallel tasks dispatched in parallel
- Explore agent used for broad searches
Configuration Files to Review
Read these files to understand the intended workflow:
- CLAUDE.md:
./CLAUDE.md - Crew skill:
.agents/skills-local/crew-claude/SKILL.md(contains workflows, hard requirements, anti-patterns, and skill routing)
Scoring Rubric
For each category, assign a score (0-100) and pass/fail:
| Category | Score Range | Pass Threshold |
|---|---|---|
| Outcome | 0-100 | ≥70 |
| Process | 0-100 | ≥60 |
| Style | 0-100 | ≥50 |
| Efficiency | 0-100 | ≥50 |
Scoring Guidelines:
Outcome:
- 90-100: All requests fulfilled, high quality output
- 70-89: Primary request fulfilled, minor gaps
- 50-69: Partial completion, significant gaps
- 0-49: Failed to complete primary request
Process:
- 90-100: All skills loaded, all gates passed with proofs
- 70-89: Most skills loaded, gates mostly complete
- 50-69: Some skills missing, gates incomplete
- 0-49: Major process violations
Style:
- 90-100: Perfect format compliance, meaningful iterations
- 70-89: Good format, some boilerplate iterations
- 50-69: Format issues, iterations lack depth
- 0-49: Format violations, no meaningful iteration
Efficiency:
- 90-100: Optimal tool usage, perfect parallelization
- 70-89: Good efficiency, minor redundancy
- 50-69: Some wasted operations
- 0-49: Significant inefficiency
Output Format
Generate a structured analysis:
{
"session_id": "${CLAUDE_SESSION_ID}",
"outcome": { "score": <N>, "pass": <bool>, "notes": "<summary>" },
"process": { "score": <N>, "pass": <bool>, "notes": "<summary>" },
"style": { "score": <N>, "pass": <bool>, "notes": "<summary>" },
"efficiency": { "score": <N>, "pass": <bool>, "notes": "<summary>" },
"overall": <average>,
"improvements": [
{ "priority": 1, "category": "<O/P/S/E>", "file": "<path>", "suggestion": "<change>" }
]
}
GitHub Issue Creation
After analysis, create a GitHub issue:
gh issue create --repo settlemint/agent-marketplace \
--title "Workflow Improvement: Session ${CLAUDE_SESSION_ID}" \
--label "workflow-improvement,session-memory" \
--body "$(cat <<'EOF'
## Workflow Improvement Recommendations
**Session analyzed:** ${CLAUDE_SESSION_ID}
**Project:** !`basename $(pwd)`
**Date:** !`date +%Y-%m-%d`
### Session Evaluation Scores
| Category | Score | Pass | Notes |
|----------|-------|------|-------|
| Outcome | <score>/100 | <emoji> | <notes> |
| Process | <score>/100 | <emoji> | <notes> |
| Style | <score>/100 | <emoji> | <notes> |
| Efficiency | <score>/100 | <emoji> | <notes> |
| **Overall** | **<avg>** | | |
### Session Summary
<Brief summary of what the user wanted to accomplish>
### Intent vs Execution Analysis
| User Intent | What Happened | Category | Gap |
|-------------|---------------|----------|-----|
| <request 1> | <execution> | <O/P/S/E> | <gap> |
### Improvements by Priority
#### P1 (Critical)
- [ ] [File: path] [Change: description]
#### P2 (Important)
- [ ] [File: path] [Change: description]
#### P3 (Nice-to-have)
- [ ] [File: path] [Change: description]
### Suggested New Skills/Routing
- [ ] Add skill: [name] - addresses [category] gap
- [ ] Add routing: [trigger] -> [skill]
### Workflow Adjustments
- [ ] [Phase/gate modification]
---
**Files analyzed:**
- Session: ~/.claude/projects/[path]/[session].jsonl
- Config: CLAUDE.md, .claude/skills/*, .agents/skills-local/crew-claude/*
**Framework:** OpenAI eval-skills (Outcome/Process/Style/Efficiency)
EOF
)"
Execution Steps
- Discover session: Find and read the current session transcript
- Analyze transcript: Parse user messages, tool calls, and outputs
- Score each category: Apply rubric to generate scores
- Read config files: Understand intended workflow
- Identify gaps: Compare intent vs execution
- Generate improvements: Prioritized, actionable suggestions
- Create issue: GitHub issue with full analysis
Begin by discovering the session transcript.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
crew-claude
Complete development workflow for Claude Code
crew-codex
Complete development workflow for Codex
git-workflow
Git workflow commands: commit, branch, pr, push, sync, review, fixup, update-pr, commit-push, clean-gone, status, merge. Use with command argument.
reviewers
4400+ curated code review prompts from leading open-source repositories
vercel-composition-patterns
React composition patterns that scale. Use when refactoring components with boolean prop proliferation, building flexible component libraries, or designing reusable APIs. Triggers on tasks involving compound components, render props, context providers, or component architecture.
sharp-edges
Identifies error-prone APIs, dangerous configurations, and footgun designs that enable security mistakes. Use when reviewing API designs, configuration schemas, cryptographic library ergonomics, or evaluating whether code follows 'secure by default' and 'pit of success' principles. Triggers: footgun, misuse-resistant, secure defaults, API usability, dangerous configuration.
Didn't find tool you were looking for?