Agent skill
codex-peer-review
Install this agent skill to your Project
npx add-skill https://github.com/leegonzales/AISkills/tree/main/CodexPeerReview/codex-peer-review
SKILL.md
Codex Peer Review Skill
🖥️ Claude Code Only - Requires terminal access to execute Codex CLI commands.
Enable Claude Code to leverage OpenAI's Codex CLI for collaborative AI reasoning, peer review, and multi-perspective analysis of code architecture, design decisions, and implementations.
Core Philosophy
Two AI perspectives are better than one for high-stakes decisions.
This skill enables strategic collaboration between Claude Code (Anthropic) and Codex CLI (OpenAI) for:
- Architecture validation and critique
- Design decision cross-validation
- Alternative approach generation
- Security, performance, and testing analysis
- Learning from different AI reasoning patterns
Not a replacement—a second opinion.
When to Use Codex Peer Review
High-Value Scenarios
DO use when:
- Making high-stakes architecture decisions
- Choosing between significant design alternatives
- Reviewing security-critical code
- Validating complex refactoring plans
- Exploring unfamiliar domains or patterns
- User explicitly requests second opinion
- Significant disagreement about approach
- Performance-critical optimization decisions
- Testing strategy validation
DON'T use when:
- Simple, straightforward implementations
- Already confident in singular approach
- Time-sensitive quick fixes
- No significant trade-offs exist
- Low-impact tactical changes
- Codex CLI is not available/installed
How to Invoke This Skill
Important: This skill requires explicit invocation. It is not automatically triggered by natural language.
To use this skill, Claude must explicitly invoke it using:
skill: "codex-peer-review"
User phrases that indicate this skill would be valuable:
- "Get a second opinion on..."
- "What would Codex think about..."
- "Review this architecture with Codex"
- "Use Codex to validate this approach"
- "Are there better alternatives to..."
- "Get Codex peer review for this"
- "Security review with Codex needed"
- "Ask Codex about this design"
When these phrases appear, Claude should suggest using this skill and invoke it explicitly if appropriate.
Codex vs Gemini: Which Peer Review Skill?
Both Codex and Gemini peer review skills provide valuable second opinions, but excel in different scenarios.
Use Codex Peer Review when:
- Code size < 500 LOC (focused reviews)
- Need precise, line-level bug detection
- Want fast analysis with concise output
- Reviewing single modules or functions
- Need tactical implementation feedback
- Performance bottleneck identification (specific issues)
- Quick validation of design decisions
Use Gemini Peer Review when:
- Code size > 5k LOC (large codebase analysis)
- Need full codebase context (up to 1M tokens)
- Reviewing architecture across multiple modules
- Analyzing diagrams + code together (multimodal)
- Want research-grounded recommendations (current best practices)
- Cross-module security analysis (attack surface mapping)
- Systemic performance patterns
- Design consistency checking
For mid-range codebases (500-5k LOC):
- Use Codex if: Focused review, single module, speed priority, specific bugs
- Use Gemini if: Cross-module patterns, holistic view, diagram analysis, research grounding
- Consider Both for: Critical decisions requiring maximum confidence
For maximum value on high-stakes decisions: Use both skills sequentially and apply synthesis framework (see references/synthesis-framework.md).
Core Workflow
1. Recognize Need for Peer Review
Assess if peer review adds value:
Questions to consider:
- Is this a high-stakes decision with significant impact?
- Are there multiple valid approaches to consider?
- Is the architecture complex or unfamiliar?
- Does this involve security, performance, or scalability concerns?
- Has the user explicitly requested a second opinion?
- Would different AI reasoning perspectives help?
If yes to 2+ questions: Proceed with peer review workflow
2. Prepare Context for Codex
Extract and structure relevant information:
Load references/context-preparation.md for detailed guidance on:
- What code/files to include
- How to frame questions effectively
- Context boundaries (what to include/exclude)
- Expectation setting for output format
Key preparation steps:
- Identify core question: What specifically do we want Codex to review?
- Extract relevant code: Include necessary files, not entire codebase
- Provide context: Project type, constraints, requirements, concerns
- Frame clearly: Specific questions, not vague requests
- Set expectations: What kind of response we need
Context structure template:
[CONTEXT]
Project: [type, purpose]
Current situation: [what exists]
Constraints: [technical, business, time]
[CODE/ARCHITECTURE]
[relevant code or architecture description]
[QUESTION]
[specific question or review request]
[EXPECTED OUTPUT]
[format: analysis, alternatives, recommendations, etc.]
3. Invoke Codex CLI
Execute appropriate Codex command:
Load references/codex-commands.md for complete command reference.
Common patterns:
Non-interactive review (recommended):
cat <<'EOF' | codex exec
[prepared context and question here]
EOF
Simple one-line review:
codex exec "Review this code for security issues"
Architecture review with diagram:
codex --image architecture-diagram.png "Analyze this architecture"
Key flags:
exec: Non-interactive execution streaming to stdout--image/-i: Attach architecture diagrams or screenshots--full-auto: Unattended mode (use with caution)
Error handling:
- If Codex CLI not installed, inform user and provide installation instructions
- If API limits reached, note limitation and proceed with Claude-only analysis
- If Codex returns unclear response, reformulate question and retry once
4. Synthesize Perspectives
Compare and integrate both AI perspectives:
Load references/synthesis-framework.md for detailed synthesis patterns.
Analysis framework:
-
Agreement Analysis
- Where do both perspectives align?
- What shared concerns exist?
- What validates confidence in approach?
-
Disagreement Analysis
- Where do perspectives diverge?
- Why might approaches differ?
- What assumptions differ?
-
Complementary Insights
- What does Codex see that Claude missed?
- What does Claude see that Codex missed?
- How do perspectives complement each other?
-
Trade-off Identification
- What trade-offs does each perspective reveal?
- Which concerns are prioritized differently?
- What constraints drive different conclusions?
-
Insight Extraction
- What are the key actionable insights?
- What alternatives emerge from both perspectives?
- What risks are highlighted by either perspective?
Synthesis output structure:
## Perspective Comparison
**Claude's Analysis:**
[key points from Claude's initial analysis]
**Codex's Analysis:**
[key points from Codex's review]
**Points of Agreement:**
- [shared insights]
**Points of Divergence:**
- [different perspectives and why]
**Complementary Insights:**
- [unique value from each perspective]
## Synthesis & Recommendations
[integrated analysis incorporating both perspectives]
**Recommended Approach:**
[action plan based on both perspectives]
**Rationale:**
[why this approach balances both perspectives]
**Remaining Considerations:**
[open questions or concerns to address]
5. Present Balanced Analysis
Deliver integrated insights to user:
Presentation principles:
- Be transparent about which AI said what
- Acknowledge disagreements honestly
- Don't force false consensus
- Explain reasoning behind each perspective
- Give user enough context to make informed decision
- Present alternatives clearly
- Indicate confidence levels appropriately
When perspectives align: "Both Claude and Codex agree that [approach] is preferable because [reasons]. This alignment increases confidence in the recommendation."
When perspectives diverge: "Claude favors [approach A] prioritizing [factors], while Codex suggests [approach B] emphasizing [factors]. This divergence reveals an important trade-off: [explanation]. Consider [factors] to decide which approach better fits your context."
When one finds issues the other missed: "Codex identified [concern] that wasn't initially apparent. This adds [insight] to our analysis..."
Use Case Patterns
Load references/use-case-patterns.md for detailed examples of each scenario.
1. Architecture Review
Scenario: Reviewing system design before major implementation
Process:
- Document current architecture or proposed design
- Prepare context: system requirements, constraints, scale expectations
- Ask Codex: "Review this architecture for scalability, maintainability, and potential issues"
- Synthesize: Compare architectural concerns and recommendations
- Present: Integrated architecture assessment with both perspectives
Example question: "Review this microservices architecture. Are there concerns with service boundaries, data consistency, or deployment complexity?"
2. Design Decision Validation
Scenario: Choosing between multiple implementation approaches
Process:
- Document the decision point and alternatives
- Prepare context: requirements, constraints, trade-offs known
- Ask Codex: "Compare approaches A, B, and C for [criteria]"
- Synthesize: Create trade-off matrix from both perspectives
- Present: Clear comparison showing strengths/weaknesses
Example question: "Should we use event sourcing or traditional CRUD for this domain? Consider complexity, auditability, and team expertise."
3. Security Review
Scenario: Validating security-critical code before deployment
Process:
- Extract security-relevant code sections
- Prepare context: threat model, security requirements, compliance needs
- Ask Codex: "Security review: identify vulnerabilities, attack vectors, and hardening opportunities"
- Synthesize: Combine security concerns from both analyses
- Present: Comprehensive security assessment with prioritized issues
Example question: "Review this authentication implementation. Are there vulnerabilities in session management, token handling, or access control?"
4. Performance Analysis
Scenario: Optimizing performance-critical code
Process:
- Extract performance-critical sections
- Prepare context: performance requirements, current bottlenecks, constraints
- Ask Codex: "Analyze for performance bottlenecks and optimization opportunities"
- Synthesize: Combine optimization suggestions from both perspectives
- Present: Prioritized optimization recommendations with trade-offs
Example question: "This query endpoint is slow under load. Identify bottlenecks in the database access pattern, caching strategy, and N+1 issues."
5. Testing Strategy
Scenario: Improving test coverage and quality
Process:
- Document current testing approach and coverage
- Prepare context: critical paths, known gaps, testing constraints
- Ask Codex: "Review testing strategy and suggest improvements"
- Synthesize: Combine testing recommendations from both perspectives
- Present: Comprehensive testing improvement plan
Example question: "Review our testing approach. Are there coverage gaps, missing edge cases, or better testing strategies for this complex state machine?"
6. Code Review & Learning
Scenario: Understanding unfamiliar code or patterns
Process:
- Extract relevant code sections
- Prepare context: what's unclear, specific questions, learning goals
- Ask Codex: "Explain this code: patterns used, design decisions, potential concerns"
- Synthesize: Combine explanations and identify patterns both AIs recognize
- Present: Clear explanation with multiple perspectives on design
Example question: "Explain this recursive backtracking algorithm. What patterns are used, and are there clearer alternatives?"
7. Alternative Approach Generation
Scenario: Stuck on a problem or exploring better approaches
Process:
- Document current approach and why it's unsatisfactory
- Prepare context: problem constraints, what's been tried, goals
- Ask Codex: "Generate alternative approaches to [problem]"
- Synthesize: Combine creative alternatives from both perspectives
- Present: Multiple vetted alternatives with trade-off analysis
Example question: "We're stuck on real-time conflict resolution for collaborative editing. What alternative CRDT or operational transform approaches could work better?"
Command Reference
Load references/codex-commands.md for complete command documentation.
Quick reference:
| Use Case | Command Pattern |
|---|---|
| Simple review | codex exec "Review this code" |
| Multi-line prompt | cat <<'EOF' | codex exec ... EOF |
| Review with diagram | codex --image diagram.png "Analyze this" |
| Interactive mode | codex "What do you think about..." |
| Resume session | codex resume --last |
Non-interactive review (recommended for automation):
cat <<'EOF' | codex exec
[Your structured prompt here]
EOF
Integration Points
With Other Skills
With concept-forge skill:
- Forge architectural concepts → Validate with Codex peer review
- Use
@builderand@strategistarchetypes to prepare questions
With prose-polish skill:
- Ensure technical documentation is clear and professional
- Polish architecture decision records (ADRs)
With claimify skill:
- Map architectural arguments and assumptions
- Analyze decision rationale structure
With Claude Code Workflows
Pre-implementation:
- Use peer review before starting major features
- Validate architecture before building
Post-implementation:
- Use peer review to validate completed work
- Cross-check refactoring results
During implementation:
- Use peer review when stuck or uncertain
- Validate critical decisions in real-time
Quality Signals
Peer Review is Valuable When:
- Both perspectives identify same concerns (high confidence)
- Perspectives reveal complementary insights
- Trade-offs become clearer through different lenses
- Alternative approaches emerge that weren't initially visible
- Security or performance concerns are validated independently
- User gains clarity on decision through multi-perspective analysis
Peer Review Needs Refinement When:
- Responses are too vague or generic
- Question wasn't specific enough
- Context was insufficient
- Both perspectives say obvious things
- No new insights emerge
- Codex response misunderstands the question
Action: Reformulate question with better context and specificity
Skip Peer Review When:
- Codex CLI unavailable and blocking progress
- Decision is time-sensitive and low-risk
- Approach is straightforward with no trade-offs
- User doesn't value second opinion for this decision
- Context is too large to prepare efficiently
Best Practices
Effective Peer Review
DO:
- Frame specific, answerable questions
- Provide sufficient context for informed analysis
- Use for high-stakes decisions where second opinion adds value
- Be transparent about which AI provided which insight
- Acknowledge disagreements and explain them
- Synthesize perspectives rather than just concatenating them
- Give user enough context to make informed decision
DON'T:
- Use for every trivial decision
- Ask vague questions without context
- Force false consensus when perspectives diverge
- Hide which AI said what
- Ignore one perspective in favor of the other
- Present peer review as authoritative truth
- Over-rely on peer review for basic decisions
Context Preparation
Effective context:
- Focused on specific decision or area of code
- Includes relevant constraints and requirements
- Provides enough background without overwhelming
- Frames clear questions
- Sets expectations for output
Ineffective context:
- Dumps entire codebase
- No clear question or focus
- Missing critical constraints
- Vague or overly broad
- No guidance on what kind of response is useful
Question Framing
Good questions:
- "Review this microservices architecture. Are service boundaries well-defined? Any concerns with data consistency or deployment complexity?"
- "Compare these three caching strategies for our use case. Consider memory overhead, invalidation complexity, and cold-start performance."
- "Security review this authentication flow. Focus on session management, token expiration, and refresh token handling."
Poor questions:
- "Is this code good?" (too vague)
- "Review everything" (too broad)
- "What do you think?" (no specific focus)
Installation Requirements
Codex CLI must be installed to use this skill.
Installation
# Via npm
npm i -g @openai/codex
# Via Homebrew
brew install openai/codex/codex
Authentication
# Sign in with ChatGPT Plus/Pro/Business/Edu/Enterprise account
codex auth login
# Or provide API key
codex auth api-key [your-api-key]
Verification
# Verify installation
codex --version
# Check authentication
codex login status
If Codex CLI is not available:
- Inform user that peer review requires Codex CLI
- Provide installation instructions
- Continue with Claude-only analysis if user can't install
- Note that second opinion isn't available
Configuration
Optional configuration in ~/.codex/config.toml:
# Approval mode (suggest|auto|on-failure)
ask_for_approval = "suggest"
# Sandbox mode (read-only|workspace-write|danger-full-access)
sandbox = "read-only"
For peer review, recommended settings:
sandbox = "read-only"for read-only safetyask_for_approval = "suggest"for transparency
Note: Don't hardcode model names in config. Let Codex CLI use its default (latest) model.
Limitations & Considerations
Technical Limitations
- Requires Codex CLI installation and authentication
- Subject to OpenAI API rate limits
- May have different context windows than Claude
- Responses may vary in quality based on prompt
- No real-time communication between AIs (sequential only)
Philosophical Considerations
- Different training data and approaches may lead to different perspectives
- Neither AI is objectively "correct"—both offer perspectives
- User judgment is ultimate arbiter
- Peer review adds time to workflow
- Over-reliance on peer review can slow decision-making
When to Trust Which Perspective
Trust convergence:
- When both AIs agree, confidence increases
Trust divergence:
- Reveals important trade-offs and assumptions
- Neither is necessarily "right"—different priorities
Trust specialized knowledge:
- Codex may have different strengths in certain domains
- Claude may have different strengths in others
- Consider which AI's reasoning aligns better with your context
Example Workflows
Example: Architecture Decision
User: "I'm designing a multi-tenant SaaS architecture. Should I use separate databases per tenant or a shared database with row-level security?"
Claude initial analysis: [Provides analysis of trade-offs]
Invoke peer review:
cat <<'EOF' | codex exec
Review multi-tenant SaaS architecture decision:
CONTEXT:
- B2B SaaS with 100-500 tenants expected
- Varying data volumes per tenant (small to large)
- Strong data isolation requirements
- Team familiar with PostgreSQL
- Cloud deployment (AWS)
OPTIONS:
A) Separate database per tenant
B) Shared database with row-level security (RLS)
QUESTION:
Analyze trade-offs for scalability, operational complexity, data isolation, and cost. Which approach is recommended for this context?
EOF
Synthesis: Compare Claude's and Codex's trade-off analysis, extract key insights, present balanced recommendation.
Anti-Patterns
Don't:
- Use peer review for every trivial decision (wastes time)
- Blindly follow one AI's recommendation over the other
- Ask vague questions without context
- Expect perfect agreement between AIs
- Force implementation when both AIs raise concerns
- Use peer review as decision-avoidance mechanism
- Over-engineer simple problems by seeking too many opinions
Do:
- Use strategically for high-stakes decisions
- Synthesize both perspectives thoughtfully
- Frame clear, specific questions with context
- Embrace disagreement as revealing trade-offs
- Use peer review to inform, not replace, judgment
- Make timely decisions based on integrated analysis
- Balance peer review with velocity
Success Metrics
Peer review succeeds when:
- User gains clarity on decision through multi-perspective analysis
- Important trade-offs are revealed that weren't initially apparent
- Alternative approaches emerge that are genuinely valuable
- Risks are identified by at least one AI perspective
- User makes more informed decision than without peer review
- Confidence increases (when perspectives align)
- Trade-offs become explicit (when perspectives diverge)
Peer review fails when:
- No new insights emerge (obvious analysis)
- Takes too long relative to decision impact
- Perspectives are confusing rather than clarifying
- User is more confused after peer review than before
- Blocks forward progress unnecessarily
- Becomes crutch for simple decisions
Skill Improvement
This skill improves through:
- Better question framing patterns
- More effective context preparation
- Refined synthesis techniques
- Pattern recognition for when peer review adds value
- Learning which types of questions work best with Codex
- Understanding Codex's strengths and limitations
- Calibrating when peer review is worth the time investment
Feedback loop:
- Track which peer reviews provided valuable insights
- Note which question patterns work well
- Identify scenarios where peer review was or wasn't valuable
- Refine use case patterns based on experience
Related Resources
- Codex CLI Documentation: https://developers.openai.com/codex/cli/
- Architecture Decision Records (ADR) patterns
- Design pattern catalogs
- Security review checklists
- Performance optimization frameworks
- Testing strategy guides
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
context-continuity
High-fidelity context transfer protocol for moving conversations between AI agents. Preserves decision tempo, open loops, and critical context with graceful degradation. Use when the user says "transfer," "handoff," "continue this in another chat," or needs to work around context window limits. Produces structured artifacts (Minimal ~200 words, Full ~1000 words). DO NOT trigger on simple "summarize our conversation" requests—only when transfer intent is explicit.
silicon-doppelganger
Build psychometrically accurate personal proxy agents for the PAIRL Conductor system. Extracts personality, decision heuristics, and values into portable schemas that enable AI agents to negotiate, filter, and act on a principal's behalf.
requesting-code-review
Use when completing tasks, implementing major features, or before merging to verify work meets requirements. Dispatches three independent reviewers in parallel.
fabric-patterns
Run danielmiessler/fabric CLI patterns for content analysis, extraction, summarization, writing, security analysis, and more. Use when user asks to "use fabric," "run a pattern," "extract wisdom," "summarize with fabric," or when piping content through AI patterns would be more effective than inline processing. Triggers include "fabric," "pattern," "extract wisdom," "summarize this article," "analyze this threat report," or any reference to a specific fabric pattern name.
moltbook-enclave
Secure, air-gapped interface for Moltbook (social network for AI agents). Isolates untrusted external content from your main agent's memory and context.
sand-table
Design, scaffold, extract, and validate Sand Table simulations and event streams across domains. Meta skill that knows the protocol and all existing implementations.
Didn't find tool you were looking for?