Agent skill

qa_review_triple-model

Launch four independent AI code reviewers (Opus, Gemini, Codex, Kimi K2.5) to QA/QC code or notebooks. Each reviewer writes findings to separate markdown files, then orchestrator synthesizes. Use for critical code review, bug investigation, or quality assurance tasks. Triggers: triple review, quad review, four model review, independent code review, QAQC, quality assurance, multi-model analysis, cross-validation, bug investigation, critical review, kimi review, togetherai review

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/qa-review-triple-model

SKILL.md

Multi-Model Code Review (4 Models)

Overview

This skill launches four independent AI subagents (Opus, Gemini, Codex, and Kimi K2.5) to perform parallel code review. Each agent writes findings to markdown files in a workspace directory, then the orchestrator synthesizes a final report with consensus findings.

When to Use

  • Critical bug investigation requiring multiple perspectives
  • QA/QC of important notebooks or modules
  • Validation of complex logic
  • Cross-checking findings before major changes
  • When high confidence in analysis is required
  • Testing scenarios where edge case detection is critical
  • Security reviews requiring thorough analysis

Usage

/triple_model_code_review [target] [focus_area]

Examples:

  • /triple_model_code_review examples/720_precipitation_methods_comprehensive.ipynb "plotting logic"
  • /triple_model_code_review ras_commander/hdf/HdfResultsPlan.py "return type consistency"
  • /triple_model_code_review ras_commander/precip/ "API contract validation"
  • /triple_model_code_review src/auth/login.py "security vulnerabilities"

Workflow

  1. Create Workspace: workspace/{task}QAQC/{opus,gemini,codex,kimi,final}-analysis/

  2. Launch 4 Parallel Subagents:

    • Opus (general-purpose, model=opus): Deep reasoning, architecture analysis
    • Gemini (code-oracle-gemini): Large context, multi-file pattern analysis
    • Codex (code-oracle-codex): Code archaeology, API contract analysis
    • Kimi K2.5 (code-oracle-kimi): Edge case detection, test generation focus, QA verification
  3. Handle Model Failures (Graceful Degradation):

    • If a model fails or is unavailable, note it and continue
    • Synthesis works with 1-4 successful models
    • Report which models succeeded/failed to user
  4. Each Agent:

    • Reads target files independently
    • Writes qaqc-report.md to their subfolder
    • Returns file path only (no large text in response)
    • If agent fails, creates empty report with error note
  5. Orchestrator Synthesizes:

    • Reads all available reports (1-4)
    • Identifies consensus findings from successful models
    • Creates FINAL_QAQC_REPORT.md with agreement matrix
    • Highlights unique insights from each successful model
    • Notes which models were unavailable

Subagent Prompts

Opus Subagent

You are conducting an independent QA/QC analysis of [TARGET].

## Critical Issue
[DESCRIBE THE PROBLEM]

## Your Task
1. Read and analyze the target files
2. Identify root cause of the issue
3. Document specific line numbers and code evidence
4. Provide recommended fixes

## Output
Write comprehensive analysis to: workspace/[TASK]QAQC/opus-analysis/qaqc-report.md

Return ONLY the file path when complete.

Gemini Subagent

You are conducting an independent QA/QC analysis using large context capabilities.

## Critical Issue
[DESCRIBE THE PROBLEM]

## Your Task
1. Read ALL relevant files in the target area
2. Trace data flow from source to symptom
3. Document column/type confusion if applicable
4. Provide method-by-method analysis

## Output
Write analysis to: workspace/[TASK]QAQC/gemini-analysis/qaqc-report.md

Return ONLY the file path when complete.

Codex Subagent

You are conducting deep code analysis for QA/QC.

## Critical Issue
[DESCRIBE THE PROBLEM]

## Your Task
1. Deep analysis of target code
2. Code archaeology - how was bug introduced
3. API contract analysis - promises vs delivery
4. Test cases that would catch this bug

## Output
Write analysis to: workspace/[TASK]QAQC/codex-analysis/qaqc-report.md

Return ONLY the file path when complete.

Kimi K2.5 Subagent (NEW)

You are conducting QA/QC analysis with focus on edge cases, testing gaps, and quality verification.

## Critical Issue
[DESCRIBE THE PROBLEM]

## Your Task
1. Identify edge cases and boundary conditions not handled
2. Find gaps in error handling and validation
3. Analyze test coverage - what tests are missing?
4. Check for race conditions and concurrency issues
5. Verify API contracts and type safety
6. Suggest specific test cases that would catch bugs

## Unique Focus Areas
- Edge case detection (null, undefined, empty inputs, extreme values)
- Test coverage gaps
- Error handling completeness
- Security vulnerabilities
- Performance bottlenecks

## Output
Write analysis to: workspace/[TASK]QAQC/kimi-analysis/qaqc-report.md

Return ONLY the file path when complete.

Output Structure

workspace/{task}QAQC/
├── opus-analysis/
│   └── qaqc-report.md          # Deep reasoning analysis
├── gemini-analysis/
│   └── qaqc-report.md          # Large context analysis
├── codex-analysis/
│   └── qaqc-report.md          # Code archaeology analysis
├── kimi-analysis/              # NEW
│   └── qaqc-report.md          # Edge case & testing analysis
└── final-synthesis/
    └── FINAL_QAQC_REPORT.md    # Consensus findings

Report Template

Individual Reports

markdown
# QA/QC Analysis Report: [Target]

**Analyst**: [Model Name]
**Date**: YYYY-MM-DD
**Target**: [file/folder]
**Status**: [CRITICAL/HIGH/MEDIUM/LOW]

## 1. Summary of Findings
## 2. Root Cause Analysis
## 3. Code Evidence (with line numbers)
## 4. Impact Assessment
## 5. Recommended Fixes
## 6. Verification Steps
## 7. Test Cases Needed (Kimi specific)

Final Synthesis

markdown
# Final QA/QC Synthesis Report

## Executive Summary
## Consensus Bug List
## Agreement Matrix (which reviewers found what)
  - Opus: [findings]
  - Gemini: [findings]
  - Codex: [findings]
  - Kimi: [findings]  # Edge cases & testing gaps
## Unique Insights by Model
  - Opus: [architectural issues]
  - Gemini: [multi-file patterns]
  - Codex: [API contract violations]
  - Kimi: [edge cases, missing tests, security gaps]
## Required Fixes (with exact code changes)
## Test Coverage Recommendations
## Verification Criteria

Model Strengths Matrix

Model Best For Unique Strengths
Opus Architecture, logic flow Deep reasoning, system design
Gemini Large codebases 1M+ token context, multi-file analysis
Codex Implementation details Code archaeology, API contracts
Kimi K2.5 Edge cases, testing Boundary conditions, test gaps, security

Best Practices

  1. Be Specific: Give clear problem description to all four agents
  2. Parallel Launch: Launch all four agents in single message for speed
  3. File-Based Communication: Agents write files, return paths only
  4. Consensus Focus: Weight findings by agreement across reviewers
  5. Preserve Evidence: Keep all reports in workspace for audit trail
  6. Consider Kimi's Edge Cases: Kimi often finds issues others miss - don't ignore unique findings
  7. Test Generation: Use Kimi's test recommendations to improve coverage

Graceful Degradation & Error Handling

Not all users have access to all four models. The skill handles failures gracefully:

Common Failure Scenarios

  1. API Key Not Set: Model provider requires authentication
  2. Rate Limit Exceeded: Free tier limits reached
  3. Model Unavailable: Service temporarily down
  4. Subagent Timeout: Analysis took too long
  5. Permission Denied: Insufficient access rights

Failure Response Protocol

When a subagent fails:

1. Log the failure with specific error reason
2. Create placeholder report: workspace/{task}QAQC/{model}-analysis/qaqc-report.md
3. Content: "ANALYSIS FAILED: [specific reason]"
4. Continue with remaining successful models
5. Notify user which models succeeded/failed

Minimum Viable Review

The skill works with as few as 1 successful model:

  • 1 model: Single perspective (still valuable)
  • 2 models: Cross-validation possible
  • 3 models: Good consensus building
  • 4 models: Optimal coverage

User Notification Template

Multi-Model Code Review Complete

✅ Successful Models:
   - Opus: Analysis complete
   - Gemini: Analysis complete
   - Codex: Analysis complete

❌ Failed Models:
   - Kimi K2.5: API key not configured (TOGETHER_API_KEY missing)

Proceeding with synthesis of 3/4 models...

Fallback Strategy by Model Count

Available Models Strategy Confidence Level
4/4 (all) Full consensus analysis ⭐⭐⭐⭐⭐ Highest
3/4 Strong consensus with gap noted ⭐⭐⭐⭐ High
2/4 Cross-validation sufficient ⭐⭐⭐ Good
1/4 Single expert opinion ⭐⭐ Moderate
0/4 Abort - No models available ❌ Failed

Handling Partial Results

If Kimi K2.5 fails (edge case expert):

  • Note: "Edge case analysis incomplete - consider manual edge case review"
  • Still proceed with Opus/Gemini/Codex consensus

If Gemini fails (large context expert):

  • Note: "Multi-file pattern analysis may be incomplete"
  • Other models may miss cross-file issues

If Codex fails (implementation expert):

  • Note: "API contract analysis incomplete"
  • Focus on Opus architecture + Gemini patterns

If Opus fails (reasoning expert):

  • Note: "Deep reasoning analysis unavailable"
  • Weight Codex implementation findings higher

Recommended Model Priority

If you can only configure some models, prioritize:

  1. Opus - Best overall reasoning (if available)
  2. Gemini - Best for large/multi-file analysis
  3. Kimi K2.5 - Best for testing/edge cases
  4. Codex - Best for implementation details

Configuration Check

Before running, verify model availability:

bash
# Check which models are available
opencode models | grep -E "(opus|gemini|codex|kimi)"

# Check API keys
echo $TOGETHER_API_KEY
echo $OPENAI_API_KEY  
echo $GEMINI_API_KEY

Example Sessions

Example 1: All Models Available (Optimal)

User: /triple_model_code_review examples/720_precipitation_methods_comprehensive.ipynb "incremental vs cumulative confusion"

Claude: Creating workspace and launching 4 independent reviewers...

[Launches Opus, Gemini, Codex, Kimi K2.5 in parallel]

✅ All four reviewers completed successfully

All four reviewers identified the same bugs:
- Line 810: DataFrame passed instead of column (All 4 found)
- Line 1396: DataFrame passed instead of column (All 4 found)
- Lines 1777-1789: np.cumsum on DataFrame (All 4 found)

Kimi K2.5 additionally identified:
- Missing test for empty DataFrame input
- Edge case: single-row DataFrame handling
- No validation for NaN values in precipitation data

Reports saved to workspace/notebook720QAQC/

Example 2: Graceful Degradation (1 Model Fails)

User: /triple_model_code_review src/auth/login.py "security vulnerabilities"

Claude: Creating workspace and launching 4 independent reviewers...

[Launches Opus, Gemini, Codex, Kimi K2.5 in parallel]

Results:
✅ Opus: Analysis complete
✅ Gemini: Analysis complete  
✅ Codex: Analysis complete
❌ Kimi K2.5: Failed - API key not configured

Note: Kimi K2.5 (edge case expert) unavailable. Edge case coverage may be incomplete.

3/4 models found these security issues:
- Line 45: SQL injection vulnerability (All 3 found)
- Line 78: Missing input validation (All 3 found)
- Line 23: Weak password hashing (Opus + Codex found)

Missing: Edge case analysis for malformed inputs (usually Kimi's strength)

Reports saved to workspace/loginQAQC/

Example 3: Minimal Viable Review (2 Models)

User: /triple_model_code_review ras_commander/core.py "performance bottleneck"

Claude: Creating workspace and launching 4 independent reviewers...

Results:
✅ Opus: Analysis complete
❌ Gemini: Failed - Rate limit exceeded
❌ Codex: Failed - Service unavailable
✅ Kimi K2.5: Analysis complete

Proceeding with 2/4 models (Opus + Kimi)

Consensus findings:
- Line 234: O(n²) loop identified by both
- Line 567: Memory leak in caching (Kimi found with test case)

Opus unique insight: Architectural recommendation to use generators
Kimi unique insight: Specific benchmark test showing 10x slowdown

Note: 2-model review sufficient for this scope. Consider re-running with all models for critical code.

Reports saved to workspace/coreQAQC/

When to Weight Kimi's Findings Higher

Kimi K2.5 findings should be given extra weight when:

  • Edge cases are critical (financial calculations, safety systems)
  • Testing is insufficient (new codebase, legacy code)
  • Security matters (user input handling, authentication)
  • Error handling is crucial (production systems, data pipelines)

Integration with Other Skills

Works with:

  • dev_invoke_kimi-cli - Follow up with Kimi-specific testing
  • dev_invoke_gemini-cli - Deep dive on Gemini findings
  • dev_invoke_codex-cli - Implement fixes identified
  • using-git-worktrees - Create isolated workspace for fixes

See Also

  • Kimi CLI Skill: .claude/skills/dev_invoke_kimi-cli/SKILL.md
  • Subagent Output Pattern: .claude/rules/subagent-output-pattern.md
  • Agent Integration Testing: .claude/rules/testing/agent-integration-testing.md
  • Orchestrator Pattern: Root CLAUDE.md - Orchestrator section

Troubleshooting Model Failures

Opus Failures

Symptom: "Model not available" or "Rate limit exceeded"

Solutions:

bash
# Check Opus availability
opencode models | grep opus

# Alternative: Use Sonnet if Opus unavailable
opencode run -m claude-sonnet-4.5 "..."

Gemini Failures

Symptom: "Gemini API error" or "Context length exceeded"

Solutions:

bash
# Check Gemini API key
export GEMINI_API_KEY=your_key_here

# Try smaller context window model
opencode run -m gemini-1.5-flash "..."

Codex Failures

Symptom: "Codex service unavailable" or "OpenAI error"

Solutions:

bash
# Check OpenAI API key
export OPENAI_API_KEY=your_key_here

# Alternative: Use GPT-4o
opencode run -m gpt-4o "..."

Kimi K2.5 Failures

Symptom: "Together.ai error" or "API key not found"

Solutions:

bash
# Set Together.ai API key
export TOGETHER_API_KEY=your_key_here

# Alternative: Use Opencode's free Kimi
opencode run -m opencode/kimi-k2.5-free "..."

# Check Together.ai credits
curl -H "Authorization: Bearer $TOGETHER_API_KEY" \
  https://api.together.xyz/v1/models

General Troubleshooting

All models failing?

  1. Check internet connection
  2. Verify opencode CLI: opencode --version
  3. Check auth status: opencode auth status
  4. Review logs: opencode debug

Intermittent failures?

  • Retry with backoff: Wait 30 seconds and re-run
  • Check rate limits: May need to upgrade tier
  • Use fewer models: Start with 2-3 instead of 4

Permission denied?

  • Check workspace directory permissions
  • Ensure write access to workspace/ folder
  • Run from project root with proper access

Didn't find tool you were looking for?

Be as detailed as possible for better results