Agent skill

idea-discovery

Workflow 1: Full idea discovery pipeline. Orchestrates research-lit → idea-creator → novelty-check → research-review to go from a broad research direction to validated, pilot-tested ideas. Use when user says \"找idea全流程\", \"idea discovery pipeline\", \"从零开始找方向\", or wants the complete idea exploration workflow.

Stars 6,306
Forks 582

Install this agent skill to your Project

npx add-skill https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep/tree/main/skills/skills-codex/idea-discovery

SKILL.md

Workflow 1: Idea Discovery Pipeline

Orchestrate a complete idea discovery workflow for: $ARGUMENTS

Overview

This skill chains sub-skills into a single automated pipeline:

/research-lit → /idea-creator → /novelty-check → /research-review → /research-refine-pipeline
  (survey)      (brainstorm)    (verify novel)    (critical feedback)  (refine method + plan experiments)

Each phase builds on the previous one's output. The final deliverables are a validated IDEA_REPORT.md with ranked ideas, plus a refined proposal (refine-logs/FINAL_PROPOSAL.md) and experiment plan (refine-logs/EXPERIMENT_PLAN.md) for the top idea.

Constants

  • PILOT_MAX_HOURS = 2 — Skip any pilot experiment estimated to take > 2 hours per GPU. Flag as "needs manual pilot" in the report.
  • PILOT_TIMEOUT_HOURS = 3 — Hard timeout: kill any running pilot that exceeds 3 hours. Collect partial results if available.
  • MAX_PILOT_IDEAS = 3 — Run pilots for at most 3 top ideas in parallel. Additional ideas are validated on paper only.
  • MAX_TOTAL_GPU_HOURS = 8 — Total GPU budget across all pilots. If exceeded, skip remaining pilots and note in report.
  • AUTO_PROCEED = true — If user doesn't respond at a checkpoint, automatically proceed with the best option after presenting results. Set to false to always wait for explicit user confirmation.
  • REVIEWER_MODEL = gpt-5.4 — Model used via a secondary Codex agent. Must be an OpenAI model (e.g., gpt-5.4, o3, gpt-4o). Passed to sub-skills.
  • ARXIV_DOWNLOAD = false — When true, /research-lit downloads the top relevant arXiv PDFs during Phase 1. When false (default), only fetches metadata. Passed through to /research-lit.
  • COMPACT = false — When true, generate compact summary files for short-context sessions and downstream skills. Writes IDEA_CANDIDATES.md.
  • REF_PAPER = false — Reference paper to base ideas on. Accepts a local PDF path, arXiv URL, or paper URL. When set, summarize it first and use it as idea-generation context.

💡 These are defaults. Override by telling the skill, e.g., /idea-discovery "topic" — ref paper: https://arxiv.org/abs/2406.04329 or /idea-discovery "topic" — compact: true.

Pipeline

Phase 0.5: Reference Paper Summary (when REF_PAPER is set)

Skip entirely if REF_PAPER is false.

Summarize the reference paper before searching the literature:

  1. If arXiv URL — invoke /arxiv "ARXIV_ID" — download to fetch the PDF, then read the first 5 pages.
  2. If local PDF path — read the PDF directly, focusing on the title, abstract, introduction, and method overview.
  3. If other URL — fetch the content and extract the method, results, and limitations.
  4. Generate REF_PAPER_SUMMARY.md with: what the paper did, key results, limitations/open questions, and plausible improvement directions.

Use REF_PAPER_SUMMARY.md as additional context in both Phase 1 and Phase 2.

Phase 1: Literature Survey

Invoke /research-lit to map the research landscape:

/research-lit "$ARGUMENTS"

What this does:

  • Search arXiv, Google Scholar, Semantic Scholar for recent papers
  • Build a landscape map: sub-directions, approaches, open problems
  • Identify structural gaps and recurring limitations
  • Output a literature summary (saved to working notes)

🚦 Checkpoint: Present the landscape summary to the user. Ask:

📚 Literature survey complete. Here's what I found:
- [key findings, gaps, open problems]

Does this match your understanding? Should I adjust the scope before generating ideas?
(If no response, I'll proceed with the top-ranked direction.)
  • User approves (or no response + AUTO_PROCEED=true) → proceed to Phase 2 with best direction.
  • User requests changes (e.g., "focus more on X", "ignore Y", "too broad") → refine the search with updated queries, re-run /research-lit with adjusted scope, and present again. Repeat until the user is satisfied.

Phase 2: Idea Generation + Filtering + Pilots

Invoke /idea-creator with the landscape context and REF_PAPER_SUMMARY.md if available:

/idea-creator "$ARGUMENTS"

What this does:

  • If REF_PAPER_SUMMARY.md exists, include it as context so ideas explicitly build on, improve, or extend the reference paper
  • Brainstorm 8-12 concrete ideas via GPT-5.4 xhigh
  • Filter by feasibility, compute cost, quick novelty search
  • Deep validate top ideas (full novelty check + devil's advocate)
  • Run parallel pilot experiments on available GPUs (top 2-3 ideas)
  • Rank by empirical signal
  • Output IDEA_REPORT.md

🚦 Checkpoint: Present IDEA_REPORT.md ranked ideas to the user. Ask:

💡 Generated X ideas, filtered to Y, piloted Z. Top results:

1. [Idea 1] — Pilot: POSITIVE (+X%)
2. [Idea 2] — Pilot: WEAK POSITIVE (+Y%)
3. [Idea 3] — Pilot: NEGATIVE, eliminated

Which ideas should I validate further? Or should I regenerate with different constraints?
(If no response, I'll proceed with the top-ranked ideas.)
  • User picks ideas (or no response + AUTO_PROCEED=true) → proceed to Phase 3 with top-ranked ideas.
  • User unhappy with all ideas → collect feedback ("what's missing?", "what direction do you prefer?"), update the prompt with user's constraints, and re-run Phase 2 (idea generation). Repeat until the user selects at least 1 idea.
  • User wants to adjust scope → go back to Phase 1 with refined direction.

Phase 3: Deep Novelty Verification

For each top idea (positive pilot signal), run a thorough novelty check:

/novelty-check "[top idea 1 description]"
/novelty-check "[top idea 2 description]"

What this does:

  • Multi-source literature search (arXiv, Scholar, Semantic Scholar)
  • Cross-verify with GPT-5.4 xhigh
  • Check for concurrent work (last 3-6 months)
  • Identify closest existing work and differentiation points

Update IDEA_REPORT.md with deep novelty results. Eliminate any idea that turns out to be already published.

Phase 4: External Critical Review

For the surviving top idea(s), get brutal feedback:

/research-review "[top idea with hypothesis + pilot results]"

What this does:

  • GPT-5.4 xhigh acts as a senior reviewer (NeurIPS/ICML level)
  • Scores the idea, identifies weaknesses, suggests minimum viable improvements
  • Provides concrete feedback on experimental design

Update IDEA_REPORT.md with reviewer feedback and revised plan.

Phase 4.5: Method Refinement + Experiment Planning

After review, refine the top idea into a concrete proposal and plan experiments:

/research-refine-pipeline "[top idea description + pilot results + reviewer feedback]"

What this does:

  • Freeze a Problem Anchor to prevent scope drift
  • Iteratively refine the method via GPT-5.4 review (up to 5 rounds, until score ≥ 9)
  • Generate a claim-driven experiment roadmap with ablations, budgets, and run order
  • Output: refine-logs/FINAL_PROPOSAL.md, refine-logs/EXPERIMENT_PLAN.md, refine-logs/EXPERIMENT_TRACKER.md

🚦 Checkpoint: Present the refined proposal summary:

🔬 Method refined and experiment plan ready:
- Problem anchor: [anchored problem]
- Method thesis: [one sentence]
- Dominant contribution: [what's new]
- Must-run experiments: [N blocks]
- First 3 runs to launch: [list]

Proceed to implementation? Or adjust the proposal?
  • User approves (or AUTO_PROCEED=true) → proceed to Final Report.
  • User requests changes → pass feedback to /research-refine for another round.
  • Lite mode: If reviewer score < 6 or pilot was weak, run /research-refine only (skip /experiment-plan) and note remaining risks in the report.

Phase 5: Final Report

Finalize IDEA_REPORT.md with all accumulated information:

markdown
# Idea Discovery Report

**Direction**: $ARGUMENTS
**Date**: [today]
**Pipeline**: research-lit → idea-creator → novelty-check → research-review → research-refine-pipeline

## Executive Summary
[2-3 sentences: best idea, key evidence, recommended next step]

## Literature Landscape
[from Phase 1]

## Ranked Ideas
[from Phase 2, updated with Phase 3-4 results]

### 🏆 Idea 1: [title] — RECOMMENDED
- Pilot: POSITIVE (+X%)
- Novelty: CONFIRMED (closest: [paper], differentiation: [what's different])
- Reviewer score: X/10
- Next step: implement full experiment → /auto-review-loop

### Idea 2: [title] — BACKUP
...

## Eliminated Ideas
[ideas killed at each phase, with reasons]

## Refined Proposal
- Proposal: `refine-logs/FINAL_PROPOSAL.md`
- Experiment plan: `refine-logs/EXPERIMENT_PLAN.md`
- Tracker: `refine-logs/EXPERIMENT_TRACKER.md`

## Next Steps
- [ ] /run-experiment to deploy experiments from the plan
- [ ] /auto-review-loop to iterate until submission-ready
- [ ] Or invoke /research-pipeline for the complete end-to-end flow

Phase 5.5: Write Compact Files (when COMPACT = true)

Skip entirely if COMPACT is false.

Write IDEA_CANDIDATES.md — a lean summary of the top 3-5 surviving ideas:

markdown
# Idea Candidates

| # | Idea | Pilot Signal | Novelty | Reviewer Score | Status |
|---|------|-------------|---------|---------------|--------|
| 1 | [title] | +X% | Confirmed | X/10 | RECOMMENDED |
| 2 | [title] | +Y% | Confirmed | X/10 | BACKUP |
| 3 | [title] | Negative | — | — | ELIMINATED |

## Active Idea: #1 — [title]
- Hypothesis: [one sentence]
- Key evidence: [pilot result]
- Next step: /experiment-bridge or /research-refine

Key Rules

  • Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.

  • Don't skip phases. Each phase filters and validates — skipping leads to wasted effort later.

  • Checkpoint between phases. Briefly summarize what was found before moving on.

  • Kill ideas early. It's better to kill 10 bad ideas in Phase 3 than to implement one and fail.

  • Empirical signal > theoretical appeal. An idea with a positive pilot outranks a "sounds great" idea without evidence.

  • Document everything. Dead ends are just as valuable as successes for future reference.

  • Be honest with the reviewer. Include negative results and failed pilots in the review prompt.

  • Feishu notifications are optional. If ~/.codex/feishu.json exists, send checkpoint at each phase transition and pipeline_done at final report. If absent/off, skip silently.

Composing with Workflow 2

After this pipeline produces a validated top idea:

/idea-discovery "direction"         ← you are here (Workflow 1, includes method refinement + experiment planning)
/run-experiment                     ← deploy experiments from the plan
/auto-review-loop "top idea"        ← Workflow 2: iterate until submission-ready

Or use /research-pipeline for the full end-to-end flow.

Expand your agent's capabilities with these related and highly-rated skills.

wanshuiyin/Auto-claude-code-research-in-sleep

ablation-planner

Use when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. Codex designs ablations from a reviewer's perspective, CC reviews feasibility and implements.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

paper-plan

Generate a structured paper outline from review conclusions and experiment results. Use when user says "写大纲", "paper outline", "plan the paper", "论文规划", or wants to create a paper plan before writing.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

idea-discovery-robot

Workflow 1 adaptation for robotics and embodied AI. Orchestrates robotics-aware literature survey, idea generation, novelty check, and critical review to go from a broad robotics direction to benchmark-grounded, simulation-first ideas. Use when user says "robotics idea discovery", "机器人找idea", "embodied AI idea", "机器人方向探索", "sim2real 选题", or wants ideas for manipulation, locomotion, navigation, drones, humanoids, or general robot learning.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

training-check

Periodically check WandB metrics during training to catch problems early (NaN, loss divergence, idle GPUs). Avoids wasting GPU hours on broken runs. Use when training is running and you want automated health checks.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

paper-plan

Generate a structured paper outline from review conclusions and experiment results. Use when user says "写大纲", "paper outline", "plan the paper", "论文规划", or wants to create a paper plan before writing.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

idea-discovery-robot

Workflow 1 adaptation for robotics and embodied AI. Orchestrates robotics-aware literature survey, idea generation, novelty check, and critical review to go from a broad robotics direction to benchmark-grounded, simulation-first ideas. Use when user says \"robotics idea discovery\", \"机器人找idea\", \"embodied AI idea\", \"机器人方向探索\", \"sim2real 选题\", or wants ideas for manipulation, locomotion, navigation, drones, humanoids, or general robot learning.

6,306 582
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results