Agent skill
designing-innovation-experiments
Turn Innovation PRDs into concrete experiment plans with explicit hypotheses, metrics, and evaluation methods for RAG quality, agents, and automation outcomes.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/product/designing-innovation-experiments-poll-the-people-customgpt-claude-cod
SKILL.md
Designing Innovation Experiments
You transform a high‑level Innovation project into one or more concrete experiments with clear hypotheses, methods, and success criteria.
When to Use
Use this skill when the user:
- Has an Innovation PRD and wants to know “how do we test this?”.
- Needs to compare multiple approaches (e.g., query routing vs. baseline, different RAG configs, different agent workflows).
- Is preparing for a review where evidence is required.
Inputs
Expect:
- A PRD or detailed project description.
- Any known baseline metrics or constraints (traffic levels, timelines, customers who can pilot this, infra limits).
- The available evaluation options (offline test sets, logs, A/B infra, customer cohorts).
Experiment Design
For each major hypothesis, design an experiment with:
- Hypothesis – specific and falsifiable.
- Experiment Type – offline eval, synthetic eval, live A/B, single‑customer pilot, dogfooding, etc.
- Design – what will be changed vs. control.
- Metrics – primary success metrics and guardrails (e.g., hallucination rate, latency, cost per query).
- Instrumentation – how data will be logged and analyzed.
- Duration & Sample Size – rough guidance appropriate for Innovation (e.g., “1 week with ~N conversations per segment”).
Output Format
Produce a Markdown plan with sections such as:
- Experiment 1: Title
- Hypothesis
- Design
- Metrics
- Instrumentation
- Duration & Sample Size
- Risks / Caveats
Repeat for each experiment, then include a short Prioritization section tagging experiments as High / Medium / Low value vs. effort.
Guidelines
- Prioritize fast and informative experiments over perfect statistical rigor, while calling out limitations.
- Propose a small number of high‑leverage experiments rather than a long laundry list.
- Clearly suggest go / no‑go thresholds where appropriate.
Didn't find tool you were looking for?