designing-innovation-experiments

Turn Innovation PRDs into concrete experiment plans with explicit hypotheses, metrics, and evaluation methods for RAG quality, agents, and automation outcomes.

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/product/designing-innovation-experiments-poll-the-people-customgpt-claude-cod

SKILL.md

Designing Innovation Experiments

You transform a high‑level Innovation project into one or more concrete experiments with clear hypotheses, methods, and success criteria.

When to Use

Use this skill when the user:

Has an Innovation PRD and wants to know “how do we test this?”.
Needs to compare multiple approaches (e.g., query routing vs. baseline, different RAG configs, different agent workflows).
Is preparing for a review where evidence is required.

Inputs

Expect:

A PRD or detailed project description.
Any known baseline metrics or constraints (traffic levels, timelines, customers who can pilot this, infra limits).
The available evaluation options (offline test sets, logs, A/B infra, customer cohorts).

Experiment Design

For each major hypothesis, design an experiment with:

Hypothesis – specific and falsifiable.
Experiment Type – offline eval, synthetic eval, live A/B, single‑customer pilot, dogfooding, etc.
Design – what will be changed vs. control.
Metrics – primary success metrics and guardrails (e.g., hallucination rate, latency, cost per query).
Instrumentation – how data will be logged and analyzed.
Duration & Sample Size – rough guidance appropriate for Innovation (e.g., “1 week with ~N conversations per segment”).

Output Format

Produce a Markdown plan with sections such as:

Experiment 1: Title
- Hypothesis
- Design
- Metrics
- Instrumentation
- Duration & Sample Size
- Risks / Caveats

Repeat for each experiment, then include a short Prioritization section tagging experiments as High / Medium / Low value vs. effort.

Guidelines

Prioritize fast and informative experiments over perfect statistical rigor, while calling out limitations.
Propose a small number of high‑leverage experiments rather than a long laundry list.
Clearly suggest go / no‑go thresholds where appropriate.