Perform Sweep

End-to-end workflow for running ablation experiments on the Diplomacy GRPO training pipeline.

Quick Reference

Phase	Action	Command
Configure	Create sweep.yaml	See YAML Reference
Validate	Dry run	`python scripts/launch_sweep.py <path> --dry-run`
Info	Show config	`python scripts/launch_sweep.py <path> --info`
Launch	Start sweep	`python scripts/launch_sweep.py <path>`
Status	Check progress	`python scripts/launch_sweep.py <path> --status`
List	List all sweeps	`python scripts/launch_sweep.py --list`
Analyze	Compare results	Use `experiment-analysis` skill

Workflow

1. Hypothesis Design

Review recent experiments in experiments/experiment-tracker.md
Identify one variable to test (e.g., horizon length, scoring function)
Predict expected outcome
Document reasoning in sweep.yaml hypothesis field

2. YAML Configuration

Create experiments/sweeps/<name>/sweep.yaml:

yaml

metadata:
  name: "my-ablation"
  description: "Testing hypothesis X"
  hypothesis: "Longer horizons should improve strategic play"
  experiment_tag_prefix: "my-ablation"

defaults:
  total_steps: 100

runs:
  A:
    name: "control"
    description: "Baseline configuration"
    config:
      experiment_tag: "${metadata.experiment_tag_prefix}-A"
  B:
    name: "treatment"
    description: "With longer horizon"
    config:
      rollout_horizon_years: 8
      experiment_tag: "${metadata.experiment_tag_prefix}-B"

See YAML Reference for full schema.

3. Validate Configuration

bash

# Show sweep info
python scripts/launch_sweep.py experiments/sweeps/<name>/ --info

# Dry run (validates config, shows what would run)
python scripts/launch_sweep.py experiments/sweeps/<name>/ --dry-run

4. Launch and Monitor

bash

# Launch (fire-and-forget - runs in cloud)
python scripts/launch_sweep.py experiments/sweeps/<name>/

# Check status anytime
python scripts/launch_sweep.py experiments/sweeps/<name>/ --status

# List all sweeps
python scripts/launch_sweep.py --list

5. Analysis

After sweep completes, use the experiment-analysis skill:

bash

# Full analysis for each run
uv run python .claude/skills/experiment-analysis/analyze_elo.py <run-name>

# Compare in WandB
# Filter by experiment_tag_prefix (e.g., "my-ablation")

Key Features

Fire-and-forget: Launch and close laptop - sweep runs in Modal cloud
Auto-resume: If Modal times out (24hr max), sweep automatically respawns
Sequential execution: Runs one training at a time (infra constraint)
Progress tracking: State saved after each run for recovery

Example Sweeps

See existing sweeps in experiments/sweeps/:

longer-horizon-inverted-weight-ablation/ - 2x2 ablation on horizon and scoring

Integration

Use experiment-analysis skill for post-sweep metrics analysis
Results logged to WandB with experiment_tag for filtering
Document findings in sweep directory's results.md

Search AI Tools

perform-sweep

Install this agent skill to your Project

SKILL.md

Perform Sweep

Quick Reference

Workflow

1. Hypothesis Design

2. YAML Configuration

3. Validate Configuration

4. Launch and Monitor

5. Analysis

Key Features

Example Sweeps

Integration