Agent skill

setup

Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator.

Stars 8,805
Forks 1,070

Install this agent skill to your Project

npx add-skill https://github.com/alirezarezvani/claude-skills/tree/main/engineering/autoresearch-agent/skills/setup

SKILL.md

/ar:setup — Create New Experiment

Set up a new autoresearch experiment with all required configuration.

Usage

/ar:setup                                    # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # Show existing experiments
/ar:setup --list-evaluators                  # Show available evaluators

What It Does

If arguments provided

Pass them directly to the setup script:

bash
python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]

If no arguments (interactive mode)

Collect each parameter one at a time:

  1. Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
  2. Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
  3. Target file — Ask: "Which file to optimize?" Verify it exists.
  4. Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
  5. Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
  6. Direction — Ask: "Is lower or higher better?"
  7. Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
  8. Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"

Then run setup_experiment.py with the collected parameters.

Listing

bash
# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list

# Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators

Built-in Evaluators

Name Metric Use Case
benchmark_speed p50_ms (lower) Function/API execution time
benchmark_size size_bytes (lower) File, bundle, Docker image size
test_pass_rate pass_rate (higher) Test suite pass percentage
build_speed build_seconds (lower) Build/compile/Docker build time
memory_usage peak_mb (lower) Peak memory during execution
llm_judge_content ctr_score (higher) Headlines, titles, descriptions
llm_judge_prompt quality_score (higher) System prompts, agent instructions
llm_judge_copy engagement_score (higher) Social posts, ad copy, emails

After Setup

Report to the user:

  • Experiment path and branch name
  • Whether the eval command worked and the baseline metric
  • Suggest: "Run /ar:run {domain}/{name} to start iterating, or /ar:loop {domain}/{name} for autonomous mode."

Expand your agent's capabilities with these related and highly-rated skills.

alirezarezvani/claude-skills

business-growth-skills

4 business growth agent skills and plugins for Claude Code, Codex, Gemini CLI, Cursor, OpenClaw. Customer success (health scoring, churn), sales engineer (RFP), revenue operations (pipeline, GTM), contract & proposal writer. Python tools (stdlib-only).

8,805 1,070
Explore
alirezarezvani/claude-skills

contract-and-proposal-writer

Contract & Proposal Writer

8,805 1,070
Explore
alirezarezvani/claude-skills

sales-engineer

Analyzes RFP/RFI responses for coverage gaps, builds competitive feature comparison matrices, and plans proof-of-concept (POC) engagements for pre-sales engineering. Use when responding to RFPs, bids, or proposal requests; comparing product features against competitors; planning or scoring a customer POC or sales demo; preparing a technical proposal; or performing win/loss competitor analysis. Handles tasks described as 'RFP response', 'bid response', 'proposal response', 'competitor comparison', 'feature matrix', 'POC planning', 'sales demo prep', or 'pre-sales engineering'.

8,805 1,070
Explore
alirezarezvani/claude-skills

customer-success-manager

Monitors customer health, predicts churn risk, and identifies expansion opportunities using weighted scoring models for SaaS customer success. Use when analyzing customer accounts, reviewing retention metrics, scoring at-risk customers, or when the user mentions churn, customer health scores, upsell opportunities, expansion revenue, retention analysis, or customer analytics. Runs three Python CLI tools to produce deterministic health scores, churn risk tiers, and prioritized expansion recommendations across Enterprise, Mid-Market, and SMB segments.

8,805 1,070
Explore
alirezarezvani/claude-skills

revenue-operations

Analyzes sales pipeline health, revenue forecasting accuracy, and go-to-market efficiency metrics for SaaS revenue optimization. Use when analyzing sales pipeline coverage, forecasting revenue, evaluating go-to-market performance, reviewing sales metrics, assessing pipeline analysis, tracking forecast accuracy with MAPE, calculating GTM efficiency, or measuring sales efficiency and unit economics for SaaS teams.

8,805 1,070
Explore
alirezarezvani/claude-skills

marketing-skills

42 marketing agent skills and plugins for Claude Code, Codex, Gemini CLI, Cursor, OpenClaw, and 6 more coding agents. 7 pods: content, SEO, CRO, channels, growth, intelligence, sales. Foundation context + orchestration router. 27 Python tools (stdlib-only).

8,805 1,070
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results