Instructions

Primary Use Case

When a user requests to extract and save specific training metrics from experiment logs at regular step intervals (e.g., "every 100 steps") into a CSV file.

Core Workflow

1. Identify Target Experiment

Input: User provides a target project/experiment (often via URL like wandb.ai/entity/project).
Action: Query the project to list available runs/experiments.
Decision Point: Determine which experiment to analyze based on user criteria (e.g., "shortest answers" → lowest response_length/mean).

2. Extract Metrics History

Input: Selected experiment run identifier.
Action: Query the run's history for the specific metrics requested by the user (e.g., actor/entropy_loss, response_length/clip_ratio, response_length/mean).
Note: Use sampling queries to efficiently retrieve data at all steps.

3. Sample at Specified Intervals

Input: Full history data and user-specified interval (e.g., "from step 0, at intervals of every 100 steps").
Action: Filter the history to extract data points only at the requested steps (e.g., 0, 100, 200, ... up to the final step).
Output: A structured list/dictionary of values for each target step.

4. Create CSV File

Input: Sampled data with columns: step, metric1, metric2, ...
Action: Format data as CSV with appropriate headers.
Output: Write CSV file to the workspace with a descriptive filename (e.g., shortest_length_experiment.csv).

5. Provide Summary

Action: Present a concise analysis summary to the user, highlighting:
- Which experiment was selected and why.
- Key observations from the extracted data (trends, min/max values).
- Location and contents of the generated CSV file.

Key Tools & Patterns

W&B Queries: Use wandb-query_wandb_tool for:
- Project/run listing (ProjectInfo, GetRuns).
- History keys inspection (RunHistoryKeys).
- Sampled history data (RunHistorySampled with appropriate specs).
File Operations: Use filesystem-write_file to create the CSV, and optionally filesystem-read_file to verify.
Large Output Handling: When tool outputs are truncated, use local-view_overlong_tooloutput and related navigation/search tools to extract needed data.

Common User Phrases That Trigger This Skill

"Record [metrics] into [filename].csv"
"Save experiment data to a CSV file"
"Extract data at intervals of every X steps"
"Get the metrics from step 0, every 100 steps"
"Analyze which experiment has the shortest/longest [metric]"

Error Handling & Edge Cases

Missing Metrics: If a requested metric key doesn't exist in the run's history, inform the user and adjust the CSV columns accordingly.
Insufficient Steps: If the run has fewer steps than the requested interval, sample all available steps and note the limitation.
Large Datasets: For runs with many steps, use sampled history queries with appropriate max_items to avoid timeouts.

Search AI Tools

training-metrics-csv-exporter

Install this agent skill to your Project

SKILL.md

Instructions

Primary Use Case

Core Workflow

1. Identify Target Experiment

2. Extract Metrics History

3. Sample at Specified Intervals

4. Create CSV File

5. Provide Summary

Key Tools & Patterns

Common User Phrases That Trigger This Skill

Error Handling & Edge Cases