Agent skill
evaluate-model
Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/testing/evaluate-model-mvillmow-projectodyssey
SKILL.md
Evaluate Model
Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).
When to Use
- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation
Quick Reference
mojo
# Mojo model evaluation pattern
struct ModelEvaluator:
fn evaluate_classification(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32, Float32]:
# Returns accuracy, precision, recall
...
fn evaluate_regression(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32]:
# Returns MSE, MAE
...
Workflow
- Load test data: Prepare test/validation dataset
- Generate predictions: Run model inference on test set
- Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
- Calculate metrics: Compute performance metrics
- Analyze results: Compare to baseline and identify strengths/weaknesses
Output Format
Evaluation report:
- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis
References
- See CLAUDE.md > Language Preference (Mojo for ML models)
- See
train-modelskill for model training - See
/notes/review/mojo-ml-patterns.mdfor Mojo tensor operations
Didn't find tool you were looking for?