evaluate-model

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/testing/evaluate-model-mvillmow-projectodyssey

SKILL.md

Evaluate Model

Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).

When to Use

Comparing different model architectures
Assessing performance on test/validation datasets
Detecting overfitting or underfitting
Reporting model accuracy for papers and documentation

Quick Reference

mojo

# Mojo model evaluation pattern
struct ModelEvaluator:
    fn evaluate_classification(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32, Float32]:
        # Returns accuracy, precision, recall
        ...

    fn evaluate_regression(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32]:
        # Returns MSE, MAE
        ...

Workflow

Load test data: Prepare test/validation dataset
Generate predictions: Run model inference on test set
Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
Calculate metrics: Compute performance metrics
Analyze results: Compare to baseline and identify strengths/weaknesses

Output Format

Evaluation report:

Task type (classification, regression, etc.)
Metrics (accuracy, precision, recall, F1, AUC, etc.)
Per-class breakdown (if applicable)
Comparison to baseline model
Confusion matrix (classification)
Error analysis

References

See CLAUDE.md > Language Preference (Mojo for ML models)
See train-model skill for model training
See /notes/review/mojo-ml-patterns.md for Mojo tensor operations