Agent skill
evaluate-model
Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/evaluate-model
SKILL.md
Evaluate Model
Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).
When to Use
- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation
Quick Reference
# Mojo model evaluation pattern
struct ModelEvaluator:
fn evaluate_classification(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32, Float32]:
# Returns accuracy, precision, recall
...
fn evaluate_regression(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32]:
# Returns MSE, MAE
...
Workflow
- Load test data: Prepare test/validation dataset
- Generate predictions: Run model inference on test set
- Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
- Calculate metrics: Compute performance metrics
- Analyze results: Compare to baseline and identify strengths/weaknesses
Output Format
Evaluation report:
- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis
References
- See CLAUDE.md > Language Preference (Mojo for ML models)
- See
train-modelskill for model training - See
/notes/review/mojo-ml-patterns.mdfor Mojo tensor operations
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?