Agent skill

evaluate-model

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/evaluate-model

SKILL.md

Evaluate Model

Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).

When to Use

Comparing different model architectures
Assessing performance on test/validation datasets
Detecting overfitting or underfitting
Reporting model accuracy for papers and documentation

Quick Reference

mojo

# Mojo model evaluation pattern
struct ModelEvaluator:
    fn evaluate_classification(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32, Float32]:
        # Returns accuracy, precision, recall
        ...

    fn evaluate_regression(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32]:
        # Returns MSE, MAE
        ...

Workflow

Load test data: Prepare test/validation dataset
Generate predictions: Run model inference on test set
Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
Calculate metrics: Compute performance metrics
Analyze results: Compare to baseline and identify strengths/weaknesses

Output Format

Evaluation report:

Task type (classification, regression, etc.)
Metrics (accuracy, precision, recall, F1, AUC, etc.)
Per-class breakdown (if applicable)
Comparison to baseline model
Confusion matrix (classification)
Error analysis

References

See CLAUDE.md > Language Preference (Mojo for ML models)
See train-model skill for model training
See /notes/review/mojo-ml-patterns.md for Mojo tensor operations

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/evaluate-model
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Evaluate Model

When to Use

Quick Reference

Workflow

Output Format

References

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state