generalization-evaluator

Cross-domain evaluation to estimate generality and detect blind spots. Use when asked to assess broad capability, compare models across domains, or identify missing skills.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/generalization-evaluator

SKILL.md

Generalization Evaluator

Use this skill to measure generality across domains and identify weak coverage.

Workflow

Load a task set (use references/task_set.example.json).
Run the task set with a consistent runner.
Score pass/fail per task and summarize by domain.
Rank gaps by impact.

Scripts

Run: python scripts/run_eval.py --tasks references/task_set.example.json --runner ollama --model qwen3:latest

Output Expectations

Provide a domain score table and a short summary of weaknesses.
List the top 3 skill gaps with suggested skill actions.

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/generalization-evaluator
License: MIT License

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Didn't find tool you were looking for?