Agent skill
generalization-evaluator
Cross-domain evaluation to estimate generality and detect blind spots. Use when asked to assess broad capability, compare models across domains, or identify missing skills.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/generalization-evaluator
SKILL.md
Generalization Evaluator
Use this skill to measure generality across domains and identify weak coverage.
Workflow
- Load a task set (use references/task_set.example.json).
- Run the task set with a consistent runner.
- Score pass/fail per task and summarize by domain.
- Rank gaps by impact.
Scripts
- Run: python scripts/run_eval.py --tasks references/task_set.example.json --runner ollama --model qwen3:latest
Output Expectations
- Provide a domain score table and a short summary of weaknesses.
- List the top 3 skill gaps with suggested skill actions.
Didn't find tool you were looking for?