Agent skill

generalization-evaluator

Cross-domain evaluation to estimate generality and detect blind spots. Use when asked to assess broad capability, compare models across domains, or identify missing skills.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/generalization-evaluator

SKILL.md

Generalization Evaluator

Use this skill to measure generality across domains and identify weak coverage.

Workflow

  1. Load a task set (use references/task_set.example.json).
  2. Run the task set with a consistent runner.
  3. Score pass/fail per task and summarize by domain.
  4. Rank gaps by impact.

Scripts

  • Run: python scripts/run_eval.py --tasks references/task_set.example.json --runner ollama --model qwen3:latest

Output Expectations

  • Provide a domain score table and a short summary of weaknesses.
  • List the top 3 skill gaps with suggested skill actions.

Didn't find tool you were looking for?

Be as detailed as possible for better results