Agent skill
libeval
libeval - RAG evaluation system. Evaluator orchestrates quality assessment using LLM-as-judge patterns. CriteriaEvaluator scores responses against rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator analyzes execution traces. EvalStore persists results. Use for automated quality testing, RAG pipeline evaluation, and agent performance testing
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/libeval
SKILL.md
libeval Skill
When to Use
- Evaluating RAG agent response quality
- Measuring retrieval recall and precision
- Running automated quality assessments
- Benchmarking agent performance over time
Key Concepts
Evaluator: Main orchestrator that runs test cases through the agent and collects metrics.
CriteriaEvaluator: Uses LLM-as-judge to score responses against defined criteria and rubrics.
RecallEvaluator: Measures how well the retrieval system returns relevant documents.
TraceEvaluator: Analyzes execution traces for performance and correctness.
Usage Patterns
Pattern 1: Run evaluation suite
import { Evaluator } from "@copilot-ld/libeval";
const evaluator = new Evaluator(config);
const results = await evaluator.run(testCases);
console.log(results.summary);
Pattern 2: Criteria-based evaluation
import { CriteriaEvaluator } from "@copilot-ld/libeval";
const criteria = new CriteriaEvaluator(llmClient);
const score = await criteria.evaluate(response, rubric);
Integration
Configured via config/eval.yml. Run via make eval. Uses libllm for
LLM-as-judge.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?