Agent skill

libeval

libeval - RAG evaluation system. Evaluator orchestrates quality assessment using LLM-as-judge patterns. CriteriaEvaluator scores responses against rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator analyzes execution traces. EvalStore persists results. Use for automated quality testing, RAG pipeline evaluation, and agent performance testing

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/libeval

SKILL.md

libeval Skill

When to Use

  • Evaluating RAG agent response quality
  • Measuring retrieval recall and precision
  • Running automated quality assessments
  • Benchmarking agent performance over time

Key Concepts

Evaluator: Main orchestrator that runs test cases through the agent and collects metrics.

CriteriaEvaluator: Uses LLM-as-judge to score responses against defined criteria and rubrics.

RecallEvaluator: Measures how well the retrieval system returns relevant documents.

TraceEvaluator: Analyzes execution traces for performance and correctness.

Usage Patterns

Pattern 1: Run evaluation suite

javascript
import { Evaluator } from "@copilot-ld/libeval";

const evaluator = new Evaluator(config);
const results = await evaluator.run(testCases);
console.log(results.summary);

Pattern 2: Criteria-based evaluation

javascript
import { CriteriaEvaluator } from "@copilot-ld/libeval";

const criteria = new CriteriaEvaluator(llmClient);
const score = await criteria.evaluate(response, rubric);

Integration

Configured via config/eval.yml. Run via make eval. Uses libllm for LLM-as-judge.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results