Agent skill
retrieval-judge
Evaluate and filter recall_search results for relevance
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/retrieval-judge
SKILL.md
Retrieval Judge Skill
When using recall_search, you MUST apply critical judgment to results rather than treating them as authoritative. This is a mandatory step after every recall_search call.
Query Construction
Build specific queries that include conversation context:
- Good:
"payment retry logic after provider timeout","Go error wrapping with sentinel errors" - Bad:
"retry","error handling"
Include the problem domain, technology, and specific concern in your query.
Result Evaluation
After each recall_search call, evaluate every result before using it:
- Title/type match — Does the entry's title and type (pattern, decision, failure, etc.) relate to what you're actually looking for?
- Content relevance — Does the snippet address your specific question, or is it tangentially related?
- Applicability — Does the result apply to the current technology, codebase area, or problem domain?
Only reference results that directly address the query. Discard results that are merely keyword-adjacent.
Anti-Patterns
- Don't trust rank order blindly. RRF scoring produces narrow distributions — result #1 may not be meaningfully better than result #5.
- Don't use results just because they appeared. An empty answer is better than citing irrelevant knowledge.
- Don't ignore low-ranked results. A result at position 8 may be more relevant than position 2 if it matches the actual intent.
- Don't skip evaluation. Every
recall_searchcall should be followed by a mental relevance check before incorporating results into your response.
Audit Trail
After evaluating recall_search results, always:
-
Log your judgment — call
flight_recorder_logwith:- type:
retrieval_judgment - content: One-line summary, e.g.
"3/7 results relevant for 'payment retry timeout'" - metadata:
json
{ "query": "<the search query>", "kept": [{"id": "<id>", "title": "<title>", "reason": "directly addresses retry logic"}], "dropped": [{"id": "<id>", "title": "<title>", "reason": "about billing, not payments"}] }
- type:
-
Show a summary to the user:
RECALL: 3/7 results kept for "payment retry timeout" -
On request — if the user asks for details on the judgment, output the full kept/dropped list with per-result reasoning.
When Results Are Poor
If no results are clearly relevant:
- Say so — don't force-fit irrelevant knowledge
- Try a rephrased query with different terms
- Proceed without RECALL context rather than using noise
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?