Agent skill
ranking
Ranks and scores retrieved documents based on similarity metrics from vector search. Use when sorting documents by relevance, prioritizing results, or when the user mentions ranking, scoring, or ordering documents.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/ranking
SKILL.md
Document Ranking
Instructions
Rank and score documents based on similarity metrics already computed by the Retrieval Agent. This skill operates on retrieved documents with distance/similarity information - it does NOT query ChromaDB again.
Default workflow:
- Receive documents from Retrieval Agent (includes distance and similarity_score)
- Call ranking functions to sort documents by relevance
- Optionally filter by similarity threshold
- Return ranked list for downstream processing (grading or generation)
Key functions:
# Rank documents by similarity score (descending)
ranked_docs = rank_documents_by_similarity(documents)
# Rank documents by distance (ascending - lower is better)
ranked_docs = rank_documents_by_distance(documents)
# Filter documents by similarity threshold
filtered_docs = filter_by_similarity_threshold(documents, threshold=0.7)
# Get top-k ranked documents
top_docs = get_top_k_documents(ranked_docs, k=10)
Similarity Metrics:
Each document from Retrieval Agent contains:
distance: Cosine distance from ChromaDB (lower = more similar)- Range: [0, 2] for cosine distance
- 0 = identical, 2 = opposite
similarity_score: Computed as1 - distance(higher = more similar)- Range: [-1, 1] for cosine
- 1 = identical, -1 = opposite
- Typical useful range: [0.5, 1.0]
Ranking Strategies:
-
By Similarity Score (Recommended): Sort descending by
similarity_score- Higher scores = more relevant
- Intuitive: 0.9 > 0.7 > 0.5
-
By Distance: Sort ascending by
distance- Lower distances = more relevant
- Direct from vector search
-
Hybrid with Collection Priority: Rank by score within each collection, then merge
Critical: NEVER query ChromaDB
This skill operates on already-retrieved documents. The Retrieval Agent has already computed distances and similarity scores. Ranking simply sorts and filters based on these existing metrics.
Implementation: Functions should be in components/ranker.py, similar to components/grader.py.
Examples
Example 1: Basic ranking by similarity
from components.ranker import rank_documents_by_similarity
# Input: Documents from Retrieval Agent
# Each has: document, metadata, distance, collection, similarity_score
documents = [
{'document': 'Laptop A...', 'similarity_score': 0.85, 'distance': 0.15, ...},
{'document': 'Laptop B...', 'similarity_score': 0.92, 'distance': 0.08, ...},
{'document': 'Laptop C...', 'similarity_score': 0.73, 'distance': 0.27, ...},
]
# Rank by similarity (descending)
ranked = rank_documents_by_similarity(documents)
# Output: [Laptop B (0.92), Laptop A (0.85), Laptop C (0.73)]
Example 2: Filter by threshold then rank
from components.ranker import filter_by_similarity_threshold, rank_documents_by_similarity
# Input: 15 documents from 3 collections (5 each)
documents = retrieve_from_chromadb("gaming laptop", collections=["catalog", "faq", "troubleshooting"])
# Filter to only high-quality matches (similarity > 0.7)
high_quality = filter_by_similarity_threshold(documents, threshold=0.7)
# Reduced from 15 to 8 documents
# Rank the high-quality matches
ranked = rank_documents_by_similarity(high_quality)
# Output: 8 documents sorted by similarity, all > 0.7
Example 3: Combined ranking and grading workflow
from components.ranker import rank_documents_by_similarity, get_top_k_documents
from components.grader import grade_documents, filter_relevant_documents
# Step 1: Retrieve documents (done by Retrieval Agent)
retrieved_docs = await retrieval_agent.retrieve_documents("best laptop for video editing", top_k=5)
# Retrieved 15 documents (5 per collection)
# Step 2: Rank by similarity score
ranked_docs = rank_documents_by_similarity(retrieved_docs)
# Step 3: Take top 10 for grading (reduce cost)
top_docs = get_top_k_documents(ranked_docs, k=10)
# Step 4: Grade for binary relevance
graded_docs = grade_documents("best laptop for video editing", top_docs)
relevant_docs = filter_relevant_documents(graded_docs)
# Output: Only the most relevant documents (high similarity + graded as relevant)
Example 4: Ranking within collections
from components.ranker import rank_by_collection
# Input: Mixed documents from multiple collections
documents = retrieve_from_chromadb("laptop warranty", collections=["catalog", "faq"])
# Rank within each collection, then combine
ranked = rank_by_collection(documents)
# Output: {
# 'catalog': [doc1 (0.88), doc2 (0.75), doc3 (0.62)],
# 'faq': [doc4 (0.95), doc5 (0.91), doc6 (0.84)]
# }
# Use this to prioritize certain collections or balance results
Distance vs Similarity Score
When to use each:
-
Similarity Score: Easier to understand, use for thresholds and display
- "Keep documents with similarity > 0.7"
- "Top document has 92% similarity"
-
Distance: Direct from vector search, use for debugging
- "ChromaDB returned distance of 0.08"
- "Check if distance < 0.3 for high confidence"
Conversion:
similarity_score = 1 - distance
distance = 1 - similarity_score
Typical thresholds:
- similarity_score > 0.8: Very relevant
- similarity_score > 0.7: Relevant
- similarity_score > 0.5: Possibly relevant
- similarity_score < 0.5: Likely not relevant
Integration with Grading
Ranking and grading serve different purposes:
-
Ranking: Sorts documents by similarity score (continuous 0-1)
- Fast, cheap (no API calls)
- Based on vector similarity alone
- Use for initial filtering and prioritization
-
Grading: Binary relevance with reasoning (yes/no)
- Slower, costs tokens (Claude API)
- Semantic understanding of relevance
- Use for final filtering before generation
Recommended workflow:
- Retrieve documents (Retrieval Agent)
- Rank by similarity (Ranking skill)
- Take top-k to reduce grading cost
- Grade for binary relevance (Grading skill)
- Generate answer from relevant docs (Generator Agent)
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?