Agent skill
rag
Implements document chunking, embedding generation, vector storage, and retrieval pipelines for Retrieval-Augmented Generation systems. Use when building RAG applications, creating document Q&A systems, or integrating AI with knowledge bases.
Install this agent skill to your Project
npx add-skill https://github.com/giuseppe-trisciuoglio/developer-kit/tree/main/plugins/developer-kit-ai/skills/rag
SKILL.md
RAG Implementation
Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.
Overview
This skill covers: document processing, embedding generation, vector storage, retrieval configuration, and RAG pipeline implementation.
When to Use
- Building Q&A systems over proprietary documents
- Creating chatbots with factual information from knowledge bases
- Implementing semantic search with natural language queries
- Reducing hallucinations with grounded, sourced responses
- Building documentation assistants and research tools
- Enabling AI systems to access domain-specific knowledge
Instructions
Step 1: Choose Vector Database
Select based on your requirements:
| Requirement | Recommended |
|---|---|
| Production scalability | Pinecone, Milvus |
| Open-source | Weaviate, Qdrant |
| Local development | Chroma, FAISS |
| Hybrid search | Weaviate with BM25 |
Step 2: Select Embedding Model
| Use Case | Model |
|---|---|
| General purpose | text-embedding-ada-002 |
| Fast and lightweight | all-MiniLM-L6-v2 |
| Multilingual | e5-large-v2 |
| Best performance | bge-large-en-v1.5 |
Step 3: Implement Document Processing Pipeline
- Load documents from source (file system, database, API)
- Clean and preprocess (remove formatting, normalize text)
- Split documents into chunks with appropriate strategy
- Generate embeddings for each chunk
- Store embeddings in vector database with metadata
Validation: Verify embeddings were generated successfully:
List<Embedding> embeddings = embeddingModel.embedAll(segments);
if (embeddings.isEmpty() || embeddings.get(0).dimension() != expectedDim) {
throw new IllegalStateException("Embedding generation failed");
}
Step 4: Configure Retrieval Strategy
Choose the appropriate strategy:
- Dense Retrieval: Semantic similarity via embeddings (default for most cases)
- Hybrid Search: Dense + sparse retrieval for better coverage
- Metadata Filtering: Filter by document attributes
- Reranking: Cross-encoder reranking for high-precision requirements
Step 5: Build RAG Pipeline
- Create content retriever with your embedding store
- Configure AI service with retriever and chat memory
- Implement prompt template with context injection
- Add response validation and grounding checks
Validation: Test with known queries to verify context injection works correctly.
Error Handling: For batch ingestion, wrap in retry logic:
for (Document doc : documents) {
int attempts = 0;
while (attempts < 3) {
try {
store.add(embeddingModel.embed(doc).content(), doc.toTextSegment());
break;
} catch (EmbeddingException e) {
attempts++;
if (attempts == 3) throw new RuntimeException("Failed after 3 retries", e);
}
}
}
Step 6: Evaluate and Optimize
- Measure retrieval metrics: precision@k, recall@k, MRR
- Evaluate answer quality: faithfulness, relevance
- Monitor performance and user feedback
- Iterate on chunking, retrieval, and prompt parameters
Examples
Example 1: Basic Document Q&A
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs");
InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);
DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
.chatModel(chatModel)
.contentRetriever(EmbeddingStoreContentRetriever.from(store))
.build();
String answer = assistant.answer("What is the company policy on remote work?");
Example 2: Metadata-Filtered Retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.maxResults(5)
.minScore(0.7)
.filter(metadataKey("category").isEqualTo("technical"))
.build();
Example 3: Multi-Source RAG Pipeline
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore);
List<Content> results = new ArrayList<>();
results.addAll(webRetriever.retrieve(query));
results.addAll(docRetriever.retrieve(query));
List<Content> topResults = reranker.reorder(query, results).subList(0, 5);
Example 4: RAG with Chat Memory
Assistant assistant = AiServices.builder(Assistant.class)
.chatModel(chatModel)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(retriever)
.build();
assistant.chat("Tell me about the product features");
assistant.chat("What about pricing for those features?"); // Maintains context
Best Practices
Document Preparation
- Clean documents before ingestion; remove irrelevant content and formatting
- Add relevant metadata for filtering and context
Chunking Strategy
- Use 500-1000 tokens per chunk for optimal balance
- Include 10-20% overlap to preserve context at boundaries
- Test different sizes for your specific use case
Retrieval Optimization
- Start with high k values (10-20), then filter/rerank
- Use metadata filtering to improve relevance
- Monitor retrieval quality and iterate based on user feedback
Performance
- Cache embeddings for frequently accessed content
- Use batch processing for document ingestion
- Optimize vector store indexing for your scale
Constraints and Warnings
System Constraints
- Embedding models have maximum token limits per document
- Vector databases require proper indexing for performance
- Chunk boundaries may lose context for complex documents
- Hybrid search requires additional infrastructure
Quality Warnings
- Retrieval quality depends heavily on chunking strategy
- Embedding models may not capture domain-specific semantics
- Metadata filtering requires proper document annotation
- Reranking adds latency to query responses
Security Warnings
- Never hardcode credentials: Use environment variables for API keys and passwords
- Validate external content: Documents from file systems, APIs, or web sources may contain malicious content (prompt injection)
- Apply content filtering on retrieved documents before passing to LLM
- Restrict allowed data source URLs and file paths using allowlists
Resources
Reference Documentation
- Vector Database Comparison
- Embedding Models Guide
- Retrieval Strategies
- Document Chunking
- LangChain4j RAG Guide
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
aws-cli-beast
Provides advanced AWS CLI patterns for managing EC2, Lambda, S3, DynamoDB, RDS, VPC, IAM, and CloudWatch. Generates bulk operation scripts, automates cross-service workflows, validates security configurations, and executes JMESPath queries for complex filtering. Triggers on "aws cli help", "aws command line", "aws scripting", "aws automation", "aws batch operations", "aws bulk operations", "aws cli pagination", "aws multi-region", "aws profiles", "aws cli troubleshooting".
aws-cost-optimization
Provides structured AWS cost optimization guidance using five pillars (right-sizing, elasticity, pricing models, storage optimization, monitoring) and twelve actionable best practices with executable AWS CLI examples. Use when optimizing AWS costs, reviewing AWS spending, finding unused AWS resources, implementing FinOps practices, reducing EC2/EBS/S3 bills, configuring AWS Budgets, or performing AWS Well-Architected cost reviews.
aws-sam-bootstrap
Provides AWS SAM bootstrap patterns: generates `template.yaml` and `samconfig.toml` for new projects via `sam init`, creates SAM templates for existing Lambda/CloudFormation code migration, validates build/package/deploy workflows, and configures local testing with `sam local invoke`. Use when the user asks about SAM projects, `sam init`, `sam deploy`, serverless deployments, or needs to bootstrap/migrate Lambda functions with SAM templates.
aws-drawio-architecture-diagrams
Creates professional AWS architecture diagrams in draw.io XML format (.drawio files) using official AWS Architecture Icons (aws4 library). Use when the user asks for AWS diagrams, VPC layouts, multi-tier architectures, serverless designs, network topology, or draw.io exports involving Lambda, EC2, RDS, or other AWS services.
aws-cloudformation-bedrock
Provides AWS CloudFormation patterns for Amazon Bedrock resources including agents, knowledge bases, data sources, guardrails, prompts, flows, and inference profiles. Use when creating Bedrock agents with action groups, implementing RAG with knowledge bases, configuring vector stores, setting up content moderation guardrails, managing prompts, orchestrating workflows with flows, and configuring inference profiles for model optimization.
aws-cloudformation-s3
Provides AWS CloudFormation patterns for Amazon S3. Use when creating S3 buckets, policies, versioning, lifecycle rules, and implementing template structure with Parameters, Outputs, Mappings, Conditions, and cross-stack references.
Didn't find tool you were looking for?