Agent skill

document-rag-pipeline-architecture

Sub-skill of document-rag-pipeline: Architecture.

Stars 4
Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/documents/document-rag-pipeline/architecture

SKILL.md

Architecture

Architecture

Document Folder
      │
      ▼
┌─────────────────────┐
│ 1. Build Inventory  │  SQLite catalog of all files
└──────────┬──────────┘
           ▼
┌─────────────────────┐
│ 2. Extract Text     │  PyMuPDF for regular PDFs
└──────────┬──────────┘
           ▼
┌─────────────────────┐
│ 3. OCR Scanned PDFs │  Tesseract + pytesseract
└──────────┬──────────┘
           ▼
┌─────────────────────┐
│ 4. Chunk Text       │  1000 chars, 200 overlap
└──────────┬──────────┘
           ▼
┌─────────────────────┐
│ 5. Generate Embeds  │  sentence-transformers
└──────────┬──────────┘
           ▼
┌─────────────────────┐
│ 6. Semantic Search  │  Cosine similarity
└─────────────────────┘

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results