Agent skills
document-rag-pipeline-build-kn...

Agent skill

document-rag-pipeline-build-knowledge-base

Sub-skill of document-rag-pipeline: Build Knowledge Base (+2).

Stars 4

Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/documents/document-rag-pipeline/build-knowledge-base

SKILL.md

Build Knowledge Base (+2)

Build Knowledge Base

bash

# Full pipeline with OCR and embeddings
python build_knowledge_base.py /path/to/documents --embed

# Skip OCR (faster, text PDFs only)
python build_knowledge_base.py /path/to/documents --no-ocr --embed

# Just build inventory (no extraction)
python build_knowledge_base.py /path/to/documents

Search Documents

bash

# Semantic search
python build_knowledge_base.py /path/to/documents --search "subsea wellhead design"

# More results
python build_knowledge_base.py /path/to/documents --search "fatigue analysis" --top-k 20

Quick Search Script

bash

#!/bin/bash
# search_docs.sh - Quick semantic search

DB_PATH="${1:-/path/to/_inventory.db}"
QUERY="$2"

CUDA_VISIBLE_DEVICES="" python3 -c "
import sqlite3, pickle, numpy as np
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
query_emb = model.encode('$QUERY', normalize_embeddings=True)

conn = sqlite3.connect('$DB_PATH')
cursor = conn.cursor()
cursor.execute('''
    SELECT tc.chunk_text, tc.embedding, d.filename
    FROM text_chunks tc
    JOIN documents d ON tc.document_id = d.id
    WHERE tc.embedding IS NOT NULL
    ORDER BY RANDOM() LIMIT 50000
''')

results = []
for text, emb_blob, filename in cursor.fetchall():
    emb = pickle.loads(emb_blob)
    sim = float(np.dot(query_emb, emb))
    results.append((sim, filename, text[:200]))

for score, fname, text in sorted(results, reverse=True)[:10]:
    print(f'[{score:.3f}] {fname}')
    print(f'  {text}...\n')
"

Maintainer

vamseeachanta Core maintainer

Source details

Full Name: vamseeachanta/workspace-hub
Branch: main
Path in repo: .claude/skills/_archive/data/documents/document-rag-pipeline/build-knowledge-base

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

vamseeachanta/workspace-hub

gsd-complete-milestone

Archive completed milestone and prepare for next version

4 4

Explore

vamseeachanta/workspace-hub

gsd-reapply-patches

Reapply local modifications after a GSD update

4 4

Explore

vamseeachanta/workspace-hub

gsd-verify-work

Validate built features through conversational UAT

4 4

Explore

vamseeachanta/workspace-hub

gsd-thread

Manage persistent context threads for cross-session work

4 4

Explore

vamseeachanta/workspace-hub

clinical-trial-protocol

Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.

4 4

Explore

vamseeachanta/workspace-hub

single-cell-rna-qc

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.

4 4

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Build Knowledge Base (+2)

Build Knowledge Base

Search Documents

Quick Search Script

Recommended Agent Skills

gsd-complete-milestone

gsd-reapply-patches

gsd-verify-work

gsd-thread

clinical-trial-protocol

single-cell-rna-qc