Agent skill

gpu-document-processing

Use when processing large PDFs, document collections, or bulk text extraction tasks that benefit from GPU-accelerated processing. Triggers when the user provides large documents or needs bulk document analysis.

Stars 18,556
Forks 2,584

Install this agent skill to your Project

npx add-skill https://github.com/langchain-ai/deepagents/tree/main/examples/nvidia_deep_agent/skills/gpu-document-processing

SKILL.md

GPU Document Processing Skill

Process large documents and document collections using GPU-accelerated tools. This skill uses the sandbox-as-tool pattern: the agent runs on CPU for reasoning, and sends document processing work to a GPU-equipped environment.

When to Use This Skill

Use this skill when:

  • Processing large PDF files (50+ pages)
  • Analyzing collections of documents (10+ files)
  • Extracting structured data from unstructured documents
  • Performing bulk text extraction and chunking
  • Generating embeddings for large document sets
  • The user uploads or references large documents for analysis

Architecture: Sandbox as Tool

This skill follows the sandbox-as-tool pattern for GPU execution:

  1. Agent reasons on CPU - planning, synthesis, report writing
  2. Processing sent to GPU sandbox - document parsing, embedding, extraction
  3. Results returned to agent - structured output for further analysis

This separation ensures:

  • API keys stay outside the sandbox (security)
  • Agent state persists independently of processing jobs
  • Processing can be parallelized across documents
  • Cost-efficient: GPU used only during processing, not during reasoning

Capabilities

PDF Text Extraction

Extract text content from PDF documents with layout preservation:

  • Headers, paragraphs, lists, and tables detected separately
  • Page numbers and section boundaries preserved
  • Multi-column layout handling

Tabular Data Extraction

Extract tables from documents into structured formats:

  • PDF tables to CSV/DataFrames using GPU-accelerated parsing
  • Automatic column type detection
  • Handles merged cells and multi-row headers

Document Chunking

Split large documents into meaningful chunks for analysis:

  • Semantic chunking (by topic/section boundaries)
  • Fixed-size chunking with overlap for embedding
  • Configurable chunk sizes (default: 512 tokens)

Embedding Generation

Generate vector embeddings for document chunks:

  • Uses NVIDIA NeMo Retriever NIM for GPU-accelerated embedding
  • Supports batch processing for large document sets
  • Compatible with standard vector stores (Milvus, ChromaDB)

Workflow

  1. Receive document reference from the orchestrator
  2. Determine processing type (extraction, analysis, embedding)
  3. Send to GPU sandbox for processing
  4. Collect structured results (text, tables, embeddings)
  5. Write findings to /shared/ for the orchestrator to synthesize

Processing Large Document Collections

For multiple documents:

  1. Process documents in parallel batches (3-5 concurrent)
  2. Extract key metadata first (title, date, author, page count)
  3. Generate per-document summaries
  4. Cross-reference findings across documents
  5. Write consolidated findings with per-document citations

Output Format

When reporting document processing results:

  • Include document metadata (filename, pages, size)
  • Structure extracted content by section/chapter
  • Format tables as markdown tables
  • Include page references for all extracted content
  • Note any extraction quality issues (scanned images, corrupted pages)

Integration with NVIDIA NIM

For production deployments, GPU document processing can leverage:

  • NVIDIA NeMo Retriever: GPU-accelerated embedding and retrieval
  • NVIDIA RAPIDS cuDF: Tabular data processing from extracted tables
  • NVIDIA Triton: Scalable inference for document classification models

See NVIDIA's NIM documentation for self-hosted deployment options.

Expand your agent's capabilities with these related and highly-rated skills.

langchain-ai/deepagents

cuml-machine-learning

Use for GPU-accelerated machine learning on tabular data using NVIDIA cuML. Triggers when tasks involve classification, regression, clustering, dimensionality reduction, or model training on datasets.

18,556 2,584
Explore
langchain-ai/deepagents

cudf-analytics

Use for GPU-accelerated data analysis on datasets, CSVs, or tabular data using NVIDIA cuDF. Triggers when tasks involve groupby aggregations, statistical summaries, anomaly detection, or large-scale data profiling.

18,556 2,584
Explore
langchain-ai/deepagents

data-visualization

Use for creating publication-quality charts and multi-panel analysis summaries. Triggers when tasks involve visualizing data, plotting results, creating charts, or producing visual reports from analysis output.

18,556 2,584
Explore
langchain-ai/deepagents

schema-exploration

Lists tables, describes columns and data types, identifies foreign key relationships, and maps entity relationships in a database. Use when the user asks about database schema, table structure, column types, what tables exist, ERD, foreign keys, or how entities relate.

18,556 2,584
Explore
langchain-ai/deepagents

query-writing

Writes and executes SQL queries from simple SELECTs to complex multi-table JOINs, aggregations, and subqueries. Use when the user asks to query a database, write SQL, run a SELECT statement, retrieve data, filter records, or generate reports from database tables.

18,556 2,584
Explore
langchain-ai/deepagents

social-media

Drafts engaging social media posts, writes hooks, suggests hashtags, creates thread structures, and generates companion images. Use when the user asks to write a LinkedIn post, tweet, Twitter/X thread, social media caption, social post, or repurpose content for social platforms.

18,556 2,584
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results