Sourcerer MCP

Sourcerer MCP

Semantic code search & navigation MCP server for efficient AI agent context retrieval.

95
Stars
11
Forks
95
Watchers
0
Issues
Sourcerer MCP provides a Model Context Protocol (MCP) server that enables AI agents to perform semantic code search and navigation. By indexing codebases at the function, class, and chunk level, it allows agents to retrieve only the necessary code snippets, greatly reducing token consumption. The tool integrates with Tree-sitter for language parsing and OpenAI for generating code embeddings, supporting advanced contextual code understanding without full file ingestion.

Key Features

Semantic code search using vector embeddings
Granular code chunking via AST parsing
Direct retrieval of functions, classes, and code segments by ID
Integration with Tree-sitter for language support
Persistent vector storage with chromem-go
Automatic re-indexing on file change
Supports multiple programming languages
Respects .gitignore configuration
Environment variable and config-based setup
API endpoints for search and code chunk retrieval

Use Cases

Assisting AI agents in retrieving only relevant code contexts
Reducing token wastage by avoiding full file reads
Powering AI-driven code navigation and understanding tools
Automated semantic search within large codebases
Supporting conversational coding assistants
Facilitating code similarity and clone detection
Streamlining onboarding by exposing conceptual code views
Providing language-agnostic code search for development environments
Efficient workspace indexing for continuous integration scenarios
Automating identification of code changes and relevant impacts

README

Sourcerer MCP 🧙

An MCP server for semantic code search & navigation that helps AI agents work efficiently without burning through costly tokens. Instead of reading entire files, agents can search conceptually and jump directly to the specific functions, classes, and code chunks they need.

Demo

asciicast

Requirements

  • OpenAI API Key: Required for generating embeddings (local embedding support planned)
  • Git: Must be a git repository (respects .gitignore files)
  • Add .sourcerer/ to .gitignore: This directory stores the embedded vector database

Installation

Go

shell
go install github.com/st3v3nmw/sourcerer-mcp/cmd/sourcerer@latest

Homebrew

shell
brew tap st3v3nmw/tap
brew install st3v3nmw/tap/sourcerer

Configuration

Claude Code

shell
claude mcp add sourcerer -e OPENAI_API_KEY=your-openai-api-key -e SOURCERER_WORKSPACE_ROOT=$(pwd) -- sourcerer

mcp.json

json
{
  "mcpServers": {
    "sourcerer": {
      "command": "sourcerer",
      "env": {
        "OPENAI_API_KEY": "your-openai-api-key",
        "SOURCERER_WORKSPACE_ROOT": "/path/to/your/project"
      }
    }
  }
}

How it Works

Sourcerer 🧙 builds a semantic search index of your codebase:

1. Code Parsing & Chunking

  • Uses Tree-sitter to parse source files into ASTs
  • Extracts meaningful chunks (functions, classes, methods, types) with stable IDs
  • Each chunk includes source code, location info, and contextual summaries
  • Chunk IDs follow the format: file.ext::Type::method

2. File System Integration

  • Watches for file changes using fsnotify
  • Respects .gitignore files via git check-ignore
  • Automatically re-indexes changed files
  • Stores metadata to track modification times

3. Vector Database

  • Uses chromem-go for persistent vector storage in .sourcerer/db/
  • Generates embeddings via OpenAI's API for semantic similarity
  • Enables conceptual search rather than just text matching
  • Maintains chunks, their embeddings, and metadata

4. MCP Tools

  • semantic_search: Find relevant code using semantic search
  • get_chunk_code: Retrieve specific chunks by ID
  • find_similar_chunks: Find similar chunks
  • index_workspace: Manually trigger re-indexing
  • get_index_status: Check indexing progress

This approach allows AI agents to find relevant code without reading entire files, dramatically reducing token usage and cognitive load.

Supported Languages

Language support requires writing Tree-sitter queries to identify functions, classes, interfaces, and other code structures for each language.

Supported: Go, JavaScript, Markdown, Python, TypeScript

Planned: C, C++, Java, Ruby, Rust, and others

Contributing

All contributions welcome! See CONTRIBUTING.md.

$ ls @stephenmwangi.com
- gh:st3v3nmw/obsidian-spaced-repetition
- gh:st3v3nmw/lsfr

Star History

Star History Chart

Repository Owner

st3v3nmw
st3v3nmw

User

Repository Details

Language Go
Default Branch main
Size 137 KB
Contributors 1
License MIT License
MCP Verified Nov 12, 2025

Programming Languages

Go
100%

Tags

Topics

claude-code code-analysis code-navigation code-search mcp mcp-server model-context-protocol semantic-search

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

We respect your privacy. Unsubscribe at any time.

Related MCPs

Discover similar Model Context Protocol servers

  • MCP Local RAG

    MCP Local RAG

    Privacy-first local semantic document search server for MCP clients.

    MCP Local RAG is a privacy-preserving, local document search server designed for use with Model Context Protocol (MCP) clients such as Cursor, Codex, and Claude Code. It enables users to ingest and semantically search local documents without using external APIs or cloud services. All processing, including embedding generation and vector storage, is performed on the user's machine. The tool supports document ingestion, semantic search, file management, file deletion, and system status reporting through MCP.

    • 10
    • MCP
    • shinpr/mcp-local-rag
  • Code Declaration Lookup MCP Server

    Code Declaration Lookup MCP Server

    Fast, language-agnostic code declaration search and lookup server via MCP.

    Provides a Model Context Protocol (MCP) server that indexes code declarations using universal ctags and SQLite with FTS5 full-text search. Offers search and listing functionality for functions, classes, structures, enums, and other code elements across any language supported by ctags. Enables seamless integration with coding agents for dynamic indexing, respects .gitignore, and supports ctags file ingestion and management.

    • 2
    • MCP
    • osinmv/function-lookup-mcp
  • Exa MCP Server

    Exa MCP Server

    Fast, efficient web and code context for AI coding assistants.

    Exa MCP Server provides a Model Context Protocol (MCP) server interface that connects AI assistants to Exa AI’s powerful search capabilities, including code, documentation, and web search. It enables coding agents to retrieve precise, token-efficient context from billions of sources such as GitHub, StackOverflow, and documentation sites, reducing hallucinations in coding agents. The platform supports integration with popular tools like Cursor, Claude, and VS Code through standardized MCP configuration, offering configurable access to various research and code-related tools via HTTP.

    • 3,224
    • MCP
    • exa-labs/exa-mcp-server
  • In Memoria

    In Memoria

    Persistent memory and instant context for AI coding assistants, integrated via MCP.

    In Memoria is an MCP server that enables AI coding assistants such as Claude or Copilot to retain, recall, and provide context about codebases across sessions. It learns patterns, architecture, and conventions from user code, offering persistent intelligence that eliminates repetitive explanations and generic suggestions. Through the Model Context Protocol, it allows AI tools to perform semantic search, smart file routing, and track project-specific decisions efficiently.

    • 94
    • MCP
    • pi22by7/In-Memoria
  • Mem0 MCP Server

    Mem0 MCP Server

    Structured management of coding preferences using Mem0 and Model Context Protocol.

    Mem0 MCP Server implements a Model Context Protocol-compliant server for storing, retrieving, and searching coding preferences. It integrates with Mem0 and offers tools for persistent management of code snippets, best practices, and technical documentation. The server exposes an SSE endpoint for clients like Cursor, enabling seamless access and interaction with coding context data.

    • 506
    • MCP
    • mem0ai/mem0-mcp
  • Driflyte MCP Server

    Driflyte MCP Server

    Bridging AI assistants with deep, topic-aware knowledge from web and code sources.

    Driflyte MCP Server acts as a bridge between AI-powered assistants and diverse, topic-aware content sources by exposing a Model Context Protocol (MCP) server. It enables retrieval-augmented generation workflows by crawling, indexing, and serving topic-specific documents from web pages and GitHub repositories. The system is extensible, with planned support for additional knowledge sources, and is designed for easy integration with popular AI tools such as ChatGPT, Claude, and VS Code.

    • 9
    • MCP
    • serkan-ozal/driflyte-mcp-server
  • Didn't find tool you were looking for?

    Be as detailed as possible for better results