Agent skill
searching-codebases
Find code by regex pattern or natural language concept in any codebase. Auto-routes between n-gram indexed regex search (2-20x faster than ripgrep) and TF-IDF semantic search. Expands results to full functions via AST maps. Accepts GitHub URLs, local directories, uploaded files/archives, or project knowledge. Use when asked to find implementations, search for patterns, explore unfamiliar repos, or answer "where is X" / "how does Y work" about code. Triggers on "search this repo", "find where X is", "grep for", "what handles Y", regex patterns, or natural-language questions about code.
Install this agent skill to your Project
npx add-skill https://github.com/oaustegard/claude-skills/tree/main/searching-codebases
Metadata
Additional technical details for this skill
- version
- 1.0.0
SKILL.md
Searching Codebases
Find code in any codebase by pattern or concept. One entry point, two search strategies, automatic routing.
Prerequisites
uv tool install ripgrep
Tree-sitter (for structural maps) installs automatically when needed.
Primary Command
SKILL_DIR=/mnt/skills/user/searching-codebases
python3 $SKILL_DIR/scripts/search.py SOURCE "query1" ["query2" ...] [OPTIONS]
SOURCE is any of:
- Local directory path
- GitHub URL (downloads tarball automatically)
uploads(uses/mnt/user-data/uploads/)project(uses/mnt/project/)- Path to a
.zipor.tar.gzarchive
Search Modes
Regex mode (patterns, identifiers, literal text):
python3 $SKILL_DIR/scripts/search.py ./repo "def handle_error"
python3 $SKILL_DIR/scripts/search.py ./repo "class.*Exception" --regex
python3 $SKILL_DIR/scripts/search.py ./repo "TODO|FIXME|HACK"
Semantic mode (concepts, natural language):
python3 $SKILL_DIR/scripts/search.py ./repo "retry logic with backoff" --semantic
python3 $SKILL_DIR/scripts/search.py ./repo "authentication flow"
python3 $SKILL_DIR/scripts/search.py ./repo "error handling strategy"
Auto-detection: short queries and code-like tokens → regex. Multi-word
natural language → semantic. Override with --regex or --semantic.
Options
--regex/--semantic: Force search mode--expand: Return full function bodies instead of matching lines--map-only: Generate structural maps only (delegates to mapping-codebases)--benchmark: Compare indexed regex vs brute-force ripgrep--branch NAME: Git branch for GitHub URLs (default: main)--skip DIRS: Comma-separated directories to skip--json: Machine-readable output-v: Show index stats and query routing decisions
How It Works
Regex search builds a sparse n-gram inverted index over all files. Queries are decomposed into literal fragments, looked up in the index to identify candidate files (typically 90-99% reduction), then verified with ripgrep. Frequency-weighted n-grams make rare character sequences more selective.
Semantic search builds a TF-IDF index over code chunks (functions, classes, map entries). Queries are ranked by cosine similarity. Structural maps from mapping-codebases enrich the index when available.
Context expansion (--expand) uses _MAP.md files to identify function
boundaries, returning complete structural units rather than line fragments.
Small codebases (< 20 files) skip indexing entirely — direct ripgrep is faster when there's nothing to narrow.
Mixed Queries
Multiple queries can use different modes in a single invocation. Each query is auto-routed independently, and indexes are built once per mode:
python3 $SKILL_DIR/scripts/search.py ./repo \
"class.*Error" \
"error recovery strategy" \
"def retry"
Dependencies
- mapping-codebases: Generates
_MAP.mdfiles for context expansion and TF-IDF enrichment. Not required — search works without maps, just with less context in results. - ripgrep: Required for regex verification. Install via
uv tool install ripgrep. - scikit-learn: Required for semantic mode. Installs automatically.
When NOT to Use
- Repos under ~10 files: just read them directly
- Exact identifier known:
rg "identifier" /pathis simpler - Need AST-precise extraction (complete function bodies via tree-sitter):
use exploring-codebases with
--expand-fullinstead
Files
scripts/search.py— Entry point, query routing, output formattingscripts/resolve.py— Input source resolution (GitHub, uploads, archives)scripts/context.py— AST/map-based context expansionscripts/ngram_index.py— Sparse n-gram inverted index, regex decompositionscripts/sparse_ngrams.py— Core n-gram algorithms, frequency weightsscripts/code_rag.py— TF-IDF semantic search over code chunks
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
hello-demo
Delivers a static Hello World HTML demo page with bookmarklet. Use when user requests the hello demo, hello world demo, or demo page.
installing-skills
Install skills from github.com/oaustegard/claude-skills into /mnt/skills/user. Use when user mentions "install skills", "load skills", "add skills", "update skills", "refresh skills", or references a skill not currently installed.
extracting-keywords
Extract keywords from documents using YAKE algorithm with support for 34 languages (Arabic to Chinese). Use when users request keyword extraction, key terms, topic identification, content summarization, or document analysis. Includes domain-specific stopwords for AI/ML and life sciences. Optional deeper extraction mode (n=2+n=3 combined) for comprehensive coverage.
remembering
Advanced memory operations reference. Basic patterns (profile loading, simple recall/remember) are in project instructions. Consult this skill for background writes, memory versioning, complex queries, edge cases, session scoping, retention management, type-safe results, proactive memory hints, GitHub access detection, autonomous curation, episodic scoring, and decision traces.
orchestrating-agents
Orchestrates parallel API instances, delegated sub-tasks, and multi-agent workflows with streaming and tool-enabled delegation patterns. Use for parallel analysis, multi-perspective reviews, or complex task decomposition.
check-tools
Validates development tool installations across Python, Node.js, Java, Go, Rust, C/C++, Git, and system utilities. Use when verifying environments or troubleshooting dependencies.
Didn't find tool you were looking for?