Agent skill
analyzing-codebases
Generates LLM-optimized code context with function call graphs, side effect detection, and incremental updates. Processes JavaScript/TypeScript codebases to create compact semantic representations including multi-level summaries, entry point identification, and hash-based change tracking. Provides 74-97% token reduction compared to reading raw source files. Useful for understanding code architecture, debugging complex systems, reviewing pull requests, and onboarding to unfamiliar projects.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/analyzing-codebases
SKILL.md
LLM Context Tools
Generate compact, semantically-rich code context for LLM consumption with 99%+ faster incremental updates.
⚠️ CRITICAL BEHAVIOR CHANGE REQUIRED
BEFORE using Grep, Bash, or Read for code exploration:
- ALWAYS check if
.llm-context/directory exists - IF IT EXISTS, use queries FIRST:
bash
llm-context query find-function <name> llm-context query calls-to <name> llm-context side-effects - ONLY use grep/read if queries don't provide needed info
This is NOT optional - queries provide richer context (call graphs, side effects, patterns) that grep cannot match.
What This Skill Does
Transforms raw source code into LLM-optimized context:
- Function call graphs with side effect detection
- Multi-level summaries (System → Domain → Module)
- Incremental updates (only re-analyze changed files)
- Hash-based change tracking
- Query interface for instant lookups
Token Efficiency: 74-97% reduction vs reading raw files
When To Use
✅ User asks to "analyze this codebase" ✅ User wants to understand code architecture ✅ User needs help debugging or refactoring ✅ You need efficient context about a large codebase ✅ User wants LLM-friendly code documentation
Quick Start
# Check if tool is available
llm-context version
# If not available, user needs to install
# See setup.md for installation
# Run analysis
llm-context analyze
# Query results
llm-context stats
How It Works
Progressive Disclosure Strategy
Read in this order for maximum token efficiency:
-
L0 (200 tokens) →
.llm-context/summaries/L0-system.md- Architecture overview, entry points, statistics
-
L1 (50-100 tokens/domain) →
.llm-context/summaries/L1-domains.json- Domain boundaries, module lists
-
L2 (20-50 tokens/module) →
.llm-context/summaries/L2-modules.json- File-level exports, entry points
-
Graph (variable) →
.llm-context/graph.jsonl- Function details, call relationships, side effects
-
Source (as needed) → Read targeted files only
Never read raw source files first! Use summaries and graph for context.
Common Commands
# Analysis
llm-context analyze # Auto-detect full/incremental
llm-context check-changes # Preview what changed
# Queries
llm-context stats # Show statistics
llm-context entry-points # Find entry points
llm-context side-effects # Functions with side effects
llm-context query calls-to func # Who calls this?
llm-context query trace func # Call tree
Usage Patterns
Pattern 1: First-Time Codebase Understanding
# 1. Run analysis
llm-context analyze
# 2. Read L0 for overview
cat .llm-context/summaries/L0-system.md
# 3. Get statistics
llm-context stats
# 4. Find entry points
llm-context entry-points
Response template:
I've analyzed the codebase:
[L0 content - architecture, components, entry points]
Statistics: X functions, Y files, Z calls
Would you like me to:
1. Explain a specific domain?
2. Trace a function's call path?
3. Review the architecture?
Pattern 2: After Code Changes
# Incremental analysis (only changed files)
llm-context analyze
# See what changed
cat .llm-context/manifest.json
Response: Highlight new/modified functions and their impact
Pattern 3: Debugging
# Find function
llm-context query find-function buggyFunc
# Trace calls
llm-context query trace buggyFunc
# Check side effects
llm-context side-effects | grep buggy
Response: Explain call path and identify potential issues based on side effects
Detailed Guides
Setup & Installation: See setup.md Usage Examples: See examples.md Command Reference: See reference.md Common Workflows: See workflows.md
Side Effect Types
When analyzing functions, these effects are detected:
file_io- Reads/writes filesnetwork- HTTP, fetch, API callsdatabase- DB queries, ORMlogging- Console, loggerdom- Browser DOM manipulation
Graph Format
Each function in graph.jsonl:
{
"id": "functionName",
"file": "path/file.js",
"line": 42,
"calls": ["foo", "bar"],
"effects": ["database", "network"]
}
Best Practices
✅ REQUIRED WORKFLOW
When .llm-context/ exists, you MUST:
-
Check for analysis data FIRST:
bashls .llm-context/ # Verify data exists -
Read progressive disclosure hierarchy:
- L0 summary:
.llm-context/summaries/L0-system.md - L1 domains:
.llm-context/summaries/L1-domains.json - L2 modules:
.llm-context/summaries/L2-modules.json
- L0 summary:
-
Use queries for exploration (NOT grep/bash):
bash# Finding functions llm-context query find-function <name> # Understanding dependencies llm-context query calls-to <name> llm-context query trace <name> # Side effect analysis llm-context side-effects | grep <keyword> -
Only read source after queries don't provide needed info
-
Run incremental analysis after code changes:
bashllm-context analyze # 99% faster than full re-analysis
❌ ANTI-PATTERNS (DO NOT DO THESE)
These waste tokens and miss critical context:
- ❌ Using
grep -r "pattern"when.llm-context/exists → Use queries instead - ❌ Using
Bashto explore code → Use queries instead - ❌ Reading raw source files first → Read summaries first
- ❌ Re-reading entire codebase on changes → Use incremental analysis
- ❌ Skipping L0/L1/L2 summaries → Miss architectural overview
Remember: Grep shows text matches. Queries show semantic relationships, call graphs, and side effects.
Token Efficiency
Traditional approach:
- Read 10 files = 10,000 tokens
- Missing: call graphs, side effects
LLM-context approach:
- L0 + L1 + Graph = 500-2,000 tokens
- Includes: complete context + relationships
Savings: 80-95%
Performance
Initial Analysis
- 100 files: 2-5s
- 1,000 files: 30s-2min
- 10,000 files: 5-15min
Incremental Updates
- 1 file: 30-50ms
- 10 files: 200-500ms
- 50 files: 1-2s
Key: Incremental is 99%+ faster at scale
Troubleshooting
"No manifest found"
→ Run llm-context analyze first
"Cannot find module" → User needs to install: See setup.md
"Graph is empty" → No JavaScript files found. Check directory.
Success Criteria
This skill is working when:
- ✅ Analysis completes without errors
- ✅
.llm-context/exists with all files - ✅
llm-context statsshows functions - ✅ You use summaries before reading source
- ✅ Token usage is 50-95% less than raw reading
Summary
Transform from:
- ❌ Reading thousands of lines token-by-token
- ❌ Missing global context
- ❌ Slow re-analysis
To:
- ✅ Compact semantic representations
- ✅ Call graphs + side effects
- ✅ 99%+ faster incremental updates
- ✅ 50-95% token savings
Didn't find tool you were looking for?