Agent skill
mechinterp-crossmodel-matcher
Match SAE features between Ultra (24K) and Full (2K) models based on activation patterns and token overlap
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/product/mechinterp-crossmodel-matcher
SKILL.md
MechInterp Cross-Model Matcher
Match features between the Ultra (24K features) and Full (2K features) SAE models to understand feature correspondence and discover monosemantic representations.
Purpose
The cross-model matcher skill:
- Finds corresponding features across models
- Computes similarity based on top token overlap
- Identifies features unique to each model
- Helps validate interpretations across model scales
When to Use
Use this skill when you:
- Have interpreted a feature in one model and want to find its counterpart
- Want to validate that a pattern exists across model scales
- Need to understand what the Ultra model decomposes that Full doesn't
Usage
Programmatic
python
from splatnlp.mechinterp.analysis import FeatureMatcher
from splatnlp.mechinterp.skill_helpers import load_context
# Load source context (the model with your known feature)
source_ctx = load_context("ultra")
# Initialize matcher (automatically loads target model)
matcher = FeatureMatcher(source_ctx)
# Find matches for an Ultra feature in the Full model
report = matcher.find_matches(
source_feature=18712,
n_candidates=500, # How many Full features to check
n_top_matches=10 # How many matches to return
)
# View results
print(f"Searched {report.n_candidates_tested} candidates")
print(f"Best correlation: {report.best_correlation:.3f}")
for match in report.matches:
print(f"\nFull feature {match.target_feature}:")
print(f" Token overlap: {match.top_token_overlap:.3f}")
print(f" Shared tokens: {match.shared_top_tokens[:5]}")
print(f" Notes: {match.notes}")
Detailed Comparison
python
# Compare two specific features in detail
comparison = matcher.compare_features(
source_fid=18712, # Ultra feature
target_fid=1024, # Full feature
)
print(f"Jaccard similarity: {comparison['jaccard_similarity']:.3f}")
print(f"Shared tokens: {comparison['shared_tokens'][:10]}")
print(f"Ultra-only tokens: {comparison['source_only_tokens'][:10]}")
print(f"Full-only tokens: {comparison['target_only_tokens'][:10]}")
Matching Metrics
Token Overlap (Jaccard Similarity)
Compares top tokens between features:
overlap = |source_top ∩ target_top| / |source_top ∪ target_top|
- > 0.3: Strong match - likely same underlying concept
- 0.1 - 0.3: Moderate match - related but not identical
- < 0.1: Weak match - probably different concepts
Interpretation
High overlap suggests:
- Features detect the same pattern
- Ultra feature may be a "refinement" of Full feature
- Good candidate for cross-model validation
Low overlap with similar activation patterns suggests:
- Ultra model has decomposed the Full feature
- Multiple Ultra features may combine to match one Full feature
Example: Finding Ultra Decomposition
python
# Example: A Full model feature that might be polysemantic
full_ctx = load_context("full")
matcher = FeatureMatcher(full_ctx) # Source = Full
# Find what Ultra features correspond to Full feature 512
report = matcher.find_matches(source_feature=512)
# If multiple Ultra features match, the Full feature may be polysemantic
if len([m for m in report.matches if m.combined_score > 0.1]) > 3:
print("Full feature 512 appears to be polysemantic")
print("Ultra decomposition:")
for m in report.matches[:5]:
print(f" Ultra {m.target_feature}: {m.shared_top_tokens[:3]}")
Workflow Integration
- Start with interpreted feature: Begin with a feature you understand
- Find matches: Use this skill to find counterparts
- Validate interpretation: Check if matches have similar behavior
- Document correspondence: Update research state with cross-model links
- Investigate decomposition: If Ultra splits a Full feature, analyze each part
Limitations
- Token overlap is a proxy; true matching would require shared activation data
- Different expansion factors mean different granularity
- Some features may not have clear counterparts
See Also
- mechinterp-cluster-mapper: Analyze groups of related features
- mechinterp-state: Track cross-model research
- mechinterp-runner: Validate matches with experiments
Didn't find tool you were looking for?