Agent skill

openplanter

Investigate datasets and resolve entities using OpenPlanter methodology — cross-reference heterogeneous sources, build Admiralty/ACH confidence-tiered evidence chains, and apply OSINT tradecraft. Triggers on entity resolution, cross-reference datasets, evidence chain, OSINT investigation, structured investigation.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/other/other/openplanter

SKILL.md

OpenPlanter — Investigation Methodology Skill

Epistemic framework for cross-dataset investigation, entity resolution, and evidence-backed analysis. Extracted from the OpenPlanter recursive investigation agent and enriched with professional OSINT tradecraft (Admiralty System, ACH, FollowTheMoney schema, intelligence cycle methodology).

Claude Code already has the tools. This skill provides the methodology.

When to Use

  • Cross-referencing heterogeneous datasets (corporate registries, campaign finance, lobbying, property records, contracts)
  • Entity resolution across datasets with inconsistent naming
  • Building evidence chains with provenance and confidence tiers
  • Structured OSINT investigations requiring epistemic discipline
  • Any analysis where claims need to trace to cited source records

Quick Start

bash
# 1. Initialize workspace
python3 ~/.claude/skills/openplanter/scripts/init_workspace.py /path/to/investigation

# 2. Drop datasets into datasets/
cp campaign_finance.csv lobbying.json corporate_registry.csv /path/to/investigation/datasets/

# 3. Write an investigation plan
# → plans/plan.md (see references/output-templates.md for plan template)

# 4. Resolve entities across datasets
python3 ~/.claude/skills/openplanter/scripts/entity_resolver.py /path/to/investigation

# 5. Cross-reference linked entities
python3 ~/.claude/skills/openplanter/scripts/cross_reference.py /path/to/investigation

# 6. Validate evidence chains
python3 ~/.claude/skills/openplanter/scripts/evidence_chain.py /path/to/investigation

# 7. Score confidence
python3 ~/.claude/skills/openplanter/scripts/confidence_scorer.py /path/to/investigation

Investigation Methodology

Epistemic Discipline

Assume nothing about the environment until confirmed firsthand. These principles prevent the most common investigation failures:

  1. Ground truth comes from files, not memory. Read actual data before modifying it, and read actual error messages before diagnosing. Model memory of data structure is unreliable—reading the file takes seconds, recovering from a wrong assumption takes minutes.
  2. Empty output is ambiguous. If a command returns empty, cross-check with ls -la and wc -c before concluding a file is actually empty, because output capture mechanisms can silently lose data.
  3. Success does not mean correctness. A command that "succeeds" may have done nothing. Check actual outcomes, not just exit codes. After downloading, verify with ls and wc -c. After extraction, verify expected files exist.
  4. Verify round-trip correctness. After any data transformation (parsing, linking, aggregation), check the result from the consumer's perspective—load the output, spot-check records, verify row counts. Transformations that silently drop records are the most common source of wrong conclusions.
  5. Three failures = wrong approach. If a command fails 3 times, change strategy entirely. Repeating an identical command expecting different results wastes context window.
  6. Produce artifacts early. Write a working first draft of findings as soon as the requirements are clear, then iterate. An imperfect deliverable beats a perfect analysis with no output. If 3+ steps have passed without writing any output, stop and write—even if incomplete.

For the full epistemic framework (including data ingestion rules and hard rules from OpenPlanter's prompts.py), see references/investigation-methodology.md.

Entity Resolution Protocol

Pipeline: Normalize → Block → Compare → Score → Cluster → Review

Adapted from OpenPlanter's prompts.py, enriched with Middesk, OpenSanctions, and ICIJ patterns.

  1. Normalize — Apply canonical key transformation: Unicode NFKD + diacritic stripping, case folding, legal suffix canonicalization (LLC/Inc/Corp/Ltd → canonical forms), punctuation removal, ampersand normalization (&and). See references/entity-resolution-patterns.md for complete normalization tables and suffix maps.

  2. Block — Reduce O(N^2) comparisons using blocking keys: first 3 characters of normalized name + state/jurisdiction, phonetic key (Soundex/Double Metaphone), token overlap via inverted index.

  3. Compare — Pairwise similarity with cascading checks: real_quick_ratio()quick_ratio()ratio(). Use autojunk=False for entity names because strings under 200 characters produce false negatives with junk heuristics. Include token set comparison for word-order variants ("Apple Inc" vs "Inc Apple").

  4. Score — Multi-signal weighted model:

    • Hard signals: TIN/EIN exact match (1.0), identical phone E.164 (0.8), identical email (0.8)
    • Soft signals: name similarity (0.5), address fuzzy (0.2), state match (0.1)
    • Hard disqualifiers: TIN mismatch when both present (-0.5), country mismatch (-0.5)
  5. Cluster — Group via transitive closure using Union-Find. Gate closure so all pairwise scores exceed threshold—this prevents chain errors where A≈B and B≈C but A≉C. Exclude registered agent addresses from triggering transitive closure alone, because thousands of entities share the same registered agent.

  6. Review — Flag by confidence band:

    • Score >= 0.85: auto-match (confirmed)
    • Score 0.70-0.84: queue for review (probable)
    • Score 0.55-0.69: include in wide net (possible)
    • Score < 0.55: discard

Evidence Chain Construction

Every claim traces to a specific record in a specific dataset. This is what separates investigation from speculation.

Claim
  └── Evidence Item
        ├── type: document | record | image | testimony
        ├── source_ref → Source
        │     ├── url / file path
        │     ├── collection_timestamp
        │     ├── source_type: primary | secondary | tertiary | official | unofficial
        │     └── reliability_grade: A–F (Admiralty)
        ├── credibility_grade: 1–6 (Admiralty)
        ├── corroboration_status: single | corroborated | contradicted | unresolvable
        └── match_details (if cross-reference)
              ├── fields_matched: [name, address, ein]
              ├── match_type: exact | fuzzy | address-based
              └── link_strength: weakest criterion in the chain

Key principles:

  • Distinguish direct evidence (A appears in record X), circumstantial evidence (A's address matches B's address), and absence of evidence (no disclosure found)
  • Document every hop in a multi-step chain with source record, linking field, and match quality
  • Link strength = weakest criterion in the chain (a chain is only as strong as its weakest link)
  • Track source lineage to detect circular reporting—Source B citing Source A is not independent corroboration

Confidence Tiers

Based on the Admiralty System (NATO AJP-2.1), adapted for dataset investigation:

Tier Criteria Required Evidence Admiralty Equivalent
Confirmed 2+ independent sources with different collection paths; hard signal match (EIN, phone); or official record with verifiable provenance Independent corroboration required A1–B1
Probable Strong single source (official record); high fuzzy match (>0.85) on name + address + state; consistent with known patterns Single strong source acceptable B2–C2
Possible Circumstantial evidence only; moderate fuzzy match (0.55-0.84); consistent but not yet corroborated; requires additional investigation Hypothesis supported but not confirmed C3–D3
Unresolved Contradictory evidence; insufficient data; single weak source; or unable to verify Cannot determine with available evidence D4–F6

Sherman Kent probability mapping:

  • Confirmed ≈ "almost certain" (93-99%)
  • Probable ≈ "likely" (75-85%)
  • Possible ≈ "chances about even" (45-55%)
  • Unresolved ≈ insufficient basis for estimate

Verification Principle

Implementation and verification must be uncorrelated. An agent that performs an analysis introduces systematic bias when self-verifying—it "knows" what the answer should be and unconsciously confirms it. Use the implement-then-verify pattern:

Step 1: Perform entity resolution and cross-referencing (analysis agent)
Step 2: Read the result files
Step 3: Independent verification (separate agent or separate pass with no shared context from step 1):
        - Load output files fresh
        - Spot-check N random records against raw source data
        - Verify row counts match expectations
        - Run validation script
        - Report raw output only

The verification executor has no context from the analysis executor. It runs commands and reports output, making its evidence independent.

Anti-bias checks:

  • Confirmation bias: Score hypotheses by inconsistency count (ACH), not confirmation count, because disconfirming evidence is more diagnostic
  • Anchoring: Do not rank hypotheses until evidence collection is complete
  • Circular reporting: Track source lineage; verify independence of collection paths before counting corroborations
  • Satisficing: Require minimum 3 competing hypotheses before scoring—this prevents premature commitment to the first plausible explanation

Analysis Output Standards

Include in all investigation deliverables:

  1. Methodology section: Sources used, entity resolution approach, linking logic, known limitations
  2. Confidence breakdown: Count of findings per tier (confirmed/probable/possible/unresolved)
  3. Evidence appendix: Every hop, every source record cited, every match score
  4. Structured output: JSON for machine-readable (findings/), Markdown for human-readable (findings/)
  5. Provenance: For each dataset—source URL/path, access timestamp, transformations applied

See references/output-templates.md for ready-to-use templates (investigation plans, summaries, evidence chains).

Integration Modes

The skill operates in three modes based on investigation complexity:

Mode When Scripts
Methodology Only Simple tasks, 1-2 datasets, local analysis entity_resolver.py, cross_reference.py, evidence_chain.py, confidence_scorer.py
Web-Enriched Need external data, public records, entity enrichment Above + dataset_fetcher.py, web_enrich.py, scrape_records.py, + 6 specialized fetchers
Full RLM Delegation Complex multi-step investigations, 3+ datasets, 20+ reasoning steps delegate_to_rlm.py → full OpenPlanter agent (provider-agnostic, session-resumable)

One-command pipeline for any mode: investigate.py /path/to/workspace --phases all

RLM Delegation — Provider-Agnostic

The RLM agent auto-detects the LLM provider from the model name. Works with any provider the agent supports:

bash
# Anthropic (default)
python3 scripts/delegate_to_rlm.py --objective "..." --workspace DIR --model claude-sonnet-4-5-20250929

# OpenAI
python3 scripts/delegate_to_rlm.py --objective "..." --workspace DIR --model gpt-4o

# OpenRouter (any model via slash routing)
python3 scripts/delegate_to_rlm.py --objective "..." --workspace DIR --model anthropic/claude-sonnet-4-5

# Ollama (local inference, air-gapped)
python3 scripts/delegate_to_rlm.py --objective "..." --workspace DIR --provider ollama --model llama3

# Cerebras (model name doesn't contain "cerebras", so specify --provider)
python3 scripts/delegate_to_rlm.py --objective "..." --workspace DIR --model qwen-3-235b-a22b-instruct-2507 --provider cerebras

# Resume a previous investigation session
python3 scripts/delegate_to_rlm.py --resume abc123 --workspace DIR

# List saved sessions
python3 scripts/delegate_to_rlm.py --list-sessions --workspace DIR

# Control reasoning depth
python3 scripts/delegate_to_rlm.py --objective "..." --workspace DIR --reasoning-effort high

# List available models for a provider
python3 scripts/delegate_to_rlm.py --list-models --provider ollama

Provider auto-detection: claude-* → anthropic, gpt-*/o1-*/o3-* → openai, org/model → openrouter, *cerebras* → cerebras, llama*/qwen*/mistral*/gemma* → ollama. For models without a recognizable prefix, pass --provider explicitly.

API keys pass through environment variables: ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY, CEREBRAS_API_KEY (or OPENPLANTER_-prefixed variants). Ollama requires no API key. Set OPENPLANTER_REPO to override local clone discovery.

Scripts Reference

All scripts use Python stdlib only. Zero external dependencies. External tools (Exa, Firecrawl, OpenPlanter agent) are invoked as subprocesses.

Core Analysis

Script Purpose Example
init_workspace.py Create investigation workspace structure python3 scripts/init_workspace.py /tmp/investigation
entity_resolver.py Fuzzy entity matching + canonical map python3 scripts/entity_resolver.py /tmp/investigation --threshold 0.85
cross_reference.py Cross-dataset record linking python3 scripts/cross_reference.py /tmp/investigation
evidence_chain.py Validate evidence chain structure python3 scripts/evidence_chain.py /tmp/investigation
confidence_scorer.py Score findings by confidence tier python3 scripts/confidence_scorer.py /tmp/investigation

Data Collection & Enrichment

Script Purpose Example
dataset_fetcher.py Download bulk public datasets (SEC, FEC, OFAC, LDA, OpenSanctions) python3 scripts/dataset_fetcher.py /tmp/investigation --sources sec,fec
web_enrich.py Enrich entities via Exa neural search python3 scripts/web_enrich.py /tmp/investigation --categories company,news
scrape_records.py Fetch entity records from government APIs python3 scripts/scrape_records.py /tmp/investigation --entities "Acme Corp" --sources sec,fec

Specialized Data Fetchers

Individual scripts for targeted government and public data sources. All use Python stdlib only, produce JSON + provenance sidecar, and support --dry-run and --list.

Script Data Source Auth Key Linking Fields
fetch_census.py US Census Bureau ACS 5-Year Optional CENSUS_API_KEY Geography (state, county, ZIP)
fetch_epa.py EPA ECHO Facility Search None registry_id, lat/lon, SIC/NAICS
fetch_icij.py ICIJ Offshore Leaks Database None icij_id, entity name, jurisdiction
fetch_osha.py OSHA DOL Enforcement None activity_nr, estab_name, SIC
fetch_propublica990.py ProPublica Nonprofit Explorer (IRS 990) None ein, org name, NTEE code
fetch_sam.py SAM.gov Entity Registration SAM_GOV_API_KEY UEI, CAGE code, NAICS

Usage pattern (all fetchers follow the same interface):

bash
python3 scripts/fetch_sam.py /tmp/investigation --query "Raytheon" --state CT
python3 scripts/fetch_epa.py /tmp/investigation --state TX --query "Refinery"
python3 scripts/fetch_icij.py /tmp/investigation --entity "Mossack" --type intermediary
python3 scripts/fetch_propublica990.py /tmp/investigation --ein 237327340
python3 scripts/fetch_census.py /tmp/investigation --state 36 --county "*"
python3 scripts/fetch_osha.py /tmp/investigation --sic 2911 --state TX

Orchestration & Delegation

Script Purpose Example
investigate.py Run full pipeline end-to-end python3 scripts/investigate.py /tmp/investigation --phases all
delegate_to_rlm.py Spawn full OpenPlanter agent (session-resumable, provider-agnostic) python3 scripts/delegate_to_rlm.py --objective "..." --workspace DIR

Knowledge Graph

Script Purpose Example
wiki_graph_query.py Query OpenPlanter wiki knowledge graph (read-only) python3 scripts/wiki_graph_query.py /tmp/investigation --entity "Raytheon" --neighbors

Supports entity lookup, neighbor traversal, BFS path finding, full-text search, and graph statistics. Reads NetworkX node-link JSON graphs produced by OpenPlanter's wiki_graph.py during delegated investigations.

Skill Integration

OpenPlanter methodology composes with existing Claude Code skills:

Investigation Task Skill Integration Pattern
Web research for entity enrichment exa-search web_enrich.py calls exa_search.py as subprocess
Scrape JS-heavy public records portals Firecrawl firecrawl scrape URL --only-main-content
Structured government APIs Built-in scrape_records.py queries SEC, FEC, LDA, USAspending via urllib
Bulk dataset downloads Built-in dataset_fetcher.py fetches SEC, FEC, OFAC, OpenSanctions, LDA
Defense contractor lookup Built-in fetch_sam.py queries SAM.gov by name/UEI/CAGE/NAICS
Environmental compliance Built-in fetch_epa.py queries EPA ECHO for facilities + violations
Nonprofit/dark money flows Built-in fetch_propublica990.py queries IRS 990 data via ProPublica
Offshore entity chains Built-in fetch_icij.py queries Panama/Paradise/Pandora Papers
Workplace safety records Built-in fetch_osha.py queries DOL enforcement data
Demographics/economic context Built-in fetch_census.py queries Census ACS 5-Year estimates
Knowledge graph query Built-in wiki_graph_query.py reads OpenPlanter wiki graphs
Local RAG over large document corpora rlama Create collection from datasets/, query semantically
Parallel investigation threads minoan-swarm Elat Research Swarm with domain-split investigators
Academic/legal research academic-research Case law, regulatory filings, citations
Twitter/social media OSINT twitter x-search for entity mentions, bird for profile data
Daimonic timeline curation worldwarwatcher-update Mazkir ha-Milḥamat entity resolution for non-military domains

US Public Records Datasets

Key datasets and their linking keys for cross-reference investigations:

Dataset Access Linking Key Script Format
FEC Campaign Finance api.open.fec.gov + bulk CSV committee_id, contributor name dataset_fetcher.py CSV, JSON API
Senate LDA Lobbying lda.senate.gov/api Registrant name, Client name dataset_fetcher.py JSON API, XML
SEC EDGAR data.sec.gov + EFTS search CIK (Central Index Key) dataset_fetcher.py JSON, XBRL
SAM.gov Entity Registration api.sam.gov UEI, CAGE code, NAICS fetch_sam.py JSON API
EPA ECHO Facilities echodata.epa.gov FRS Registry ID, lat/lon fetch_epa.py JSON API
ProPublica 990 (IRS) projects.propublica.org EIN, org name fetch_propublica990.py JSON API
ICIJ Offshore Leaks offshoreleaks.icij.org Node ID, entity name, jurisdiction fetch_icij.py JSON API
OSHA Inspections enforcedata.dol.gov Activity number, SIC code fetch_osha.py JSON API
US Census ACS api.census.gov State/county/ZIP FIPS fetch_census.py JSON API
OFAC Sanctions treasury.gov/ofac (or OpenSanctions) Name + aliases, identifiers dataset_fetcher.py CSV, XML
State Corporate Registries Per-state (or OpenCorporates API) State registration number Varies
Property Records County-level (or ATTOM/CoreLogic) Parcel ID (APN), owner name CSV, shapefile

Cross-dataset linking challenge: No universal corporate ID exists in US public records. The standard approach: normalize names, fuzzy match, filter by jurisdiction/address, then anchor on known IDs (CIK, committee_id, EIN, UEI, CAGE, FRS ID) when available.

Multi-Agent Investigation

For complex investigations requiring parallel workstreams, use minoan-swarm. Separate the verifier agent from analysis agents to maintain uncorrelated verification—the verifier receives only output files and verification criteria, with no shared context from the analysis phase.

See references/investigation-methodology.md for the full swarm role template (keret, kothar, resheph, anat, shapash).

Structured Analytical Techniques

For complex scenarios with multiple possible explanations, apply Analysis of Competing Hypotheses (ACH) and Key Assumptions Check. These techniques are detailed in references/investigation-methodology.md.

ACH summary: Build a matrix of hypotheses vs. evidence. Score by inconsistency count (fewest I markers wins), not confirmation count. This counteracts confirmation bias because disconfirming evidence is more diagnostic than supporting evidence. Identify linchpin evidence whose reclassification would change the conclusion.

Deep References

  • references/investigation-methodology.md — Full epistemic framework, ACH procedure, Key Assumptions Check, multi-agent swarm template
  • references/entity-resolution-patterns.md — Complete normalization tables, suffix maps, address canonicalization
  • references/output-templates.md — JSON/Markdown templates for investigation plans, summaries, and evidence chains
  • references/public-records-apis.md — API endpoints, auth, rate limits, linking keys for SEC, FEC, LDA, OFAC, USAspending, Census, EPA, ICIJ, OSHA, ProPublica 990, SAM.gov

OpenPlanter Tool to Claude Code Mapping

OpenPlanter Tool Claude Code Equivalent
list_files Glob, Bash(ls)
read_file Read
write_file Write
search_files Grep
edit_file Edit
apply_patch Edit
run_shell Bash
web_search exa-search skill
fetch_url Firecrawl skill / WebFetch
subtask Task tool (minoan-swarm)
execute Task tool (haiku model)
think Native reasoning
wiki_graph wiki_graph_query.py (read-only)
fetch_sam fetch_sam.py
fetch_epa fetch_epa.py
fetch_icij fetch_icij.py
fetch_osha fetch_osha.py
fetch_990 fetch_propublica990.py
fetch_census fetch_census.py
resume_session delegate_to_rlm.py --resume

No capability gap. The methodology is what matters, not the tooling.

Didn't find tool you were looking for?

Be as detailed as possible for better results