Agent skill

named-entity-extractor

Extract named entities (people, organizations, locations, dates) from text using NLP. Use for document analysis, information extraction, or data enrichment.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/named-entity-extractor

SKILL.md

Named Entity Extractor

Extract named entities from text including people, organizations, locations, dates, and more.

Features

  • Entity Types: People, organizations, locations, dates, money, percentages
  • Multiple Models: spaCy for accuracy, regex for speed
  • Batch Processing: Process multiple documents
  • Entity Linking: Group same entities across text
  • Export: JSON, CSV output formats
  • Visualization: Entity highlighting

Quick Start

python
from entity_extractor import EntityExtractor

extractor = EntityExtractor()

text = "Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976."

entities = extractor.extract(text)
for entity in entities:
    print(f"{entity['text']}: {entity['type']}")

# Output:
# Apple Inc.: ORG
# Steve Jobs: PERSON
# Cupertino: GPE
# California: GPE
# 1976: DATE

CLI Usage

bash
# Extract from text
python entity_extractor.py --text "Steve Jobs founded Apple in California."

# Extract from file
python entity_extractor.py --input document.txt

# Batch process folder
python entity_extractor.py --input ./documents/ --output entities.csv

# Filter by entity type
python entity_extractor.py --input document.txt --types PERSON,ORG

# Use regex mode (faster, less accurate)
python entity_extractor.py --input document.txt --mode regex

# JSON output
python entity_extractor.py --input document.txt --json

API Reference

EntityExtractor Class

python
class EntityExtractor:
    def __init__(self, mode: str = "spacy", model: str = "en_core_web_sm")

    # Extraction
    def extract(self, text: str) -> list
    def extract_file(self, filepath: str) -> list
    def extract_batch(self, folder: str) -> dict

    # Filtering
    def filter_entities(self, entities: list, types: list) -> list
    def get_unique_entities(self, entities: list) -> list
    def group_by_type(self, entities: list) -> dict

    # Analysis
    def entity_frequency(self, text: str) -> dict
    def find_relationships(self, text: str) -> list

    # Export
    def to_csv(self, entities: list, output: str) -> str
    def to_json(self, entities: list, output: str) -> str
    def highlight_text(self, text: str) -> str

Entity Types

Standard Entity Types (spaCy)

Type Description Example
PERSON People, including fictional "Steve Jobs"
ORG Companies, agencies, institutions "Apple Inc."
GPE Countries, cities, states "California"
LOC Non-GPE locations, mountains, water "Pacific Ocean"
DATE Dates, periods "January 2024"
TIME Times "3:30 PM"
MONEY Monetary values "$1.5 million"
PERCENT Percentages "20%"
PRODUCT Products "iPhone"
EVENT Events "World Cup"
WORK_OF_ART Books, songs, etc. "The Great Gatsby"
LAW Laws, regulations "GDPR"
LANGUAGE Languages "English"
NORP Nationalities, groups "American"

Regex Mode Entities

Faster extraction with regex patterns:

Type Description
EMAIL Email addresses
PHONE Phone numbers
URL Web URLs
DATE Common date formats
MONEY Currency amounts
PERCENTAGE Percentages

Output Format

Entity Result

python
{
    "text": "Steve Jobs",
    "type": "PERSON",
    "start": 10,
    "end": 20,
    "confidence": 0.95
}

Full Extraction Result

python
{
    "text": "Original text...",
    "entities": [
        {"text": "Steve Jobs", "type": "PERSON", "start": 10, "end": 20},
        {"text": "Apple Inc.", "type": "ORG", "start": 30, "end": 40}
    ],
    "summary": {
        "total_entities": 5,
        "unique_entities": 4,
        "by_type": {
            "PERSON": 2,
            "ORG": 1,
            "GPE": 2
        }
    }
}

Filtering and Grouping

Filter by Type

python
entities = extractor.extract(text)

# Get only people and organizations
filtered = extractor.filter_entities(entities, ["PERSON", "ORG"])

Get Unique Entities

python
# Remove duplicates, keep first occurrence
unique = extractor.get_unique_entities(entities)

Group by Type

python
grouped = extractor.group_by_type(entities)

# Returns:
{
    "PERSON": ["Steve Jobs", "Tim Cook"],
    "ORG": ["Apple Inc."],
    "GPE": ["California", "Cupertino"]
}

Entity Frequency

python
frequency = extractor.entity_frequency(text)

# Returns:
{
    "Steve Jobs": {"count": 5, "type": "PERSON"},
    "Apple": {"count": 8, "type": "ORG"},
    "California": {"count": 2, "type": "GPE"}
}

Batch Processing

Process Folder

python
results = extractor.extract_batch("./documents/")

# Returns:
{
    "doc1.txt": {
        "entities": [...],
        "summary": {...}
    },
    "doc2.txt": {
        "entities": [...],
        "summary": {...}
    }
}

Export to CSV

python
extractor.to_csv(results, "entities.csv")

# Creates CSV with columns:
# filename, entity_text, entity_type, start, end

Text Highlighting

Generate HTML with highlighted entities:

python
html = extractor.highlight_text(text)

# Returns HTML with colored spans for each entity type

Example Workflows

Document Analysis

python
extractor = EntityExtractor()

# Analyze a document
text = open("article.txt").read()
result = extractor.extract(text)

# Get key people mentioned
people = extractor.filter_entities(result, ["PERSON"])
print(f"People mentioned: {len(people)}")

# Get frequency
freq = extractor.entity_frequency(text)
top_entities = sorted(freq.items(), key=lambda x: x[1]["count"], reverse=True)[:10]

Contact Information Extraction

python
extractor = EntityExtractor(mode="regex")

text = """
Contact John Smith at john.smith@example.com
or call (555) 123-4567.
"""

entities = extractor.extract(text)
# Finds: EMAIL, PHONE entities

Content Tagging

python
extractor = EntityExtractor()

articles = ["article1.txt", "article2.txt", "article3.txt"]
tags = {}

for article in articles:
    entities = extractor.extract_file(article)
    tags[article] = extractor.get_unique_entities(entities)

Dependencies

  • spacy>=3.7.0
  • pandas>=2.0.0
  • en_core_web_sm (spaCy model)

Note: Run python -m spacy download en_core_web_sm to install the model.

Didn't find tool you were looking for?

Be as detailed as possible for better results