Agent skill
ioc-extraction
Extract, classify, deduplicate, and enrich IOCs from investigation artifacts; map to STIX 2.1 observables
Install this agent skill to your Project
npx add-skill https://github.com/jmagly/aiwg/tree/main/agentic/code/frameworks/forensics-complete/skills/ioc-extraction
SKILL.md
ioc-extraction
Scans investigation artifacts — log files, memory analysis output, findings documents, and raw captures — to extract indicators of compromise. Classifies each indicator by type, deduplicates, and produces a STIX 2.1 observable bundle alongside a flat IOC list for import into SIEMs and threat intelligence platforms.
Triggers
Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):
- "IOCs" / "indicators" → Indicator of Compromise extraction
- "STIX" / "STIX 2.1" → structured threat intelligence output
- "pull indicators" → IOC extraction shorthand
Purpose
IOCs extracted during investigation have value beyond the current case: they feed detection rules, threat intelligence platforms, and network blocklists. Raw extraction without classification and deduplication produces noise. This skill applies consistent extraction patterns and maps output to STIX 2.1 so findings integrate with standard threat intelligence tooling.
Behavior
When triggered, this skill:
-
Identify input sources:
- Accept a directory path, file path, or glob pattern
- Default to scanning all files under
.aiwg/forensics/if no path is specified - Supported source types: plain text, Markdown, JSON, JSONL, CSV, raw log files
-
Extract IP addresses:
- IPv4: match
\b(?:\d{1,3}\.){3}\d{1,3}\b, validate octets are 0-255 - IPv6: match full and compressed forms
- Exclude RFC1918 private ranges, loopback (127.0.0.0/8), link-local (169.254.0.0/16), and multicast (224.0.0.0/4) by default (configurable)
- Exclude IP addresses that appear only in trusted infrastructure context (DNS servers, NTP servers from baseline profile)
- IPv4: match
-
Extract domain names and hostnames:
- Match FQDNs:
\b(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}\b - Exclude known-good domains from an allowlist (configurable)
- Flag domains with high entropy names (DGA indicators): calculate Shannon entropy per label
- Flag recently registered TLDs and uncommon ccTLDs
- Match FQDNs:
-
Extract file hashes:
- MD5: 32 hex characters
- SHA-1: 40 hex characters
- SHA-256: 64 hex characters
- Tag with hash type; flag any MD5 or SHA-1 hashes as weak-algorithm IOCs
-
Extract URLs:
- Match full URLs including scheme, host, path, and query string
- Defang for safe storage: replace
httpwithhxxp,.with[.]in output - Classify by scheme: http, https, ftp, smb, ldap
-
Extract email addresses:
- Standard RFC 5321 pattern
- Flag addresses in suspicious domains or with high-entropy local parts
-
Extract file paths and registry keys:
- Unix absolute paths:
/[a-zA-Z0-9._/-]+ - Windows paths:
[A-Za-z]:\\[^\s"]+ - Windows registry keys:
HK(LM|CU|CR|U|CC)\\[^\s"]+
- Unix absolute paths:
-
Classify and deduplicate:
- Assign STIX 2.1 observable type to each indicator:
- IP:
ipv4-addroripv6-addr - Domain:
domain-name - URL:
url - Hash:
filewithhashesproperty - Email:
email-addr - File path:
file - Registry key:
windows-registry-key
- IP:
- Deduplicate by value within each type
- Record source file and line number for each unique indicator
- Assign STIX 2.1 observable type to each indicator:
-
Produce STIX 2.1 bundle:
- Generate
observable-objectsentries per STIX 2.1 specification - Assign deterministic UUIDs based on type and value (version 5 UUID from SHA-1 namespace)
- Include
createdandmodifiedtimestamps - Link observables to a STIX
reportobject referencing the investigation ID
- Generate
-
Write outputs:
- Flat IOC list:
.aiwg/forensics/iocs/<investigation>-iocs.txt(one indicator per line, typed prefix) - STIX bundle:
.aiwg/forensics/iocs/<investigation>-stix.json - Summary report:
.aiwg/forensics/iocs/<investigation>-ioc-summary.md
- Flat IOC list:
Usage Examples
Example 1 — Scan all forensics artifacts
extract iocs
Example 2 — Scan specific file
extract indicators from .aiwg/forensics/findings/webserver-01-linux.md
Example 3 — With custom allowlist
ioc analysis --allowlist /etc/forensics/trusted-domains.txt
Output Locations
- Flat IOC list:
.aiwg/forensics/iocs/<investigation>-iocs.txt - STIX 2.1 bundle:
.aiwg/forensics/iocs/<investigation>-stix.json - Summary:
.aiwg/forensics/iocs/<investigation>-ioc-summary.md
Configuration
ioc_extraction:
exclude_private_ips: true
exclude_loopback: true
exclude_multicast: true
dga_entropy_threshold: 3.5
weak_hash_algorithms:
- md5
- sha1
defang_urls: true
stix_version: "2.1"
domain_allowlist: []
ip_allowlist: []
References
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Scan investigation artifacts completely before extracting; check baseline and allowlists before flagging
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Produce IOC lists for analyst review; do not autonomously push indicators to blocking systems
- @$AIWG_ROOT/agentic/code/frameworks/forensics-complete/rules/evidence-integrity.md — IOC extraction must not modify source artifacts; read-only access to evidence
- @$AIWG_ROOT/agentic/code/frameworks/forensics-complete/skills/evidence-preservation/SKILL.md — Evidence must be preserved and hashed before IOC extraction begins
- @$AIWG_ROOT/agentic/code/frameworks/forensics-complete/skills/sigma-hunting/SKILL.md — Sigma hunting cross-references extracted IOCs against log sources for confirmation
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
research-document
Generate summaries and literature notes from research papers
research-archive
Package research artifacts for long-term archival
research-cite
Format citations and generate bibliographies
induct-research
Induct research sources into a research repository. Point at an issue, a single file, a directory of papers, or a URI and the skill reads, annotates, and files structured induction tasks — one per source. Similar to address-issues but for research corpora instead of code backlogs.
research-provenance
Query provenance chains and artifact relationships
research-quality
Assess source quality using GRADE methodology
Didn't find tool you were looking for?