Agent skill
data-exploration-phase-1-structural-understanding
Sub-skill of data-exploration: Phase 1: Structural Understanding (+2).
Install this agent skill to your Project
npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/analytics/data-exploration/phase-1-structural-understanding
SKILL.md
Phase 1: Structural Understanding (+2)
Phase 1: Structural Understanding
Before analyzing any data, understand its structure:
Table-level questions:
- How many rows and columns?
- What is the grain (one row per what)?
- What is the primary key? Is it unique?
- When was the data last updated?
- How far back does the data go?
Column classification: Categorize each column as one of:
- Identifier: Unique keys, foreign keys, entity IDs
- Dimension: Categorical attributes for grouping/filtering (status, type, region, category)
- Metric: Quantitative values for measurement (revenue, count, duration, score)
- Temporal: Dates and timestamps (created_at, updated_at, event_date)
- Text: Free-form text fields (description, notes, name)
- Boolean: True/false flags
- Structural: JSON, arrays, nested structures
Phase 2: Column-Level Profiling
For each column, compute:
All columns:
- Null count and null rate
- Distinct count and cardinality ratio (distinct / total)
- Most common values (top 5-10 with frequencies)
- Least common values (bottom 5 to spot anomalies)
Numeric columns (metrics):
min, max, mean, median (p50)
standard deviation
percentiles: p1, p5, p25, p75, p95, p99
zero count
negative count (if unexpected)
String columns (dimensions, text):
min length, max length, avg length
empty string count
pattern analysis (do values follow a format?)
case consistency (all upper, all lower, mixed?)
leading/trailing whitespace count
Date/timestamp columns:
min date, max date
null dates
future dates (if unexpected)
distribution by month/week
gaps in time series
Boolean columns:
true count, false count, null count
true rate
Phase 3: Relationship Discovery
After profiling individual columns:
- Foreign key candidates: ID columns that might link to other tables
- Hierarchies: Columns that form natural drill-down paths (country > state > city)
- Correlations: Numeric columns that move together
- Derived columns: Columns that appear to be computed from others
- Redundant columns: Columns with identical or near-identical information
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-complete-milestone
Archive completed milestone and prepare for next version
gsd-reapply-patches
Reapply local modifications after a GSD update
gsd-verify-work
Validate built features through conversational UAT
gsd-thread
Manage persistent context threads for cross-session work
clinical-trial-protocol
Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.
single-cell-rna-qc
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.
Didn't find tool you were looking for?