Agent skill

data-exploration-phase-1-structural-understanding

Sub-skill of data-exploration: Phase 1: Structural Understanding (+2).

Stars 4
Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/analytics/data-exploration/phase-1-structural-understanding

SKILL.md

Phase 1: Structural Understanding (+2)

Phase 1: Structural Understanding

Before analyzing any data, understand its structure:

Table-level questions:

  • How many rows and columns?
  • What is the grain (one row per what)?
  • What is the primary key? Is it unique?
  • When was the data last updated?
  • How far back does the data go?

Column classification: Categorize each column as one of:

  • Identifier: Unique keys, foreign keys, entity IDs
  • Dimension: Categorical attributes for grouping/filtering (status, type, region, category)
  • Metric: Quantitative values for measurement (revenue, count, duration, score)
  • Temporal: Dates and timestamps (created_at, updated_at, event_date)
  • Text: Free-form text fields (description, notes, name)
  • Boolean: True/false flags
  • Structural: JSON, arrays, nested structures

Phase 2: Column-Level Profiling

For each column, compute:

All columns:

  • Null count and null rate
  • Distinct count and cardinality ratio (distinct / total)
  • Most common values (top 5-10 with frequencies)
  • Least common values (bottom 5 to spot anomalies)

Numeric columns (metrics):

min, max, mean, median (p50)
standard deviation
percentiles: p1, p5, p25, p75, p95, p99
zero count
negative count (if unexpected)

String columns (dimensions, text):

min length, max length, avg length
empty string count
pattern analysis (do values follow a format?)
case consistency (all upper, all lower, mixed?)
leading/trailing whitespace count

Date/timestamp columns:

min date, max date
null dates
future dates (if unexpected)
distribution by month/week
gaps in time series

Boolean columns:

true count, false count, null count
true rate

Phase 3: Relationship Discovery

After profiling individual columns:

  • Foreign key candidates: ID columns that might link to other tables
  • Hierarchies: Columns that form natural drill-down paths (country > state > city)
  • Correlations: Numeric columns that move together
  • Derived columns: Columns that appear to be computed from others
  • Redundant columns: Columns with identical or near-identical information

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results