Agent skills
data-exploration-phase-1-struc...

Agent skill

data-exploration-phase-1-structural-understanding

Sub-skill of data-exploration: Phase 1: Structural Understanding (+2).

Stars 4

Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/analytics/data-exploration/phase-1-structural-understanding

SKILL.md

Phase 1: Structural Understanding (+2)

Phase 1: Structural Understanding

Before analyzing any data, understand its structure:

Table-level questions:

How many rows and columns?
What is the grain (one row per what)?
What is the primary key? Is it unique?
When was the data last updated?
How far back does the data go?

Column classification: Categorize each column as one of:

Identifier: Unique keys, foreign keys, entity IDs
Dimension: Categorical attributes for grouping/filtering (status, type, region, category)
Metric: Quantitative values for measurement (revenue, count, duration, score)
Temporal: Dates and timestamps (created_at, updated_at, event_date)
Text: Free-form text fields (description, notes, name)
Boolean: True/false flags
Structural: JSON, arrays, nested structures

Phase 2: Column-Level Profiling

For each column, compute:

All columns:

Null count and null rate
Distinct count and cardinality ratio (distinct / total)
Most common values (top 5-10 with frequencies)
Least common values (bottom 5 to spot anomalies)

Numeric columns (metrics):

min, max, mean, median (p50)
standard deviation
percentiles: p1, p5, p25, p75, p95, p99
zero count
negative count (if unexpected)

String columns (dimensions, text):

min length, max length, avg length
empty string count
pattern analysis (do values follow a format?)
case consistency (all upper, all lower, mixed?)
leading/trailing whitespace count

Date/timestamp columns:

min date, max date
null dates
future dates (if unexpected)
distribution by month/week
gaps in time series

Boolean columns:

true count, false count, null count
true rate

Phase 3: Relationship Discovery

After profiling individual columns:

Foreign key candidates: ID columns that might link to other tables
Hierarchies: Columns that form natural drill-down paths (country > state > city)
Correlations: Numeric columns that move together
Derived columns: Columns that appear to be computed from others
Redundant columns: Columns with identical or near-identical information

Maintainer

vamseeachanta Core maintainer

Source details

Full Name: vamseeachanta/workspace-hub
Branch: main
Path in repo: .claude/skills/_archive/data/analytics/data-exploration/phase-1-structural-understanding

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

vamseeachanta/workspace-hub

gsd-complete-milestone

Archive completed milestone and prepare for next version

4 4

Explore

vamseeachanta/workspace-hub

gsd-reapply-patches

Reapply local modifications after a GSD update

4 4

Explore

vamseeachanta/workspace-hub

gsd-verify-work

Validate built features through conversational UAT

4 4

Explore

vamseeachanta/workspace-hub

gsd-thread

Manage persistent context threads for cross-session work

4 4

Explore

vamseeachanta/workspace-hub

clinical-trial-protocol

Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.

4 4

Explore

vamseeachanta/workspace-hub

single-cell-rna-qc

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.

4 4

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Phase 1: Structural Understanding (+2)

Phase 1: Structural Understanding

Phase 2: Column-Level Profiling

Phase 3: Relationship Discovery

Recommended Agent Skills

gsd-complete-milestone

gsd-reapply-patches

gsd-verify-work

gsd-thread

clinical-trial-protocol

single-cell-rna-qc