Agent skill

python3-data

Python data, ETL, analysis, and scientific workflows with maintainable module boundaries and explicit validation at ingress points. Activates on pandas, numpy, scipy, jupyter, notebooks, ETL pipelines, tabular data ingestion, or scientific processing code.

Stars 33
Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/Jamie-BitFlight/claude_skills/tree/main/plugins/python-engineering/skills/python3-data

SKILL.md

Python Data

Load python3-core for standing defaults. Load python3-typing for boundary schemas. Load python3-testing for parser and edge-case tests.

Quality Checklist

  • Schema validated at first stable ingress point — not deep in transforms
  • dtype= explicit in pd.read_csv() / pd.read_excel() — never rely on inference
  • No raw pd.DataFrame crossing module boundaries without documented column contract
  • Merge/join results checked for unexpected nulls and row count changes
  • model_config = {"strict": True} on all Pydantic boundary models
  • No inplace=True — deprecated, returns None, causes silent bugs
  • Notebook logic that survived 3+ uses extracted into tested modules

Gotchas

Trap What to do instead
df["a"]["b"] = x (chained indexing) df.loc[:, "b"] = x — chained indexing silently fails
.apply(lambda) on large frames Vectorized ops first; .apply() only when no vectorized path exists
pd.merge() without post-check Assert no unexpected nulls or duplicate keys after merge
df.drop(..., inplace=True) df = df.drop(...)inplace is deprecated and returns None
Bare pd.read_csv(path) Always pass dtype= to prevent silent type inference errors

Decision Table

Task Use Not
Tabular < 1M rows pandas Polars (overhead not justified)
Tabular > 1M rows or need speed Polars pandas
SQL-like analytics on local files DuckDB Loading everything into pandas
Read-only TOML config tomllib (stdlib, binary mode "rb") tomlkit
Read/write TOML preserving comments tomlkit (text mode) tomllib

Module Layout

text
etl/
├── ingest.py      # raw data loading (boundary)
├── validate.py    # schema validation (boundary)
├── transform.py   # business logic (typed core)
├── load.py        # output writing (boundary)
└── types.py       # shared typed models

Didn't find tool you were looking for?

Be as detailed as possible for better results