Agent skill
python3-data
Python data, ETL, analysis, and scientific workflows with maintainable module boundaries and explicit validation at ingress points. Activates on pandas, numpy, scipy, jupyter, notebooks, ETL pipelines, tabular data ingestion, or scientific processing code.
Stars
33
Forks
4
Install this agent skill to your Project
npx add-skill https://github.com/Jamie-BitFlight/claude_skills/tree/main/plugins/python-engineering/skills/python3-data
SKILL.md
Python Data
Load python3-core for standing defaults. Load python3-typing for boundary schemas. Load python3-testing for parser and edge-case tests.
Quality Checklist
- Schema validated at first stable ingress point — not deep in transforms
-
dtype=explicit inpd.read_csv()/pd.read_excel()— never rely on inference - No raw
pd.DataFramecrossing module boundaries without documented column contract - Merge/join results checked for unexpected nulls and row count changes
-
model_config = {"strict": True}on all Pydantic boundary models - No
inplace=True— deprecated, returnsNone, causes silent bugs - Notebook logic that survived 3+ uses extracted into tested modules
Gotchas
| Trap | What to do instead |
|---|---|
df["a"]["b"] = x (chained indexing) |
df.loc[:, "b"] = x — chained indexing silently fails |
.apply(lambda) on large frames |
Vectorized ops first; .apply() only when no vectorized path exists |
pd.merge() without post-check |
Assert no unexpected nulls or duplicate keys after merge |
df.drop(..., inplace=True) |
df = df.drop(...) — inplace is deprecated and returns None |
Bare pd.read_csv(path) |
Always pass dtype= to prevent silent type inference errors |
Decision Table
| Task | Use | Not |
|---|---|---|
| Tabular < 1M rows | pandas | Polars (overhead not justified) |
| Tabular > 1M rows or need speed | Polars | pandas |
| SQL-like analytics on local files | DuckDB | Loading everything into pandas |
| Read-only TOML config | tomllib (stdlib, binary mode "rb") |
tomlkit |
| Read/write TOML preserving comments | tomlkit (text mode) |
tomllib |
Module Layout
text
etl/
├── ingest.py # raw data loading (boundary)
├── validate.py # schema validation (boundary)
├── transform.py # business logic (typed core)
├── load.py # output writing (boundary)
└── types.py # shared typed models
Didn't find tool you were looking for?