Agent skill
data-management
Comprehensive DataFrame loading, filtering, transformation, and data pipeline management from Excel, CSV, and multiple sources with YAML-driven configuration.
Install this agent skill to your Project
npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/data/analysis/data-management
SKILL.md
Data Management Skill
Overview
This skill provides comprehensive data management capabilities including loading data from Excel/CSV files, filtering DataFrames by column values, applying transformations, managing data arrays, and building data pipelines. All operations are configurable via YAML files for reproducible data workflows.
Key Components
DataManagement Class (data_management.py)
High-level data pipeline management:
router(cfg)- Route data operations based on configurationget_df_data(cfg)- Load DataFrame from configurationget_df_array_from_cfg(cfg)- Load multiple DataFrames as arrayget_filtered_df(data_set_cfg, df)- Apply filters to DataFrameget_transformed_df(data_set_cfg, df)- Apply transformations to DataFrame
ReadFromExcel Class (data.py)
Excel file reading with sheet selection:
from_xlsx(cfg, file_index)- Read Excel files with configurable sheet selection- Supports multiple sheets, header row configuration, data range selection
ReadFromCSV Class (data.py)
CSV file reading with encoding detection:
to_df(cfg, file_index)- Read CSV to DataFrame- Automatic encoding detection with chardet
- Configurable delimiter, header options
ReadData Class (data.py)
Advanced data reading operations:
df_filter_by_column_values(cfg, df, file_index)- Filter DataFrame by column valuesxlsx_to_df_by_keyword_search(cfg)- Read Excel by keyword-based row searchget_data_from_xlsx_and_csv(cfg)- Unified Excel/CSV reading
Usage Patterns
Data Loading Configuration
data:
files:
- path: "data.xlsx"
sheet_name: "Sheet1"
header_row: 0
columns: ["A", "B", "C"]
Filtering Configuration
data:
filter:
column: "status"
values: ["active", "pending"]
operator: "in" # in, equals, gt, lt, contains
Transformation Configuration
data:
transform:
- type: "rename"
mapping: {"old_col": "new_col"}
- type: "add_column"
name: "calculated"
expression: "col_a + col_b"
Common Workflows
- Excel Pipeline: Load Excel → Filter rows → Transform columns → Export
- Multi-Source Merge: Load CSV + Excel → Merge on key → Validate → Save
- Data Validation: Load data → Apply filters → Check constraints → Report
- Batch Processing: Config with file list → Process each → Aggregate results
Module Location
- Pipeline:
src/assetutilities/common/data_management.py - Readers:
src/assetutilities/common/data.py
Dependencies
- pandas (DataFrame operations)
- openpyxl (Excel reading)
- chardet (encoding detection)
- numpy (numerical operations)
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-complete-milestone
Archive completed milestone and prepare for next version
gsd-reapply-patches
Reapply local modifications after a GSD update
gsd-verify-work
Validate built features through conversational UAT
gsd-thread
Manage persistent context threads for cross-session work
clinical-trial-protocol
Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.
single-cell-rna-qc
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.
Didn't find tool you were looking for?