Agent skill
prepare-dataset
Process and validate datasets for training. Use when setting up data pipelines.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/prepare-dataset
SKILL.md
Prepare Dataset
Load, preprocess, and validate datasets for machine learning model training including normalization and augmentation.
When to Use
- Setting up data pipelines for training
- Normalizing and cleaning raw data
- Splitting into train/validation/test sets
- Applying data augmentation
Quick Reference
python
# Dataset preparation pipeline
class DatasetLoader:
def load(self, path: str) -> Tuple[ndarray, ndarray]:
# Load raw data
pass
def normalize(self, data: ndarray) -> ndarray:
# Normalize to [0, 1] or standardize
pass
def split(self, data: ndarray, ratios: Tuple[float, float, float]):
# Split into train/val/test
pass
def augment(self, data: ndarray) -> ndarray:
# Apply transformations if needed
pass
Workflow
- Load raw data: Read dataset from file (CSV, HDF5, NumPy)
- Validate data: Check shape, dtype, missing values
- Preprocess: Normalize, standardize, encode categorical features
- Split sets: Create train/validation/test splits
- Augment data: Apply transformations if needed (rotation, flip, etc.)
Output Format
Dataset preparation report:
- Raw data shape and statistics
- Data validation results (missing values, outliers)
- Preprocessing applied (normalization, encoding)
- Train/val/test split sizes
- Final dataset shape and statistics
- Augmentation transformations applied
References
- See
extract-hyperparametersskill for data preprocessing config - See
evaluate-modelskill for test set evaluation - See
/notes/review/mojo-ml-patterns.mdfor Mojo data loading
Didn't find tool you were looking for?