Agent skill

prepare-dataset

Process and validate datasets for training. Use when setting up data pipelines.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/prepare-dataset

SKILL.md

Prepare Dataset

Load, preprocess, and validate datasets for machine learning model training including normalization and augmentation.

When to Use

Setting up data pipelines for training
Normalizing and cleaning raw data
Splitting into train/validation/test sets
Applying data augmentation

Quick Reference

python

# Dataset preparation pipeline
class DatasetLoader:
    def load(self, path: str) -> Tuple[ndarray, ndarray]:
        # Load raw data
        pass

    def normalize(self, data: ndarray) -> ndarray:
        # Normalize to [0, 1] or standardize
        pass

    def split(self, data: ndarray, ratios: Tuple[float, float, float]):
        # Split into train/val/test
        pass

    def augment(self, data: ndarray) -> ndarray:
        # Apply transformations if needed
        pass

Workflow

Load raw data: Read dataset from file (CSV, HDF5, NumPy)
Validate data: Check shape, dtype, missing values
Preprocess: Normalize, standardize, encode categorical features
Split sets: Create train/validation/test splits
Augment data: Apply transformations if needed (rotation, flip, etc.)

Output Format

Dataset preparation report:

Raw data shape and statistics
Data validation results (missing values, outliers)
Preprocessing applied (normalization, encoding)
Train/val/test split sizes
Final dataset shape and statistics
Augmentation transformations applied

References

See extract-hyperparameters skill for data preprocessing config
See evaluate-model skill for test set evaluation
See /notes/review/mojo-ml-patterns.md for Mojo data loading

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/prepare-dataset
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Prepare Dataset

When to Use

Quick Reference

Workflow

Output Format

References

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state