Agent skill

drift-detection-implementation

Implement drift detection features for data quality monitoring including baseline storage, history tracking, thresholds, and validation wrappers

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/drift-detection-implementation

Metadata

Additional technical details for this skill

tags
validation,drift,data-quality,testing,baseline,history
version
1
complexity
medium
created at
2026-01-29T23:54:28.547Z
updated at
2026-01-29T23:54:28.547Z
lines estimate
~350

SKILL.md

Purpose

Implement drift detection with baseline storage, history tracking, configurable thresholds, and validation check wrappers.

When To Use

  • User asks to implement drift detection features
  • Ticket requires KS/PSI tests, baseline storage, history tracking, or validation wrappers

Architecture

Implement in src/vibe_piper/validation/drift_detection.py:

Configuration Types

  • DriftThresholds dataclass: warning, critical, psi_warning, psi_critical, ks_significance
  • BaselineMetadata dataclass: baseline_id, created_at, sample_size, columns, description
  • DriftHistoryEntry dataclass: timestamp, baseline_id, method, drift_score, max_drift_score, drifted_columns, alert_level

Result Types

  • DriftResult dataclass: method, drift_score, drifted_columns, p_values, statistics, recommendations, timestamp
  • ColumnDriftResult dataclass: column_name, drift_score, p_value, is_significant, baseline_distribution, new_distribution, recommendation

Storage Classes

  • BaselineStore class:
    • init(storage_dir)
    • _baseline_path(baseline_id) -> Path
    • add_baseline(baseline_id, data, description) -> BaselineMetadata
    • get_baseline(baseline_id, schema=None) -> Sequence[DataRecord]
    • get_metadata(baseline_id) -> BaselineMetadata
    • list_baselines() -> list[BaselineMetadata]
    • delete_baseline(baseline_id)
    • JSON file storage with metadata and data list

History Tracking

  • DriftHistory class:
    • init(storage_dir)
    • _history_path(baseline_id) -> Path
    • add_entry(result, baseline_id, thresholds) -> DriftHistoryEntry
    • get_entries(baseline_id, limit=None) -> list[DriftHistoryEntry]
    • get_trend(baseline_id, window=10) -> dict[str, Any]
    • clear_history(baseline_id)
    • JSONL append-only storage (one line per entry)

Alerting

  • check_drift_alert(result, thresholds) -> tuple[bool, str] (should_alert, alert_level)

Validation Check Wrappers

  • check_drift_ks(column, baseline, thresholds=None) -> Callable[[Sequence[DataRecord]], ValidationResult]
  • check_drift_psi(column, baseline, thresholds=None) -> Callable[[Sequence[DataRecord]], ValidationResult]
    • Convert DriftResult to ValidationResult based on alert_level
    • Map errors (critical drift), warnings (recommendations, drifted columns)

Dependencies

  • scipy (optional) - Import inside functions with TYPE_CHECKING guard
  • datetime.utcnow (deprecation warning - consider datetime.now(datetime.UTC))
  • json, pathlib, dataclasses

Testing Pattern

  • Create test fixtures with sample_schema
  • Test BaselineStore: add, get, get_metadata, list, delete operations
  • Test DriftHistory: add_entry, get_entries, get_trend, clear_history
  • Test thresholds validation
  • Test validation wrappers with stable/drifted data
  • Test alerting logic
  • Use tempfile for BaselineStore/DriftHistory storage
  • Aim for 85%+ coverage

Exports

Update src/vibe_piper/validation/init.py to export new classes and functions.

Manual notes

This section is preserved when the skill is updated. Put human notes, caveats, and exceptions here.

Didn't find tool you were looking for?

Be as detailed as possible for better results