Agent skill
ml-antipattern-validator
Prevents 30+ critical AI/ML mistakes including data leakage, evaluation errors, training pitfalls, and deployment issues. Use when working with ML training, testing, model evaluation, or deployment.
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/doyajin174/ml-antipattern-validator
SKILL.md
ML Antipattern Validator
Overview
AI/ML 개발에서 30+ 안티패턴을 감지하고 방지하는 스킬입니다.
Key Principle: Honest evaluation > Impressive metrics.
When to Activate
Automatic Triggers:
- ML training code (
train*.py, model training) - Dataset preparation or splitting
- Model evaluation or testing
- Production deployment planning
Manual Triggers:
@validate-ml- Full validation@check-leakage- Data leakage detection@verify-eval- Evaluation methodology
Pre-Implementation Checklist
✅ Requirements:
□ Problem clearly defined with success metrics
□ Train/test split strategy defined
□ Evaluation methodology matches business objective
✅ Data Integrity:
□ No temporal leakage (future → past)
□ No target leakage (answer in features)
□ No preprocessing leakage (fit on all data)
□ No group leakage (related samples split)
✅ Evaluation Setup:
□ Test set completely held out
□ Metrics aligned with business objective
□ Baseline models defined
Critical Antipatterns
Category 1: Data Leakage 🚨
1.1 Target Leakage
❌ WRONG: Using "refund_issued" to predict "purchase_fraud"
✅ CORRECT: Only use features available at purchase time
1.2 Temporal Leakage
❌ WRONG: train = df[df['date'] > '2024-06-01'] # Future data
✅ CORRECT: train = df[df['date'] < '2024-06-01'] # Past for training
1.3 Preprocessing Leakage
❌ WRONG: X_scaled = scaler.fit_transform(X); train_test_split(X_scaled)
✅ CORRECT: Split first, then scaler.fit(X_train)
1.4 Group Leakage
❌ WRONG: train_test_split(df) # Same user in both sets
✅ CORRECT: GroupShuffleSplit(groups=df['user_id'])
1.5 Data Augmentation Leakage
❌ WRONG: augment(X) → train_test_split()
✅ CORRECT: train_test_split() → augment(X_train)
Category 2: Evaluation Mistakes ⚠️
2.1 Testing on Training Data
❌ WRONG: evaluate(model, training_data)
✅ CORRECT: evaluate(model, unseen_test_data)
2.2 Metric Misalignment
Business Objective → Appropriate Metric:
- Ranking → NDCG, MRR, MAP
- Imbalanced → F1, Precision@K, AUC-PR
- Balanced → Accuracy, AUC-ROC
2.3 Accuracy Paradox
❌ WRONG: 99% accuracy on 99:1 imbalanced data
✅ CORRECT: Check per-class metrics with classification_report()
2.4 Invalid Time Series CV
❌ WRONG: cross_val_score(model, X, y, cv=5) # Shuffles time!
✅ CORRECT: TimeSeriesSplit(n_splits=5)
2.5 Hyperparameter Tuning on Test Set
❌ WRONG: grid_search(model, X_test, y_test)
✅ CORRECT: train/validation/test three-way split
Category 3: Training Pitfalls 🔧
3.1 Batch Norm Inference Error
❌ WRONG: predictions = model(X_test) # Still in train mode
✅ CORRECT: model.eval(); with torch.no_grad(): predictions = model(X_test)
3.2 Early Stopping Overfitting
❌ WRONG: EarlyStopping(patience=50)
✅ CORRECT: EarlyStopping(patience=5, min_delta=0.001, restore_best_weights=True)
3.3 Learning Rate Warmup
✅ CORRECT: get_linear_schedule_with_warmup(num_warmup_steps=1000)
3.4 Class Imbalance
❌ WRONG: CrossEntropyLoss() # Biased toward majority
✅ CORRECT: CrossEntropyLoss(weight=class_weights)
Detection Patterns
Leakage Detection
# Check feature-target correlation
correlation = df[features].corrwith(df['target'])
if (correlation.abs() > 0.95).any():
raise DataLeakageError("Suspiciously high correlation")
# Check temporal ordering
if train['date'].min() > test['date'].max():
raise TemporalLeakageError("Training on future, testing on past")
# Check group overlap
if train_groups & test_groups:
raise GroupLeakageError("Overlapping groups")
Mode Check
if model.training:
raise InferenceModeError("Model in training mode during evaluation")
Validation Checklist
Before deployment:
- No data leakage detected
- Test set never seen during training
- Metrics aligned with business objective
- model.eval() called for inference
- Class imbalance handled
- Covariate shift monitoring planned
References
상세 예시 및 시나리오는 references/REFERENCE.md 참조.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?