Agent skill
experiment-tracking
Master ML experiment tracking - MLflow, W&B, Neptune, versioning, reproducibility
Install this agent skill to your Project
npx add-skill https://github.com/pluginagentmarketplace/custom-plugin-mlops/tree/main/skills/experiment-tracking
SKILL.md
Experiment Tracking Skill
Learn: Master ML experiment tracking for reproducibility and collaboration.
Skill Overview
| Attribute | Value |
|---|---|
| Bonded Agent | 02-experiment-tracking |
| Difficulty | Intermediate |
| Duration | 30 hours |
| Prerequisites | mlops-basics |
Learning Objectives
- Set up experiment tracking infrastructure
- Log parameters, metrics, and artifacts systematically
- Compare experiments and identify best models
- Use model registry for version management
- Collaborate with team using shared tracking
Topics Covered
Module 1: Platform Setup (6 hours)
Platform Comparison:
| Feature | MLflow | W&B | Neptune |
|---|---|---|---|
| Self-hosted | ✅ | ❌ | ❌ |
| Free tier | ✅ | ✅ | ✅ |
| Real-time | ❌ | ✅ | ✅ |
| Git integration | ⚠️ | ✅ | ✅ |
Setup Exercises:
- Install MLflow and start local server
- Create W&B account and initialize project
- Compare UI/UX of both platforms
Module 2: Experiment Logging (10 hours)
What to Log:
# Complete logging example
with mlflow.start_run():
# 1. Parameters (hyperparameters, configs)
mlflow.log_params({
"learning_rate": 0.001,
"batch_size": 32,
"model_type": "transformer"
})
# 2. Metrics (per-step and final)
for epoch in range(10):
mlflow.log_metrics({
"train_loss": train_loss,
"val_loss": val_loss
}, step=epoch)
# 3. Artifacts (models, plots, configs)
mlflow.log_artifact("confusion_matrix.png")
mlflow.pytorch.log_model(model, "model")
# 4. Tags (for filtering)
mlflow.set_tags({
"experiment_type": "baseline",
"dataset_version": "v2.1"
})
Module 3: Model Registry (8 hours)
Registry Workflow:
┌─────────────────────────────────────────────────────────────┐
│ MODEL REGISTRY FLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ Train → Log Model → Register → Staging → Production → Archive
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Version 1 Validate Deploy │
│ Version 2 A/B Test Monitor │
│ Version N Approve Rollback │
│ │
└─────────────────────────────────────────────────────────────┘
Exercises:
- Register a trained model
- Promote model through stages
- Implement rollback procedure
Module 4: Best Practices (6 hours)
Naming Conventions:
experiments/
├── {project_name}/
│ ├── {experiment_type}_{date}/
│ │ ├── run_{config_hash}/
Reproducibility Checklist:
- Log git commit hash
- Capture environment (pip freeze)
- Set and log random seeds
- Log data version/hash
- Save config files as artifacts
Code Templates
Template: Production Experiment Tracker
# templates/experiment_tracker.py
import mlflow
import hashlib
import subprocess
from datetime import datetime
class ProductionExperimentTracker:
"""Production-ready experiment tracking wrapper."""
def __init__(self, experiment_name: str, tracking_uri: str):
mlflow.set_tracking_uri(tracking_uri)
mlflow.set_experiment(experiment_name)
self.run = None
def start_run(self, run_name: str = None):
"""Start a new tracked run."""
self.run = mlflow.start_run(run_name=run_name)
# Auto-log environment info
self._log_environment()
return self
def _log_environment(self):
"""Capture reproducibility information."""
# Git info
try:
git_hash = subprocess.check_output(
["git", "rev-parse", "HEAD"]
).decode().strip()
mlflow.set_tag("git_commit", git_hash)
except:
pass
# Timestamp
mlflow.set_tag("run_timestamp", datetime.now().isoformat())
def log_config(self, config: dict):
"""Log configuration as parameters."""
# Flatten nested config
flat_config = self._flatten_dict(config)
mlflow.log_params(flat_config)
def log_metrics(self, metrics: dict, step: int = None):
"""Log metrics with optional step."""
mlflow.log_metrics(metrics, step=step)
def log_model(self, model, artifact_path: str = "model"):
"""Log model with signature."""
mlflow.pytorch.log_model(model, artifact_path)
def end_run(self):
"""End the current run."""
if self.run:
mlflow.end_run()
def _flatten_dict(self, d: dict, parent_key: str = '') -> dict:
"""Flatten nested dictionary."""
items = []
for k, v in d.items():
new_key = f"{parent_key}.{k}" if parent_key else k
if isinstance(v, dict):
items.extend(self._flatten_dict(v, new_key).items())
else:
items.append((new_key, v))
return dict(items)
Troubleshooting Guide
| Issue | Cause | Solution |
|---|---|---|
| Runs not syncing | Network issue | Check connectivity, use offline mode |
| Large artifacts fail | Size limit | Use cloud storage for large files |
| Duplicate run names | No uniqueness | Add timestamp or hash to names |
Resources
- MLflow Documentation
- W&B Documentation
- [See: training-pipelines] - Integrate tracking with pipelines
Version History
| Version | Date | Changes |
|---|---|---|
| 2.0.0 | 2024-12 | Production-grade with templates |
| 1.0.0 | 2024-11 | Initial release |
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
mlops-basics
Master MLOps fundamentals - lifecycle, principles, tools, practices, and organizational adoption
feature-stores
Master feature stores - Feast, data validation, versioning, online/offline serving
ml-infrastructure
Production-grade ML infrastructure with Kubernetes, auto-scaling, and cost optimization
model-serving
Master model serving - inference optimization, scaling, deployment, edge serving
training-pipelines
Master training pipelines - orchestration, distributed training, hyperparameter tuning
ml-monitoring
Production-grade ML model monitoring, drift detection, and observability
Didn't find tool you were looking for?