Agent skills
ai-ml-principal-engineer

Agent skill

ai-ml-principal-engineer

Principal/Senior-level AI/ML playbook for production machine learning systems, LLM-enabled backends, model serving, training pipelines, evaluation discipline, reliability, security, and MLOps. Use when: designing ML services, building or reviewing training/inference code, selecting model architectures, fine-tuning transformers, hardening model APIs, debugging performance or correctness issues, or preparing ML systems for production.

View SKILL.md on GitHub Repository

Stars 4

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/mOdrA40/claude-codex-skills-directory/tree/main/backend-skills/ai-ml-mastery-skill

SKILL.md

AI/ML Mastery (Senior → Principal)

Operate

Start by confirming: objective, success metric, data availability, privacy/security constraints, latency and throughput targets, compute budget, deployment target, and the definition of done.
Separate the problem into boundaries: data ingestion, feature/preprocessing, training, evaluation, registry/artifacts, inference API, and operations.
Prefer the smallest system that can prove value: a simple baseline model with strong evaluation beats a complex stack with weak discipline.
Treat ML work as software engineering: reproducibility, observability, rollback, and failure handling are part of the feature.

The goal is not just a high offline metric. The goal is a model-backed backend that is correct, measurable, operable, and safe in production.

Default Standards

Keep notebooks for exploration only; production logic belongs in versioned Python modules and tests.
Validate schema, dtypes, ranges, nullability, and label quality at the data boundary.
Make training and inference preprocessing identical by sharing explicit pipeline code.
Prefer typed config objects and immutable runtime settings.
Use structured logging and explicit error taxonomy for data, model, dependency, and serving failures.
Define latency budgets, timeout behavior, fallback behavior, and model version strategy before exposing public inference endpoints.
Default to simpler baselines before large models; earn complexity with measured gains.

“Bad vs Good” (common production pitfalls)

python

# ❌ BAD: training and inference use different preprocessing.
train_text = text.lower().strip()
serve_text = text.strip()

# ✅ GOOD: one shared preprocessing pipeline used everywhere.
normalized_text = text_normalizer.normalize(text)

python

# ❌ BAD: silent fallback hides model loading failures.
try:
    model = load_model(path)
except Exception:
    model = None

# ✅ GOOD: fail explicitly or switch to a known degraded mode.
try:
    model = load_model(path)
except FileNotFoundError as error:
    raise ModelBootstrapError(f"model artifact missing: {path}") from error

python

# ❌ BAD: unbounded inference call with no deadline.
prediction = client.predict(payload)

# ✅ GOOD: explicit deadline and graceful failure mapping.
prediction = client.predict(payload, timeout=2.0)

Workflow (Feature / Refactor / Bug)

Define the business outcome, online/offline metrics, and failure tolerance.
Establish a reproducible baseline and dataset contract.
Design boundaries between training code, model packaging, and serving code.
Implement the smallest end-to-end slice with tests and evaluation reports.
Validate reproducibility, security, performance, and rollback readiness.
Ship with monitoring for latency, throughput, drift, quality, and cost.

Validation Commands

Run python -m pytest.
Run python -m ruff check . if Ruff is used.
Run python -m mypy src for typed code paths when the repo uses MyPy.
Run python -m pytest -k inference for serving-critical tests.
Run python -m pytest --maxfail=1 --disable-warnings during local debugging.
Run smoke evaluation for the current model artifact before release.
Run container build validation if inference is deployed via Docker.

Backend-Oriented ML Guardrails

Always version models, prompts, tokenizer assets, and preprocessing artifacts together.
Do not call external model providers from request paths without timeouts, retries, budgets, and fallback behavior.
Separate online inference from heavy offline batch jobs.
Prefer async queue-based processing for expensive enrichment, reranking, or embedding backfills.
Protect inference endpoints with payload size limits, authn/authz, and rate limiting.
Log request IDs, model version, feature version, and decision metadata without leaking raw sensitive payloads.

Decision Framework: Library Selection

Task	Default Choice	Use Alternative When
Deep learning training	PyTorch	TensorFlow for TPU-heavy production, JAX for research-heavy experimentation
Classical/tabular ML	scikit-learn	XGBoost/LightGBM for stronger tabular baselines, CatBoost for categorical-heavy data
LLM application layer	transformers + sentence-transformers	vLLM for high-throughput serving, llama.cpp for edge or constrained environments
Data processing	pandas	polars for larger columnar workloads, dask/spark for distributed pipelines
Experiment tracking	MLflow	Weights & Biases or Neptune when team workflows require hosted collaboration
Hyperparameter tuning	Optuna	Ray Tune when you need distributed search orchestration

Architecture Selection Heuristics

text

Text classification          -> DistilBERT for speed, RoBERTa for stronger accuracy
Embeddings / retrieval       -> sentence-transformers or hosted embedding APIs with evaluation gates
Vision classification        -> ResNet/EfficientNet as baseline, ViT when data and budget justify it
Object detection             -> YOLO for speed, DETR/RT-DETR when workflow favors transformer-based designs
Tabular prediction           -> Logistic regression / XGBoost baseline first, deep tabular only if proven necessary
Recommendation               -> retrieval + ranking pipelines, not a single monolithic model by default
Time series                  -> statistical baseline first, then TFT/PatchTST when complexity is justified

Recommended Project Structure

text

project/
├── pyproject.toml
├── README.md
├── src/
│   └── app/
│       ├── config/
│       ├── data/
│       ├── features/
│       ├── models/
│       ├── training/
│       ├── evaluation/
│       ├── inference/
│       ├── serving/
│       └── observability/
├── tests/
├── scripts/
├── configs/
├── notebooks/
└── docker/

Reliability, Security, and Operations

Make model bootstrap behavior explicit: fail closed, fail open, or degraded mode.
Bound input sizes, token counts, image dimensions, and recursion depth for untrusted requests.
Prefer queue-based retries over client-side blind retries for expensive inference.
Track feature drift, data freshness, and serving skew between training and production.
Keep PII out of prompts, logs, traces, and experiment artifacts unless explicitly required and governed.
Store secrets and provider credentials in secret managers, never in notebooks or source files.

Training and Evaluation Checklist

Define offline and online success metrics before training
Fix random seeds when reproducibility matters
Check train/validation/test leakage
Validate preprocessing parity between train and serve
Save model artifact, config, tokenizer, and feature metadata together
Record dataset version and experiment version
Benchmark latency, throughput, memory, and cost
Define rollback or model disable strategy before release

References

Deep learning systems: references/deep-learning.md
Transformers and LLMs: references/transformers-llm.md
Computer vision: references/computer-vision.md
Classical machine learning: references/machine-learning.md
NLP systems: references/nlp.md
MLOps and deployment: references/mlops.md
Production model serving: references/production-serving.md
Evaluation and release guardrails: references/evaluation-and-guardrails.md
Retrieval and RAG systems: references/retrieval-and-rag-systems.md
Inference reliability and cost control: references/inference-reliability-and-cost.md

Maintainer

mOdrA40 Core maintainer

Source details

Full Name: mOdrA40/claude-codex-skills-directory
Branch: main
Path in repo: backend-skills/ai-ml-mastery-skill

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

mOdrA40/claude-codex-skills-directory

nuxt-tanstack-mastery

Panduan senior/lead developer 20 tahun pengalaman untuk Vue.js 3 + Nuxt 3 + TanStack Query development. Gunakan skill ini ketika: (1) Membuat project Nuxt 3 baru dengan arsitektur production-ready, (2) Integrasi TanStack Query untuk data fetching, (3) Debugging Vue/Nuxt yang kompleks, (4) Review code untuk clean code compliance, (5) Optimisasi performa aplikasi Vue/Nuxt, (6) Setup folder structure yang scalable, (7) Mencari library terpercaya untuk Vue ecosystem, (8) Menghindari common pitfalls dan bugs, (9) Implementasi state management patterns, (10) Security hardening aplikasi Nuxt. Trigger keywords: vue, vuejs, nuxt, nuxtjs, tanstack, vue-query, composition api, pinia, vueuse, vue router, clean code vue, debugging vue, folder structure nuxt.

4 0

Explore

mOdrA40/claude-codex-skills-directory

solidjs-solidstart-expert

Expert-level SolidJS and SolidStart development skill with 20+ years senior/lead engineer mindset. Comprehensive guidance for building production-ready, scalable web applications with fine-grained reactivity. Use when Claude needs to: (1) Create new SolidJS/SolidStart projects, (2) Implement TanStack Query/Router/Table/Form integration, (3) Build reactive components with signals/stores/resources, (4) Handle SSR/SSG/streaming with SolidStart, (5) Implement authentication and API routes, (6) Optimize bundle size and performance, (7) Debug reactivity issues and memory leaks, (8) Structure large-scale applications, (9) Implement type-safe patterns with TypeScript, (10) Handle error boundaries and suspense, (11) Build accessible UI components, (12) Deploy to Vercel/Netlify/Cloudflare. Triggers: "solid", "solidjs", "solidstart", "createSignal", "createStore", "createResource", "tanstack solid", "vinxi", "fine-grained reactivity".

4 0

Explore

mOdrA40/claude-codex-skills-directory

react-tanstack-senior

Expertise senior/lead React developer 20 tahun dengan TanStack ecosystem (Query, Router, Table, Form, Start). Gunakan skill ini ketika: (1) Membuat aplikasi React dengan TanStack libraries, (2) Review/refactor kode React untuk clean code, (3) Debugging React/TanStack issues, (4) Setup project structure yang maintainable, (5) Optimasi performa React apps, (6) Memilih library yang tepat untuk use case tertentu, (7) Mencegah common bugs dan memory leaks, (8) Implementasi best practices KISS dan less is more. Trigger keywords: React, TanStack, React Query, TanStack Router, TanStack Table, TanStack Form, TanStack Start, Vinxi, clean code, refactor, performance, debugging.

4 0

Explore

mOdrA40/claude-codex-skills-directory

clickhouse-principal-engineer

Principal/Senior-level ClickHouse playbook for analytical schema design, partitioning, ingestion, query performance, replication, storage strategy, and operating large-scale columnar systems. Use when: designing OLAP workloads, reviewing MergeTree layout, tuning analytical queries, building event analytics platforms, or operating ClickHouse in production.

4 0

Explore

mOdrA40/claude-codex-skills-directory

mysql-principal-engineer

Principal/Senior-level MySQL playbook for schema design, indexing, transactions, replication, operational reliability, online migrations, and production workload tuning. Use when: designing relational systems, reviewing query/index strategy, operating MySQL fleets, debugging contention or replication lag, or hardening MySQL-backed applications.

4 0

Explore

mOdrA40/claude-codex-skills-directory

mongodb-principal-engineer

Principal/Senior-level MongoDB playbook for document modeling, indexing, replication, sharding, query design, observability, and production reliability. Use when: designing document schemas, reviewing aggregation/query performance, operating replicas/shards, or hardening MongoDB-backed systems.

4 0

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

AI/ML Mastery (Senior → Principal)

Operate

Default Standards

“Bad vs Good” (common production pitfalls)

Workflow (Feature / Refactor / Bug)

Validation Commands

Backend-Oriented ML Guardrails

Decision Framework: Library Selection

Architecture Selection Heuristics

Recommended Project Structure

Reliability, Security, and Operations

Training and Evaluation Checklist

References

Recommended Agent Skills

nuxt-tanstack-mastery

solidjs-solidstart-expert

react-tanstack-senior

clickhouse-principal-engineer

mysql-principal-engineer

mongodb-principal-engineer