Topic: codex
8,457 skills in this topic.
-
modal-serverless-gpu
Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling.
Orchestra-Research/AI-Research-SKILLs 6,644
-
skypilot-multi-cloud-orchestration
Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or optimize GPU costs across providers.
Orchestra-Research/AI-Research-SKILLs 6,644
-
awq-quantization
Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.
Orchestra-Research/AI-Research-SKILLs 6,644
-
quantizing-models-bitsandbytes
Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.
Orchestra-Research/AI-Research-SKILLs 6,644
-
hqq-quantization
Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.
Orchestra-Research/AI-Research-SKILLs 6,644
-
gptq
Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.
Orchestra-Research/AI-Research-SKILLs 6,644
-
gguf-quantization
GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.
Orchestra-Research/AI-Research-SKILLs 6,644
-
optimizing-attention-flash
Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.
Orchestra-Research/AI-Research-SKILLs 6,644
-
openrlhf-training
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.
Orchestra-Research/AI-Research-SKILLs 6,644
-
miles-rl-training
Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.
Orchestra-Research/AI-Research-SKILLs 6,644
-
grpo-rl-training
Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
Orchestra-Research/AI-Research-SKILLs 6,644
-
transformer-lens-interpretability
Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.
Orchestra-Research/AI-Research-SKILLs 6,644
-
sparse-autoencoder-training
Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language models.
Orchestra-Research/AI-Research-SKILLs 6,644
-
pyvene-interventions
Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.
Orchestra-Research/AI-Research-SKILLs 6,644
-
nnsight-remote-interpretability
Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.
Orchestra-Research/AI-Research-SKILLs 6,644
-
unsloth
Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
Orchestra-Research/AI-Research-SKILLs 6,644
-
peft-fine-tuning
Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.
Orchestra-Research/AI-Research-SKILLs 6,644
-
llama-factory
Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support
Orchestra-Research/AI-Research-SKILLs 6,644
-
axolotl
Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support
Orchestra-Research/AI-Research-SKILLs 6,644
-
simpo-training
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
Orchestra-Research/AI-Research-SKILLs 6,644
-
slime-rl-training
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
Orchestra-Research/AI-Research-SKILLs 6,644
-
torchforge-rl-training
Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.
Orchestra-Research/AI-Research-SKILLs 6,644
-
fine-tuning-with-trl
Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
Orchestra-Research/AI-Research-SKILLs 6,644
-
verl-rl-training
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
Orchestra-Research/AI-Research-SKILLs 6,644