Agent skill

coreweave-prod-checklist

Production readiness checklist for CoreWeave GPU workloads. Use when launching inference services, preparing GPU training for production, or validating deployment configurations. Trigger with phrases like "coreweave production", "coreweave go-live", "coreweave checklist", "coreweave launch".

Stars 1,803
Forks 241

Install this agent skill to your Project

npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/coreweave-pack/skills/coreweave-prod-checklist

SKILL.md

CoreWeave Production Checklist

Inference Services

  • GPU type and count validated for model size
  • Autoscaling configured (KServe or HPA)
  • Health and readiness probes set
  • Resource requests AND limits specified
  • Node affinity targeting correct GPU class
  • minReplicas >= 1 for production (no cold starts)

Storage

  • Model weights in PVC (not downloaded at startup)
  • Checkpoints saved to persistent storage
  • Storage class appropriate (SSD for inference, HDD for archival)

Security

  • Secrets for model tokens and registry access
  • Network policies applied
  • Container images from trusted registries

Monitoring

  • GPU utilization metrics collected
  • Inference latency and throughput tracked
  • Alert on pod restarts and OOM events
  • Log aggregation configured

Rollback

bash
kubectl rollout undo deployment/my-inference
kubectl rollout status deployment/my-inference

Resources

Next Steps

For upgrades, see coreweave-upgrade-migration.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results