Agent skill

Infrastructure Sizing and Capacity Planning

Methods for determining the optimal resource allocation for compute, database, and network systems to balance cost and performance.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/infra-sizing

SKILL.md

Infrastructure Sizing and Capacity Planning

Overview

Infrastructure sizing is the process of determining the exact amount of CPU, Memory, Storage, and Network capacity required for a workload. Effective sizing avoids both Over-provisioning (wasted money) and Under-provisioning (poor performance/outages).

Core Principle: "Sizing is not a one-time event; it is a continuous feedback loop based on real utilization metrics."


1. Right-Sizing Principles

Traditional sizing used the "Peak + Buffer" model, leading to massive waste. Modern sizing uses Demand-Driven Allocation.

Principle Description
Utilization Thresholds Target 40-70% CPU utilization. Below 40% is over-provisioned; above 80% is risky.
Vertical first... Increase resource limits for single-threaded or monolithic apps.
...Horizontal usually Spread load across multiple small instances for resilience and elasticity.
Metric-Based Use P95 or P99 metrics for latency, but Average for base capacity sizing.

2. Compute Sizing (EC2, VMs, GCE)

Step 1: Resource Profiling

Run your app in a staging environment and measure:

  • CPU: Is the app CPU-bound (mathematical calculations, compression)?
  • Memory: Is it memory-bound (caching, large payloads, in-memory DBs)?
  • Thread Usage: How many concurrent requests can one CPU core handle?

Step 2: Instance Family Selection

Family Best For AWS Example GCP Example
General Purpose Balanced workloads, small DBs t3, m6g n2, e2
Compute Optimized Batch processing, high-traffic APIs c6g, c7i c2, c3
Memory Optimized Redis, high-RAM DBs, Analytics r6g, x2 m1, m2

Sizing Formula (Basic)

Target Instances = (Total Peak Concurrent Requests * Avg Service Time per Req) / (Target Utilization per Core * Core Count)


3. Database Sizing (RDS, Cloud SQL, Azure SQL)

IOPS (Input/Output Operations Per Second)

Disk performance is often the bottleneck, not CPU.

  • GP3 (AWS): Baseline 3,000 IOPS included. Provision more for heavy writes.
  • Provisioned IOPS (io2): For high-performance transactional DBs.

Storage Growth Calculation

Required Storage = (Initial Data Size) + (Daily Ingest * Retention Period) * (1 + Overhead Buffer)

  • Buffer: Always keep 20% free to allow for indexing and temp file creation.

Connection Pool Sizing

Max Connections = (Instance RAM / 10MB) - (System Reserve)

  • Too many connections lead to high "Context Switching" and performance degradation.

4. Cache Sizing (Redis/Memcached)

Caching is a trade-off between Memory Cost and Latency Benefits.

Formula: Working Set Size

Not all data needs to be in cache. Only store the Working Set (frequently accessed data).

  1. Measure Total Data Size.
  2. Analyze Access Distribution (Pareto Principle: 80% access to 20% data).
  3. Cache Size = 20-30% of Total Data Size.

Eviction Policy Impact

  • allkeys-lru: Best for general caching.
  • noeviction: Returns errors when full (dangerous).

5. Container Sizing (Kubernetes)

Understanding the difference between Requests and Limits is critical for both stability and cost.

Metric Purpose Cost Impact
Requests Kubernetes guarantees this capacity. Used for scheduling. High: Cloud Providers charge based on requests.
Limits The maximum a pod can burst to. Low: Generally doesn't impact cost unless using serverless K8s.

The "OOMKill" Trap

If Memory Requests < Actual Usage, the pod might be scheduled on a node that runs out of RAM, leading to an OOMKill (Out Of Memory).


6. Serverless Sizing (Lambda / Cloud Functions)

Serverless "scaling" is handled by the provider, but "sizing" (Memory allocation) is handled by you.

  • Power Tuning: In AWS Lambda, increasing Memory also increases CPU proportionaly.
  • Strategy: Use AWS Lambda Power Tuning to find the "Sweet Spot" where performance and cost intersect.
Memory (MB) Duration (ms) Cost ($) Result
128 1000 0.0000021 Slow
512 200 0.0000016 Winner (Faster & Cheaper)
1024 150 0.0000025 Diminishing returns

7. Network and CDN Sizing

  • Throughput: Measure P99 payload size * Peak requests per second.
  • CDN Coverage: What % of your traffic can be served from the edge?
    • Goal: > 80% Cache Hit Ratio for static assets.
    • Impact: CDN bandwidth is 50-70% cheaper than origin egress.

8. Load Testing for Capacity Planning

Never size based on assumptions. Use tools like k6, Locust, or JMeter.

  1. Stepping Test: Gradually increase users until latency spikes (The "Knee" of the curve).
  2. Soak Test: Run at 80% load for 24 hours to find memory leaks.
  3. Stress Test: Find the "Breaking Point" to configure failover/auto-scaling.

9. Monitoring for Right-Sizing

The Dashboard Template (Grafana/Datadog)

  • CPU Heatmap: Identify idle periods (e.g., weekends).
  • RAM Saturation: Identify "Memory Bloat".
  • Disk Queue Depth: Identify IOPS bottlenecks.
  • Network In/Out: Identify efficient vs inefficient regions.

Automated Right-Sizing Tools

  • AWS Compute Optimizer: Provides JSON recommendations for instance types.
  • VPA (Vertical Pod Autoscaler): Automatically adjusts K8s requests/limits.
  • Goldilocks: A K8s dashboard that visualizes VPA recommendations.

10. Capacity Planning Template

Component Metric Current Load Growth (6mo) Buffer Target Spec
Web Tier Peak Req/sec 500 2x (1000) 20% 4x c6g.large
Database Storage 500GB +100GB/mo 30% 1.5TB GP3
Cache Working Set 8GB 12GB 10% 16GB Node

11. Real Sizing Scenario: SaaS API

  • Initial Setup: 10 nodes of m5.xlarge (4 vCPU, 16GB RAM). Monthly cost: $1,400.
  • Observation: CPU average 12%, RAM average 40%.
  • Analysis: The app is memory-bound, but CPU is idle.
  • Action: Switched to 5 nodes of t3.large ($350/mo) + enabled Auto-scaling.
  • Result: 75% cost reduction while maintaining the same performance metrics.

Related Skills

  • 40-system-resilience/graceful-degradation
  • 42-cost-engineering/cloud-cost-models
  • 42-cost-engineering/budget-guardrails

Didn't find tool you were looking for?

Be as detailed as possible for better results