Agent skill
flyio
Install this agent skill to your Project
npx add-skill https://github.com/FortiumPartners/ai-mesh/tree/main/skills/flyio
SKILL.md
Fly.io Infrastructure Skills
Version: 1.0.0 | Target Size: <25KB | Purpose: Fast reference for Fly.io deployments and global application distribution
Overview
What is Fly.io: Modern platform-as-a-service (PaaS) for deploying applications globally with minimal configuration. Fly.io transforms containers into micro-VMs that run on physical hardware across 30+ regions worldwide.
When to Use Fly.io:
- Simple to moderate applications requiring global distribution
- Fast deployments without complex Kubernetes orchestration
- Minimal operations overhead with PaaS simplicity
- Edge computing and low-latency requirements (anycast routing)
- Startup/SaaS applications with unpredictable traffic patterns
- Databases requiring multi-region active replication (Fly Postgres)
When to Use Kubernetes Instead:
- Complex microservices architectures with 10+ interdependent services
- Existing Kubernetes expertise and tooling investment
- Hybrid cloud or multi-cloud requirements (cloud-agnostic)
- Advanced orchestration needs (service mesh, custom operators, advanced scheduling)
- Enterprise compliance requirements (HIPAA, PCI-DSS on-premises)
- Extensive third-party ecosystem integrations (Helm charts, operators)
Detection Criteria:
- Auto-loads when
fly.tomldetected in project root - Manual:
--tools=flyioflag - Use Case: Global application deployment with PaaS simplicity
Progressive Disclosure:
- This file (SKILL.md): Quick reference for immediate use
- REFERENCE.md: Comprehensive guide with advanced patterns and deployment strategies
Table of Contents
- fly.toml Quick Reference
- Deployment Patterns
- Secrets Management
- Networking Basics
- Health Checks
- Scaling Patterns
- Common Commands Cheat Sheet
- Quick Troubleshooting
- Framework-Specific Examples
fly.toml Quick Reference
Minimal Configuration (Node.js Express)
# app name (must be globally unique on Fly.io)
app = "my-express-app"
[build]
# Dockerfile-based build (default)
dockerfile = "Dockerfile"
[env]
# Non-sensitive environment variables
NODE_ENV = "production"
PORT = "8080"
[[services]]
internal_port = 8080 # Container port
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80 # External HTTP
[[services.ports]]
handlers = ["tls", "http"]
port = 443 # External HTTPS
# Health check
[[services.http_checks]]
interval = "10s"
timeout = "2s"
grace_period = "5s"
method = "GET"
path = "/health"
Key Concepts:
- app: Globally unique application name (DNS-safe)
- build: How to build your application (Dockerfile, buildpacks)
- env: Non-sensitive configuration (use secrets for sensitive data)
- services: Network services exposed to the internet
- internal_port: Port your app listens on inside container
- ports: External ports (80 HTTP, 443 HTTPS)
- http_checks: Health check configuration
Node.js (Express, Fastify, Koa)
app = "nodejs-api"
[build]
dockerfile = "Dockerfile"
[env]
NODE_ENV = "production"
PORT = "8080"
[[services]]
internal_port = 8080
protocol = "tcp"
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.http_checks]]
interval = "10s"
timeout = "2s"
grace_period = "5s"
method = "GET"
path = "/health"
protocol = "http"
Dockerfile (minimal):
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 8080
CMD ["node", "server.js"]
Next.js Application
app = "nextjs-app"
[build]
dockerfile = "Dockerfile"
[env]
NODE_ENV = "production"
PORT = "3000"
[[services]]
internal_port = 3000
protocol = "tcp"
auto_stop_machines = true
auto_start_machines = true
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.http_checks]]
interval = "15s"
timeout = "5s"
grace_period = "10s"
method = "GET"
path = "/"
Dockerfile (Next.js standalone):
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/next.config.js ./
COPY --from=builder /app/public ./public
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
EXPOSE 3000
CMD ["node", "server.js"]
Python (Django, FastAPI, Flask)
app = "python-api"
[build]
dockerfile = "Dockerfile"
[env]
PYTHONUNBUFFERED = "1"
PORT = "8000"
[[services]]
internal_port = 8000
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.http_checks]]
interval = "10s"
timeout = "2s"
grace_period = "5s"
method = "GET"
path = "/health"
Dockerfile (FastAPI example):
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Go Microservice
app = "go-service"
[build]
dockerfile = "Dockerfile"
[env]
PORT = "8080"
[[services]]
internal_port = 8080
protocol = "tcp"
auto_stop_machines = true
auto_start_machines = true
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.tcp_checks]]
interval = "10s"
timeout = "2s"
grace_period = "5s"
Dockerfile (multi-stage Go build):
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.* ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app/server
FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /app/server /app/server
EXPOSE 8080
CMD ["/app/server"]
Ruby on Rails
app = "rails-app"
[build]
dockerfile = "Dockerfile"
[env]
RAILS_ENV = "production"
PORT = "3000"
[[services]]
internal_port = 3000
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.http_checks]]
interval = "10s"
timeout = "5s"
grace_period = "10s"
method = "GET"
path = "/health"
Elixir Phoenix
app = "phoenix-app"
[build]
dockerfile = "Dockerfile"
[env]
PORT = "4000"
MIX_ENV = "prod"
[deploy]
release_command = "/app/bin/migrate"
[[services]]
internal_port = 4000
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.http_checks]]
interval = "10s"
timeout = "2s"
grace_period = "5s"
method = "GET"
path = "/health"
Key Feature: release_command runs database migrations before deployment.
Deployment Patterns
Zero-Downtime Deployments
Fly.io's rolling deployment ensures zero downtime:
[deploy]
strategy = "rolling" # Default
max_unavailable = 0.33 # 33% of machines can be down during deploy
How it works:
- New machines start with updated code
- Health checks pass on new machines
- Old machines drain connections (graceful shutdown)
- Old machines stop after connections close
Graceful Shutdown:
// Express.js example
const server = app.listen(PORT);
process.on('SIGTERM', () => {
console.log('SIGTERM received, closing server gracefully');
server.close(() => {
console.log('Server closed');
process.exit(0);
});
});
Health Check Requirements:
[[services.http_checks]]
interval = "10s"
timeout = "2s"
grace_period = "5s" # Wait 5s before first check
method = "GET"
path = "/health"
Blue-Green Deployment
# Deploy to new machines without affecting current ones
fly deploy --strategy=immediate --no-cache
# Traffic still routes to old machines
# Validate new deployment
fly status
# Switch traffic to new machines (instant cutover)
fly deploy --strategy=bluegreen
When to use:
- Critical production deployments requiring validation
- Database migrations requiring downtime window
- High-risk changes requiring instant rollback capability
Canary Releases
Gradual traffic shifting:
# Deploy new version to 10% of machines
fly deploy --strategy=canary:10
# Monitor metrics, errors, logs
fly logs
# Increase to 50%
fly deploy --strategy=canary:50
# Full rollout
fly deploy --strategy=rolling
When to use:
- New features with uncertain performance impact
- Gradual rollout to detect issues early
- A/B testing scenarios
Rollback Procedures
# View release history
fly releases
# Example output:
# VERSION STATUS DESCRIPTION
# v3 current Deploy by user@example.com
# v2 complete Deploy by user@example.com
# v1 complete Deploy by user@example.com
# Rollback to previous version (v2)
fly deploy --image registry.fly.io/my-app:v2
# Immediate rollback (no health checks)
fly deploy --strategy=immediate --image registry.fly.io/my-app:v2
Emergency Rollback:
# Rollback and bypass all health checks
fly deploy --strategy=immediate --no-cache --image registry.fly.io/my-app:v2
Secrets Management
Setting Secrets
# Set single secret
fly secrets set DATABASE_URL="postgres://user:pass@host/db"
# Set multiple secrets
fly secrets set \
API_KEY="abc123" \
JWT_SECRET="xyz789" \
STRIPE_KEY="sk_live_..."
# Set secret from file
fly secrets set SSL_CERT="$(cat cert.pem)"
# Import from .env file
fly secrets import < .env
Important: Secrets are encrypted at rest and never logged.
Multi-Environment Segregation
Development:
fly secrets set -a my-app-dev \
DATABASE_URL="postgres://dev-host/myapp_dev" \
STRIPE_KEY="sk_test_..."
Staging:
fly secrets set -a my-app-staging \
DATABASE_URL="postgres://staging-host/myapp_staging" \
STRIPE_KEY="sk_test_..."
Production:
fly secrets set -a my-app-prod \
DATABASE_URL="postgres://prod-host/myapp_prod" \
STRIPE_KEY="sk_live_..."
Best Practice: Use separate apps for each environment (-dev, -staging, -prod).
Viewing and Removing Secrets
# List secret names (values hidden)
fly secrets list
# Unset secret
fly secrets unset API_KEY
# Unset multiple secrets
fly secrets unset API_KEY JWT_SECRET
Security Note: Secret values cannot be retrieved after setting. Store backups securely.
Secrets Rotation Strategy
# 1. Set new secret with different name
fly secrets set NEW_DATABASE_URL="postgres://new-host/db"
# 2. Update application code to use NEW_DATABASE_URL
fly deploy
# 3. Verify application works with new secret
fly logs
# 4. Remove old secret
fly secrets unset DATABASE_URL
# 5. Rename secret (optional)
fly secrets set DATABASE_URL="$(fly secrets list | grep NEW_DATABASE_URL)"
fly secrets unset NEW_DATABASE_URL
Networking Basics
Internal Service Communication
Fly.io provides internal DNS for private service communication:
<app-name>.internal
Example (microservices):
# api-service fly.toml
app = "api-service"
[[services]]
internal_port = 8080
protocol = "tcp"
# worker-service fly.toml
app = "worker-service"
[env]
API_URL = "http://api-service.internal:8080"
Key Benefits:
- No external traffic (faster, more secure)
- Automatic service discovery
- Load balancing across all machines
External Access Configuration
HTTP/HTTPS:
[[services]]
internal_port = 8080
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
TCP/UDP:
[[services]]
internal_port = 5432
protocol = "tcp"
[[services.ports]]
port = 5432
WebSocket:
[[services]]
internal_port = 8080
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80
# WebSocket support included automatically
Private Networking
Secure communication between apps:
# Allocate private IP
fly ips private
# Example: Connect database to API
# Database app
fly ips private -a my-postgres
# API app connects via internal DNS
# my-postgres.internal:5432
Use Cases:
- Database connections (Postgres, Redis)
- Backend-to-backend communication
- Admin tools accessing production systems
Fly Proxy and Anycast Routing
How it works:
- User request hits nearest Fly edge location (anycast)
- Fly Proxy routes to nearest healthy machine
- Machine processes request
- Response returns to user
Benefits:
- Ultra-low latency (edge routing)
- Automatic SSL/TLS termination
- Global load balancing
- DDoS protection
Regions:
# List available regions
fly platform regions
# Add region
fly regions add lhr # London Heathrow
# Remove region
fly regions remove lhr
# Backup regions (fallback)
fly regions set iad ord sjc --backup lhr
Health Checks
HTTP Health Checks
[[services.http_checks]]
interval = "10s" # Check every 10 seconds
timeout = "2s" # Timeout after 2 seconds
grace_period = "5s" # Wait 5s before first check (app startup)
method = "GET"
path = "/health"
protocol = "http"
tls_skip_verify = false
headers = {}
Health Check Endpoint (Express.js):
app.get('/health', (req, res) => {
// Check database connection
db.ping()
.then(() => res.status(200).json({ status: 'healthy' }))
.catch(() => res.status(503).json({ status: 'unhealthy' }));
});
TCP Health Checks
For non-HTTP services (databases, caches):
[[services.tcp_checks]]
interval = "10s"
timeout = "2s"
grace_period = "5s"
When to use:
- Redis, Postgres, MongoDB
- gRPC services
- Custom TCP protocols
Custom Script-Based Health Checks
Use HTTP endpoint that runs comprehensive checks:
app.get('/health', async (req, res) => {
const checks = {
database: false,
redis: false,
external_api: false
};
try {
// Database check
await db.query('SELECT 1');
checks.database = true;
// Redis check
await redis.ping();
checks.redis = true;
// External API check (optional)
await axios.get('https://api.example.com/ping', { timeout: 1000 });
checks.external_api = true;
// All checks passed
res.status(200).json({ status: 'healthy', checks });
} catch (error) {
res.status(503).json({ status: 'unhealthy', checks, error: error.message });
}
});
Zero-Downtime Deployment Health Checks
Requirements:
- grace_period: Allow app startup time
- Fast response: <2s timeout (avoid slow queries)
- Critical dependencies only: Don't fail on non-critical services
[[services.http_checks]]
interval = "10s"
timeout = "2s"
grace_period = "10s" # Allow 10s for app startup
method = "GET"
path = "/health"
Best Practice:
// Fast health check (critical dependencies only)
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1'); # Database only
res.status(200).json({ status: 'ok' });
} catch (error) {
res.status(503).json({ status: 'error' });
}
});
// Comprehensive readiness check (separate endpoint)
app.get('/ready', async (req, res) => {
// Check all dependencies
// Caches, external APIs, etc.
});
Scaling Patterns
Horizontal Scaling
# Scale to 3 machines
fly scale count 3
# Scale per region
fly scale count 2 --region iad # US East
fly scale count 2 --region lhr # London
# View current scale
fly status
When to scale:
- Increased traffic (CPU >70%, memory >80%)
- Global distribution requirements
- High availability (multiple machines per region)
Auto-Scaling Configuration
[[services]]
internal_port = 8080
protocol = "tcp"
auto_stop_machines = true # Stop idle machines
auto_start_machines = true # Start on traffic
min_machines_running = 1 # Always keep 1 running
[http_service.concurrency]
type = "requests"
soft_limit = 200 # Start new machine at 200 concurrent requests
hard_limit = 250 # Reject requests at 250 concurrent requests
How it works:
- Traffic increases → new machines start automatically
- Traffic decreases → machines stop after idle timeout
- Cost savings: Pay only for running machines
Best Practice:
- Set
min_machines_running = 1for production (avoid cold starts) - Set
min_machines_running = 0for dev/staging (cost optimization)
Regional Distribution
# Add regions for global distribution
fly regions add iad ord sjc # US East, Central, West
fly regions add lhr ams # London, Amsterdam
fly regions add nrt syd # Tokyo, Sydney
# View current regions
fly regions list
# Remove region
fly regions remove nrt
Strategy:
- Start with 1-2 regions near users
- Expand based on latency metrics
- Use backup regions for disaster recovery
Resource Limits (CPU, Memory)
# View available machine types
fly platform vm-sizes
# Scale machine size
fly scale vm shared-cpu-1x # 1 shared CPU, 256MB RAM
fly scale vm shared-cpu-2x # 2 shared CPUs, 512MB RAM
fly scale vm dedicated-cpu-1x # 1 dedicated CPU, 2GB RAM
# Custom memory
fly scale memory 512 # 512MB RAM
Machine Types:
- shared-cpu-1x: Small apps, low traffic (256MB)
- shared-cpu-2x: Medium apps (512MB)
- dedicated-cpu-1x: Production apps, consistent performance (2GB)
- dedicated-cpu-2x: High-traffic apps (4GB)
Common Commands Cheat Sheet
Deployment
fly deploy # Deploy application
fly deploy --strategy rolling # Rolling deployment (zero downtime)
fly deploy --strategy immediate # Immediate deployment (downtime)
fly deploy --remote-only # Build on Fly.io (not locally)
fly deploy --no-cache # Rebuild without cache
Scaling
fly scale count 3 # Scale to 3 machines
fly scale vm shared-cpu-2x # Change machine type
fly scale memory 512 # Set memory to 512MB
fly regions add iad # Add region
fly regions remove iad # Remove region
fly autoscale standard min=1 max=10 # Enable autoscaling
Secrets
fly secrets set KEY=value # Set secret
fly secrets list # List secret names
fly secrets unset KEY # Remove secret
fly secrets import < .env # Import from file
Monitoring
fly logs # View logs
fly logs -a my-app # Logs for specific app
fly logs --region iad # Logs for specific region
fly status # Application status
fly vm status # Machine status
fly checks list # Health check status
Debugging
fly ssh console # SSH into machine
fly ssh console -C "ps aux" # Run command via SSH
fly ssh sftp shell # SFTP access
fly proxy 5432 # Port forward to local machine
fly doctor # Diagnose issues
App Management
fly apps list # List all apps
fly apps create my-app # Create new app
fly apps destroy my-app # Delete app
fly open # Open app in browser
fly info # Application info
Quick Troubleshooting
Top 10 Common Issues
- Deployment fails → Check health checks:
fly logs, adjustgrace_period - App not starting → Review logs:
fly logs, check Dockerfile CMD/ENTRYPOINT - Network timeout → Verify
internal_portmatches app port, check firewall rules - Database connection fails → Check secrets:
fly secrets list, verifyDATABASE_URL - High memory usage → Scale machine:
fly scale vm shared-cpu-2x, optimize app - SSL certificate issues → Verify domain:
fly certs show, add certificate:fly certs add example.com - Regional latency → Add regions:
fly regions add lhr ams, monitor withfly logs - Health check failures → Adjust grace period:
grace_period = "10s", simplify health endpoint - Build failures → Review Dockerfile:
fly deploy --local-only, check dependencies - Cost concerns → Right-size machines:
fly scale vm shared-cpu-1x, enable auto-stop:auto_stop_machines = true
Deployment Troubleshooting
Failed health checks:
# View health check status
fly checks list
# Increase grace period
# fly.toml
[[services.http_checks]]
grace_period = "15s" # Increase from 5s
# Check logs during deployment
fly logs --region iad
Build failures:
# Build locally to debug
fly deploy --local-only
# Clear cache and rebuild
fly deploy --no-cache
# View build logs
fly builds
Runtime Troubleshooting
High memory usage:
# Check current usage
fly vm status
# Scale up memory
fly scale memory 1024
# Or upgrade machine type
fly scale vm dedicated-cpu-1x
Connection issues:
# Test internal networking
fly ssh console -C "curl http://other-app.internal:8080/health"
# Check external connectivity
fly ssh console -C "curl https://api.example.com"
# Port forward for local debugging
fly proxy 8080
Logs not appearing:
# Ensure app logs to stdout/stderr (not files)
# Check log level
fly logs --region iad
# SSH to machine and check logs
fly ssh console
tail -f /var/log/app.log
Framework-Specific Examples
NestJS Backend
app = "nestjs-api"
[build]
dockerfile = "Dockerfile"
[env]
NODE_ENV = "production"
PORT = "3000"
[[services]]
internal_port = 3000
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.http_checks]]
interval = "10s"
timeout = "2s"
grace_period = "10s"
method = "GET"
path = "/health"
NestJS Health Check:
// health.controller.ts
@Controller('health')
export class HealthController {
@Get()
check() {
return { status: 'ok', timestamp: Date.now() };
}
}
Phoenix LiveView
app = "phoenix-liveview"
[build]
dockerfile = "Dockerfile"
[env]
PORT = "4000"
MIX_ENV = "prod"
SECRET_KEY_BASE = "use-fly-secrets-set-for-this"
[deploy]
release_command = "/app/bin/migrate"
[[services]]
internal_port = 4000
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
Phoenix Health Check:
# router.ex
get "/health", HealthController, :check
# health_controller.ex
defmodule MyApp.HealthController do
use MyAppWeb, :controller
def check(conn, _params) do
json(conn, %{status: "ok"})
end
end
Rails Backend
app = "rails-api"
[build]
dockerfile = "Dockerfile"
[env]
RAILS_ENV = "production"
PORT = "3000"
[deploy]
release_command = "bin/rails db:migrate"
[[services]]
internal_port = 3000
protocol = "tcp"
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
Next Steps
For Advanced Patterns:
- See REFERENCE.md for comprehensive guide with deployment strategies, Fly Postgres, Redis, monitoring
- Covers: Multi-region databases, CI/CD integration, metrics, logging, cost optimization
Common Use Cases:
- Multi-region apps → REFERENCE.md § Global Distribution
- Database setup → REFERENCE.md § Fly Postgres
- CI/CD pipelines → REFERENCE.md § GitHub Actions Integration
- Production monitoring → REFERENCE.md § Observability
Progressive Disclosure: Start here for quick reference, load REFERENCE.md for comprehensive patterns and production examples.
Performance Target: <100ms skill loading (this file ~22KB)
Last Updated: 2025-10-25 | Version: 1.0.0
Didn't find tool you were looking for?