Agent skill

devops-patterns

DevOps patterns including CI/CD pipeline design, GitHub Actions, Infrastructure as Code, Docker, Kubernetes, deployment strategies, monitoring, and disaster recovery. Use when setting up CI/CD, deploying applications, managing infrastructure, or creating pipelines.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/devops-patterns

SKILL.md

DevOps Patterns

This skill provides comprehensive guidance for implementing DevOps practices, automation, and deployment strategies.

CI/CD Pipeline Design

Pipeline Stages

yaml
# Complete CI/CD Pipeline
stages:
  - lint          # Code quality checks
  - test          # Run test suite
  - build         # Build artifacts
  - scan          # Security scanning
  - deploy-dev    # Deploy to development
  - deploy-staging # Deploy to staging
  - deploy-prod   # Deploy to production

Pipeline Best Practices

1. Fast Feedback: Run fastest checks first

yaml
jobs:
  # Quick checks first (1-2 minutes)
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm run lint

  type-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm run type-check

  # Longer tests after (5-10 minutes)
  test:
    needs: [lint, type-check]
    runs-on: ubuntu-latest
    steps:
      - run: npm test

2. Fail Fast: Stop pipeline on first failure 3. Idempotent: Running twice produces same result 4. Versioned: Pipeline config in version control

GitHub Actions Patterns

Basic Workflow Structure

yaml
# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  NODE_VERSION: '18'

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/coverage-final.json

Reusable Workflows

yaml
# .github/workflows/reusable-test.yml
name: Reusable Test Workflow

on:
  workflow_call:
    inputs:
      node-version:
        required: true
        type: string
    secrets:
      DATABASE_URL:
        required: true

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: ${{ inputs.node-version }}
      - run: npm ci
      - run: npm test
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}

# Use in another workflow
# .github/workflows/main.yml
jobs:
  call-test:
    uses: ./.github/workflows/reusable-test.yml
    with:
      node-version: '18'
    secrets:
      DATABASE_URL: ${{ secrets.DATABASE_URL }}

Matrix Strategy

yaml
# Test across multiple versions
jobs:
  test:
    strategy:
      matrix:
        node-version: [16, 18, 20]
        os: [ubuntu-latest, windows-latest, macos-latest]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/setup-node@v3
        with:
          node-version: ${{ matrix.node-version }}
      - run: npm test

Custom Actions

yaml
# .github/actions/deploy/action.yml
name: 'Deploy Application'
description: 'Deploy to specified environment'
inputs:
  environment:
    description: 'Target environment'
    required: true
  api-key:
    description: 'Deployment API key'
    required: true

runs:
  using: 'composite'
  steps:
    - run: |
        echo "Deploying to ${{ inputs.environment }}"
        ./deploy.sh ${{ inputs.environment }}
      env:
        API_KEY: ${{ inputs.api-key }}
      shell: bash

# Usage
jobs:
  deploy:
    steps:
      - uses: ./.github/actions/deploy
        with:
          environment: production
          api-key: ${{ secrets.DEPLOY_KEY }}

Conditional Execution

yaml
jobs:
  deploy:
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
      - name: Deploy to production
        run: ./deploy.sh production

  notify:
    if: failure()
    runs-on: ubuntu-latest
    steps:
      - name: Send failure notification
        uses: slack/notify@v2
        with:
          message: 'Build failed!'

Infrastructure as Code (Terraform)

Project Structure

terraform/
├── modules/
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── eks/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   │   ├── main.tf
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       └── terraform.tfvars
└── global/
    └── s3/
        └── main.tf

VPC Module Example

hcl
# modules/vpc/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "${var.environment}-vpc"
    Environment = var.environment
  }
}

resource "aws_subnet" "public" {
  count             = length(var.public_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.public_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.environment}-public-${count.index + 1}"
  }
}

# modules/vpc/variables.tf
variable "environment" {
  description = "Environment name"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
}

variable "public_subnet_cidrs" {
  description = "CIDR blocks for public subnets"
  type        = list(string)
}

variable "availability_zones" {
  description = "Availability zones"
  type        = list(string)
}

# modules/vpc/outputs.tf
output "vpc_id" {
  value = aws_vpc.main.id
}

output "public_subnet_ids" {
  value = aws_subnet.public[*].id
}

Using Modules

hcl
# environments/prod/main.tf
terraform {
  required_version = ">= 1.0"

  backend "s3" {
    bucket = "my-terraform-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region = "us-east-1"
}

module "vpc" {
  source = "../../modules/vpc"

  environment          = "prod"
  vpc_cidr            = "10.0.0.0/16"
  public_subnet_cidrs = ["10.0.1.0/24", "10.0.2.0/24"]
  availability_zones  = ["us-east-1a", "us-east-1b"]
}

module "eks" {
  source = "../../modules/eks"

  cluster_name    = "prod-cluster"
  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.public_subnet_ids
  node_count      = 3
  node_instance_type = "t3.large"
}

Docker Best Practices

Multi-Stage Builds

dockerfile
# Build stage
FROM node:18-alpine AS builder

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy source code
COPY . .

# Build application
RUN npm run build

# Production stage
FROM node:18-alpine AS production

WORKDIR /app

# Copy only necessary files from builder
COPY --from=builder /app/package*.json ./
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

USER nodejs

EXPOSE 3000

CMD ["node", "dist/index.js"]

Layer Optimization

dockerfile
# ✅ GOOD - Dependencies cached separately
FROM node:18-alpine

WORKDIR /app

# Copy package files first (rarely change)
COPY package*.json ./
RUN npm ci

# Copy source code (changes frequently)
COPY . .
RUN npm run build

# ❌ BAD - Everything in one layer
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm ci && npm run build
# Cache invalidated on every source change

Security Best Practices

dockerfile
# ✅ Use specific versions
FROM node:18.17.1-alpine

# ✅ Run as non-root user
RUN addgroup -g 1001 nodejs && \
    adduser -S nodejs -u 1001
USER nodejs

# ✅ Use .dockerignore
# .dockerignore:
node_modules
.git
.env
*.md
.github

# ✅ Scan for vulnerabilities
# docker scan myapp:latest

# ✅ Use minimal base images
FROM node:18-alpine  # Not node:18 (full)

# ✅ Don't include secrets
# Use build args or runtime env vars
ARG API_KEY
ENV API_KEY=${API_KEY}

Docker Compose for Development

yaml
# docker-compose.yml
version: '3.8'

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile.dev
    ports:
      - '3000:3000'
    volumes:
      - .:/app
      - /app/node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgresql://user:pass@db:5432/mydb
    depends_on:
      - db
      - redis

  db:
    image: postgres:15-alpine
    ports:
      - '5432:5432'
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=mydb
    volumes:
      - postgres_data:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    ports:
      - '6379:6379'

volumes:
  postgres_data:

Kubernetes Patterns

Deployment

yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:1.0.0
        ports:
        - containerPort: 3000
        env:
        - name: NODE_ENV
          value: production
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: myapp-secrets
              key: database-url
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5

Service

yaml
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  selector:
    app: myapp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: LoadBalancer

ConfigMap and Secrets

yaml
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
data:
  LOG_LEVEL: info
  MAX_CONNECTIONS: "100"

# secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: myapp-secrets
type: Opaque
data:
  database-url: cG9zdGdyZXNxbDovL3VzZXI6cGFzc0BkYjU0MzIvbXlkYg==
  api-key: c2tfbGl2ZV9hYmMxMjN4eXo=

Ingress

yaml
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-service
            port:
              number: 80

Deployment Strategies

Blue-Green Deployment

yaml
# Blue deployment (current production)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: myapp
        image: myapp:1.0.0

---
# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: myapp
        image: myapp:2.0.0

---
# Service (switch by changing selector)
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  selector:
    app: myapp
    version: blue  # Change to 'green' to switch
  ports:
  - port: 80
    targetPort: 3000

Canary Deployment

yaml
# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-stable
spec:
  replicas: 9
  selector:
    matchLabels:
      app: myapp
      track: stable

---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
      track: canary

---
# Service routes to both
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  selector:
    app: myapp  # Matches both stable and canary
  ports:
  - port: 80
    targetPort: 3000

Rolling Update

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2        # Max 2 extra pods during update
      maxUnavailable: 1  # Max 1 pod unavailable during update
  selector:
    matchLabels:
      app: myapp
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:2.0.0

Database Migration Strategies

Forward-Only Migrations

typescript
// ✅ GOOD - Backwards compatible
// Step 1: Add new column (nullable)
await db.schema.alterTable('users', (table) => {
  table.string('phone_number').nullable();
});

// Step 2: Populate data
await db('users').update({
  phone_number: db.raw('contact_info'),
});

// Step 3: Make non-nullable (separate deployment)
await db.schema.alterTable('users', (table) => {
  table.string('phone_number').notNullable().alter();
});

// Step 4: Drop old column (separate deployment)
await db.schema.alterTable('users', (table) => {
  table.dropColumn('contact_info');
});

Zero-Downtime Migrations

typescript
// Rename column without downtime

// Migration 1: Add new column
await db.schema.alterTable('users', (table) => {
  table.string('email_address').nullable();
});

// Update application code to write to both columns
class User {
  async save() {
    await db('users').update({
      email: this.email,
      email_address: this.email, // Write to both
    });
  }
}

// Migration 2: Backfill data
await db.raw(`
  UPDATE users
  SET email_address = email
  WHERE email_address IS NULL
`);

// Migration 3: Update app to read from new column
class User {
  get email() {
    return this.email_address; // Read from new column
  }
}

// Migration 4: Drop old column
await db.schema.alterTable('users', (table) => {
  table.dropColumn('email');
});

Environment Management

Environment Configuration

typescript
// config/environments.ts
interface EnvironmentConfig {
  database: {
    host: string;
    port: number;
    name: string;
  };
  api: {
    baseUrl: string;
    timeout: number;
  };
  features: {
    enableNewFeature: boolean;
  };
}

const environments: Record<string, EnvironmentConfig> = {
  development: {
    database: {
      host: 'localhost',
      port: 5432,
      name: 'myapp_dev',
    },
    api: {
      baseUrl: 'http://localhost:3000',
      timeout: 30000,
    },
    features: {
      enableNewFeature: true,
    },
  },
  staging: {
    database: {
      host: 'staging-db.example.com',
      port: 5432,
      name: 'myapp_staging',
    },
    api: {
      baseUrl: 'https://staging-api.example.com',
      timeout: 10000,
    },
    features: {
      enableNewFeature: true,
    },
  },
  production: {
    database: {
      host: process.env.DB_HOST!,
      port: parseInt(process.env.DB_PORT!),
      name: 'myapp_prod',
    },
    api: {
      baseUrl: 'https://api.example.com',
      timeout: 5000,
    },
    features:  {
      enableNewFeature: false,
    },
  },
};

export const config = environments[process.env.NODE_ENV || 'development'];

Monitoring

Prometheus Metrics

typescript
import prometheus from 'prom-client';

// Create metrics
const httpRequestDuration = new prometheus.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status'],
});

const httpRequestTotal = new prometheus.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status'],
});

// Middleware to track metrics
app.use((req, res, next) => {
  const start = Date.now();

  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;

    httpRequestDuration
      .labels(req.method, req.route?.path || req.path, res.statusCode.toString())
      .observe(duration);

    httpRequestTotal
      .labels(req.method, req.route?.path || req.path, res.statusCode.toString())
      .inc();
  });

  next();
});

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', prometheus.register.contentType);
  res.end(await prometheus.register.metrics());
});

Grafana Dashboard

json
{
  "dashboard": {
    "title": "Application Metrics",
    "panels": [
      {
        "title": "Request Rate",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])"
          }
        ]
      },
      {
        "title": "Response Time (p95)",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, http_request_duration_seconds_bucket)"
          }
        ]
      },
      {
        "title": "Error Rate",
        "targets": [
          {
            "expr": "rate(http_requests_total{status=~\"5..\"}[5m])"
          }
        ]
      }
    ]
  }
}

Log Aggregation

typescript
// Winston logger with JSON format
import winston from 'winston';

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: {
    service: 'myapp',
    environment: process.env.NODE_ENV,
  },
  transports: [
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' }),
  ],
});

// Structured logging
logger.info('User logged in', {
  userId: user.id,
  email: user.email,
  ip: req.ip,
});

Disaster Recovery

Backup Strategy

bash
#!/bin/bash
# backup-database.sh

# Configuration
DB_HOST="${DB_HOST}"
DB_NAME="${DB_NAME}"
BACKUP_DIR="/backups"
S3_BUCKET="s3://my-backups"
RETENTION_DAYS=30

# Create backup
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.sql.gz"

# Dump database
pg_dump -h "${DB_HOST}" -U postgres "${DB_NAME}" | gzip > "${BACKUP_FILE}"

# Upload to S3
aws s3 cp "${BACKUP_FILE}" "${S3_BUCKET}/"

# Remove local backup
rm "${BACKUP_FILE}"

# Delete old backups from S3
aws s3 ls "${S3_BUCKET}/" | while read -r line; do
  FILE_DATE=$(echo "$line" | awk '{print $1}')
  FILE_NAME=$(echo "$line" | awk '{print $4}')

  FILE_EPOCH=$(date -d "$FILE_DATE" +%s)
  CURRENT_EPOCH=$(date +%s)
  DAYS_OLD=$(( (CURRENT_EPOCH - FILE_EPOCH) / 86400 ))

  if [ $DAYS_OLD -gt $RETENTION_DAYS ]; then
    aws s3 rm "${S3_BUCKET}/${FILE_NAME}"
  fi
done

Recovery Plan

markdown
## Disaster Recovery Plan

### RTO (Recovery Time Objective): 4 hours
### RPO (Recovery Point Objective): 1 hour

### Recovery Steps:

1. **Assess the situation**
   - Identify scope of failure
   - Notify stakeholders

2. **Restore database**
   ```bash
   # Download latest backup
   aws s3 cp s3://my-backups/latest.sql.gz /tmp/

   # Restore database
   gunzip -c /tmp/latest.sql.gz | psql -h new-db -U postgres myapp
  1. Deploy application

    bash
    # Deploy to new infrastructure
    kubectl apply -f k8s/production/
    
    # Update DNS
    aws route53 change-resource-record-sets ...
    
  2. Verify recovery

    • Run smoke tests
    • Check monitoring dashboards
    • Verify critical features
  3. Post-mortem

    • Document incident
    • Identify root cause
    • Create action items

## When to Use This Skill

Use this skill when:
- Setting up CI/CD pipelines
- Deploying applications
- Managing infrastructure
- Implementing deployment strategies
- Configuring monitoring
- Planning disaster recovery
- Containerizing applications
- Orchestrating with Kubernetes
- Automating workflows
- Scaling infrastructure

---

**Remember**: DevOps is about automation, reliability, and continuous improvement. Invest in your infrastructure and deployment processes to enable faster, safer releases.

Didn't find tool you were looking for?

Be as detailed as possible for better results