Agent skills
ops-platform-onboarding

Agent skill

ops-platform-onboarding

Structured workflow for onboarding new services to the internal platform including infrastructure provisioning, observability setup, and documentation.

View SKILL.md on GitHub Repository

Stars 169

Forks 18

Install this agent skill to your Project

npx add-skill https://github.com/LerianStudio/ring/tree/main/.archive/ops-team/skills/ops-platform-onboarding

SKILL.md

Platform Onboarding Workflow

This skill defines the structured process for onboarding services to the internal developer platform. Use it to ensure consistent, compliant service deployments.

Onboarding Phases

Phase	Focus	Output
1. Requirements	Gather service requirements	Requirements doc
2. Golden Path Selection	Choose deployment pattern	Selected template
3. Infrastructure Provisioning	Create service resources	Infrastructure ready
4. Observability Setup	Configure monitoring	Dashboards/alerts
5. Security Configuration	Apply security controls	Security validated
6. Documentation	Complete service docs	Runbook ready
7. Handoff	Transfer to service team	Ownership confirmed

Phase 1: Requirements Gathering

Service Requirements Checklist

markdown

## Service Onboarding Request

**Service Name:** [name]
**Team:** [owning team]
**Requested By:** [name]
**Target Date:** YYYY-MM-DD

### Service Information

| Attribute | Value |
|-----------|-------|
| Service type | [API / Worker / Batch / Frontend] |
| Language/runtime | [Go / Node.js / Python / etc.] |
| Criticality | [Tier 1/2/3/4] |
| External traffic | [Yes / No] |
| Data sensitivity | [PII / Financial / Public] |

### Resource Requirements

| Resource | Requirement | Notes |
|----------|-------------|-------|
| CPU | [cores] | [peak/average] |
| Memory | [GB] | [peak/average] |
| Storage | [GB] | [type: SSD/HDD] |
| Database | [type] | [shared/dedicated] |
| Cache | [type] | [shared/dedicated] |

### Dependencies

| Dependency | Type | SLA Required |
|------------|------|--------------|
| [service] | Internal | [Yes/No] |
| [external] | External | [Yes/No] |

### Compliance Requirements

- [ ] SOC2
- [ ] PCI-DSS
- [ ] GDPR
- [ ] HIPAA
- [ ] Other: ____________

Phase 2: Golden Path Selection

Available Golden Paths

Golden Path	Use Case	Includes
api-service	REST/GraphQL APIs	ALB, EKS, RDS, ElastiCache
worker-service	Background processing	SQS, EKS, auto-scaling
batch-job	Scheduled jobs	EventBridge, Lambda/Fargate
frontend-app	Static sites, SPAs	CloudFront, S3, API Gateway
data-pipeline	ETL, streaming	Kinesis, Glue, S3

Golden Path Selection Matrix

Requirement	api-service	worker-service	batch-job
HTTP traffic	Yes	No	No
Queue processing	Optional	Yes	Optional
Scheduled runs	No	No	Yes
Real-time	Yes	Near-real-time	No
Auto-scaling	Yes	Yes	N/A

Selection Template

markdown

## Golden Path Selection

**Service:** [name]
**Selected Path:** [api-service / worker-service / etc.]

### Rationale

1. Service type [X] matches [golden path] pattern
2. Traffic requirements of [X] supported by [features]
3. Compliance requirements met by built-in [controls]

### Customizations Required

| Standard Component | Customization | Reason |
|--------------------|---------------|--------|
| [component] | [change] | [why] |

### Approval

- [ ] Platform team reviewed
- [ ] Security team reviewed (if customizations)
- [ ] Architecture team reviewed (if non-standard)

Phase 3: Infrastructure Provisioning

Provisioning Checklist

Namespace/project created
Compute resources provisioned
Database provisioned (if required)
Cache provisioned (if required)
Load balancer configured
DNS entries created
SSL certificates provisioned
Secrets stored in vault
IAM roles/service accounts created

Terraform/IaC Template

hcl

# Example service provisioning
module "service" {
  source = "platform/service-template"

  service_name    = var.service_name
  team            = var.team
  environment     = var.environment
  golden_path     = "api-service"

  # Compute
  cpu_request     = "500m"
  memory_request  = "512Mi"
  replicas_min    = 2
  replicas_max    = 10

  # Database
  database_enabled = true
  database_class   = "db.t3.medium"

  # Tags
  tags = {
    Team        = var.team
    Environment = var.environment
    CostCenter  = var.cost_center
  }
}

Provisioning Verification

bash

# Verify namespace
kubectl get namespace [service-name]

# Verify compute
kubectl get deployment -n [service-name]

# Verify database
aws rds describe-db-instances --db-instance-identifier [service-db]

# Verify DNS
dig [service-name].internal.example.com

Phase 4: Observability Setup

Observability Checklist

Structured logging configured
Tracing instrumentation added
Metrics endpoints exposed
Service dashboard created
SLI/SLO defined
Alerts configured
On-call integration set up

Dashboard Template

Standard service dashboard includes:

Panel	Metrics
Request rate	requests/sec, by status code
Error rate	5xx rate, 4xx rate
Latency	p50, p95, p99
Saturation	CPU, memory utilization
Dependencies	Upstream/downstream health

Alert Configuration

Alert	Condition	Severity	Response
High error rate	5xx > 1% for 5m	Critical	Page on-call
High latency	p99 > 1s for 5m	Warning	Alert team
Low availability	uptime < 99.9%	Critical	Page on-call
Resource saturation	CPU > 85% for 10m	Warning	Alert team

SLI/SLO Definition

markdown

## Service Level Objectives

**Service:** [name]
**SLO Version:** 1.0

| SLI | Target | Measurement |
|-----|--------|-------------|
| Availability | 99.9% | Successful requests / total requests |
| Latency | p99 < 500ms | Request duration percentile |
| Error rate | < 0.1% | 5xx responses / total responses |

### Error Budget

- Monthly budget: 43.2 minutes downtime
- Current consumption: [X]%
- Actions if budget exceeded: [escalation process]

Phase 5: Security Configuration

Security Checklist

Network policies applied
Service mesh mTLS configured
Secrets management configured
IAM permissions follow least privilege
Security scanning in CI/CD
Dependency scanning enabled
WAF rules applied (if external)

Network Policy Template

yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: service-policy
  namespace: [service-name]
spec:
  podSelector:
    matchLabels:
      app: [service-name]
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: istio-system
      ports:
        - port: 8080
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: database
      ports:
        - port: 5432

Security Review

markdown

## Security Configuration Review

**Service:** [name]
**Reviewer:** @security-team

| Control | Status | Notes |
|---------|--------|-------|
| mTLS enabled | PASS | Istio strict mode |
| Network policies | PASS | Ingress/egress restricted |
| Secrets management | PASS | Using Vault |
| Least privilege IAM | PASS | Scoped to required resources |
| Vulnerability scanning | PASS | Trivy in CI/CD |

Phase 6: Documentation

Required Documentation

Document	Purpose	Template
Service Overview	What the service does	README.md
Runbook	Operational procedures	runbook.md
Architecture	Design decisions	architecture.md
API Docs	Interface documentation	OpenAPI spec
On-call Guide	Incident handling	oncall.md

Runbook Template

markdown

## [Service Name] Runbook

### Service Overview

[Brief description of what the service does]

### Quick Reference

| Item | Value |
|------|-------|
| Repository | [link] |
| Dashboard | [link] |
| Logs | [query link] |
| On-call | [PagerDuty service] |

### Common Operations

#### Restart Service

```bash
kubectl rollout restart deployment/[service] -n [namespace]

Scale Service

bash

kubectl scale deployment/[service] -n [namespace] --replicas=X

Check Logs

bash

kubectl logs -l app=[service] -n [namespace] --tail=100

Troubleshooting

Symptom	Possible Cause	Resolution
High latency	DB connection pool	Scale DB or optimize queries
5xx errors	Dependency down	Check upstream services
OOM kills	Memory leak	Investigate heap, restart

Escalation

Level	Contact	When
L1	[team Slack channel]	First response
L2	[on-call engineer]	Cannot resolve in 15m
L3	[service owner]	Critical/extended outage


---

## Phase 7: Handoff

### Handoff Checklist

- [ ] Service owner identified and trained
- [ ] On-call rotation set up
- [ ] Access provisioned to team
- [ ] Documentation reviewed by team
- [ ] Shadowing session completed
- [ ] Ownership officially transferred

### Handoff Template

```markdown
## Service Handoff Confirmation

**Service:** [name]
**Date:** YYYY-MM-DD
**Platform Team:** @[name]
**Service Owner:** @[name]

### Completed Items

- [x] Infrastructure provisioned and documented
- [x] Observability configured
- [x] Security controls applied
- [x] Runbook created and reviewed
- [x] On-call rotation configured
- [x] Training session completed

### Outstanding Items

| Item | Owner | Due Date |
|------|-------|----------|
| [item] | [owner] | YYYY-MM-DD |

### Acknowledgment

By signing below, the service owner confirms:
1. Receipt of all documentation
2. Understanding of operational procedures
3. Acceptance of on-call responsibilities

**Service Owner:** _________________ Date: _______
**Platform Team:** _________________ Date: _______

Anti-Rationalization Table

Rationalization	Why It's WRONG	Required Action
"Skip documentation, code is self-explanatory"	On-call != developers	Complete runbook
"We'll add observability later"	Blind deployments = incidents	Observability on day 1
"Golden path doesn't fit exactly"	Customizations add complexity	Justify every deviation
"Security can come later"	Later = never for security	Security from start
"Team can figure it out"	Assumptions cause outages	Complete handoff process

Dispatch Specialist

For platform onboarding tasks, dispatch:

Task tool:
  subagent_type: "ring:platform-engineer"
  prompt: |
    SERVICE ONBOARDING REQUEST
    Service: [name]
    Team: [team]
    Type: [API/Worker/Batch]
    Requirements: [summary]
    Golden Path: [if known]

Maintainer

LerianStudio Core maintainer

Source details

Full Name: LerianStudio/ring
Branch: main
Path in repo: .archive/ops-team/skills/ops-platform-onboarding
License: Apache License 2.0
Topics: claude-code workflow-automation ai-agents ai-coding developer-tools prompt-engineering code-quality code-review software-engineering plugin devops finops tdd technical-writing skills-library

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

LerianStudio/ring

ring:regulatory-templates

5-stage regulatory template orchestrator - manages setup, Gate 1 (analysis + auto-save), Gate 2 (validation), Gate 3 (generation), optional Test Gate, optional Contribution Gate. Supports any regulatory template (BACEN, RFB, CVM, SUSEP, COAF, or other).

169 18

Explore

LerianStudio/ring

ring:using-finops-team

3 FinOps agents: 2 for Brazilian financial regulatory compliance (BACEN, RFB, Open Banking), 1 for infrastructure cost estimation when onboarding customers. Supports any regulatory template via open intake system.

169 18

Explore

LerianStudio/ring

ring:regulatory-templates-gate1

Gate 1 sub-skill - performs regulatory compliance analysis, field mapping, batch approval by confidence level, and auto-saves dictionary after approval. Supports both pre-defined templates (dictionary exists) and new templates (any spec).

169 18

Explore

LerianStudio/ring

ring:regulatory-templates-gate2

Gate 2 sub-skill - validates uncertain mappings from Gate 1 and confirms all field specifications through testing.

169 18

Explore

LerianStudio/ring

ring:regulatory-templates-gate3

Gate 3 sub-skill - generates complete .tpl template file with all validated mappings from Gates 1-2.

169 18

Explore

LerianStudio/ring

ring:infrastructure-cost-estimation

Orchestrates infrastructure cost estimation with tier-based or custom TPS sizing. Offers pre-configured tiers (Starter/Growth/Business/Enterprise) or custom TPS input. Skill discovers components, asks shared/dedicated for EACH, selects environment(s), reads actual Helm chart configs, then dispatches agent for accurate calculations.

169 18

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Platform Onboarding Workflow

Onboarding Phases

Phase 1: Requirements Gathering

Service Requirements Checklist

Phase 2: Golden Path Selection

Available Golden Paths

Golden Path Selection Matrix

Selection Template

Phase 3: Infrastructure Provisioning

Provisioning Checklist

Terraform/IaC Template

Provisioning Verification

Phase 4: Observability Setup

Observability Checklist

Dashboard Template

Alert Configuration

SLI/SLO Definition

Phase 5: Security Configuration

Security Checklist

Network Policy Template

Security Review

Phase 6: Documentation

Required Documentation

Runbook Template

Scale Service

Check Logs

Troubleshooting

Escalation

Anti-Rationalization Table

Dispatch Specialist

Recommended Agent Skills

ring:regulatory-templates

ring:using-finops-team

ring:regulatory-templates-gate1

ring:regulatory-templates-gate2

ring:regulatory-templates-gate3

ring:infrastructure-cost-estimation