Agent skills
server-management

Agent skill

server-management

Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.

View SKILL.md on GitHub Repository

Stars 23,776

Forks 2,298

Install this agent skill to your Project

npx add-skill https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/development/server-management

SKILL.md

Server Management

Server management principles for production operations. Learn to THINK, not memorize commands.

1. Process Management Principles

Tool Selection

Scenario	Tool
Node.js app	PM2 (clustering, reload)
Any app	systemd (Linux native)
Containers	Docker/Podman
Orchestration	Kubernetes, Docker Swarm

Process Management Goals

Goal	What It Means
Restart on crash	Auto-recovery
Zero-downtime reload	No service interruption
Clustering	Use all CPU cores
Persistence	Survive server reboot

2. Monitoring Principles

What to Monitor

Category	Key Metrics
Availability	Uptime, health checks
Performance	Response time, throughput
Errors	Error rate, types
Resources	CPU, memory, disk

Alert Severity Strategy

Level	Response
Critical	Immediate action
Warning	Investigate soon
Info	Review daily

Monitoring Tool Selection

Need	Options
Simple/Free	PM2 metrics, htop
Full observability	Grafana, Datadog
Error tracking	Sentry
Uptime	UptimeRobot, Pingdom

3. Log Management Principles

Log Strategy

Log Type	Purpose
Application logs	Debug, audit
Access logs	Traffic analysis
Error logs	Issue detection

Log Principles

Rotate logs to prevent disk fill
Structured logging (JSON) for parsing
Appropriate levels (error/warn/info/debug)
No sensitive data in logs

4. Scaling Decisions

When to Scale

Symptom	Solution
High CPU	Add instances (horizontal)
High memory	Increase RAM or fix leak
Slow response	Profile first, then scale
Traffic spikes	Auto-scaling

Scaling Strategy

Type	When to Use
Vertical	Quick fix, single instance
Horizontal	Sustainable, distributed
Auto	Variable traffic

5. Health Check Principles

What Constitutes Healthy

Check	Meaning
HTTP 200	Service responding
Database connected	Data accessible
Dependencies OK	External services reachable
Resources OK	CPU/memory not exhausted

Health Check Implementation

Simple: Just return 200
Deep: Check all dependencies
Choose based on load balancer needs

6. Security Principles

Area	Principle
Access	SSH keys only, no passwords
Firewall	Only needed ports open
Updates	Regular security patches
Secrets	Environment vars, not files
Audit	Log access and changes

7. Troubleshooting Priority

When something's wrong:

Check if running (process status)
Check logs (error messages)
Check resources (disk, memory, CPU)
Check network (ports, DNS)
Check dependencies (database, APIs)

8. Anti-Patterns

❌ Don't	✅ Do
Run as root	Use non-root user
Ignore logs	Set up log rotation
Skip monitoring	Monitor from day one
Manual restarts	Auto-restart config
No backups	Regular backup schedule

Remember: A well-managed server is boring. That's the goal.

Maintainer

davila7 Core maintainer

Source details

Full Name: davila7/claude-code-templates
Branch: main
Path in repo: cli-tool/components/skills/development/server-management
License: MIT License
Topics: claude-code anthropic anthropic-claude claude

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

23,776 2,298

Explore

davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

23,776 2,298

Explore

davila7/claude-code-templates

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

23,776 2,298

Explore

davila7/claude-code-templates

Claude Code Guide

Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.

23,776 2,298

Explore

davila7/claude-code-templates

qdrant-vector-search

High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.

23,776 2,298

Explore

davila7/claude-code-templates

behavioral-modes

AI operational modes (brainstorm, implement, debug, review, teach, ship, orchestrate). Use to adapt behavior based on task type.

23,776 2,298

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Server Management

1. Process Management Principles

Tool Selection

Process Management Goals

2. Monitoring Principles

What to Monitor

Alert Severity Strategy

Monitoring Tool Selection

3. Log Management Principles

Log Strategy

Log Principles

4. Scaling Decisions

When to Scale

Scaling Strategy

5. Health Check Principles

What Constitutes Healthy

Health Check Implementation

6. Security Principles

7. Troubleshooting Priority

8. Anti-Patterns

Recommended Agent Skills

verl-rl-training

openrlhf-training

gguf-quantization

Claude Code Guide

qdrant-vector-search

behavioral-modes