Agent skill
enterprise-agent-ops
通过可观测性、安全边界和生命周期管理来操作长期运行的代理工作负载。
Install this agent skill to your Project
npx add-skill https://github.com/affaan-m/everything-claude-code/tree/main/docs/zh-CN/skills/enterprise-agent-ops
SKILL.md
企业级智能体运维
使用此技能用于需要超越单次 CLI 会话操作控制的云托管或持续运行的智能体系统。
运维领域
- 运行时生命周期(启动、暂停、停止、重启)
- 可观测性(日志、指标、追踪)
- 安全控制(作用域、权限、紧急停止开关)
- 变更管理(发布、回滚、审计)
基线控制
- 不可变的部署工件
- 最小权限凭证
- 环境级别的密钥注入
- 硬性超时和重试预算
- 高风险操作的审计日志
需跟踪的指标
- 成功率
- 每项任务的平均重试次数
- 恢复时间
- 每项成功任务的成本
- 故障类别分布
事故处理模式
当故障激增时:
- 冻结新发布
- 捕获代表性追踪数据
- 隔离故障路径
- 应用最小的安全变更进行修补
- 运行回归测试 + 安全检查
- 逐步恢复
部署集成
此技能可与以下工具配合使用:
- PM2 工作流
- systemd 服务
- 容器编排器
- CI/CD 门控
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
python-testing
Python testing best practices using pytest including fixtures, parametrization, mocking, coverage analysis, async testing, and test organization. Use when writing or improving Python tests.
golang-patterns
Go-specific design patterns and best practices including functional options, small interfaces, dependency injection, concurrency patterns, error handling, and package organization. Use when working with Go code to apply idiomatic Go patterns.
e2e-testing
Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.
agentic-engineering
Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls.
api-design
REST API design patterns including resource naming, status codes, pagination, filtering, error responses, versioning, and rate limiting for production APIs.
python-patterns
Python-specific design patterns and best practices including protocols, dataclasses, context managers, decorators, async/await, type hints, and package organization. Use when working with Python code to apply Pythonic patterns.
Didn't find tool you were looking for?