Agent skill
model-fallback
模型自动降级与故障切换。当主模型请求失败、超时、达到速率限制或配额耗尽时,自动切换到备用模型,确保服务连续性。支持多供应商、多优先级的智能模型选择,提供健康监控、自动重试和错误恢复机制。
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/model-fallback
SKILL.md
模型自动降级与故障切换
概述
本技能提供完整的模型自动降级和故障切换解决方案,确保在主模型不可用时自动切换到备用模型,维持 AI 助手的持续服务能力。
核心功能
1. 智能模型选择
根据任务类型和模型状态智能选择最合适的模型:
优先级顺序:
1. anapi/opus-4.5 # 最强能力,最高优先级
2. zai/glm-4.7 # 中文优化,性价比高
3. openrouter-vip/gpt-5.2-codex # 编码专用
4. github-copilot/claude-sonnet-4-5 # 免费备用
2. 故障检测与处理
自动检测以下故障并触发切换:
- 超时错误 (timeout): 请求超过设定时间
- 速率限制 (rate_limit): API 调用频率超限
- 配额耗尽 (quota_exceeded): token 配额用尽
- 认证错误 (authentication): API 密钥失效
- 服务不可用 (service_unavailable): 供应商服务故障
- 网络错误 (network_error): 连接失败
3. 重试机制
智能重试策略:
重试配置:
最大重试次数: 3
初始延迟: 1000ms
最大延迟: 10000ms
退避倍数: 2.0
使用备用模型: true
重试时间线:
第1次失败 → 等待 1s → 重试
第2次失败 → 等待 2s → 重试
第3次失败 → 等待 4s → 切换模型
4. 健康监控
持续监控所有模型供应商的健康状态:
# 每5分钟检查一次
检查项目:
- API 端点连通性
- 响应时间
- 错误率
- 配额使用情况
快速开始
1. 配置模型降级
配置文件:~/.openclaw/agents/main/agent/agent.json
{
"model": "anapi/opus-4.5",
"modelFallback": [
"zai/glm-4.7",
"openrouter-vip/gpt-5.2-codex",
"github-copilot/claude-sonnet-4-5"
],
"retry": {
"maxAttempts": 3,
"initialDelayMs": 1000,
"maxDelayMs": 10000,
"backoffMultiplier": 2.0,
"useFallbackOnFailure": true
}
}
2. 启动监控
# 启动后台监控
~/.openclaw/scripts/monitor-models.sh start
# 查看监控状态
~/.openclaw/scripts/monitor-models.sh status
# 停止监控
~/.openclaw/scripts/monitor-models.sh stop
3. 手动触发切换
# 运行降级检查脚本
~/.openclaw/scripts/model-fallback.sh
工作流程
正常请求流程
用户请求
↓
选择主模型 (anapi/opus-4.5)
↓
发送请求到 API
↓
成功 → 返回结果
故障切换流程
用户请求
↓
选择当前模型
↓
发送请求 → 失败/超时
↓
重试 (最多3次)
↓
仍然失败?
↓
切换到备用模型 (zai/glm-4.7)
↓
发送请求
↓
成功 → 返回结果并记录
监控流程
监控守护进程 (每5分钟)
↓
检查所有模型健康状态
↓
├─ 主模型健康 → 保持当前
└─ 主模型不健康 → 切换到最佳可用模型
↓
更新状态文件
↓
记录日志
错误处理策略
1. 超时错误 (Timeout)
{
"timeout": {
"switchModel": true,
"retryCount": 2,
"timeoutMs": 60000,
"fallbackTo": "zai/glm-4.7"
}
}
行为:
- 超过 60 秒视为超时
- 重试 2 次后仍超时则切换模型
2. 速率限制 (Rate Limit)
{
"rateLimit": {
"switchModel": true,
"cooldownMs": 60000,
"alert": true,
"fallbackTo": "zai/glm-4.7"
}
}
行为:
- 收到 429 错误码
- 冷却 60 秒后尝试恢复
- 立即切换到备用模型
3. 配额耗尽 (Quota Exceeded)
{
"quotaExceeded": {
"switchModel": true,
"alert": true,
"fallbackTo": "zai/glm-4.7",
"checkInterval": 3600000
}
}
行为:
- 配额用尽时切换模型
- 每小时检查一次主模型是否恢复
- 发送告警通知
4. 认证错误 (Authentication)
{
"authenticationError": {
"switchModel": true,
"alert": true,
"disableModel": true
}
}
行为:
- API 密钥失效时立即切换
- 禁用故障模型(不自动恢复)
- 发送紧急告警
智能路由规则
根据任务类型自动选择最合适的模型:
{
"routing": {
"strategy": "priority-fallback",
"rules": [
{
"name": "coding-task",
"match": {
"contentContains": ["代码", "code", "编程", "函数"]
},
"preferModels": [
"openrouter-vip/gpt-5.2-codex",
"anapi/opus-4.5"
]
},
{
"name": "chinese-task",
"match": {
"language": "zh"
},
"preferModels": [
"zai/glm-4.7",
"anapi/opus-4.5"
]
},
{
"name": "vision-task",
"match": {
"hasImage": true
},
"preferModels": [
"anapi/opus-4.5"
]
}
]
}
}
监控与日志
日志文件位置
~/.openclaw/logs/model-fallback.log # 切换日志
~/.openclaw/logs/model-monitor.log # 监控日志
~/.openclaw/logs/model-status.json # 状态报告
查看实时日志
# 查看切换日志
tail -f ~/.openclaw/logs/model-fallback.log
# 查看监控日志
tail -f ~/.openclaw/logs/model-monitor.log
# 查看所有日志
tail -f ~/.openclaw/logs/*.log
状态报告
# 查看当前状态
~/.openclaw/scripts/monitor-models.sh status
# JSON 格式状态
cat ~/.openclaw/logs/model-status.json | python3 -m json.tool
脚本说明
model-fallback.sh
模型降级切换脚本,负责:
- 测试所有配置的模型
- 选择最佳可用模型
- 执行模型切换
- 记录切换日志
用法:
~/.openclaw/scripts/model-fallback.sh
monitor-models.sh
健康监控守护进程,负责:
- 定期检查模型健康状态
- 自动触发故障切换
- 生成状态报告
- 管理 PID 文件
用法:
~/.openclaw/scripts/monitor-models.sh {start|stop|restart|status|check}
test-model-fallback.sh
测试脚本,用于:
- 验证配置正确性
- 测试切换逻辑
- 模拟故障场景
- 生成测试报告
用法:
~/clawd/scripts/test-model-fallback.sh
配置文件详解
agent.json 完整配置
{
"model": "anapi/opus-4.5",
"modelFallback": [
"zai/glm-4.7",
"openrouter-vip/gpt-5.2-codex",
"github-copilot/claude-sonnet-4-5"
],
"retry": {
"maxAttempts": 3,
"initialDelayMs": 1000,
"maxDelayMs": 10000,
"backoffMultiplier": 2.0,
"useFallbackOnFailure": true
},
"errorHandling": {
"rateLimit": {
"switchModel": true,
"cooldownMs": 60000,
"alert": true
},
"timeout": {
"switchModel": true,
"retryCount": 2,
"timeoutMs": 60000
},
"quotaExceeded": {
"switchModel": true,
"alert": true,
"fallbackTo": "zai/glm-4.7"
},
"authenticationError": {
"switchModel": true,
"alert": true,
"disableModel": true
}
},
"models": {
"anapi/opus-4.5": {
"provider": "anapi",
"alias": "opus45",
"maxTokens": 200000,
"timeoutMs": 60000,
"priority": 1,
"supports": ["vision", "tools", "long-context"],
"costFactor": "high"
},
"zai/glm-4.7": {
"provider": "zai",
"alias": "zai47",
"maxTokens": 200000,
"timeoutMs": 60000,
"priority": 2,
"supports": ["tools", "long-context"],
"costFactor": "medium",
"bestFor": ["chinese", "general-purpose"]
},
"openrouter-vip/gpt-5.2-codex": {
"provider": "openrouter-vip",
"alias": "codex52",
"maxTokens": 100000,
"timeoutMs": 30000,
"priority": 3,
"supports": ["coding"],
"costFactor": "low",
"bestFor": ["coding", "code-generation"]
},
"github-copilot/claude-sonnet-4-5": {
"provider": "github-copilot",
"alias": "sonnet",
"maxTokens": 200000,
"timeoutMs": 60000,
"priority": 4,
"supports": ["tools", "long-context"],
"costFactor": "free",
"bestFor": ["fallback", "general-purpose"]
}
},
"monitoring": {
"enabled": true,
"checkIntervalMs": 300000,
"logFile": "$HOME/.openclaw/logs/model-fallback.log",
"alertOnFailure": true
}
}
故障排查
问题 1: 模型未自动切换
检查:
# 查看配置文件
cat ~/.openclaw/agents/main/agent/agent.json | grep modelFallback
# 查看日志
tail -20 ~/.openclaw/logs/model-fallback.log
# 手动运行切换脚本
~/.openclaw/scripts/model-fallback.sh
问题 2: 监控未运行
检查:
# 查看进程
ps aux | grep monitor-models
# 查看PID文件
cat ~/.openclaw/logs/model-monitor.pid
# 重启监控
~/.openclaw/scripts/monitor-models.sh restart
问题 3: 所有模型都不可用
检查:
# 查看状态报告
~/.openclaw/scripts/monitor-models.sh status
# 检查 API 密钥
cat ~/.openclaw/agents/main/agent/auth-profiles.json
# 测试网络连接
ping -c 3 anapi.9w7.cn
ping -c 3 open.bigmodel.cn
性能优化
减少切换频率
{
"retry": {
"maxAttempts": 5, // 增加重试次数
"initialDelayMs": 2000 // 增加初始延迟
}
}
优化响应时间
为不同任务选择最快的模型:
{
"routing": {
"rules": [
{
"name": "quick-response",
"match": {
"priority": "speed"
},
"preferModels": [
"github-copilot/claude-sonnet-4-5", // 通常响应最快
"zai/glm-4.7"
]
}
]
}
}
集成到 OpenClaw
配置完成后,模型降级功能会自动集成到 OpenClaw Gateway:
- 自动重启 Gateway:
openclaw gateway restart
- 验证配置:
openclaw status | grep Model
- 查看日志:
journalctl -u openclaw-gateway -f | grep model
最佳实践
1. 定期检查
每周运行一次全面检查:
~/clawd/scripts/test-model-fallback.sh
2. 监控日志
每天查看切换日志:
grep "切换模型" ~/.openclaw/logs/model-fallback.log | tail -10
3. 更新配置
当添加新模型时,更新 agent.json:
{
"modelFallback": [
"anapi/opus-4.5",
"zai/glm-4.7",
"new-model-here", // 新模型
"github-copilot/claude-sonnet-4-5"
]
}
4. 备份配置
定期备份配置文件:
cp ~/.openclaw/agents/main/agent/agent.json \
~/.openclaw/agents/main/agent/agent.json.backup
相关文件
~/.openclaw/agents/main/agent/agent.json- 主配置文件~/.openclaw/agents/main/agent/auth-profiles.json- API 密钥~/.openclaw/scripts/model-fallback.sh- 切换脚本~/.openclaw/scripts/monitor-models.sh- 监控脚本~/clawd/scripts/test-model-fallback.sh- 测试脚本~/clawd/docs/model-fallback-strategy.md- 技术文档
支持的模型
当前配置的模型及其特性:
| 模型 | 供应商 | 优先级 | 最大Token | 特长 |
|---|---|---|---|---|
| opus-4.5 | anapi | 1 | 200k | 最强能力,视觉 |
| glm-4.7 | zai | 2 | 200k | 中文优化 |
| gpt-5.2-codex | openrouter-vip | 3 | 100k | 编码专用 |
| sonnet-4.5 | github-copilot | 4 | 200k | 免费备用 |
TODO
- 添加更多供应商
- 实现基于成本的模型选择
- 添加模型性能指标收集
- 实现预测性模型切换
- 集成告警通知(Telegram/邮件)
- 添加 WebUI 监控面板
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?