Agent skill

model-fallback

模型自动降级与故障切换。当主模型请求失败、超时、达到速率限制或配额耗尽时，自动切换到备用模型，确保服务连续性。支持多供应商、多优先级的智能模型选择，提供健康监控、自动重试和错误恢复机制。

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/model-fallback

SKILL.md

模型自动降级与故障切换

概述

本技能提供完整的模型自动降级和故障切换解决方案，确保在主模型不可用时自动切换到备用模型，维持 AI 助手的持续服务能力。

核心功能

1. 智能模型选择

根据任务类型和模型状态智能选择最合适的模型：

yaml

优先级顺序:
  1. anapi/opus-4.5         # 最强能力，最高优先级
  2. zai/glm-4.7            # 中文优化，性价比高
  3. openrouter-vip/gpt-5.2-codex  # 编码专用
  4. github-copilot/claude-sonnet-4-5  # 免费备用

2. 故障检测与处理

自动检测以下故障并触发切换：

超时错误 (timeout): 请求超过设定时间
速率限制 (rate_limit): API 调用频率超限
配额耗尽 (quota_exceeded): token 配额用尽
认证错误 (authentication): API 密钥失效
服务不可用 (service_unavailable): 供应商服务故障
网络错误 (network_error): 连接失败

3. 重试机制

智能重试策略：

yaml

重试配置:
  最大重试次数: 3
  初始延迟: 1000ms
  最大延迟: 10000ms
  退避倍数: 2.0
  使用备用模型: true

重试时间线：

第1次失败 → 等待 1s → 重试
第2次失败 → 等待 2s → 重试
第3次失败 → 等待 4s → 切换模型

4. 健康监控

持续监控所有模型供应商的健康状态：

bash

# 每5分钟检查一次
检查项目:
  - API 端点连通性
  - 响应时间
  - 错误率
  - 配额使用情况

快速开始

1. 配置模型降级

配置文件：~/.openclaw/agents/main/agent/agent.json

json

{
  "model": "anapi/opus-4.5",
  "modelFallback": [
    "zai/glm-4.7",
    "openrouter-vip/gpt-5.2-codex",
    "github-copilot/claude-sonnet-4-5"
  ],
  "retry": {
    "maxAttempts": 3,
    "initialDelayMs": 1000,
    "maxDelayMs": 10000,
    "backoffMultiplier": 2.0,
    "useFallbackOnFailure": true
  }
}

2. 启动监控

bash

# 启动后台监控
~/.openclaw/scripts/monitor-models.sh start

# 查看监控状态
~/.openclaw/scripts/monitor-models.sh status

# 停止监控
~/.openclaw/scripts/monitor-models.sh stop

3. 手动触发切换

bash

# 运行降级检查脚本
~/.openclaw/scripts/model-fallback.sh

工作流程

正常请求流程

用户请求
    ↓
选择主模型 (anapi/opus-4.5)
    ↓
发送请求到 API
    ↓
成功 → 返回结果

故障切换流程

用户请求
    ↓
选择当前模型
    ↓
发送请求 → 失败/超时
    ↓
重试 (最多3次)
    ↓
仍然失败?
    ↓
切换到备用模型 (zai/glm-4.7)
    ↓
发送请求
    ↓
成功 → 返回结果并记录

监控流程

监控守护进程 (每5分钟)
    ↓
检查所有模型健康状态
    ↓
├─ 主模型健康 → 保持当前
└─ 主模型不健康 → 切换到最佳可用模型
    ↓
更新状态文件
    ↓
记录日志

错误处理策略

1. 超时错误 (Timeout)

json

{
  "timeout": {
    "switchModel": true,
    "retryCount": 2,
    "timeoutMs": 60000,
    "fallbackTo": "zai/glm-4.7"
  }
}

行为：

超过 60 秒视为超时
重试 2 次后仍超时则切换模型

2. 速率限制 (Rate Limit)

json

{
  "rateLimit": {
    "switchModel": true,
    "cooldownMs": 60000,
    "alert": true,
    "fallbackTo": "zai/glm-4.7"
  }
}

行为：

收到 429 错误码
冷却 60 秒后尝试恢复
立即切换到备用模型

3. 配额耗尽 (Quota Exceeded)

json

{
  "quotaExceeded": {
    "switchModel": true,
    "alert": true,
    "fallbackTo": "zai/glm-4.7",
    "checkInterval": 3600000
  }
}

行为：

配额用尽时切换模型
每小时检查一次主模型是否恢复
发送告警通知

4. 认证错误 (Authentication)

json

{
  "authenticationError": {
    "switchModel": true,
    "alert": true,
    "disableModel": true
  }
}

行为：

API 密钥失效时立即切换
禁用故障模型（不自动恢复）
发送紧急告警

智能路由规则

根据任务类型自动选择最合适的模型：

json

{
  "routing": {
    "strategy": "priority-fallback",
    "rules": [
      {
        "name": "coding-task",
        "match": {
          "contentContains": ["代码", "code", "编程", "函数"]
        },
        "preferModels": [
          "openrouter-vip/gpt-5.2-codex",
          "anapi/opus-4.5"
        ]
      },
      {
        "name": "chinese-task",
        "match": {
          "language": "zh"
        },
        "preferModels": [
          "zai/glm-4.7",
          "anapi/opus-4.5"
        ]
      },
      {
        "name": "vision-task",
        "match": {
          "hasImage": true
        },
        "preferModels": [
          "anapi/opus-4.5"
        ]
      }
    ]
  }
}

监控与日志

日志文件位置

bash

~/.openclaw/logs/model-fallback.log    # 切换日志
~/.openclaw/logs/model-monitor.log     # 监控日志
~/.openclaw/logs/model-status.json     # 状态报告

查看实时日志

bash

# 查看切换日志
tail -f ~/.openclaw/logs/model-fallback.log

# 查看监控日志
tail -f ~/.openclaw/logs/model-monitor.log

# 查看所有日志
tail -f ~/.openclaw/logs/*.log

状态报告

bash

# 查看当前状态
~/.openclaw/scripts/monitor-models.sh status

# JSON 格式状态
cat ~/.openclaw/logs/model-status.json | python3 -m json.tool

脚本说明

model-fallback.sh

模型降级切换脚本，负责：

测试所有配置的模型
选择最佳可用模型
执行模型切换
记录切换日志

用法：

bash

~/.openclaw/scripts/model-fallback.sh

monitor-models.sh

健康监控守护进程，负责：

定期检查模型健康状态
自动触发故障切换
生成状态报告
管理 PID 文件

用法：

bash

~/.openclaw/scripts/monitor-models.sh {start|stop|restart|status|check}

test-model-fallback.sh

测试脚本，用于：

验证配置正确性
测试切换逻辑
模拟故障场景
生成测试报告

用法：

bash

~/clawd/scripts/test-model-fallback.sh

配置文件详解

agent.json 完整配置

json

{
  "model": "anapi/opus-4.5",
  "modelFallback": [
    "zai/glm-4.7",
    "openrouter-vip/gpt-5.2-codex",
    "github-copilot/claude-sonnet-4-5"
  ],
  "retry": {
    "maxAttempts": 3,
    "initialDelayMs": 1000,
    "maxDelayMs": 10000,
    "backoffMultiplier": 2.0,
    "useFallbackOnFailure": true
  },
  "errorHandling": {
    "rateLimit": {
      "switchModel": true,
      "cooldownMs": 60000,
      "alert": true
    },
    "timeout": {
      "switchModel": true,
      "retryCount": 2,
      "timeoutMs": 60000
    },
    "quotaExceeded": {
      "switchModel": true,
      "alert": true,
      "fallbackTo": "zai/glm-4.7"
    },
    "authenticationError": {
      "switchModel": true,
      "alert": true,
      "disableModel": true
    }
  },
  "models": {
    "anapi/opus-4.5": {
      "provider": "anapi",
      "alias": "opus45",
      "maxTokens": 200000,
      "timeoutMs": 60000,
      "priority": 1,
      "supports": ["vision", "tools", "long-context"],
      "costFactor": "high"
    },
    "zai/glm-4.7": {
      "provider": "zai",
      "alias": "zai47",
      "maxTokens": 200000,
      "timeoutMs": 60000,
      "priority": 2,
      "supports": ["tools", "long-context"],
      "costFactor": "medium",
      "bestFor": ["chinese", "general-purpose"]
    },
    "openrouter-vip/gpt-5.2-codex": {
      "provider": "openrouter-vip",
      "alias": "codex52",
      "maxTokens": 100000,
      "timeoutMs": 30000,
      "priority": 3,
      "supports": ["coding"],
      "costFactor": "low",
      "bestFor": ["coding", "code-generation"]
    },
    "github-copilot/claude-sonnet-4-5": {
      "provider": "github-copilot",
      "alias": "sonnet",
      "maxTokens": 200000,
      "timeoutMs": 60000,
      "priority": 4,
      "supports": ["tools", "long-context"],
      "costFactor": "free",
      "bestFor": ["fallback", "general-purpose"]
    }
  },
  "monitoring": {
    "enabled": true,
    "checkIntervalMs": 300000,
    "logFile": "$HOME/.openclaw/logs/model-fallback.log",
    "alertOnFailure": true
  }
}

故障排查

问题 1: 模型未自动切换

检查：

bash

# 查看配置文件
cat ~/.openclaw/agents/main/agent/agent.json | grep modelFallback

# 查看日志
tail -20 ~/.openclaw/logs/model-fallback.log

# 手动运行切换脚本
~/.openclaw/scripts/model-fallback.sh

问题 2: 监控未运行

检查：

bash

# 查看进程
ps aux | grep monitor-models

# 查看PID文件
cat ~/.openclaw/logs/model-monitor.pid

# 重启监控
~/.openclaw/scripts/monitor-models.sh restart

问题 3: 所有模型都不可用

检查：

bash

# 查看状态报告
~/.openclaw/scripts/monitor-models.sh status

# 检查 API 密钥
cat ~/.openclaw/agents/main/agent/auth-profiles.json

# 测试网络连接
ping -c 3 anapi.9w7.cn
ping -c 3 open.bigmodel.cn

性能优化

减少切换频率

json

{
  "retry": {
    "maxAttempts": 5,        // 增加重试次数
    "initialDelayMs": 2000   // 增加初始延迟
  }
}

优化响应时间

为不同任务选择最快的模型：

json

{
  "routing": {
    "rules": [
      {
        "name": "quick-response",
        "match": {
          "priority": "speed"
        },
        "preferModels": [
          "github-copilot/claude-sonnet-4-5",  // 通常响应最快
          "zai/glm-4.7"
        ]
      }
    ]
  }
}

集成到 OpenClaw

配置完成后，模型降级功能会自动集成到 OpenClaw Gateway：

自动重启 Gateway:

bash

openclaw gateway restart

验证配置:

bash

openclaw status | grep Model

查看日志:

bash

journalctl -u openclaw-gateway -f | grep model

最佳实践

1. 定期检查

每周运行一次全面检查：

bash

~/clawd/scripts/test-model-fallback.sh

2. 监控日志

每天查看切换日志：

bash

grep "切换模型" ~/.openclaw/logs/model-fallback.log | tail -10

3. 更新配置

当添加新模型时，更新 agent.json：

json

{
  "modelFallback": [
    "anapi/opus-4.5",
    "zai/glm-4.7",
    "new-model-here",  // 新模型
    "github-copilot/claude-sonnet-4-5"
  ]
}

4. 备份配置

定期备份配置文件：

bash

cp ~/.openclaw/agents/main/agent/agent.json \
   ~/.openclaw/agents/main/agent/agent.json.backup

支持的模型

当前配置的模型及其特性：

模型	供应商	优先级	最大Token	特长
opus-4.5	anapi	1	200k	最强能力，视觉
glm-4.7	zai	2	200k	中文优化
gpt-5.2-codex	openrouter-vip	3	100k	编码专用
sonnet-4.5	github-copilot	4	200k	免费备用

TODO

添加更多供应商
实现基于成本的模型选择
添加模型性能指标收集
实现预测性模型切换
集成告警通知（Telegram/邮件）
添加 WebUI 监控面板

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/model-fallback
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

模型自动降级与故障切换

概述

核心功能

1. 智能模型选择

2. 故障检测与处理

3. 重试机制

4. 健康监控

快速开始

1. 配置模型降级

2. 启动监控

3. 手动触发切换

工作流程

正常请求流程

故障切换流程

监控流程

错误处理策略

1. 超时错误 (Timeout)

2. 速率限制 (Rate Limit)

3. 配额耗尽 (Quota Exceeded)

4. 认证错误 (Authentication)

智能路由规则

监控与日志

日志文件位置

查看实时日志

状态报告

脚本说明

model-fallback.sh

monitor-models.sh

test-model-fallback.sh

配置文件详解

agent.json 完整配置

故障排查

问题 1: 模型未自动切换

问题 2: 监控未运行

问题 3: 所有模型都不可用

性能优化

减少切换频率

优化响应时间

集成到 OpenClaw

最佳实践

1. 定期检查

2. 监控日志

3. 更新配置

4. 备份配置

相关文件

支持的模型

TODO

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state