Agent skill
workflow-fixer
CI/CD workflow failure diagnosis and automated repair skills
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/workflow-fixer
SKILL.md
Workflow Fixer Skills
Overview
This skill provides expertise in diagnosing CI/CD workflow failures and implementing automated fixes. It combines log analysis, pattern recognition, and targeted repairs to quickly resolve pipeline issues.
Activation Triggers
The workflow-fixer skill activates when:
- User mentions workflow/CI/build failures
- A GitHub Actions run has failed status
- Error patterns are detected in discussion
- Keywords match:
failed,broken,ci,workflow,pipeline,fix
Log Analysis
Fetching Logs
bash
# Via GitHub CLI
gh run list --status failure --limit 5
gh run view <run-id> --log-failed
# Via MCP Tools (preferred when available)
# Use github-mcp-server-actions_list with method: "list_workflow_runs"
# Use github-mcp-server-get_job_logs with run_id and failed_only: true
Error Pattern Matching
| Pattern | Category | Common Fix |
|---|---|---|
npm ERR! ERESOLVE |
Dependency | Update lockfile, use --legacy-peer-deps |
pip: No matching distribution |
Dependency | Pin version, check Python version |
nix: error: path .* does not exist |
Build | Check flake inputs, rebuild |
colcon build returned nonzero |
Build | Fix ROS2 package CMakeLists/setup.py |
FAILED.*test_ |
Test | Fix test logic or mark as flaky |
Error: Process completed with exit code 137 |
Resource | OOM - optimize or increase memory |
Error: The operation was canceled |
Resource | Timeout - increase timeout-minutes |
Error: Resource not accessible |
Permission | Check GITHUB_TOKEN permissions |
Log Parsing Commands
bash
# Extract error lines
grep -E "error|ERROR|Error|failed|FAILED|Failed" logs.txt | head -20
# Find stack traces
grep -A 10 "Traceback\|at .*\.ts:\|at .*\.js:" logs.txt
# Nix-specific errors
grep -E "error: |builder for .* failed" logs.txt
Fix Strategies
Dependency Issues
bash
# NPM
rm -rf node_modules package-lock.json
npm install
# Pip/Python
pip install --upgrade pip
pip install -r requirements.txt --upgrade
# Nix
nix flake update
nix flake check --no-build
# Pixi
rm -f pixi.lock
pixi install
Build Failures
bash
# Clean rebuild
rm -rf build install log
colcon build --symlink-install
# Nix rebuild
nix develop --command echo "rebuilt"
# Check for missing dependencies
nix flake show
Test Failures
bash
# Run specific test with verbose output
pytest -xvs path/to/test.py::test_name
# Run with coverage to find issues
coverage run -m pytest path/to/test.py
coverage report -m
# Mark flaky test (last resort)
@pytest.mark.flaky(reruns=3)
def test_sometimes_fails():
...
Environment Issues
yaml
# Add missing environment variable
env:
MY_VAR: ${{ secrets.MY_SECRET }}
# Fix permission issues
permissions:
contents: read
packages: write
# Add required setup step
- uses: actions/setup-python@v5
with:
python-version: '3.11'
Workflow Debugging
Step-by-step Investigation
-
Identify the failed run
bashgh run list --status failure --limit 1 -
Get failed job details
bashgh run view <run-id> --json jobs --jq '.jobs[] | select(.conclusion=="failure")' -
Extract relevant logs
bashgh run view <run-id> --log-failed 2>&1 | tail -100 -
Identify error pattern
- Look for
Error:,error:,FAILED,Exception - Check exit codes (137 = OOM, 143 = SIGTERM)
- Look for
-
Trace to source
- Find file and line number from error
- Check recent commits that touched that file
-
Implement minimal fix
- Change only what's necessary
- Test locally if possible
-
Verify fix
- Push and monitor new run
- Check related jobs don't break
Common Workflow Patterns
Nix + Pixi (This Repository)
yaml
steps:
- uses: actions/checkout@v4
- uses: DeterminateSystems/nix-installer-action@v21
- uses: DeterminateSystems/magic-nix-cache-action@v13
- run: nix develop .#default --command pixi install
- run: nix develop .#default --command pixi run <command>
Common Fixes for This Repository
| Issue | Fix |
|---|---|
pixi.lock mismatch |
Remove pixi.lock, run pixi install |
| Flake evaluation error | Check flake.nix, run nix flake check |
| ROS2 build failure | Check package.xml, CMakeLists.txt |
| macOS CUDA error | Ensure CUDA shell only on Linux |
| Disk space | Add cleanup step before build |
Integration with MCP
When GitHub MCP tools are available, use them for structured data:
python
# List failed workflow runs
actions_list(
method="list_workflow_runs",
owner="FlexNetOS",
repo="ripple-env",
workflow_runs_filter={"status": "failure"}
)
# Get failed job logs
get_job_logs(
owner="FlexNetOS",
repo="ripple-env",
run_id=12345,
failed_only=True,
return_content=True
)
Related Skills
- DevOps - CI/CD pipeline management
- Nix Environment - Nix-specific debugging
- ROS2 Development - ROS2 build issues
Best Practices
- Always read logs first - Don't guess at the problem
- Look for the first error - Later errors are often cascading failures
- Check recent changes - Most failures come from recent commits
- Test fixes locally - When possible, verify before pushing
- Document non-obvious fixes - Add comments explaining why
- Don't mask errors - Fix root causes, not symptoms
- Consider flakiness - Some failures are intermittent
Didn't find tool you were looking for?