Agent skill
regression-search
Search phone-call history for when a feature regressed (find-regression.py) and drill into a single call to see what went wrong (diagnose-call.py). Skips reading 100+ transcripts by hand.
Install this agent skill to your Project
npx add-skill https://github.com/sonichi/sutando/tree/main/skills/regression-search
SKILL.md
Regression Search
Two scripts for hunting down bad calls without reading every transcript:
find-regression.py— searchresults/calls/calls.jsonlfor calls touching a feature, classify each as working/broken, print a sorted timeline.diagnose-call.py— drill into a single call by SID, report refusals/errors/silences/repeated requests, optionally show metrics fromdata/call-metrics.jsonl.
Closes #188.
When to use
- "When did the X feature stop working?" — pass the feature keyword.
- "Has feature Y improved?" — see the broken/working trend over time.
- Before shipping a fix — sanity check that the regression is reproducible.
Usage
python3 skills/regression-search/scripts/find-regression.py "record"
python3 skills/regression-search/scripts/find-regression.py "summon" --since 2026-04-01
python3 skills/regression-search/scripts/find-regression.py "play" --json
Flags:
--since YYYY-MM-DD— only show calls on/after this date--json— machine-readable output--show-snippet— print a one-line transcript snippet for each call
Heuristics
A call is broken for a query if any of:
- Sutando refuses ("I can't", "I'm not able", "I'm unable", "sorry I cannot")
- Sutando reports an error ("error", "failed", "didn't work", "something went wrong")
- The user repeats the same request 2+ times in a row (Sutando didn't respond usefully)
- Sutando says "(Silence)" after the user mentions the feature
Otherwise the call is working if Sutando's response includes the feature keyword and isn't flagged broken.
These are intentionally crude — the goal is "good enough to find the regression window without reading 163 transcripts." Tune as you find false positives.
Limitations
- Keyword matching only. "recording doesn't stop" vs "recording won't start" both match
record. The issue calls this out as future work. - No semantic understanding. A call where Sutando talks about recording but the user wanted something else still matches.
- Doesn't correlate with git commits — manual step for now.
diagnose-call.py
python3 skills/regression-search/scripts/diagnose-call.py de1f04733fc2
python3 skills/regression-search/scripts/diagnose-call.py CA701fc4129779... --metrics
python3 skills/regression-search/scripts/diagnose-call.py de1f04733fc2 --json
Accepts a full SID or just the last 12 characters. Reports turn counts, refusals, errors, silences, repeated user requests, and the ending style (normal vs abrupt user end vs sutando silence). With --metrics, also pulls per-event tool-call timeline from data/call-metrics.jsonl (requires PR #223). Exit code 1 if any issues are found, 0 if clean — useful for CI.
Typical workflow: run find-regression.py to surface broken candidates, then diagnose-call.py <sid> to drill into the worst one.
Future work
- Auto-correlate regression windows with git log
- Smarter NLP-based query matching (query: "recording doesn't stop" vs "recording won't start")
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
x-twitter
Post tweets, search, read mentions, and check engagement on X (Twitter) via API v2.
schedule-crons
claude-codex
Use the local Codex CLI from Claude Code with the user's existing Codex login or API key. Use for Codex reviews, second-opinion analysis, implementation delegation, or non-interactive Codex runs in the current workspace.
phone-conversation
Make conversational phone calls and join Zoom meetings via Twilio + Gemini. Multi-turn AI conversations on the phone on behalf of the user.
screen-record
quota-tracker
Track Claude Code quota usage via Anthropic API rate limit headers. Shows 5h and 7d utilization, reset times, and quota status. Works with both subscription and API key auth.
Didn't find tool you were looking for?