Agent skill
visual-verify
This skill should be used when the user asks to 'verify visual output', 'check how it looks', 'render and review', 'visual verify', 'check the slide', 'does this look right', or when any task produces rendered visual output (slides, charts, documents, UI). Starts a render-vision-fix loop using Gemini vision.
Install this agent skill to your Project
npx add-skill https://github.com/edwinhu/workflows/tree/main/skills/visual-verify
SKILL.md
Announce: "I'm using visual-verify to set up a render-vision-fix loop."
NO VISUAL TASK IS COMPLETE WITHOUT RENDERING, SCORING, AND MEETING THE THRESHOLD.
Source code correctness does NOT imply visual correctness. You MUST render to PNG, score with vision (0-10) — Gemini via look-at, or a subagent with Read if Gemini is unavailable — and iterate until score >= 9.5. Claiming "done" with a score below threshold delivers broken visuals to the user.
Skipping the score check is NOT HELPFUL — the user gets a visual artifact with defects you didn't verify. </EXTREMELY-IMPORTANT>
Agentic Vision Routing
Route based on image complexity, not source language.
Agentic vision (--agentic) enables Gemini's Think-Act-Observe loop: it can crop, zoom, annotate, and re-examine regions of the image autonomously. This catches fine-grained defects that full-image viewing misses.
| Image characteristic | --agentic? |
Why |
|---|---|---|
| Dense diagram (5+ nodes, many arrows/labels) | YES | Can crop individual arrow paths, zoom into small labels to check connections |
| Small text or labels near container edges | YES | Implicit zoom catches clipping that full-image view misses |
| Side-by-side or before/after comparison | YES | Can crop matching regions and annotate differences |
| Simple layout (2-3 large elements) | NO | Full image is sufficient; agentic adds latency for no gain |
| Single chart with large text | NO | No fine-grained details to investigate |
Decision rule: If the image has details that are easy to miss at full resolution, use --agentic. If you can see everything clearly at a glance, skip it.
The Loop
0. PAGE MAP -> If Typst + Touying: Skill("teaching:find-slide-page")
| Returns heading → physical page mapping
| Skip if: single-page file, non-Typst, or page already known
|
1. CHANGE -> Modify source code (Task agent)
|
2. RENDER -> Produce PNG (see references/render-commands.md)
| Render fails? -> fix source, back to step 1
|
3. VISION -> Two-part check:
| a) INTENT — Does the render match the design intent?
| (Does the visual structure argue what it should?)
| b) DEFECTS — Scan for the 9 visual defect categories:
| 1. Text clipped by or overflowing its container
| 2. Text or shapes overlapping other elements
| 3. Arrows crossing through elements instead of routing around
| 4. Arrows landing on wrong element or pointing into empty space
| 5. Labels floating ambiguously (not anchored to what they describe)
| 6. Labels squeezed between adjacent elements without clearance
| 7. Uneven spacing (cramped sections next to spacious ones)
| 8. Text too small to read at rendered size
| 9. Parallel sub-diagrams with inconsistent layout
| Complexity-routed look-at call with SCORING:
| Dense/fine-grained? -> --agentic (Gemini crops/zooms/annotates)
| Simple/large elements? -> vision-only (structured pixel feedback)
| → Score 0-10 against checklist items
| → Record in SCORES.md
|
4. DECIDE -> Score >= 9.5 AND exit criteria met? → DONE
Score < 9.5 OR defects remain? → extract fixes, back to step 1
When to Stop
The loop is done when ALL of these hold:
- Score >= 9.5
- The rendered output matches the design intent (not just defect-free but correct)
- No text is clipped, overlapping, or unreadable
- All arrows/lines connect to the right elements and route cleanly
- Spacing is consistent and composition is balanced
- You'd show it to someone without caveats
Don't stop after one clean pass just because there are no critical bugs — if the composition could be better, improve it. Conversely, don't loop forever on cosmetics — 5 iterations max before escalating to the user.
Invocation
Skill(skill="ralph-loop:ralph-loop", args="Visual Task N: [TASK NAME] --max-iterations 5 --completion-promise VTASKN_9_5")
Score Tracking
Initialize SCORES.md before the first iteration:
# Visual Verify Scores
| Iteration | Score | Threshold | BLOCKING | COSMETIC | Delta |
|-----------|-------|-----------|----------|----------|-------|
Each vision call must score the output 0-10:
- 10.0 = all checklist items pass, zero issues
- 9.5 = 95% pass, 1-2 cosmetic issues remain (default threshold)
- < 9.0 = BLOCKING issues present
The score reflects the fraction of checklist items that pass. Gemini counts BLOCKING and COSMETIC issues against the domain-specific checklist, and the score = (items passing / total items) * 10.
Vision Calls
Primary: Gemini via look-at
Agentic (dense diagrams, fine-grained details, comparisons):
python3 "${CLAUDE_SKILL_DIR}/../../skills/look-at/scripts/look_at.py" \
--file "/tmp/visual-verify.png" \
--goal "[CONTEXT-ENRICHED GOAL]" \
--agentic
Gemini will autonomously crop, zoom, and annotate regions it wants to inspect more closely. This is most valuable for:
- Verifying arrow connections in dense node diagrams
- Checking whether small labels are fully visible (not clipped)
- Comparing spatial consistency across sub-diagrams
Vision-only (simple layouts, large elements):
python3 "${CLAUDE_SKILL_DIR}/../../skills/look-at/scripts/look_at.py" \
--file "/tmp/visual-verify.png" \
--goal "[CONTEXT-ENRICHED GOAL]"
If look-at fails because the Google API key is unavailable, missing, or returns an API error, DO NOT fall back to pdftotext. pdftotext cannot detect overlap, misalignment, or spatial defects — it only extracts text. Using it as a visual check produces false passes.
Instead, spawn a subagent that uses the Read tool directly on the rendered PNG:
Agent(
prompt="Read the image at /tmp/visual-verify.png and score it. [CONTEXT-ENRICHED GOAL]. Score 0-10 against the checklist. Report BLOCKING and COSMETIC issues separately.",
description="Vision fallback: score rendered PNG",
subagent_type="general-purpose"
)
Why this is safe: The Iron Law against Read on images exists to protect the main conversation's context window (images cost 1000-5000 tokens). Subagent context is throwaway — the tokens are discarded when the agent returns its short summary. The cost rationale does not apply.
Why pdftotext is NOT safe: In the lecture 22 incident (2026-03-22), Gemini was unavailable. The agent fell back to pdftotext -layout, checked only VIS-17 (a table), declared "only one diagram in 22.typ," and missed VIS-18 entirely — a CeTZ timeline with anchor: "south" causing severe text overlap. pdftotext showed all text present (no clipping), so the agent declared it clean. The overlap was invisible to text extraction.
</EXTREMELY-IMPORTANT>
Goal Assembly
NEVER call look-at with a generic goal. Goals must reference the spec, checklist items, and prior feedback.
| Context Piece | Source |
|---|---|
spec_text |
SPEC.md, PLAN.md task, or user request |
checklist_items |
Domain + task specific |
previous_feedback |
Gemini's output from prior iteration |
For diagrams (fletcher, CeTZ): use describe-first strategy, NOT judgment prompts.
- Step 1: Ask Gemini to transcribe all text elements with
[FULL | PARTIAL]annotations - Step 2: Claude diffs transcription against expected elements from source code
- Step 3 (optional): Run spatial/overlap check for non-clipping issues
- Why: Judgment prompts ("is X visible?") invite yes-man answers. Transcription forces the model to report what it actually sees — clipping becomes objectively detectable via text mismatches.
See references/goal-templates.md for full copy-paste templates per domain.
Translating Non-Python Feedback
Before editing Typst source to apply fixes, load the tinymist skill: Skill(skill="tinymist:typst")
It has Fletcher/CeTZ reference docs with correct parameter names (label-side, label-pos, spacing, inset). Without it, you will guess at parameter names and waste iterations.
Gemini returns structured pixel measurements. Claude translates to source code:
| Gemini says | Claude translates to (Typst example) |
|---|---|
| "Move label 15px left" | Adjust label-pos or node coordinates by ~0.5em |
| "Text clipped at right edge" | Increase inset or reduce scale() percentage |
| "Node/diagram cut off at edge" | Reduce canvas length, node inset, or spacing; or shift coordinates toward center |
| "Elements overlap vertically" | Increase spacing parameter |
| "Font too small to read" | Increase #set text(Npt) value |
Complex Diagrams (3+ Failed Iterations)
If the same spatial issue persists after 3 iterations, escalate to the reference sketch approach: have Gemini draw an ideal layout in matplotlib and translate coordinates.
See references/complex-diagram-strategy.md for the full approach.
Quick Render Reference
| Domain | Command |
|---|---|
| Typst | tinymist compile input.typ /tmp/visual-verify.png --pages N --ppi 144 (use /find-slide-page to get N) |
| Python | python3 script.py (script saves to known path) |
| Screenshot | screencapture -x /tmp/visual-verify.png |
See references/render-commands.md for the full reference.
Red Flags — STOP If You Catch Yourself:
| Action | Why Wrong | Do Instead |
|---|---|---|
Falling back to pdftotext when Gemini is down |
pdftotext extracts text only — it cannot detect overlap, misalignment, or spatial defects. It produces false passes. | Spawn a subagent with Read on the rendered PNG |
Declaring "only N diagrams" without grepping for all VIS-* items |
You will miss diagrams and skip their visual check | rg 'VIS-' file.typ to enumerate ALL diagrams before checking any |
| Skipping visual check because "the code looks correct" | The anchor: "south" bug was syntactically valid Typst that produced severe overlap | Render and score EVERY diagram, no exceptions |
When NOT to Use
- One-off visual checks: Use
look-atdirectly, not the full loop - Text-only verification: Use standard dev-verify
- Compilation checks only: Just run the compile command
Reference Files
references/goal-templates.md-- copy-paste goal templates per domainreferences/render-commands.md-- render commands for all supported domainsreferences/rationalization-prevention.md-- excuses, red flags, honesty framing, drive-aligned consequencesreferences/complex-diagram-strategy.md-- reference sketch approach for persistent layout failuresreferences/examples.md-- worked examples (Typst slide, matplotlib chart, diagram escalation)
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
audit-fix-loop
This skill should be used when the user asks to 'iteratively improve', 'audit and fix', 'hill-climb quality', 'grade and improve', 'score and fix', 'audit loop', 'quality loop', or needs structured iterative improvement of an artifact using scored independent audits. Also use when the user invokes a ralph loop for quality improvement rather than task completion.
ds-spec-reviewer
Internal skill used by ds-brainstorm at Phase 1 exit gate. Dispatches a reviewer subagent to verify SPEC.md completeness before planning. NOT user-facing.
pptx-render
Use when the user asks to "render pptx", "show pptx slide", "compare with pptx", "pptx to image", "export pptx slide", "original slide", "show me the original", "what does the pptx look like", or needs to extract a specific PPTX slide's content for visual comparison.
obsidian-organize
Organize Obsidian notes according to clawd's preferences. Use when user asks to "organize notes", "move notes to right folder", "clean up vault", "tidy vault", "file this note", or when creating new notes in the Obsidian vault. Also use when moving, renaming, or categorizing notes, or when the vault root has stray files.
dev-verify
This skill should be used when the user asks to 'verify completion', 'check that tests pass', 'confirm feature works', or REQUIRED Phase 7 of /dev workflow (final). Enforces fresh runtime evidence before claiming completion.
dev
This skill should be used when the user asks to 'start a feature', 'build a feature', 'implement a feature', 'develop', 'new feature', or needs the full 7-phase development workflow with TDD enforcement.
Didn't find tool you were looking for?