Agent skill
ironbee-verify
Trigger browser verification of code changes. Args: (default), full, visual, functional
Install this agent skill to your Project
npx add-skill https://github.com/ironbee-ai/ironbee-cli/tree/main/src/clients/cursor/commands/ironbee-verify
SKILL.md
IronBee Verify
Verify the current code changes in the browser.
Usage
/ironbee-verify— default — focus on what changed, visual + functional checks on affected areas/ironbee-verify full— full scope — entire application, all checklists, edge cases, responsive, accessibility deep dive/ironbee-verify visual— visual only — contrast, layout, spacing, fonts, images, theming/ironbee-verify functional— functional only — clicks, forms, navigation, data flow, error handling
If no argument is given, use default mode.
Steps (all modes)
- Start verification: Run
echo '{"session_id":"<your-session-id>"}' | ironbee hook verification-startvia terminal - Build and start the application if not already running
- For EVERY page you visit, repeat this cycle:
a. Navigate using browser-devtools MCP tools
b. Take a FULL PAGE screenshot with
fullPage: truec. Take an ARIA snapshot to capture the page structure d. STOP and visually analyze the screenshot — switch your focus entirely to finding visual problems. Look at this screenshot as if your ONLY job is to find visual defects: WARNING: ARIA reports DOM content, not what the user actually sees. Do NOT assume the page looks correct just because ARIA shows the right content. Only the screenshot tells you what the user actually sees.- Text readability — is it readable against its background? Look for text that blends in or poor contrast
- Layout — overlapping elements, unexpected gaps, overflow, content cut off
- Spacing — consistent padding/margin? Too cramped or too far apart?
- Colors — intentional and consistent? Any jarring mismatches?
- Typography — right sizes? Clipped or truncated text?
- Images/icons — loaded? Right size and aspect ratio?
- States — empty, loading, disabled, error states rendered properly? Report your visual findings before continuing. e. Read the ARIA snapshot — verify headings, labels, landmarks, and structure f. If anything looks wrong → note it as an issue
- Functionally test — run the checklist for your mode (see below). After each significant interaction, take another screenshot and repeat the visual analysis.
- Check console for errors
- Stop the dev server when verification is complete
- Submit your verdict via terminal:
- Pass:
echo '{"session_id":"...","status":"pass","pages_tested":[...],"checks":[...],"console_errors":0,"network_failures":0}' | ironbee hook submit-verdict - Fail:
echo '{"session_id":"...","status":"fail","pages_tested":[...],"checks":[...],"console_errors":N,"network_failures":N,"issues":["describe what failed"]}' | ironbee hook submit-verdict
- Pass:
- If failed → collect ALL issues first (finish testing all affected pages), submit one fail verdict with all issues, then fix everything, rebuild, and re-verify. Do not fix one issue at a time — batch fixes to avoid repeated build/restart cycles.
- If pass after a previous fail, include
"fixes"in the verdict describing what was fixed
Default Mode
Focus on the code you changed — not the entire application.
1. Study the changes
- Run
git diff --name-onlyandgit diff --name-only HEAD~1 - Ignore
.ironbee/,.claude/,.cursor/— tool config, not application code - Read the full diff (
git diffand/orgit diff HEAD~1) — understand every change: what was added, removed, modified. Note specific values (colors, sizes, conditions, logic, API endpoints, component props). - Before opening the browser, you should be able to answer: what exactly changed, what should look or behave differently, and what could go wrong?
2. Verify in the browser
- Cross-reference the diff against what you see. For each change in the diff, verify it is correctly reflected in the browser. If the diff changes a color → check that color. If it changes a calculation → verify the result. If it adds a component → confirm it renders.
- Test the flow end-to-end — navigate, click, fill forms, submit, verify the outcome
- Check one edge case — empty input, invalid data, or double-click
- Console — any new errors or warnings?
Full Mode (/ironbee-verify full)
Comprehensive verification of the entire application. Do NOT run git diff or scope to recent changes. Test every page, every flow, every visual detail. Any issue is a failure, regardless of when it was introduced.
Visual Checklist
In addition to the per-page visual analysis in step 3d:
- Responsiveness — does the layout adapt if viewport changes? No horizontal scrolling on standard widths
- Borders & separators — visible and consistent? Not too faint or missing
- Scroll behavior — does the page scroll smoothly? No content hidden behind sticky headers/footers?
Functional Checklist
- Navigation — do links and buttons navigate to the correct pages?
- Forms — fill inputs with real data, select options, submit. Do validation messages appear correctly?
- Buttons & interactions — do click handlers fire? Do toggles, dropdowns, and modals work?
- Data flow — does submitted data appear where expected?
- Error handling — what happens with invalid input? Does the UI handle errors gracefully?
- Authentication — if applicable, do protected routes redirect correctly?
- API calls — do network requests succeed? Check for failed requests in console/network
- State persistence — does state survive page refresh where expected?
- Edge cases — empty inputs, very long text, special characters, rapid clicks
Accessibility (deep dive)
- Are headings hierarchical? Do form inputs have labels? Are landmarks present?
- Check for missing alt text on images
Visual Mode (/ironbee-verify visual)
Focus exclusively on visual quality. Run the per-page visual analysis from step 3d on every page, plus:
- Responsiveness — viewport changes, no horizontal scrolling
- Borders & separators — visible and consistent?
- Scroll behavior — smooth scrolling, no hidden content
Take screenshots of multiple states if applicable (hover, active, disabled, empty, populated).
Functional Mode (/ironbee-verify functional)
Focus exclusively on behavior. Use the same functional checklist as Full Mode above.
Test the complete user flow, not just the single step you changed.
When to FAIL
If you observe ANY problem — wrong data, unexpected errors, visual defects, broken interactions, console errors, data inconsistency between pages — you MUST submit a fail verdict.
Do NOT rationalize away problems. If something looks wrong or behaves unexpectedly, it IS wrong. In full mode, there is no such thing as "pre-existing" — if it's broken, fail it.
After a fail verdict, you MUST fix the issues and re-verify. Do not just report and stop.
Verdict Quality
Your checks array must list specific observations, not generic statements:
- GOOD:
["login form renders with email and password fields", "submitted valid credentials, redirected to /dashboard", "console clean — 0 errors"] - BAD:
["it works", "looks good", "feature implemented"]
Important
- ALWAYS submit a verdict after every verification attempt — both pass AND fail
- Do NOT edit code before submitting a fail verdict
- Noticing a bug and submitting pass is the #1 violation — if you see it, fail it
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
ironbee-analyze
Run IronBee session analysis with semantic interpretation of verification metrics, issues, and fixes
verl-rl-training
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
openrlhf-training
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.
gguf-quantization
GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.
Claude Code Guide
Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.
qdrant-vector-search
High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.
Didn't find tool you were looking for?