Agent skill
qa-test
Browser-based QA verification after any implementation. Use when someone says "QA this", "test this in browser", "verify the feature", "qa test", "browser test", or after completing an /implement-change to verify acceptance criteria in a real browser. Opens Chrome via MCP, exercises each acceptance criterion, verifies via DOM snapshots, and reports pass/fail. The "closer" for every implementation — proof it works, not just that tests pass.
Install this agent skill to your Project
npx add-skill https://github.com/teambrilliant/dev-skills/tree/main/skills/qa-test
SKILL.md
QA Test
Verify implemented features in a real browser. Exercise each acceptance criterion, verify via snapshots, report results.
Context-efficient design: browser testing runs in a sub-agent so snapshot/interaction data stays out of the main thread. Main thread only sees compact pass/fail summaries.
Process
- Pre-flight (sub-agent) — gather criteria, resolve URL, check environment
- Interactive setup — human steers browser for hard-to-automate steps (login, drag, etc.)
- Browser testing (sub-agent) — exercises all criteria in isolated context
- Report results — main thread receives compact summary only
- Handle failures — retry failed criteria after manual intervention if needed
1. Pre-flight Sub-agent
Launch an Explore sub-agent before any browser interaction to gather all context in parallel.
Sub-agent prompt:
Gather QA pre-flight context for testing. Return a structured JSON block with:
1. **acceptance_criteria**: List of testable criteria. Check these sources in order,
stop at the first that has criteria:
- The user's prompt (if criteria were given explicitly)
- Current PR description: run `gh pr view --json body` via Bash
- Current branch diff: run `git diff main...HEAD --stat` then read changed files
to infer what user-visible behavior changed
- Linked issue: check PR body for issue references, fetch with `gh issue view`
2. **test_url**: Where to test. Check in order:
- `.tap/tap-audit.md` Environments section
- `package.json` scripts for `dev`, `start`, or similar
- Common defaults: localhost:3000, :5173, :4321, :6886
3. **app_running**: Try to fetch the test_url via `curl -s -o /dev/null -w '%{http_code}'`.
Return the status code. If not running, return the dev command that would start it.
4. **test_pages**: List of specific page URLs/routes to visit based on the changed files
(e.g., if `modules/campaigns/` changed, the test page is likely `/campaigns/...`)
5. **db_available**: Check if postgres MCP tools are available (search for
`mcp__postgres__execute_sql` or similar). Return true/false.
6. **has_async_flows**: Based on the changed files, flag whether the feature involves
background jobs (Temporal workflows, queues, webhooks) that need async verification.
7. **needs_login**: Whether the app requires authentication. Check for login pages,
auth middleware, or session requirements in the codebase.
Return results as a structured summary, not raw tool output.
Using the pre-flight results:
- If
app_runningis not 200, start the dev server (background) and wait for it - Use
acceptance_criteriaas the test plan - Use
test_pagesto know where to navigate first - If
db_available, include database verification steps - If
has_async_flows, use the async testing pattern - If
needs_login, prompt user for interactive setup before launching browser sub-agent
Fallback (no sub-agent): If sub-agents are unavailable, gather criteria and resolve URL sequentially.
Gather acceptance criteria from (in priority order):
- Explicit criteria provided in the prompt
- Current ticket/issue (if referenced)
- PR description
.tap/tap-audit.mdfor environment context
If no criteria found, ask in human mode. In agent mode, infer from the diff.
Resolve test URL (in priority order):
- URL provided in the prompt
.tap/tap-audit.md→ Environments sectionpackage.jsonscripts →dev,start, or similar- Common defaults:
http://localhost:3000,http://localhost:5173,http://localhost:4321
Verify the app is running before proceeding.
2. Interactive Setup (Main Thread)
Before launching the browser testing sub-agent, handle anything that's hard to automate in the main thread. The browser state persists since the sub-agent connects to the same Chrome instance.
When to prompt for interactive setup:
needs_loginis true → ask user: "App requires login. Want me to navigate to login page so you can sign in, or should I attempt automated login?"- Complex drag-and-drop or gesture-based preconditions
- Multi-factor auth, CAPTCHAs, OAuth popups
What to do:
- Navigate to the relevant page via Chrome MCP
- Tell the user what action is needed
- Wait for user confirmation that setup is complete
- Then launch the browser testing sub-agent
If no interactive setup is needed, skip directly to step 3.
3. Browser Testing (Sub-agent)
Launch a general-purpose sub-agent for all browser interaction. This keeps snapshot/interaction data out of the main thread context.
Sub-agent prompt template:
You are running browser-based QA tests. The browser is already open and may already
be logged in / set up.
Test URL: {test_url}
Acceptance criteria to verify:
{numbered list of criteria}
Additional context:
- DB tools available: {db_available}
- Has async flows: {has_async_flows}
- Test pages: {test_pages}
## How to test
Use Chrome MCP tools (`mcp__chrome-devtools__*`).
**Snapshot-first workflow** — use `take_snapshot` for BOTH finding elements AND
verifying results. Do NOT use `take_screenshot` unless a criterion fails and you
need visual debugging evidence.
**For each criterion:**
1. Navigate to the relevant page
2. `take_snapshot` → get element UIDs and current state
3. Interact via UIDs (`click`, `fill`, `hover`)
4. `take_snapshot` → verify state changed as expected
5. Check `list_console_messages` for errors
6. Check `list_network_requests` for failed requests (4xx, 5xx)
**Important**: UIDs are ephemeral — always take a fresh snapshot before interacting.
**On failure only**: `take_screenshot` and save to `./qa-evidence/` for debugging.
**React/SPA hover interactions:**
Chrome DevTools `hover` only triggers CSS `:hover`, NOT JS `mouseenter`/`mouseover`.
If a UI element only appears via React's `onMouseEnter`:
1. Try `click` directly in the area
2. If that fails, `evaluate_script` to dispatch mouseenter event
3. `take_snapshot` to confirm
**Testing patterns:**
- Form submission: fill → submit → snapshot to verify success + check no errors
- Navigation: click → snapshot to verify new state + check URL
- State changes: trigger action → snapshot to verify → reload → snapshot to verify persistence
- Async: trigger → snapshot for intermediate state → poll snapshots → verify final state
- Error states: trigger invalid input → snapshot to verify error messaging
**Always check:**
- Console errors (JS exceptions)
- Failed network requests (4xx, 5xx)
## Report format
Return ONLY a compact summary in this exact format:
RESULT: [PASS / FAIL / PARTIAL]
CRITERIA:
1. [criterion] — PASS/FAIL — [one-line observation]
2. [criterion] — PASS/FAIL — [one-line observation]
...
ERRORS: [any console errors or failed network requests, or "none"]
FAILURES: [for any failed criterion: what happened, what was expected,
screenshot path if captured]
NEEDS_MANUAL: [any criteria that couldn't be tested due to automation
limitations — e.g., drag-and-drop, complex gestures]
4. Report Results
The sub-agent returns a compact summary. Present it to the user.
Human mode: Show the summary. If any failures, ask: "Want me to fix this and re-test, or is this expected?"
Agent mode: If all pass, proceed (e.g., open PR). If any fail, attempt fix-and-retest.
5. Failure Handling
Automation failures (NEEDS_MANUAL):
- The user performs the manual action in the browser (main thread)
- Launch a new sub-agent to verify only the remaining criteria
- The new sub-agent picks up the browser state left by the user
Code failures (FAIL):
Agent mode:
- Fix the code
- Launch new sub-agent to re-test only failed criteria
- Max 2 fix-and-retest cycles
Human mode:
- Present failures
- Ask: "Want me to fix this and re-test, or is this expected behavior?"
Optional: Database Verification
Include in the sub-agent prompt when db_available is true. For features that create or modify data:
- Record creation: Verify expected rows exist with correct values
- Relational data: Confirm junction table rows were created
- Status transitions: Confirm async workflows completed
Boundaries
- Does NOT write unit tests (that's implement-acceptance-tests)
- Does NOT review code quality (that's CLAUDE.md / code review)
- Does NOT assess blast radius (that's /blast-radius)
- Tests user-visible behavior in the browser, with optional database verification
- Does NOT modify acceptance criteria — tests what was specified
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
product-discovery
Validate whether a product idea is worth building before committing engineering investment. Use when someone says "should we build this", "validate this idea", "discovery", "run an experiment", "test this hypothesis", "what are the risks", "is this worth building", "feasibility check", "prototype plan", or when a team has a shaped feature or product idea and needs to assess risks and design experiments before building. Sits between product-thinker (should we?) and shaping-work (what exactly?) — this skill answers "will this actually work?" by identifying what you don't know, designing the cheapest way to find out, and defining evidence gates that justify (or kill) the investment. Also trigger when someone has a feature request and you sense high uncertainty — if the team is about to spend weeks building something nobody tested, this skill should intervene.
implementation-planning
Create technical implementation plans and architecture designs. Use when someone needs a detailed technical approach before coding begins — "create a plan", "plan this ticket", "how should we implement this", "technical design", "architect this", "design the approach", "plan the migration", "refactor plan", "how should we structure this", or when shaped work or a groomed ticket needs a concrete implementation strategy with phases, file changes, and verification steps.
shaping-work
Shape rough ideas into clear, actionable work definitions. Use this skill whenever someone has an unstructured idea that needs to become a concrete work definition — feature requests, bug reports, PRDs, customer feedback, Slack threads, stakeholder asks, or vague "we should do X" statements. Trigger phrases include "shape this", "scope this", "write a PRD", "define this work", "turn this into a ticket", "flesh this out", "spec this out", "what should we build for X", "I have an idea for...", or any rough input that needs structure before implementation can begin.
implement-change
Execute code changes from an implementation plan. Use when someone says "implement this", "build this", "code this", "start building", "let's implement", "execute the plan", "make the changes", "do the work", or has an approved implementation plan ready for coding. Takes implementation plans and produces working code, phase by phase with verification.
product-primitives
Break down complex products, features, or systems into fundamental primitives and building blocks from a software creator's perspective. Use when starting a new application, designing a large feature, or needing to understand a complex system's moving parts before building. Trigger phrases: "break down X", "decompose this", "what are the primitives", "building blocks of Z", "map the architecture", "what are the moving parts", "analyze this system", or any situation where you need to identify the atomic, reusable capabilities that compose a system. Complements product-thinker (user perspective) with the builder's perspective (system-level connections).
loop-check
Assess what's needed to make feedback loops autonomous in a repo. Use when someone says "loop check", "what do I need to work autonomously", "check my feedback loops", "what's manual here", "what should I automate", "can an agent iterate here", or before starting work in an unfamiliar repo to understand what's missing for autonomous iteration. Also use when the user asks "what do you need to make this autonomous?" or describes a workflow they want to close the loop on. NOT for: full repo audits (use tap-audit), coding, test writing, or implementation.
Didn't find tool you were looking for?