Agent skill
e2e-testing
Playwright-based end-to-end testing workflow.
Install this agent skill to your Project
npx add-skill https://github.com/notque/claude-code-toolkit/tree/main/skills/e2e-testing
SKILL.md
E2E Testing Skill (Playwright)
Playwright-based E2E testing across four phases: Scaffold, Build, Run, Validate. Each phase produces a saved artifact and must pass its gate before the next phase begins.
Instructions
PHASE 1: SCAFFOLD
Goal: Verify Playwright is installed, create the directory structure, and generate playwright.config.ts.
Actions:
- Check if
@playwright/testis installed:npx playwright --version. If not, runnpm install -D @playwright/testandnpx playwright install. - Create directory structure:
tests/ e2e/ auth/ features/ api/ pages/ <- POM classes live here artifacts/ screenshots/ traces/ videos/ - Write
playwright.config.tsusing the template below. The config bakes in failure diagnostics by default:screenshot: 'only-on-failure',trace: 'on-first-retry', andvideo: 'retain-on-failure'so that every failure produces actionable artifacts without manual setup. CI retries (retries: process.env.CI ? 2 : 0) absorb transient infrastructure flakiness without masking real bugs. - Confirm
playwright.config.tsis valid TypeScript:npx tsc --noEmit. Run this deterministic check before any subjective assessment of the config -- compiler errors are facts, opinions are not.
Artifact: playwright.config.ts + tests/e2e/ directory structure.
Gate: playwright.config.ts exists AND tests/e2e/ directory exists. If either is missing, do not proceed to Phase 2 -- diagnose and fix.
playwright.config.ts Template
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests/e2e',
fullyParallel: true,
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: process.env.CI ? 1 : undefined,
reporter: [
['html', { outputFolder: 'playwright-report' }],
['json', { outputFile: 'playwright-results.json' }],
],
use: {
baseURL: process.env.BASE_URL || 'http://localhost:3000',
trace: 'on-first-retry',
screenshot: 'only-on-failure',
video: 'retain-on-failure',
},
outputDir: 'artifacts/',
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
{ name: 'firefox', use: { ...devices['Desktop Firefox'] } },
{ name: 'webkit', use: { ...devices['Desktop Safari'] } },
{
name: 'Mobile Chrome',
use: { ...devices['Pixel 5'] },
},
],
});
The multi-browser matrix (Chromium, Firefox, WebKit) is the default because cross-browser bugs caught in CI are cheaper than cross-browser bugs caught in production. Remove browsers only when the project explicitly constrains the target set.
PHASE 2: BUILD
Goal: Write POM classes for target feature areas, then write spec files that use those POMs.
Every page or feature area gets a typed Page Object class. Spec files never contain inline locators -- all selectors live in the POM. This separation means a selector change is a one-line POM edit, not a grep-and-replace across dozens of specs.
Actions:
- Identify the feature areas under test (auth, checkout, dashboard, etc.).
- For each area, create a POM class in
pages/(see POM Pattern below). All locators must usedata-testidattributes viapage.getByTestId(). CSS selectors (page.locator('.btn-primary')) break silently when styles change. XPath breaks on DOM restructuring. Text matching (page.locator('text=Submit')) breaks on copy changes.data-testidis a testing contract that survives all three. - Write spec files in
tests/e2e/<area>/using the POMs. - Run
npx tsc --noEmitto verify all files compile. - Fix any TypeScript errors before proceeding.
Artifact: tests/e2e/**/*.spec.ts files + pages/*.ts POM classes, all compiling cleanly.
Gate: At least one .spec.ts exists under tests/e2e/ AND npx tsc --noEmit exits 0. If compile fails, fix errors -- do not proceed to Phase 3 with broken TypeScript.
POM Pattern
// pages/LoginPage.ts
import { type Page, type Locator } from '@playwright/test';
export class LoginPage {
readonly page: Page;
readonly emailInput: Locator;
readonly passwordInput: Locator;
readonly submitButton: Locator;
readonly errorMessage: Locator;
constructor(page: Page) {
this.page = page;
this.emailInput = page.getByTestId('login-email');
this.passwordInput = page.getByTestId('login-password');
this.submitButton = page.getByTestId('login-submit');
this.errorMessage = page.getByTestId('login-error');
}
async goto() {
await this.page.goto('/login');
await this.page.waitForLoadState('networkidle');
}
async login(email: string, password: string) {
await this.emailInput.fill(email);
await this.passwordInput.fill(password);
await this.submitButton.click();
}
}
// tests/e2e/auth/login.spec.ts
import { test, expect } from '@playwright/test';
import { LoginPage } from '../../../pages/LoginPage';
test.describe('Login Flow', () => {
let loginPage: LoginPage;
test.beforeEach(async ({ page }) => {
loginPage = new LoginPage(page);
await loginPage.goto();
});
test('successful login redirects to dashboard', async ({ page }) => {
await loginPage.login('user@example.com', 'password123');
await expect(page).toHaveURL('/dashboard');
});
test('invalid credentials shows error message', async () => {
await loginPage.login('bad@example.com', 'wrong');
await expect(loginPage.errorMessage).toBeVisible();
await expect(loginPage.errorMessage).toContainText('Invalid credentials');
});
});
data-testid Convention
- Format:
<component>-<element>-- e.g.,login-email,checkout-submit,nav-profile-link - Scope: Add
data-testidto interactive elements and status regions the tests need to assert on - Stability:
data-testidattributes must not change with styling or refactoring -- they are a testing contract
Waiting and Timing
Never use waitForTimeout() or setTimeout() in tests. Arbitrary waits pass slowly on fast machines and fail on slow ones -- they encode a guess about timing instead of observing the actual condition. Use condition-based waiting instead:
| Instead of | Use |
|---|---|
await page.waitForTimeout(2000) |
await expect(locator).toBeVisible() or await page.waitForResponse(...) |
await page.waitForTimeout(0) to "flush" |
await page.waitForLoadState('networkidle') |
page.click('button') without waiting |
locator.click() -- Playwright auto-waits for actionability |
Each test must own its own setup in beforeEach. Tests sharing state via global variables break parallel execution because Playwright runs specs concurrently by default.
PHASE 3: RUN
Goal: Execute the test suite, capture the results JSON, and identify any failing or flaky tests.
Actions:
- Ensure the application under test is running (or document the
BASE_URLrequired). - Run the full suite with JSON reporter configured in
playwright.config.ts:bashnpx playwright test - If any tests fail, run them in isolation with
--repeat-each=5to distinguish flaky from consistently failing:bashnpx playwright test tests/e2e/auth/login.spec.ts --repeat-each=5 - Quarantine confirmed flaky tests with
test.fixme(). Never delete a failing test -- deleted tests leave silent coverage gaps. Quarantined tests are visible debt with tracking references:typescripttest.fixme('flaky: login redirects intermittently', async ({ page }) => { // TODO: #123 -- investigate race condition with auth cookie ... }); - Do NOT use
test.skip()to hide broken tests.test.skip()is for conditional environment guards (e.g., "skip on WebKit"), not for sweeping failures under the rug.
Artifact: playwright-results.json (presence is the gate -- pass rate is not).
Gate: playwright-results.json exists at the project root. The file must contain valid JSON. Pass rate does not block Phase 4 -- reporting on failures is Phase 4's job.
Flaky Test Quarantine Protocol
When a test fails intermittently:
- Reproduce:
npx playwright test <file> --repeat-each=5-- if it fails at least once in 5 runs, it is flaky. - Quarantine: Replace
test(withtest.fixme(and add a comment with the symptom and a tracking reference. - Do not delete: Deleted tests leave coverage gaps. Quarantined tests are visible debt.
- Fix criteria: Before removing
test.fixme, the test must pass 10/10 with--repeat-each=10.
// Before
test('checkout completes successfully', async ({ page }) => { ... });
// After quarantine
test.fixme('checkout completes successfully', async ({ page }) => {
// FLAKY: intermittent race on payment confirmation response
// TODO: #456 -- investigate network timing in checkout flow
...
});
PHASE 4: VALIDATE
Goal: Deterministic checks on test output, then structured report generation.
Actions:
- Deterministic checks first -- run these before any LLM summary because compiler output and JSON parsing are facts, not opinions:
playwright-results.jsonexists and parses as valid JSON.- Extract counts:
python3 -c "import json,sys; d=json.load(open('playwright-results.json')); print(d.get('stats', d))" - Identify all
unexpected(failed) andflakyresult entries.
- LLM triage (only after deterministic checks pass):
- For each failed test, identify whether it is: (a) a broken assertion, (b) a selector mismatch, (c) a timing/async issue, or (d) an application bug.
- Categorize flaky tests for quarantine vs. fix.
- Write
e2e-report.mdusing the report template below.
Artifact: e2e-report.md.
Gate: e2e-report.md exists. Skill is complete only when this file is written.
e2e-report.md Template
# E2E Test Report
**Date**: YYYY-MM-DD
**Playwright version**: X.X.X
**Base URL**: http://...
**Browsers tested**: Chromium, Firefox, WebKit
## Summary
| Status | Count |
|--------|-------|
| Passed | N |
| Failed | N |
| Flaky (quarantined) | N |
| Skipped | N |
| **Total** | N |
## Failed Tests
### <test name>
- **File**: `tests/e2e/.../file.spec.ts`
- **Error**: <assertion or timeout message>
- **Category**: broken-assertion | selector-mismatch | timing | app-bug
- **Action**: fix | quarantine | investigate
## Quarantined (test.fixme)
| Test | Issue | Tracking |
|------|-------|----------|
| <name> | <symptom> | <issue link or TODO> |
## Artifacts
| Type | Path |
|------|------|
| HTML Report | `playwright-report/index.html` |
| JSON Results | `playwright-results.json` |
| Screenshots | `artifacts/screenshots/` |
| Traces | `artifacts/traces/` |
| Videos | `artifacts/videos/` |
## Next Actions
- [ ] Fix broken assertions in: ...
- [ ] Investigate app bugs: ...
- [ ] Unquarantine after fix: ...
CI/CD Integration
GitHub Actions Workflow Template
name: E2E Tests
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps
- name: Start application
run: npm run build && npm run start &
env:
NODE_ENV: test
- name: Wait for application
run: npx wait-on http://localhost:3000 --timeout 60000
- name: Run E2E tests
run: npx playwright test
env:
BASE_URL: http://localhost:3000
CI: true
- name: Upload test artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-artifacts
path: |
playwright-report/
playwright-results.json
artifacts/
retention-days: 30
Error Handling
| Symptom | Likely Cause | Fix |
|---|---|---|
npx tsc --noEmit fails after Phase 1 |
Bad config template or missing types | Check @playwright/test is in devDependencies, verify tsconfig.json includes the test directory |
| Tests pass locally, fail in CI | Missing browser deps or wrong BASE_URL |
Use npx playwright install --with-deps in CI; verify BASE_URL env var matches the running app |
playwright-results.json missing after run |
Reporter not configured or test runner crashed | Verify json reporter is in playwright.config.ts; check for OOM or process kill signals |
| Locator timeout on element that exists | Element present but not actionable (hidden, disabled, covered) | Use await expect(locator).toBeVisible() before interaction; check for overlays or modals |
page.fill() appends instead of replacing |
Input field has existing value | Use locator.clear() then locator.fill() |
| Flaky test passes 4/5 runs | Race condition, network timing, or animation interference | Quarantine with test.fixme(), reproduce with --repeat-each=10, check for missing waitFor conditions |
Locators depending on nth(0) break randomly |
DOM order is not stable | Add a data-testid to the specific element instead of relying on position |
References
- playwright-patterns.md -- POM examples, condition-based waiting, multi-browser config, financial skip guards
- wallet-testing.md -- Web3/MetaMask mock patterns with
addInitScript - financial-flows.md -- Production skip guards, blockchain confirmation waits
- flakiness-triage.md --
--repeat-each,--retries, quarantine decision tree - ADR-107 -- Decision record for this skill
- Playwright docs -- Official API reference
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
voice-writer
Unified voice content generation pipeline with mandatory validation and joy-check. 9-phase pipeline: LOAD, GROUND, GENERATE, VALIDATE, REFINE, JOY-CHECK, OUTPUT, CLEANUP. Use when writing articles, blog posts, or any content that uses a voice profile. Use for "write article", "blog post", "write in voice", "generate content", "draft article", "write about".
image-auditor
Non-destructive image validation for accessibility and health.
video-editing
Video editing pipeline: cut footage, assemble clips via FFmpeg and Remotion.
comment-quality
Review and fix temporal references in code comments.
anti-ai-editor
Remove AI-sounding patterns from content.
github-notification-triage
Triage GitHub notifications and report actions needed.
Didn't find tool you were looking for?