Agent skill
e2e-outside-in-test-generator
Generates comprehensive end-to-end test scenarios using outside-in methodology. Supports 5 app types: Web (Playwright), CLI, TUI, API, and MCP (gadugi YAML). Auto-detects app type or accepts explicit override.
Install this agent skill to your Project
npx add-skill https://github.com/rysweet/amplihack/tree/main/.claude/skills/e2e-outside-in-test-generator
SKILL.md
E2E Outside-In Test Generator
Automatically generates comprehensive end-to-end Playwright tests for full-stack applications using outside-in testing methodology. Generates 40+ tests across 7 categories, finds real bugs, and validates through iterative fix loops.
LEVEL 1: Quick Start
Purpose
This skill analyzes your full-stack application and generates a complete Playwright test suite that:
- Validates user journeys from browser through backend
- Finds real bugs through systematic exploration
- Runs deterministically with no flakiness
- Integrates with CI out of the box
When to Use
Activate this skill when you:
- Need comprehensive browser testing for a full-stack app
- Want to validate critical user flows end-to-end
- Need regression coverage before major refactoring
- Are preparing for production deployment
Requirements: Your project must have both a frontend (Next.js, React, Vue, Angular) and backend API.
Quick Start
# In your project root
$ claude
> add e2e tests
The skill automatically:
- Analyzes your application (routes, API endpoints, database schema)
- Sets up infrastructure (Playwright config, test helpers, seed data)
- Generates 40+ tests across 7 categories
- Runs tests and fixes failures (up to 5 iterations)
- Reports coverage and recommendations
Expected Output
e2e/
├── playwright.config.ts # Playwright configuration
├── test-helpers/
│ ├── auth.ts # Authentication helpers
│ ├── navigation.ts # Navigation helpers
│ ├── assertions.ts # Custom assertions
│ └── data-setup.ts # Test data management
├── fixtures/
│ ├── users.json # Test user data
│ ├── products.json # Test product data
│ └── seed.sql # Database seed script
├── happy-path/
│ ├── user-registration.spec.ts
│ ├── user-login.spec.ts
│ └── checkout-flow.spec.ts
├── edge-cases/
│ ├── invalid-inputs.spec.ts
│ └── boundary-conditions.spec.ts
├── error-handling/
│ ├── network-failures.spec.ts
│ └── validation-errors.spec.ts
├── performance/
│ ├── page-load-times.spec.ts
│ └── api-response-times.spec.ts
├── security/
│ ├── unauthorized-access.spec.ts
│ └── xss-protection.spec.ts
├── accessibility/
│ ├── keyboard-navigation.spec.ts
│ └── screen-reader.spec.ts
└── integration/
├── database-persistence.spec.ts
└── api-integration.spec.ts
Total: 42 tests across 7 categories
Run Your Tests
# Run all tests
npx playwright test
# Run specific category
npx playwright test e2e/happy-path
# Run in headed mode
npx playwright test --headed
# Run in debug mode
npx playwright test --debug
Success Criteria
After generation, your test suite achieves:
- ✓ 40+ tests across all 7 categories
- ✓ 100% pass rate after fix loop
- ✓ <2 minute total execution time
- ✓ ≥1 real bug discovered during generation
- ✓ Zero flakiness (deterministic test data)
LEVEL 2: Full Features
The 5 Phases
graph LR
A[Phase 1: Analysis] --> B[Phase 2: Infrastructure]
B --> C[Phase 3: Generation]
C --> D[Phase 4: Fix Loop]
D --> E[Phase 5: Coverage Audit]
Phase 1: Stack Analysis
The skill performs deep application analysis:
Frontend Analysis:
- Detects framework (Next.js, React, Vue, Angular)
- Maps routes and pages
- Identifies navigation patterns
- Extracts interactive elements
Backend Analysis:
- Discovers API endpoints (REST/GraphQL)
- Maps data models and relationships
- Identifies authentication mechanisms
- Detects validation rules
Database Analysis:
- Extracts schema and relationships
- Identifies required fields
- Determines foreign key constraints
- Maps enum types
Example Analysis Output:
StackConfig(
frontend_framework="nextjs",
frontend_dir="app/",
backend_framework="fastapi",
api_base_url="http://localhost:8000/api",
database_type="postgresql",
auth_mechanism="jwt",
routes=[
Route(path="/", component="Home"),
Route(path="/login", component="Login"),
Route(path="/products", component="ProductList"),
Route(path="/products/:id", component="ProductDetail"),
Route(path="/checkout", component="Checkout")
],
api_endpoints=[
APIEndpoint(path="/auth/login", method="POST"),
APIEndpoint(path="/auth/register", method="POST"),
APIEndpoint(path="/products", method="GET"),
APIEndpoint(path="/products/:id", method="GET"),
APIEndpoint(path="/orders", method="POST")
],
models=[
Model(name="User", fields=["id", "email", "password"]),
Model(name="Product", fields=["id", "name", "price", "stock"]),
Model(name="Order", fields=["id", "user_id", "product_id", "quantity"])
]
)
Phase 2: Infrastructure Setup
Generates complete testing infrastructure:
Playwright Configuration:
// e2e/playwright.config.ts
import { defineConfig, devices } from "@playwright/test";
export default defineConfig({
testDir: "./e2e",
fullyParallel: false, // CRITICAL: workers=1 for deterministic test data
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: 1, // NEVER > 1 to prevent data races
reporter: "html",
use: {
baseURL: "http://localhost:3000",
trace: "on-first-retry",
},
projects: [{ name: "chromium", use: { ...devices["Desktop Chrome"] } }],
webServer: {
command: "npm run dev",
url: "http://localhost:3000",
reuseExistingServer: !process.env.CI,
},
});
Test Helpers:
// e2e/test-helpers/auth.ts
export async function login(page: Page, email: string, password: string) {
await page.goto("/login");
await page.getByRole("textbox", { name: /email/i }).fill(email);
await page.getByRole("textbox", { name: /password/i }).fill(password);
await page.getByRole("button", { name: /sign in/i }).click();
await page.waitForURL("/dashboard");
}
export async function logout(page: Page) {
await page.getByRole("button", { name: /logout/i }).click();
await page.waitForURL("/");
}
Seed Data:
// e2e/fixtures/users.json
[
{
"email": "test@example.com",
"password": "Test123!", // pragma: allowlist secret
"name": "Test User",
"role": "customer"
},
{
"email": "admin@example.com",
"password": "Admin123!", // pragma: allowlist secret
"name": "Admin User",
"role": "admin"
}
]
Phase 3: Test Generation
Generates tests across 7 mandatory categories:
1. Happy Path Tests (Critical user journeys)
// e2e/happy-path/user-registration.spec.ts
import { test, expect } from "@playwright/test";
test("user can register with valid credentials", async ({ page }) => {
await page.goto("/register");
await page.getByRole("textbox", { name: /email/i }).fill("newuser@example.com");
await page.getByRole("textbox", { name: /password/i }).fill("Password123!");
await page.getByRole("textbox", { name: /confirm password/i }).fill("Password123!");
await page.getByRole("button", { name: /create account/i }).click();
await expect(page).toHaveURL("/dashboard");
await expect(page.getByText(/welcome/i)).toBeVisible();
});
2. Edge Case Tests (Boundary conditions)
// e2e/edge-cases/boundary-conditions.spec.ts
test("rejects password that is too short", async ({ page }) => {
await page.goto("/register");
await page.getByRole("textbox", { name: /email/i }).fill("test@example.com");
await page.getByRole("textbox", { name: /password/i }).fill("123"); // Too short
await page.getByRole("button", { name: /create account/i }).click();
await expect(page.getByText(/password must be at least 8 characters/i)).toBeVisible();
});
3. Error Handling Tests (Failure scenarios)
// e2e/error-handling/network-failures.spec.ts
test("shows error when API is unavailable", async ({ page, context }) => {
// Simulate network failure
await context.route("**/api/**", (route) => route.abort());
await page.goto("/products");
await expect(page.getByText(/unable to load products/i)).toBeVisible();
await expect(page.getByRole("button", { name: /retry/i })).toBeVisible();
});
4. Performance Tests (Speed validation)
// e2e/performance/page-load-times.spec.ts
test("homepage loads in under 2 seconds", async ({ page }) => {
const startTime = Date.now();
await page.goto("/");
await page.waitForLoadState("networkidle");
const loadTime = Date.now() - startTime;
expect(loadTime).toBeLessThan(2000);
});
5. Security Tests (Authorization/XSS)
// e2e/security/unauthorized-access.spec.ts
test("redirects unauthenticated users from protected routes", async ({ page }) => {
await page.goto("/dashboard");
await expect(page).toHaveURL("/login");
await expect(page.getByText(/please sign in/i)).toBeVisible();
});
6. Accessibility Tests (WCAG compliance)
// e2e/accessibility/keyboard-navigation.spec.ts
import { test, expect } from "@playwright/test";
test("login form is fully keyboard accessible", async ({ page }) => {
await page.goto("/login");
await page.keyboard.press("Tab"); // Focus email
await page.keyboard.type("test@example.com");
await page.keyboard.press("Tab"); // Focus password
await page.keyboard.type("Test123!");
await page.keyboard.press("Tab"); // Focus submit button
await page.keyboard.press("Enter"); // Submit form
await expect(page).toHaveURL("/dashboard");
});
7. Integration Tests (Database/API)
// e2e/integration/database-persistence.spec.ts
test("product order persists across sessions", async ({ page, context }) => {
// Create order
await login(page, "test@example.com", "Test123!");
await page.goto("/products/123");
await page.getByRole("button", { name: /add to cart/i }).click();
await page.goto("/checkout");
await page.getByRole("button", { name: /place order/i }).click();
const orderNumber = await page.getByText(/order #\d+/).textContent();
// Logout and login again
await logout(page);
await login(page, "test@example.com", "Test123!");
// Verify order exists
await page.goto("/orders");
await expect(page.getByText(orderNumber!)).toBeVisible();
});
Phase 4: Fix Loop
Automatically fixes failing tests through iterative debugging:
Fix Loop Process:
- Run tests → Collect failures
- Analyze failures → Categorize issues (locator, timing, data, logic)
- Apply fixes → Update tests based on issue type
- Rerun tests → Verify fixes
- Repeat → Max 5 iterations
Common Fix Patterns:
// Before: Flaky locator
await page.click(".submit-button");
// After: Role-based locator
await page.getByRole("button", { name: /submit/i }).click();
// Before: Race condition
await page.click("#login");
await page.fill("#email", "test@example.com");
// After: Wait for navigation
await page.click("#login");
await page.waitForLoadState("networkidle");
await page.fill("#email", "test@example.com");
// Before: Hardcoded data
await page.fill("#quantity", "5");
// After: Dynamic data from fixtures
const product = await getTestProduct();
await page.fill("#quantity", product.minQuantity.toString());
Fix Loop Results:
Iteration 1: 42 tests, 8 failures
- Fixed 5 locator issues
- Fixed 2 timing issues
- Fixed 1 data issue
Iteration 2: 42 tests, 3 failures
- Fixed 2 locator issues
- Fixed 1 timing issue
Iteration 3: 42 tests, 0 failures ✓
Fix loop completed in 3 iterations.
Phase 5: Coverage Audit
Generates comprehensive coverage report and recommendations:
Coverage Report:
## Test Coverage Report
### Tests Generated: 42
- Happy Path: 12 tests
- Edge Cases: 8 tests
- Error Handling: 6 tests
- Performance: 4 tests
- Security: 5 tests
- Accessibility: 4 tests
- Integration: 3 tests
### Routes Covered: 12/15 (80%)
Uncovered routes:
- /admin/settings (requires admin role)
- /api/webhooks/\* (external integrations)
- /debug/\* (development-only)
### API Endpoints Covered: 18/22 (82%)
Uncovered endpoints:
- POST /api/admin/users (admin-only)
- DELETE /api/products/:id (soft delete, needs verification)
- GET /api/analytics (complex aggregations)
- POST /api/webhooks/stripe (external trigger)
### Bugs Found: 2
1. **CRITICAL**: Login form allows SQL injection via email field
- Location: app/login/page.tsx:45
- Test: e2e/security/sql-injection.spec.ts
2. **MEDIUM**: Checkout fails with international phone numbers
- Location: lib/validation.ts:12
- Test: e2e/edge-cases/international-phone.spec.ts
### Recommendations
1. Add admin-role tests (requires admin user setup)
2. Add webhook integration tests (requires test webhook server)
3. Add load testing for high-traffic endpoints
4. Increase accessibility coverage (current: 9%, target: 15%)
Customization Options
Custom Locator Strategies:
// e2e/playwright.config.ts
export default defineConfig({
use: {
// Prioritize test IDs
testIdAttribute: "data-testid",
},
});
Custom Test Categories:
# Add custom category during generation
custom_categories = [
"happy-path",
"edge-cases",
"error-handling",
"performance",
"security",
"accessibility",
"integration",
"custom-business-rules" # Your custom category
]
Custom Seed Data:
// e2e/fixtures/custom-data.json
{
"scenarios": [
{
"name": "bulk-order",
"users": [...],
"products": [...],
"expected_discount": 0.15
}
]
}
Configuration
Environment Variables:
# .env.test
DATABASE_URL=postgresql://test:test@localhost:5432/testdb # pragma: allowlist secret
API_BASE_URL=http://localhost:8000/api
FRONTEND_URL=http://localhost:3000
ADMIN_EMAIL=admin@example.com
ADMIN_PASSWORD=Admin123! # pragma: allowlist secret
Playwright Options:
// e2e/playwright.config.ts
export default defineConfig({
// Test execution
workers: 1, // MANDATORY: prevents data races
fullyParallel: false, // Sequential execution
retries: process.env.CI ? 2 : 0,
// Timeouts
timeout: 30000, // Per-test timeout
expect: { timeout: 5000 }, // Assertion timeout
// Artifacts
use: {
screenshot: "only-on-failure",
video: "retain-on-failure",
trace: "on-first-retry",
},
});
LEVEL 3: Advanced Usage
Integration with Other Skills
With test-gap-analyzer:
> analyze test gaps and generate e2e tests to fill them
The skill:
- Runs test-gap-analyzer to identify uncovered flows
- Prioritizes test generation based on gap analysis
- Focuses on high-risk uncovered areas
With shadow-testing:
> generate e2e tests and run shadow testing against production
The skill:
- Generates full test suite
- Configures shadow-testing with production mirror
- Runs tests against both environments
- Reports discrepancies
With qa-team methodology (formerly outside-in-testing):
The skill inherently follows outside-in testing:
- Starts from user-facing UI
- Tests through all layers (UI → API → DB)
- Validates end-to-end contract adherence
- No mocking (real integration)
Custom Template Development
Template Structure:
# e2e/templates/custom-template.py
CUSTOM_TEST_TEMPLATE = """
import {{ test, expect }} from '@playwright/test';
import {{ {helpers} }} from '../test-helpers/{helper_module}';
test.describe('{feature_name}', () => {{
test.beforeEach(async ({{ page }}) => {{
// Setup
{setup_code}
}});
test('{test_description}', async ({{ page }}) => {{
// Arrange
{arrange_code}
// Act
{act_code}
// Assert
{assert_code}
}});
}});
"""
Using Custom Templates:
from e2e_outside_in_test_generator.template_manager import TemplateManager
template_manager = TemplateManager()
template_manager.register_template("custom-flow", CUSTOM_TEST_TEMPLATE)
test_code = template_manager.render("custom-flow", {
"feature_name": "Payment Processing",
"helpers": "login, checkout",
"helper_module": "payment",
"setup_code": "await setupPaymentGateway();",
"arrange_code": "const cart = await createCart();",
"act_code": "await processPayment(cart);",
"assert_code": "await expect(page.getByText(/payment successful/i)).toBeVisible();"
})
Advanced Locator Strategies
Priority Hierarchy:
- Role-based (preferred):
getByRole('button', { name: /submit/i }) - User-visible text:
getByText(/welcome/i) - Test IDs:
getByTestId('submit-button') - CSS selectors (last resort):
locator('.submit-button')
Custom Locator Builders:
// e2e/test-helpers/locators.ts
export function findByDataAttribute(page: Page, attr: string, value: string) {
return page.locator(`[data-${attr}="${value}"]`);
}
export function findByAriaLabel(page: Page, label: string) {
return page.locator(`[aria-label*="${label}" i]`);
}
// Usage in tests
await findByDataAttribute(page, "action", "submit").click();
Performance Optimization
Test Execution Time:
// Group fast tests together
test.describe("Quick smoke tests", () => {
test("homepage renders", async ({ page }) => {
/* <1s */
});
test("navigation works", async ({ page }) => {
/* <1s */
});
});
// Isolate slow tests
test.describe("Full checkout flow @slow", () => {
test("complete purchase", async ({ page }) => {
/* 10s */
});
});
Run profiles:
// package.json
{
"scripts": {
"test:e2e": "playwright test",
"test:e2e:quick": "playwright test --grep-invert @slow",
"test:e2e:full": "playwright test",
"test:e2e:smoke": "playwright test e2e/happy-path"
}
}
CI/CD Integration
GitHub Actions:
# .github/workflows/e2e.yml
name: E2E Tests
on: [push, pull_request]
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: "18"
- name: Install dependencies
run: npm ci
- name: Install Playwright
run: npx playwright install --with-deps
- name: Run E2E tests
run: npm run test:e2e
env:
DATABASE_URL: ${{ secrets.TEST_DATABASE_URL }}
- uses: actions/upload-artifact@v3
if: failure()
with:
name: playwright-report
path: playwright-report/
Troubleshooting
Common Issues:
-
Flaky tests due to parallel execution
- Cause:
workers > 1causes database race conditions - Fix: Ensure
workers: 1inplaywright.config.ts
- Cause:
-
Locator timeouts
- Cause: Element not yet rendered
- Fix: Add explicit waits:
await page.waitForLoadState('networkidle')
-
Test data conflicts
- Cause: Shared test database without cleanup
- Fix: Use
test.beforeEach()to reset database state
-
Authentication failures
- Cause: Session timeout or incorrect credentials
- Fix: Verify test credentials in
e2e/fixtures/users.json
Debug Commands:
# Run single test in headed mode
npx playwright test e2e/happy-path/login.spec.ts --headed
# Run with debugger
npx playwright test --debug
# Generate trace for failing test
npx playwright test --trace on
# View trace
npx playwright show-trace trace.zip
Best Practices
- Keep workers at 1 - Never increase for deterministic test data
- Use role-based locators - Most resilient to UI changes
- Small seed datasets - 10-20 records max for predictability
- Explicit waits - Avoid race conditions with
waitForLoadState() - Cleanup between tests - Use
beforeEachto reset state - Test isolation - Each test should run independently
- Meaningful assertions - Test business logic, not implementation
Success Metrics
After implementing this skill, you should achieve:
- Test count: 40+ tests (all 7 categories)
- Pass rate: 100% after fix loop
- Execution time: <2 minutes total
- Bug detection: ≥1 real bug found
- Coverage: ≥80% of routes and endpoints
- Flakiness: 0% (deterministic)
- CI integration: Green checks on every PR
See also:
- README.md - Developer documentation
- examples.md - Usage examples
- reference.md - API reference
- patterns.md - Common patterns
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
learning-path-builder
Creates personalized learning paths for technologies, frameworks, or concepts. Use for user-interactive session only for onboarding new technologies, hackathon skill-building, or personal development planning. Not for use in automated development or investigation. Sequences resources (docs, tutorials, exercises) based on current skill level and learning goals. Adapts to learning style: hands-on, theory-first, project-based.
gh-work-report
Generates comprehensive GitHub activity reports across all authenticated accounts. Gathers repos, PRs, features, and themes for configurable time periods (1/5/7/30/90 days). Produces shareable markdown with tables, mermaid charts, and executive summaries. Can create a private repo with GitHub Actions automation and GitHub Pages aggregation site. Use when: "github report", "work report", "activity summary", "what did I work on", "gh-work-report", "show my github activity".
pr-review-assistant
Philosophy-aware PR reviews checking alignment with amplihack principles. Use when reviewing PRs to ensure ruthless simplicity, modular design, and zero-BS implementation. Suggests simplifications, identifies over-engineering, verifies brick module structure. Posts detailed, constructive review comments with specific file:line references.
code-smell-detector
Identifies anti-patterns specific to amplihack philosophy. Use when reviewing code for quality issues or refactoring. Detects: over-abstraction, complex inheritance, large functions (>50 lines), tight coupling, missing __all__ exports. Provides specific fixes and explanations for each smell.
biologist-analyst
Analyzes living systems and biological phenomena through biological lens using evolution, molecular biology, ecology, and systems biology frameworks. Provides insights on mechanisms, adaptations, interactions, and life processes. Use when: Biological systems, health issues, evolutionary questions, ecological problems, biotechnology. Evaluates: Function, structure, heredity, evolution, interactions, molecular mechanisms.
Didn't find tool you were looking for?