Agent skill

e2e-outside-in-test-generator

Generates comprehensive end-to-end test scenarios using outside-in methodology. Supports 5 app types: Web (Playwright), CLI, TUI, API, and MCP (gadugi YAML). Auto-detects app type or accepts explicit override.

Stars 45
Forks 28

Install this agent skill to your Project

npx add-skill https://github.com/rysweet/amplihack/tree/main/.claude/skills/e2e-outside-in-test-generator

SKILL.md

E2E Outside-In Test Generator

Automatically generates comprehensive end-to-end Playwright tests for full-stack applications using outside-in testing methodology. Generates 40+ tests across 7 categories, finds real bugs, and validates through iterative fix loops.

LEVEL 1: Quick Start

Purpose

This skill analyzes your full-stack application and generates a complete Playwright test suite that:

  • Validates user journeys from browser through backend
  • Finds real bugs through systematic exploration
  • Runs deterministically with no flakiness
  • Integrates with CI out of the box

When to Use

Activate this skill when you:

  • Need comprehensive browser testing for a full-stack app
  • Want to validate critical user flows end-to-end
  • Need regression coverage before major refactoring
  • Are preparing for production deployment

Requirements: Your project must have both a frontend (Next.js, React, Vue, Angular) and backend API.

Quick Start

bash
# In your project root
$ claude

> add e2e tests

The skill automatically:

  1. Analyzes your application (routes, API endpoints, database schema)
  2. Sets up infrastructure (Playwright config, test helpers, seed data)
  3. Generates 40+ tests across 7 categories
  4. Runs tests and fixes failures (up to 5 iterations)
  5. Reports coverage and recommendations

Expected Output

e2e/
├── playwright.config.ts          # Playwright configuration
├── test-helpers/
│   ├── auth.ts                   # Authentication helpers
│   ├── navigation.ts             # Navigation helpers
│   ├── assertions.ts             # Custom assertions
│   └── data-setup.ts             # Test data management
├── fixtures/
│   ├── users.json                # Test user data
│   ├── products.json             # Test product data
│   └── seed.sql                  # Database seed script
├── happy-path/
│   ├── user-registration.spec.ts
│   ├── user-login.spec.ts
│   └── checkout-flow.spec.ts
├── edge-cases/
│   ├── invalid-inputs.spec.ts
│   └── boundary-conditions.spec.ts
├── error-handling/
│   ├── network-failures.spec.ts
│   └── validation-errors.spec.ts
├── performance/
│   ├── page-load-times.spec.ts
│   └── api-response-times.spec.ts
├── security/
│   ├── unauthorized-access.spec.ts
│   └── xss-protection.spec.ts
├── accessibility/
│   ├── keyboard-navigation.spec.ts
│   └── screen-reader.spec.ts
└── integration/
    ├── database-persistence.spec.ts
    └── api-integration.spec.ts

Total: 42 tests across 7 categories

Run Your Tests

bash
# Run all tests
npx playwright test

# Run specific category
npx playwright test e2e/happy-path

# Run in headed mode
npx playwright test --headed

# Run in debug mode
npx playwright test --debug

Success Criteria

After generation, your test suite achieves:

  • 40+ tests across all 7 categories
  • 100% pass rate after fix loop
  • <2 minute total execution time
  • ≥1 real bug discovered during generation
  • Zero flakiness (deterministic test data)

LEVEL 2: Full Features

The 5 Phases

mermaid
graph LR
    A[Phase 1: Analysis] --> B[Phase 2: Infrastructure]
    B --> C[Phase 3: Generation]
    C --> D[Phase 4: Fix Loop]
    D --> E[Phase 5: Coverage Audit]

Phase 1: Stack Analysis

The skill performs deep application analysis:

Frontend Analysis:

  • Detects framework (Next.js, React, Vue, Angular)
  • Maps routes and pages
  • Identifies navigation patterns
  • Extracts interactive elements

Backend Analysis:

  • Discovers API endpoints (REST/GraphQL)
  • Maps data models and relationships
  • Identifies authentication mechanisms
  • Detects validation rules

Database Analysis:

  • Extracts schema and relationships
  • Identifies required fields
  • Determines foreign key constraints
  • Maps enum types

Example Analysis Output:

python
StackConfig(
    frontend_framework="nextjs",
    frontend_dir="app/",
    backend_framework="fastapi",
    api_base_url="http://localhost:8000/api",
    database_type="postgresql",
    auth_mechanism="jwt",
    routes=[
        Route(path="/", component="Home"),
        Route(path="/login", component="Login"),
        Route(path="/products", component="ProductList"),
        Route(path="/products/:id", component="ProductDetail"),
        Route(path="/checkout", component="Checkout")
    ],
    api_endpoints=[
        APIEndpoint(path="/auth/login", method="POST"),
        APIEndpoint(path="/auth/register", method="POST"),
        APIEndpoint(path="/products", method="GET"),
        APIEndpoint(path="/products/:id", method="GET"),
        APIEndpoint(path="/orders", method="POST")
    ],
    models=[
        Model(name="User", fields=["id", "email", "password"]),
        Model(name="Product", fields=["id", "name", "price", "stock"]),
        Model(name="Order", fields=["id", "user_id", "product_id", "quantity"])
    ]
)

Phase 2: Infrastructure Setup

Generates complete testing infrastructure:

Playwright Configuration:

typescript
// e2e/playwright.config.ts
import { defineConfig, devices } from "@playwright/test";

export default defineConfig({
  testDir: "./e2e",
  fullyParallel: false, // CRITICAL: workers=1 for deterministic test data
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 2 : 0,
  workers: 1, // NEVER > 1 to prevent data races
  reporter: "html",
  use: {
    baseURL: "http://localhost:3000",
    trace: "on-first-retry",
  },
  projects: [{ name: "chromium", use: { ...devices["Desktop Chrome"] } }],
  webServer: {
    command: "npm run dev",
    url: "http://localhost:3000",
    reuseExistingServer: !process.env.CI,
  },
});

Test Helpers:

typescript
// e2e/test-helpers/auth.ts
export async function login(page: Page, email: string, password: string) {
  await page.goto("/login");
  await page.getByRole("textbox", { name: /email/i }).fill(email);
  await page.getByRole("textbox", { name: /password/i }).fill(password);
  await page.getByRole("button", { name: /sign in/i }).click();
  await page.waitForURL("/dashboard");
}

export async function logout(page: Page) {
  await page.getByRole("button", { name: /logout/i }).click();
  await page.waitForURL("/");
}

Seed Data:

json
// e2e/fixtures/users.json
[
  {
    "email": "test@example.com",
    "password": "Test123!", // pragma: allowlist secret
    "name": "Test User",
    "role": "customer"
  },
  {
    "email": "admin@example.com",
    "password": "Admin123!", // pragma: allowlist secret
    "name": "Admin User",
    "role": "admin"
  }
]

Phase 3: Test Generation

Generates tests across 7 mandatory categories:

1. Happy Path Tests (Critical user journeys)

typescript
// e2e/happy-path/user-registration.spec.ts
import { test, expect } from "@playwright/test";

test("user can register with valid credentials", async ({ page }) => {
  await page.goto("/register");

  await page.getByRole("textbox", { name: /email/i }).fill("newuser@example.com");
  await page.getByRole("textbox", { name: /password/i }).fill("Password123!");
  await page.getByRole("textbox", { name: /confirm password/i }).fill("Password123!");
  await page.getByRole("button", { name: /create account/i }).click();

  await expect(page).toHaveURL("/dashboard");
  await expect(page.getByText(/welcome/i)).toBeVisible();
});

2. Edge Case Tests (Boundary conditions)

typescript
// e2e/edge-cases/boundary-conditions.spec.ts
test("rejects password that is too short", async ({ page }) => {
  await page.goto("/register");

  await page.getByRole("textbox", { name: /email/i }).fill("test@example.com");
  await page.getByRole("textbox", { name: /password/i }).fill("123"); // Too short
  await page.getByRole("button", { name: /create account/i }).click();

  await expect(page.getByText(/password must be at least 8 characters/i)).toBeVisible();
});

3. Error Handling Tests (Failure scenarios)

typescript
// e2e/error-handling/network-failures.spec.ts
test("shows error when API is unavailable", async ({ page, context }) => {
  // Simulate network failure
  await context.route("**/api/**", (route) => route.abort());

  await page.goto("/products");

  await expect(page.getByText(/unable to load products/i)).toBeVisible();
  await expect(page.getByRole("button", { name: /retry/i })).toBeVisible();
});

4. Performance Tests (Speed validation)

typescript
// e2e/performance/page-load-times.spec.ts
test("homepage loads in under 2 seconds", async ({ page }) => {
  const startTime = Date.now();
  await page.goto("/");
  await page.waitForLoadState("networkidle");
  const loadTime = Date.now() - startTime;

  expect(loadTime).toBeLessThan(2000);
});

5. Security Tests (Authorization/XSS)

typescript
// e2e/security/unauthorized-access.spec.ts
test("redirects unauthenticated users from protected routes", async ({ page }) => {
  await page.goto("/dashboard");

  await expect(page).toHaveURL("/login");
  await expect(page.getByText(/please sign in/i)).toBeVisible();
});

6. Accessibility Tests (WCAG compliance)

typescript
// e2e/accessibility/keyboard-navigation.spec.ts
import { test, expect } from "@playwright/test";

test("login form is fully keyboard accessible", async ({ page }) => {
  await page.goto("/login");

  await page.keyboard.press("Tab"); // Focus email
  await page.keyboard.type("test@example.com");
  await page.keyboard.press("Tab"); // Focus password
  await page.keyboard.type("Test123!");
  await page.keyboard.press("Tab"); // Focus submit button
  await page.keyboard.press("Enter"); // Submit form

  await expect(page).toHaveURL("/dashboard");
});

7. Integration Tests (Database/API)

typescript
// e2e/integration/database-persistence.spec.ts
test("product order persists across sessions", async ({ page, context }) => {
  // Create order
  await login(page, "test@example.com", "Test123!");
  await page.goto("/products/123");
  await page.getByRole("button", { name: /add to cart/i }).click();
  await page.goto("/checkout");
  await page.getByRole("button", { name: /place order/i }).click();

  const orderNumber = await page.getByText(/order #\d+/).textContent();

  // Logout and login again
  await logout(page);
  await login(page, "test@example.com", "Test123!");

  // Verify order exists
  await page.goto("/orders");
  await expect(page.getByText(orderNumber!)).toBeVisible();
});

Phase 4: Fix Loop

Automatically fixes failing tests through iterative debugging:

Fix Loop Process:

  1. Run tests → Collect failures
  2. Analyze failures → Categorize issues (locator, timing, data, logic)
  3. Apply fixes → Update tests based on issue type
  4. Rerun tests → Verify fixes
  5. Repeat → Max 5 iterations

Common Fix Patterns:

typescript
// Before: Flaky locator
await page.click(".submit-button");

// After: Role-based locator
await page.getByRole("button", { name: /submit/i }).click();

// Before: Race condition
await page.click("#login");
await page.fill("#email", "test@example.com");

// After: Wait for navigation
await page.click("#login");
await page.waitForLoadState("networkidle");
await page.fill("#email", "test@example.com");

// Before: Hardcoded data
await page.fill("#quantity", "5");

// After: Dynamic data from fixtures
const product = await getTestProduct();
await page.fill("#quantity", product.minQuantity.toString());

Fix Loop Results:

Iteration 1: 42 tests, 8 failures
  - Fixed 5 locator issues
  - Fixed 2 timing issues
  - Fixed 1 data issue

Iteration 2: 42 tests, 3 failures
  - Fixed 2 locator issues
  - Fixed 1 timing issue

Iteration 3: 42 tests, 0 failures ✓

Fix loop completed in 3 iterations.

Phase 5: Coverage Audit

Generates comprehensive coverage report and recommendations:

Coverage Report:

markdown
## Test Coverage Report

### Tests Generated: 42

- Happy Path: 12 tests
- Edge Cases: 8 tests
- Error Handling: 6 tests
- Performance: 4 tests
- Security: 5 tests
- Accessibility: 4 tests
- Integration: 3 tests

### Routes Covered: 12/15 (80%)

Uncovered routes:

- /admin/settings (requires admin role)
- /api/webhooks/\* (external integrations)
- /debug/\* (development-only)

### API Endpoints Covered: 18/22 (82%)

Uncovered endpoints:

- POST /api/admin/users (admin-only)
- DELETE /api/products/:id (soft delete, needs verification)
- GET /api/analytics (complex aggregations)
- POST /api/webhooks/stripe (external trigger)

### Bugs Found: 2

1. **CRITICAL**: Login form allows SQL injection via email field
   - Location: app/login/page.tsx:45
   - Test: e2e/security/sql-injection.spec.ts

2. **MEDIUM**: Checkout fails with international phone numbers
   - Location: lib/validation.ts:12
   - Test: e2e/edge-cases/international-phone.spec.ts

### Recommendations

1. Add admin-role tests (requires admin user setup)
2. Add webhook integration tests (requires test webhook server)
3. Add load testing for high-traffic endpoints
4. Increase accessibility coverage (current: 9%, target: 15%)

Customization Options

Custom Locator Strategies:

typescript
// e2e/playwright.config.ts
export default defineConfig({
  use: {
    // Prioritize test IDs
    testIdAttribute: "data-testid",
  },
});

Custom Test Categories:

python
# Add custom category during generation
custom_categories = [
    "happy-path",
    "edge-cases",
    "error-handling",
    "performance",
    "security",
    "accessibility",
    "integration",
    "custom-business-rules"  # Your custom category
]

Custom Seed Data:

json
// e2e/fixtures/custom-data.json
{
  "scenarios": [
    {
      "name": "bulk-order",
      "users": [...],
      "products": [...],
      "expected_discount": 0.15
    }
  ]
}

Configuration

Environment Variables:

bash
# .env.test
DATABASE_URL=postgresql://test:test@localhost:5432/testdb  # pragma: allowlist secret
API_BASE_URL=http://localhost:8000/api
FRONTEND_URL=http://localhost:3000
ADMIN_EMAIL=admin@example.com
ADMIN_PASSWORD=Admin123!  # pragma: allowlist secret

Playwright Options:

typescript
// e2e/playwright.config.ts
export default defineConfig({
  // Test execution
  workers: 1, // MANDATORY: prevents data races
  fullyParallel: false, // Sequential execution
  retries: process.env.CI ? 2 : 0,

  // Timeouts
  timeout: 30000, // Per-test timeout
  expect: { timeout: 5000 }, // Assertion timeout

  // Artifacts
  use: {
    screenshot: "only-on-failure",
    video: "retain-on-failure",
    trace: "on-first-retry",
  },
});

LEVEL 3: Advanced Usage

Integration with Other Skills

With test-gap-analyzer:

bash
> analyze test gaps and generate e2e tests to fill them

The skill:

  1. Runs test-gap-analyzer to identify uncovered flows
  2. Prioritizes test generation based on gap analysis
  3. Focuses on high-risk uncovered areas

With shadow-testing:

bash
> generate e2e tests and run shadow testing against production

The skill:

  1. Generates full test suite
  2. Configures shadow-testing with production mirror
  3. Runs tests against both environments
  4. Reports discrepancies

With qa-team methodology (formerly outside-in-testing):

The skill inherently follows outside-in testing:

  • Starts from user-facing UI
  • Tests through all layers (UI → API → DB)
  • Validates end-to-end contract adherence
  • No mocking (real integration)

Custom Template Development

Template Structure:

python
# e2e/templates/custom-template.py
CUSTOM_TEST_TEMPLATE = """
import {{ test, expect }} from '@playwright/test';
import {{ {helpers} }} from '../test-helpers/{helper_module}';

test.describe('{feature_name}', () => {{
  test.beforeEach(async ({{ page }}) => {{
    // Setup
    {setup_code}
  }});

  test('{test_description}', async ({{ page }}) => {{
    // Arrange
    {arrange_code}

    // Act
    {act_code}

    // Assert
    {assert_code}
  }});
}});
"""

Using Custom Templates:

python
from e2e_outside_in_test_generator.template_manager import TemplateManager

template_manager = TemplateManager()
template_manager.register_template("custom-flow", CUSTOM_TEST_TEMPLATE)

test_code = template_manager.render("custom-flow", {
    "feature_name": "Payment Processing",
    "helpers": "login, checkout",
    "helper_module": "payment",
    "setup_code": "await setupPaymentGateway();",
    "arrange_code": "const cart = await createCart();",
    "act_code": "await processPayment(cart);",
    "assert_code": "await expect(page.getByText(/payment successful/i)).toBeVisible();"
})

Advanced Locator Strategies

Priority Hierarchy:

  1. Role-based (preferred): getByRole('button', { name: /submit/i })
  2. User-visible text: getByText(/welcome/i)
  3. Test IDs: getByTestId('submit-button')
  4. CSS selectors (last resort): locator('.submit-button')

Custom Locator Builders:

typescript
// e2e/test-helpers/locators.ts
export function findByDataAttribute(page: Page, attr: string, value: string) {
  return page.locator(`[data-${attr}="${value}"]`);
}

export function findByAriaLabel(page: Page, label: string) {
  return page.locator(`[aria-label*="${label}" i]`);
}

// Usage in tests
await findByDataAttribute(page, "action", "submit").click();

Performance Optimization

Test Execution Time:

typescript
// Group fast tests together
test.describe("Quick smoke tests", () => {
  test("homepage renders", async ({ page }) => {
    /* <1s */
  });
  test("navigation works", async ({ page }) => {
    /* <1s */
  });
});

// Isolate slow tests
test.describe("Full checkout flow @slow", () => {
  test("complete purchase", async ({ page }) => {
    /* 10s */
  });
});

Run profiles:

json
// package.json
{
  "scripts": {
    "test:e2e": "playwright test",
    "test:e2e:quick": "playwright test --grep-invert @slow",
    "test:e2e:full": "playwright test",
    "test:e2e:smoke": "playwright test e2e/happy-path"
  }
}

CI/CD Integration

GitHub Actions:

yaml
# .github/workflows/e2e.yml
name: E2E Tests

on: [push, pull_request]

jobs:
  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: "18"

      - name: Install dependencies
        run: npm ci

      - name: Install Playwright
        run: npx playwright install --with-deps

      - name: Run E2E tests
        run: npm run test:e2e
        env:
          DATABASE_URL: ${{ secrets.TEST_DATABASE_URL }}

      - uses: actions/upload-artifact@v3
        if: failure()
        with:
          name: playwright-report
          path: playwright-report/

Troubleshooting

Common Issues:

  1. Flaky tests due to parallel execution

    • Cause: workers > 1 causes database race conditions
    • Fix: Ensure workers: 1 in playwright.config.ts
  2. Locator timeouts

    • Cause: Element not yet rendered
    • Fix: Add explicit waits: await page.waitForLoadState('networkidle')
  3. Test data conflicts

    • Cause: Shared test database without cleanup
    • Fix: Use test.beforeEach() to reset database state
  4. Authentication failures

    • Cause: Session timeout or incorrect credentials
    • Fix: Verify test credentials in e2e/fixtures/users.json

Debug Commands:

bash
# Run single test in headed mode
npx playwright test e2e/happy-path/login.spec.ts --headed

# Run with debugger
npx playwright test --debug

# Generate trace for failing test
npx playwright test --trace on

# View trace
npx playwright show-trace trace.zip

Best Practices

  1. Keep workers at 1 - Never increase for deterministic test data
  2. Use role-based locators - Most resilient to UI changes
  3. Small seed datasets - 10-20 records max for predictability
  4. Explicit waits - Avoid race conditions with waitForLoadState()
  5. Cleanup between tests - Use beforeEach to reset state
  6. Test isolation - Each test should run independently
  7. Meaningful assertions - Test business logic, not implementation

Success Metrics

After implementing this skill, you should achieve:

  • Test count: 40+ tests (all 7 categories)
  • Pass rate: 100% after fix loop
  • Execution time: <2 minutes total
  • Bug detection: ≥1 real bug found
  • Coverage: ≥80% of routes and endpoints
  • Flakiness: 0% (deterministic)
  • CI integration: Green checks on every PR

See also:

  • README.md - Developer documentation
  • examples.md - Usage examples
  • reference.md - API reference
  • patterns.md - Common patterns

Expand your agent's capabilities with these related and highly-rated skills.

rysweet/amplihack

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

45 28
Explore
rysweet/amplihack

learning-path-builder

Creates personalized learning paths for technologies, frameworks, or concepts. Use for user-interactive session only for onboarding new technologies, hackathon skill-building, or personal development planning. Not for use in automated development or investigation. Sequences resources (docs, tutorials, exercises) based on current skill level and learning goals. Adapts to learning style: hands-on, theory-first, project-based.

45 28
Explore
rysweet/amplihack

gh-work-report

Generates comprehensive GitHub activity reports across all authenticated accounts. Gathers repos, PRs, features, and themes for configurable time periods (1/5/7/30/90 days). Produces shareable markdown with tables, mermaid charts, and executive summaries. Can create a private repo with GitHub Actions automation and GitHub Pages aggregation site. Use when: "github report", "work report", "activity summary", "what did I work on", "gh-work-report", "show my github activity".

45 28
Explore
rysweet/amplihack

pr-review-assistant

Philosophy-aware PR reviews checking alignment with amplihack principles. Use when reviewing PRs to ensure ruthless simplicity, modular design, and zero-BS implementation. Suggests simplifications, identifies over-engineering, verifies brick module structure. Posts detailed, constructive review comments with specific file:line references.

45 28
Explore
rysweet/amplihack

code-smell-detector

Identifies anti-patterns specific to amplihack philosophy. Use when reviewing code for quality issues or refactoring. Detects: over-abstraction, complex inheritance, large functions (>50 lines), tight coupling, missing __all__ exports. Provides specific fixes and explanations for each smell.

45 28
Explore
rysweet/amplihack

biologist-analyst

Analyzes living systems and biological phenomena through biological lens using evolution, molecular biology, ecology, and systems biology frameworks. Provides insights on mechanisms, adaptations, interactions, and life processes. Use when: Biological systems, health issues, evolutionary questions, ecological problems, biotechnology. Evaluates: Function, structure, heredity, evolution, interactions, molecular mechanisms.

45 28
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results