Testing Workflow Skill

Quick Decision: Which Test Tier?

Ask yourself:

Fast local iteration? → Tier 0 (pytest --tier=0)
Before commit? → Tier 1 (pytest, default)
Integration validation? → Tier 2 (pytest --tier=2)
Pre-deployment? → Tier 3 (pytest --tier=3)
Release validation? → Tier 4 (pytest --tier=4)

Quick Decision: Which Test Strategy?

For Lambda/infrastructure testing (layers beyond pytest):

Quick dev iteration? → Unit tests only (just test-scheduler-unit, 15s)
Before commit? → Quick validation layers 1-5 (just test-scheduler, 2 min)
Lambda changes? → Docker tests (just test-scheduler-docker, 90s)
Step Functions changes? → Contract tests (just test-scheduler-contracts, 10s)
Pre-deployment? → Full validation (just test-scheduler-all, 5 min)
AWS integration? → Integration tests (just test-scheduler-integration, 60s)

See Progressive Testing Strategy for the 7-layer approach.

Docker-Based Testing for Lambda Functions

NEW: Docker-based testing prevents "filesystem unaware" deployment failures

For Lambda functions (LINE bot, Telegram API), run tests in Docker to match production runtime:

bash

# LINE bot Docker import validation
./scripts/test_line_bot_docker.sh

# Pre-commit validation (syntax + unit tests + Docker imports)
./scripts/test_line_bot_pre_commit.sh

Why Docker tests matter:

✅ Runtime fidelity: Tests run in exact Lambda Python 3.11 environment
✅ Filesystem aware: Validates deployment package structure (/var/task)
✅ Catches import errors: "cannot import handle_webhook" caught before production
✅ 2 birds 1 stone: Tests logic AND validates deployment environment

CI/CD integration:

GitHub Actions runs Docker import tests automatically (.github/workflows/deploy-line-dev.yml)
Tests block deployment if imports fail
Prevents false positive deployments (tests pass but Lambda fails)

Anti-pattern prevented: ❌ Running tests in dev environment (setup-python) but deploying to Lambda (Docker container) ✅ Run tests in Docker container that matches deployed environment

See: .claude/specifications/workflow/2025-12-29-implement-test-workflow-to-reduce-false-positive-deployment.md

Loop Pattern: Synchronize Loop (Test-Code Alignment)

Escalation Trigger:

Tests pass but code still buggy (drift between test intent and reality)
/validate shows tests don't actually test the claim
Knowledge drift: Test assumptions outdated

Tools Used:

/validate - Verify tests actually test what they claim (sabotage code, test should fail)
/consolidate - Align test intent with code reality (update tests or fix code)
/trace - Understand test failure causality (why did this test fail?)
/reflect - Assess test quality (are we testing outcomes or just execution?)

Why This Works: Testing naturally involves synchronize loop—ensuring tests align with code behavior, not just pass.

See Thinking Process Architecture - Feedback Loops for structural overview.

Test Structure

tests/
├── conftest.py         # Shared fixtures ONLY
├── shared/             # Agent, workflow, data tests
├── telegram/           # Telegram API tests
├── line_bot/           # LINE Bot tests (mark: legacy)
├── e2e/                # Playwright browser tests
├── integration/        # External API tests
└── infrastructure/     # S3, DynamoDB tests

When to Use Each Tier

Tier	Command	Includes	Use Case
0	`pytest --tier=0`	Unit only	Fast local
1	`pytest` (default)	Unit + mocked	Deploy gate
2	`pytest --tier=2`	+ integration	Nightly
3	`pytest --tier=3`	+ smoke	Pre-deploy
4	`pytest --tier=4`	+ e2e	Release

Writing a Test: Checklist

Choose test location based on component under test
Use class-based structure: class TestComponent:
Follow canonical pattern: See PATTERNS.md
Avoid anti-patterns: Check ANTI-PATTERNS.md
Apply defensive validation: See DEFENSIVE.md
Verify test can fail: Sabotage code, test should fail

Common Workflows

Writing a Unit Test

Create class TestComponent in appropriate test file
Add setup_method() if component needs initialization
Write test method: def test_behavior_description(self):
Use fixtures from conftest.py for shared data
Assert outcomes, not just execution
Sabotage code to verify test catches failures

Adding Integration Tests

Mark with @pytest.mark.integration
Use real external APIs (LLM, yfinance, Aurora)
Validate multi-layer outcomes (status code → logs → data state)
Consider rate limits (@pytest.mark.ratelimited)

Improving Test Coverage

Run pytest --cov to see coverage report
Identify untested branches and edge cases
Write tests for failure modes (not just success)
Add boundary condition tests

Fixing Failing Tests

Read test failure message carefully
Check if code behavior changed (update test)
Check if test has anti-pattern (fix test)
Verify test isolation (no shared state between tests)

Test Markers

python

@pytest.mark.integration   # External APIs (LLM, yfinance)
@pytest.mark.smoke         # Requires live server
@pytest.mark.e2e           # Requires browser
@pytest.mark.legacy        # LINE bot (skip in Telegram CI)
@pytest.mark.ratelimited   # API rate limited (--run-ratelimited to include)
pytestmark = pytest.mark.legacy  # Mark entire file

Quick Reference Commands