Agent skill
E2E Test Runner
Provides the ability to run and iterate on HyperShift e2e tests. Auto-applies when implementing features that require e2e validation, fixing e2e test failures, or working on tasks that need live cluster testing.
Install this agent skill to your Project
npx add-skill https://github.com/openshift/hypershift/tree/main/.claude/skills/dev/e2e-run-aws
SKILL.md
HyperShift E2E Test Runner
This skill enables autonomous iteration on e2e tests - running tests, analyzing failures, making fixes, and re-running until tests pass.
When to Use This Skill
This skill automatically applies when:
- Implementing a feature that needs e2e test validation
- Fixing a failing e2e test
- Working on a task where the user wants you to iterate until tests pass
- Debugging test failures in the
test/e2e/directory - The user mentions running e2e tests or validating changes against a live cluster
Prerequisites
Source the environment file before using this skill:
source dev/claude-env.sh
Environment Configuration
Environment variables from dev/claude-env.sh:
| Variable | Description |
|---|---|
E2E_PLATFORM |
Test platform (AWS, Azure, etc.) |
AWS_CREDENTIALS |
Path to AWS credentials file |
OIDC_BUCKET |
S3 bucket for OIDC |
BASE_DOMAIN |
Base DNS domain |
PULL_SECRET |
Path to pull secret file |
AWS_REGION |
AWS region |
E2E_ARTIFACT_DIR |
Directory for test artifacts |
MGMT_KUBECONFIG |
Path to management cluster kubeconfig |
CPO_IMAGE_REPO |
Custom CPO image repository |
RUNTIME |
Container runtime (podman/docker) |
Running E2E Tests
Step 1: Check if Test Binary Needs Rebuilding
CRITICAL: Before running any e2e test, you MUST check if the test binary needs rebuilding:
# Check if binary exists
if [ ! -f ./bin/test-e2e ]; then
echo "Test binary missing, building..."
make e2e
fi
# Check if any test files are newer than the binary
NEWEST_TEST=$(find test/e2e -name "*.go" -newer ./bin/test-e2e 2>/dev/null | head -1)
if [ -n "$NEWEST_TEST" ]; then
echo "Test files changed (e.g., $NEWEST_TEST), rebuilding..."
make e2e
fi
Step 2: Run the Test
Build and execute the test command:
KUBECONFIG=$MGMT_KUBECONFIG \
./bin/test-e2e -test.v -test.timeout 2h \
-test.run "TEST_PATTERN" \
-test.v \
--e2e.platform $E2E_PLATFORM \
--e2e.aws-credentials-file $AWS_CREDENTIALS \
--e2e.aws-oidc-s3-bucket-name $OIDC_BUCKET \
--e2e.base-domain $BASE_DOMAIN \
--e2e.pull-secret-file $PULL_SECRET \
--e2e.aws-region $AWS_REGION \
--e2e.artifact-dir $E2E_ARTIFACT_DIR
Step 3: Add Custom CPO Image (When Testing Control Plane Changes)
If you've made changes to control-plane-operator code and built a custom image, add:
-e2e.control-plane-operator-image $CPO_IMAGE_REPO:TAG
Iteration Loop
When working autonomously on a task that requires e2e validation:
1. Initial Test Run
Run the test to establish baseline:
KUBECONFIG=$MGMT_KUBECONFIG ./bin/test-e2e -test.v -test.run "TestName" [flags...]
2. On Failure - Analyze
- Read the test output carefully
- Check artifacts in
$E2E_ARTIFACT_DIR/directory for:- Pod logs
- Events
- Resource states
- Identify the root cause
3. Make Fixes
- Edit the relevant code (test code, operator code, etc.)
- If you modified
test/e2e/*.gofiles, the binary will be rebuilt automatically on next run
4. Rebuild Images (If Needed)
If you modified control-plane-operator code: Use the build-cpo-image skill to build and push a new image.
$RUNTIME build -f Dockerfile.control-plane --platform linux/amd64 -t $CPO_IMAGE_REPO:NEW_TAG .
$RUNTIME push $CPO_IMAGE_REPO:NEW_TAG
5. Re-run Test
Run the test again with updated code/images. Repeat until passing.
Common Test Patterns
| Test Pattern | Description |
|---|---|
TestNodePool |
All NodePool tests |
TestNodePool/HostedCluster0/Main/TestSpotTerminationHandler |
Specific spot test |
TestNodePool.*Karpenter |
All Karpenter-related tests |
TestCreateCluster |
Cluster creation tests |
TestUpgrade |
Upgrade tests |
Analyzing Test Failures
Check Test Output
The test output includes:
- Test name and status
- Assertion failures with expected vs actual
- Timeout information
- Resource creation/deletion logs
Check Artifact Directory
After a test failure, examine:
ls -la $E2E_ARTIFACT_DIR/
# Contains: cluster manifests, pod logs, events, resource dumps
Common Failure Patterns
| Pattern | Likely Cause |
|---|---|
context deadline exceeded |
Resource didn't reach expected state in time |
not found |
Resource wasn't created or was deleted prematurely |
connection refused |
Service not ready or network issue |
forbidden |
RBAC or permission issue |
Building Test Binary
When test code changes, rebuild:
make e2e
This compiles ./bin/test-e2e with all tests from test/e2e/.
Notes
- Tests typically take 10-30+ minutes depending on complexity
- Some tests create real AWS resources (costs money, needs cleanup on failure)
- Use
-test.timeoutto set appropriate timeouts (default: 2h) - The artifact directory is overwritten on each run
- For long tests, consider running in background and checking periodically
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
Create HC AWS
Create a HyperShift HostedCluster on AWS for development and testing, with optional custom CPO/HO images.
Install HO AWS
Install HyperShift Operator with private AWS and external-dns settings.
Build HO Image
Build and push hypershift-operator container image. Auto-applies when testing HO changes that require deploying to a live cluster.
Build CPO Image
Build and push control-plane-operator container image. Auto-applies when testing CPO changes that require deploying to a live cluster.
Git Environment
Create development environments with git worktrees, branches, commits, and push to remote. Auto-applies for git workflow tasks.
Destroy HC AWS
Destroy a HyperShift HostedCluster and all associated AWS infrastructure (VPC, IAM, Route53, etc.).
Didn't find tool you were looking for?