End-to-End Testing

Orchestrates E2E testing strategy and architecture — delegates to playwright and stagehand skills.

When to Use

Designing test suites for frontend or backend user flows
Automating browser or API flows across services
Integrating E2E tests with CI/CD pipelines
Don’t use for: unit tests, component tests in isolation, load/performance testing

Critical Patterns

Test User Flows, Not Implementation

Each test should walk through a real user scenario rather than verifying internal state.

// CORRECT: tests the outcome the user sees
test('customer completes purchase', async ({ page }) => {
  await page.goto('/products');
  await page.getByRole('button', { name: 'Add to cart' }).first().click();
  await page.getByRole('link', { name: 'Cart' }).click();
  await page.getByRole('button', { name: 'Checkout' }).click();
  await expect(page.getByText('Order confirmed')).toBeVisible();
});
// WRONG: testing internal state
expect(store.getState().cart.items).toHaveLength(1);

Stable Selectors

Use selectors that survive refactors — data-testid for complex components, ARIA roles for standard elements.

// CORRECT: resilient selectors
await page.getByTestId('product-card').first().click();
await page.getByRole('navigation').getByRole('link', { name: 'Cart' }).click();
// WRONG: structural selectors that break on layout changes
await page.locator('div > div:nth-child(3) > a.link-blue').click();

Handle Async UI

Never sleep — rely on auto-wait or explicit conditions tied to visible DOM changes.

// CORRECT: wait for a real DOM condition
await page.getByRole('button', { name: 'Save' }).click();
await expect(page.getByRole('alert')).toHaveText('Saved');
// WRONG: arbitrary delay
await page.waitForTimeout(2000);

Test Data Management

Each test creates its own data and cleans up — no shared mutable state.

test.beforeEach(async ({ request }) => {
  await request.post('/api/test/seed', {
    data: { user: 'e2e-user-' + Date.now(), role: 'customer' },
  });
});
test.afterEach(async ({ request }) => {
  await request.post('/api/test/cleanup');
});

Assert Both Presence and Absence

Each user flow has a success path and failure paths. Assert both visible outcomes and absent states — an E2E test that only checks success misses half the contract.

// ✅ POSITIVE: success outcome is visible
await expect(page.getByText('Order confirmed')).toBeVisible();
await expect(page.getByRole('link', { name: 'My orders' })).toBeVisible();

// ✅ NEGATIVE: error state appears on invalid input; success state absent
await page.getByLabel('Email').fill('not-an-email');
await page.getByRole('button', { name: 'Place order' }).click();
await expect(page.getByText('Invalid email')).toBeVisible();
await expect(page.getByText('Order confirmed')).not.toBeVisible();
await expect(page.getByRole('button', { name: 'Place order' })).toBeDisabled();

Playwright assertion matchers — see playwright skill for toBeVisible, toBeDisabled, not.*.

CI Pipeline Integration

Run E2E as a dedicated CI stage after unit tests; upload artifacts on failure.

e2e-tests:
  needs: [unit-tests, build]
  steps:
    - run: npx playwright install --with-deps
    - run: npx playwright test --retries=1 --reporter=html
    - uses: actions/upload-artifact@v4
      if: failure()
      with: { name: playwright-report, path: playwright-report/ }

Decision Tree

Browser UI flow?
  → Delegate to the playwright skill

AI-driven automation?
  → Delegate to the stagehand skill

Need test data?
  → Seed via API in beforeEach, clean up in afterEach

Flaky in CI?
  → Add --retries=1, mock external services, upload traces

Testing auth flows?
  → Store storageState and reuse across tests

API-only flow?
  → Use Playwright request fixture or HTTP client

Slow suite?
  → Shard across CI workers with --shard=N/M

Example

import { test, expect } from '@playwright/test';
test.describe('Checkout flow', () => {
  test.beforeEach(async ({ request }) => {
    await request.post('/api/test/seed', {
      data: { products: ['widget-a'], user: 'checkout-user' },
    });
  });
  test('guest completes checkout', async ({ page }) => {
    await page.goto('/products');
    await page.getByTestId('product-card').first().click();
    await page.getByRole('button', { name: 'Add to cart' }).click();
    await page.getByRole('link', { name: 'Cart (1)' }).click();
    await page.getByRole('button', { name: 'Checkout' }).click();
    await page.getByLabel('Email').fill('guest@example.com');
    await page.getByRole('button', { name: 'Place order' }).click();
    await expect(page.getByText('Order confirmed')).toBeVisible();
  });
});

Edge Cases

Flaky network: Mock external APIs with page.route() in CI
Data races: Isolate test data per worker; never share DB rows between parallel tests
CI differences: Pin browser versions; use playwright install --with-deps
Long suites: Shard across CI workers (--shard=1/4)
Auth expiry: Generate short-lived tokens per run; don’t cache sessions across runs

Checklist

Each test covers a complete user flow from entry to outcome
All selectors use getByRole, getByTestId, or getByLabel
No waitForTimeout or manual sleeps
Test data is created and torn down per test
CI uploads trace/report artifacts on failure
External services are mocked in CI
Suite runs under 10 minutes (shard if needed)