Skills

Install

$ npx ai-agents-skills add --skill e2e-testing
Domain v1.1

End-to-End Testing

Orchestrates E2E testing strategy and architecture — delegates to playwright and stagehand skills.

When to Use

  • Designing test suites for frontend or backend user flows
  • Automating browser or API flows across services
  • Integrating E2E tests with CI/CD pipelines
  • Don’t use for: unit tests, component tests in isolation, load/performance testing

Critical Patterns

Test User Flows, Not Implementation

Each test should walk through a real user scenario rather than verifying internal state.

// CORRECT: tests the outcome the user sees
test('customer completes purchase', async ({ page }) => {
  await page.goto('/products');
  await page.getByRole('button', { name: 'Add to cart' }).first().click();
  await page.getByRole('link', { name: 'Cart' }).click();
  await page.getByRole('button', { name: 'Checkout' }).click();
  await expect(page.getByText('Order confirmed')).toBeVisible();
});
// WRONG: testing internal state
expect(store.getState().cart.items).toHaveLength(1);

Stable Selectors

Use selectors that survive refactors — data-testid for complex components, ARIA roles for standard elements.

// CORRECT: resilient selectors
await page.getByTestId('product-card').first().click();
await page.getByRole('navigation').getByRole('link', { name: 'Cart' }).click();
// WRONG: structural selectors that break on layout changes
await page.locator('div > div:nth-child(3) > a.link-blue').click();

Handle Async UI

Never sleep — rely on auto-wait or explicit conditions tied to visible DOM changes.

// CORRECT: wait for a real DOM condition
await page.getByRole('button', { name: 'Save' }).click();
await expect(page.getByRole('alert')).toHaveText('Saved');
// WRONG: arbitrary delay
await page.waitForTimeout(2000);

Test Data Management

Each test creates its own data and cleans up — no shared mutable state.

test.beforeEach(async ({ request }) => {
  await request.post('/api/test/seed', {
    data: { user: 'e2e-user-' + Date.now(), role: 'customer' },
  });
});
test.afterEach(async ({ request }) => {
  await request.post('/api/test/cleanup');
});

Assert Both Presence and Absence

Each user flow has a success path and failure paths. Assert both visible outcomes and absent states — an E2E test that only checks success misses half the contract.

// ✅ POSITIVE: success outcome is visible
await expect(page.getByText('Order confirmed')).toBeVisible();
await expect(page.getByRole('link', { name: 'My orders' })).toBeVisible();

// ✅ NEGATIVE: error state appears on invalid input; success state absent
await page.getByLabel('Email').fill('not-an-email');
await page.getByRole('button', { name: 'Place order' }).click();
await expect(page.getByText('Invalid email')).toBeVisible();
await expect(page.getByText('Order confirmed')).not.toBeVisible();
await expect(page.getByRole('button', { name: 'Place order' })).toBeDisabled();

Playwright assertion matchers — see playwright skill for toBeVisible, toBeDisabled, not.*.

CI Pipeline Integration

Run E2E as a dedicated CI stage after unit tests; upload artifacts on failure.

e2e-tests:
  needs: [unit-tests, build]
  steps:
    - run: npx playwright install --with-deps
    - run: npx playwright test --retries=1 --reporter=html
    - uses: actions/upload-artifact@v4
      if: failure()
      with: { name: playwright-report, path: playwright-report/ }

Decision Tree

Browser UI flow?
  → Delegate to the playwright skill

AI-driven automation?
  → Delegate to the stagehand skill

Need test data?
  → Seed via API in beforeEach, clean up in afterEach

Flaky in CI?
  → Add --retries=1, mock external services, upload traces

Testing auth flows?
  → Store storageState and reuse across tests

API-only flow?
  → Use Playwright request fixture or HTTP client

Slow suite?
  → Shard across CI workers with --shard=N/M

Example

import { test, expect } from '@playwright/test';
test.describe('Checkout flow', () => {
  test.beforeEach(async ({ request }) => {
    await request.post('/api/test/seed', {
      data: { products: ['widget-a'], user: 'checkout-user' },
    });
  });
  test('guest completes checkout', async ({ page }) => {
    await page.goto('/products');
    await page.getByTestId('product-card').first().click();
    await page.getByRole('button', { name: 'Add to cart' }).click();
    await page.getByRole('link', { name: 'Cart (1)' }).click();
    await page.getByRole('button', { name: 'Checkout' }).click();
    await page.getByLabel('Email').fill('guest@example.com');
    await page.getByRole('button', { name: 'Place order' }).click();
    await expect(page.getByText('Order confirmed')).toBeVisible();
  });
});

Edge Cases

  • Flaky network: Mock external APIs with page.route() in CI
  • Data races: Isolate test data per worker; never share DB rows between parallel tests
  • CI differences: Pin browser versions; use playwright install --with-deps
  • Long suites: Shard across CI workers (--shard=1/4)
  • Auth expiry: Generate short-lived tokens per run; don’t cache sessions across runs

Checklist

  • Each test covers a complete user flow from entry to outcome
  • All selectors use getByRole, getByTestId, or getByLabel
  • No waitForTimeout or manual sleeps
  • Test data is created and torn down per test
  • CI uploads trace/report artifacts on failure
  • External services are mocked in CI
  • Suite runs under 10 minutes (shard if needed)

Resources