Stagehand

AI browser automation with natural language for interaction/extraction. Extends Playwright’s Page object — key concepts: page.goto(url) navigates, page.locator() finds elements, expect(locator).toBeVisible() asserts. Stagehand adds page.act() (natural language actions) and page.extract() (structured data extraction) on top.

When to Use

Automating browser flows where selectors are fragile or unknown
Extracting structured data from web pages
Discovering page elements before writing precise selectors
Don’t use for: deterministic tests (playwright), static scraping, non-browser

Critical Patterns

✅ REQUIRED: page.act() for AI-Driven Actions

Natural language instructions, Stagehand finds element and acts.

// CORRECT: descriptive, single action
await page.act('Click the "Sign in" button');
await page.act('Type "user@example.com" into the email field');
// WRONG: vague or compound instruction
await page.act("Do the login thing");
await page.act("Fill the form and submit it");

✅ REQUIRED: page.extract() for Structured Data

Zod schema extracts typed data from page.

import { z } from "zod";
const product = await page.extract({
  instruction: "Extract the product details from this page",
  schema: z.object({
    name: z.string(),
    price: z.number(),
    inStock: z.boolean(),
  }),
});
// WRONG: no schema (unstructured text)
const raw = await page.extract({ instruction: "Get product info" });

✅ REQUIRED: page.observe() for Element Discovery

Inspect AI view before act/extract calls.

const actions = await page.observe("What actions are available?");
console.log(actions); // [{ description: "Sign in button", selector: "..." }]
await page.act("Click the sign in button");

✅ REQUIRED: Precise Prompting

One action per act(), specific nouns, quoted literals.

// CORRECT: specific, single-step instructions
await page.act('Click the "Add to cart" button for "Wireless Mouse"');
// WRONG: ambiguous references
await page.act("Click the button");

✅ REQUIRED: Error Handling with Retries

Retry logic for AI actions on dynamic pages.

async function actWithRetry(page: Page, instruction: string, retries = 3) {
  for (let i = 0; i < retries; i++) {
    try {
      await page.act(instruction);
      return;
    } catch (e) {
      if (i === retries - 1) throw e;
      await new Promise((r) => setTimeout(r, 1000 * (i + 1)));
    }
  }
}

✅ REQUIRED: Combining with Playwright for Hybrid Approach

Use Stagehand for discovery, then Playwright for stable execution.

// Phase 1: Use Stagehand to discover elements
const actions = await page.observe("What buttons are on this page?");
console.log(actions); // [{ description: "Submit button", selector: "button[type=submit]" }]

// Phase 2: Use Playwright with discovered selectors for fast, stable tests
import { test as playwrightTest } from "@playwright/test";
playwrightTest("submit form", async ({ page }) => {
  await page.goto("/form");
  await page.locator("button[type=submit]").click(); // Fast, no LLM call
});

// WRONG: Using Stagehand in every test run (slow, costly, flaky)
test("submit form", async () => {
  await page.act("Click the submit button"); // LLM call on every run
});

✅ REQUIRED: Batch Extraction for Performance

Extract multiple data points in one call to reduce LLM requests.

// CORRECT: Single extract call with comprehensive schema
const pageData = await page.extract({
  instruction: "Extract all product information from this page",
  schema: z.object({
    product: z.object({
      name: z.string(),
      price: z.number(),
      inStock: z.boolean(),
      reviews: z.array(
        z.object({
          author: z.string(),
          rating: z.number(),
          text: z.string(),
        }),
      ),
    }),
  }),
});

// WRONG: Multiple extracts (slow, 4 LLM requests)
const name = await page.extract({
  instruction: "Get product name",
  schema: z.object({ name: z.string() }),
});
const price = await page.extract({
  instruction: "Get price",
  schema: z.object({ price: z.number() }),
});
const inStock = await page.extract({
  instruction: "Check if in stock",
  schema: z.object({ inStock: z.boolean() }),
});
const reviews = await page.extract({
  instruction: "Get reviews",
  schema: z.object({ reviews: z.array(z.any()) }),
});

Decision Tree

Known, stable selectors?
  → Use playwright directly instead

Selectors unknown or fragile?
  → Use page.observe() then page.act()

Need structured data?
  → Use page.extract() with a Zod schema

Action keeps failing?
  → Make the instruction more specific, add retries

Exploring unfamiliar page?
  → Start with page.observe() to map elements

Building deterministic tests?
  → Prototype with Stagehand, convert to Playwright

Example

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";

const stagehand = new Stagehand({ env: "LOCAL" });
await stagehand.init();
const page = stagehand.page;
await page.goto("https://news.ycombinator.com");
const stories = await page.extract({
  instruction: "Extract the top 5 story titles and their URLs",
  schema: z.object({
    stories: z
      .array(
        z.object({
          title: z.string(),
          url: z.string(),
        }),
      )
      .max(5),
  }),
});
console.log(stories);
await stagehand.close();

Edge Cases

Dynamic SPAs: Call page.observe() after navigation or state changes to re-index the DOM. Stagehand doesn’t automatically detect SPA route changes.
Ambiguous elements: Add context like “the first”, “in the header”, or quote exact visible text: await page.act('Click the "Sign in" link in the navigation bar').
Rate limiting: Stagehand makes LLM calls per act() and extract(). Batch multiple data points into one extract() call with a comprehensive schema to reduce API calls.
Long pages: Scroll first with page.act('Scroll down to the pricing section') before extracting off-screen content. Stagehand’s vision is limited to viewport.
Iframes/popups: Stagehand operates on the main frame by default. Use Playwright’s page.frame() or page.context().pages() to switch context manually for iframes or popups.
Non-English pages: Include the language in instructions: await page.act('Click the button labeled "Enviar" (Spanish for Submit)'). LLM models handle multilingual content well when prompted.
Authentication flows: For multi-step auth (2FA, CAPTCHA), combine Stagehand for initial steps with Playwright’s storageState to save auth tokens and skip login in subsequent runs.
Cost optimization: Each act() and extract() call costs LLM tokens. Prototype with Stagehand, then convert stable flows to Playwright selectors using page.observe() to discover selectors.
Stale element detection: If page updates after observe(), re-run observe() before act(). Stagehand doesn’t track DOM mutations automatically.
Complex multi-step forms: Break into discrete act() calls with verification between steps: await page.act('Fill email field with "user@example.com"'), then await page.act('Fill password field with "pass123"'), not await page.act('Fill and submit the form').

Checklist

Each act() call contains a single, specific instruction
All extract() calls include a Zod schema for type safety
Retry logic wraps actions on dynamic pages
observe() is used first when exploring unfamiliar pages
Literal text values are quoted in instructions
Stable flows use Playwright directly — Stagehand is for AI flexibility