Back to blog
AI#ai#playwright#test-generation#llm#claude

AI-Powered Test Generation with Claude and Playwright

How to use Anthropic Claude to automatically generate Playwright test scripts from user stories and design specs — a practical guide with real code.

March 15, 2026InnovateBits

One of the most exciting applications of Generative AI in QE is automated test generation. Instead of manually writing test cases from requirements, you can use an LLM to do the heavy lifting — and then review, refine, and run them.

Here's a practical workflow I've been using with Anthropic Claude + Playwright.

The problem with manual test writing

Writing comprehensive test suites is time-consuming:

  • A single feature can require 20–50 test cases when you account for edge cases
  • Requirements change and tests go stale
  • Engineers often skip writing negative tests due to time pressure
  • Coverage gaps are discovered too late (in production)

The AI-assisted approach

The workflow looks like this:

Requirements / User Story
        ↓
   Prompt Claude with context
        ↓
   Generated test cases (JSON)
        ↓
   Generated Playwright scripts
        ↓
   Human review + refinement
        ↓
   CI/CD integration

Step 1: Structure your prompt

The quality of generated tests depends entirely on your prompt. Here's a template that works well:

const systemPrompt = `
You are an expert QA engineer specializing in Playwright test automation.
Given a user story or feature description, generate comprehensive test cases covering:
1. Happy path scenarios
2. Edge cases and boundary values  
3. Negative/error scenarios
4. Accessibility checks
 
Output format: TypeScript Playwright test file with proper describe/test structure.
Use data-testid selectors where possible. Include meaningful test names.
`;
 
const featureDescription = `
Feature: User Login
- Users can log in with email and password
- Email must be valid format
- Password must be at least 8 characters
- After 5 failed attempts, account is locked for 30 minutes
- Successful login redirects to /dashboard
`;

Step 2: Call the API

import Anthropic from '@anthropic-ai/sdk';
 
const client = new Anthropic();
 
async function generateTests(featureDescription: string): Promise<string> {
  const response = await client.messages.create({
    model: 'claude-opus-4-6',
    max_tokens: 4096,
    system: systemPrompt,
    messages: [
      {
        role: 'user',
        content: `Generate Playwright tests for this feature:\n\n${featureDescription}`,
      },
    ],
  });
 
  const content = response.content[0];
  if (content.type === 'text') {
    return content.text;
  }
  throw new Error('Unexpected response type');
}

Step 3: What gets generated

Claude typically generates something like this:

import { test, expect } from '@playwright/test';
 
test.describe('User Login', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('/login');
  });
 
  test('successful login redirects to dashboard', async ({ page }) => {
    await page.fill('[data-testid="email-input"]', 'user@example.com');
    await page.fill('[data-testid="password-input"]', 'SecurePass123');
    await page.click('[data-testid="login-button"]');
 
    await expect(page).toHaveURL('/dashboard');
    await expect(page.locator('[data-testid="user-greeting"]')).toBeVisible();
  });
 
  test('invalid email format shows validation error', async ({ page }) => {
    await page.fill('[data-testid="email-input"]', 'not-an-email');
    await page.fill('[data-testid="password-input"]', 'SecurePass123');
    await page.click('[data-testid="login-button"]');
 
    await expect(page.locator('[data-testid="email-error"]'))
      .toHaveText('Please enter a valid email address');
  });
 
  test('account locks after 5 failed attempts', async ({ page }) => {
    for (let i = 0; i < 5; i++) {
      await page.fill('[data-testid="email-input"]', 'user@example.com');
      await page.fill('[data-testid="password-input"]', 'WrongPassword');
      await page.click('[data-testid="login-button"]');
    }
 
    await expect(page.locator('[data-testid="lockout-message"]'))
      .toContainText('account has been locked');
  });
 
  // ... more tests generated automatically
});

Step 4: Review and refine

AI-generated tests need human review for:

  • Selector accuracy — the AI guesses data-testid values; you need to verify or adjust
  • Assertion completeness — add business-specific assertions the AI couldn't know
  • Test data — replace placeholder values with real test data
  • Flakiness — add appropriate wait strategies where needed

Measuring the impact

In a recent project, we tested this approach on a 12-feature sprint:

MetricManualAI-Assisted
Time to first test45 min8 min
Tests per feature8 avg22 avg
Edge case coverage40%78%
Review time15 min

What's next: Agentic test maintenance

The next evolution is self-healing tests — AI agents that:

  1. Detect when tests fail due to UI changes (not bugs)
  2. Identify the changed selector or element
  3. Automatically update the test script
  4. Open a PR for human approval

That's a topic for a future post. For now, start with generation — the ROI is immediate.


Want to see this in action? Check the GitHub repo for a working example with the full pipeline.