Vibe Testing: The QA Answer to Vibe Coding
Vibe testing applies the same AI-first, natural-language approach of vibe coding to quality assurance — writing tests by describing intent, not scripting steps. Here's how QA engineers can adopt vibe testing workflows, which tools enable it, and where human judgment still matters most.
If vibe coding is writing software by describing what you want to an AI, vibe testing is validating it the same way. The phrase is new — coined in 2025 as the natural QA counterpart to vibe coding — but the concept represents something genuine: a shift from writing test scripts to describing test intent, and letting AI handle the execution.
For QA engineers who have spent years wrestling with brittle selectors, flaky wait conditions, and test maintenance overhead, vibe testing offers a compelling promise: focus on what needs to be validated, not how the automation is written.
What Vibe Testing Is
The core idea: instead of writing explicit automation steps, you describe your testing intent in natural language. An AI agent interprets that intent, navigates the application, executes the appropriate interactions, and reports findings.
Traditional test automation:
test('user can add item to cart', async ({ page }) => {
await page.goto('/products');
await page.locator('[data-testid="product-card"]').first().click();
await page.locator('[data-testid="add-to-cart-btn"]').click();
await expect(page.locator('[data-testid="cart-count"]')).toHaveText('1');
});Vibe testing equivalent:
"Verify that a user can browse the product catalogue, add an item to their cart,
and see the cart count update correctly."
The AI agent navigates the product catalogue, identifies an item to add, clicks the appropriate elements, and verifies the cart count — without the QA engineer specifying a single selector or step.
This is not science fiction. It's available today, with real tools.
Why Vibe Testing Is Becoming Relevant
Two forces are driving adoption:
Vibe-coded applications need vibe testing. When a product manager builds a feature by talking to an AI, asking them to review a Playwright test suite is unrealistic. But asking them to describe what a user should be able to do — and validating that in natural language — is accessible. Vibe testing democratises QA participation across non-technical team members.
Traditional automation has a maintenance problem. The industry average is that 30–40% of automation effort is spent maintaining existing tests, not writing new ones. Selector-based tests break every time the UI changes. Vibe testing approaches use semantic understanding rather than brittle selectors, reducing (though not eliminating) maintenance overhead.
Tools Enabling Vibe Testing
Stagehand
Stagehand (by Browserbase) adds AI-powered actions to Playwright. You write natural language instructions alongside your existing test code:
import { Stagehand } from '@browserbasehq/stagehand';
import { z } from 'zod';
const stagehand = new Stagehand({ env: 'LOCAL' });
await stagehand.init();
const page = stagehand.page;
await page.goto('https://staging.yourapp.com');
// Natural language action — Stagehand figures out the implementation
await page.act({ action: 'find the search bar and search for "wireless headphones"' });
await page.act({ action: 'click on the first search result' });
await page.act({ action: 'add the item to the shopping cart' });
// Extract structured data with AI
const cartState = await page.extract({
instruction: 'extract the cart item count from the header',
schema: z.object({
count: z.number(),
visible: z.boolean()
})
});
expect(cartState.count).toBe(1);What makes Stagehand different from pure natural language tools: it integrates with Playwright's existing infrastructure, so you can mix natural language actions with traditional selectors where you need precision. This hybrid approach is the most practical for production test suites.
Browser Use
Browser Use is a Python library that connects LLMs to browser automation. It's designed for fully autonomous tasks — you give it an objective, and it figures out how to achieve it:
from browser_use import Agent
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model='claude-sonnet-4-20250514')
# Exploratory testing: agent explores the checkout flow
agent = Agent(
task="""
Test the checkout flow of our e-commerce application at staging.yourapp.com.
Login as test@example.com / TestPass123.
Please test:
1. Can you complete a purchase with a valid credit card (use 4242 4242 4242 4242)?
2. What happens when you use an expired card (4000 0000 0000 0069)?
3. Can you apply a discount code?
4. What happens if you try to checkout with an empty cart?
Report what you find for each scenario, including any errors or unexpected behaviour.
""",
llm=llm,
)
result = await agent.run()
print(result)The agent navigates the application, executes the scenarios, and returns a structured report of its findings. This is genuinely useful for exploratory testing of complex flows.
testRigor
testRigor allows writing tests in plain English that non-technical stakeholders can understand and contribute to:
Open browser
Navigate to "https://staging.yourapp.com/login"
Enter "test@example.com" into "Email" field
Enter "TestPass123" into "Password" field
Click "Log In"
Check that page contains "Welcome"
Check that current URL contains "/dashboard"
The distinctive feature: testRigor's plain English tests are resilient to UI changes. If the login button is renamed from "Log In" to "Sign In," the test still works — the AI understands the semantic intent.
Claude's Playwright MCP Server
When combined with Claude's computer use capabilities, the Playwright MCP server enables a conversation-based testing workflow:
You: "Can you test the registration flow on staging.ourapp.com?
Check that it works for a new user, and also try with an
already-registered email to see if the error handling is correct."
Claude: [Opens browser, navigates to registration, creates test user,
verifies success, then tests duplicate email scenario]
"I tested the registration flow. Here's what I found:
✅ New user registration: Works correctly, confirmation email sent
✅ Duplicate email: Shows 'Email already registered' error
⚠️ The error message disappears after 3 seconds — users might miss it
❌ After registration, the 'Log In' link in the confirmation message
leads to a 404 page"
The 404 link and the disappearing error message are exactly the kind of issues scripted tests miss.
A Practical Vibe Testing Workflow
Here's how to integrate vibe testing into a realistic QA workflow — not as a replacement for structured automation, but as a complement:
Phase 1: Describe test objectives, not steps
When a new feature arrives in your sprint, start by writing test objectives in plain English before thinking about implementation:
Feature: User can set delivery preferences
Test objectives:
- User can save a preferred delivery address
- User can set delivery time preferences (morning, afternoon, evening)
- Saved preferences appear pre-filled on the checkout page
- User can update preferences and the change persists
- User cannot save an invalid address format
- Preferences are specific to each user account
These objectives are the input for both vibe testing (immediate AI-driven validation) and structured automation (explicit tests for regression suite inclusion).
Phase 2: Run AI-driven exploratory validation
Use Browser Use or the Playwright MCP to run the AI against the new feature with the objectives as the prompt. The agent will:
- Find ways to complete the happy path (and validate it works)
- Discover edge cases and error states you didn't think to specify
- Report unexpected UI behaviour
Review the AI's findings. Some will be genuine bugs. Some will be expected behaviour the AI interpreted incorrectly. This review is the QA judgment layer that vibe testing still requires.
Phase 3: Convert high-value findings to structured tests
The findings from Phase 2 inform your structured test suite. The AI's exploration has found the interesting cases — now encode them explicitly:
// Converted from AI exploratory finding:
// "After saving preferences, navigating away and returning
// doesn't always retain the saved values — intermittent"
test('saved delivery preferences persist across navigation', async ({ page }) => {
await loginAs(page, 'test@example.com');
await page.goto('/account/delivery-preferences');
await page.getByLabel('Preferred delivery time').selectOption('morning');
await page.getByRole('button', { name: 'Save' }).click();
// Navigate away
await page.goto('/dashboard');
await page.goto('/account/delivery-preferences');
// Verify persistence
await expect(page.getByLabel('Preferred delivery time')).toHaveValue('morning');
});Where Vibe Testing Has Limits
Vibe testing is powerful, but it's not a complete replacement for structured test automation. Be clear-eyed about where it falls short:
Non-deterministic results. AI-driven test execution can produce different results on different runs, even for the same scenario. Structured tests are deterministic — they either pass or fail, repeatably. Your regression suite needs that determinism.
Performance and load testing. Vibe testing operates at a single user, browser-level interaction. It cannot simulate 10,000 concurrent users or measure API response times at percentile levels.
Contract testing. API schema validation and contract tests require precise assertions against specific data types and structures — this is not a natural language task.
CI gate tests. Tests that must pass before a PR can merge need to be deterministic and fast. AI-driven tests are neither.
The right model: use vibe testing for exploratory and discovery work — where the value is in finding the unexpected. Use structured automation for regression and contract work — where the value is in reliable, repeatable verification.
The Vibe Testing Mindset
Beyond the tools, vibe testing represents a mindset shift that's worth adopting regardless of which tools you use:
Test objectives over test steps. The most valuable thing a QA engineer contributes is deciding what needs to be tested and why. The mechanics of test execution are increasingly automatable. Invest your attention accordingly.
Describe user intent, not system behaviour. "The user should be able to complete a purchase" is a better test objective than "the /checkout endpoint should return 200 with a transaction ID in the response body." Both are valid, but the user-intent framing catches more and breaks less.
Collaborate across roles. When tests can be written in plain English, product managers, designers, and customer success teams can contribute test cases. The QA engineer's role shifts to curating and prioritising this input, not owning all test creation.
Getting Started Today
The lowest-friction path into vibe testing: install Claude Desktop, connect the Playwright MCP server, and spend an hour having Claude explore your staging environment. Give it a feature to test, review what it finds, and assess how much of it is useful.
You'll quickly develop a sense of where AI-driven exploration adds value for your application and team. That's your starting point.
For a deeper look at vibe coding and why it changes QA requirements, see our Vibe Coding Guide for QA Engineers. For the Claude-specific tools that enable this workflow, see the next article in this series.