Introduction to Agentic AI in Quality Engineering
What agentic AI means for QE teams, how autonomous testing agents work, and how to start building your first self-healing test pipeline.
We've gone through waves of automation in QE — from record-and-playback tools to code-based frameworks to AI-assisted test generation. The next wave is agentic AI: systems that don't just generate tests but autonomously plan, execute, adapt, and repair them.
What is agentic AI?
An AI agent is a system that:
- Perceives its environment (your application, test results, CI logs)
- Reasons about what to do next
- Acts by calling tools (browsers, APIs, code editors)
- Learns from outcomes and adjusts
The key difference from traditional AI assistance: agents are autonomous loops, not single-shot prompts. They keep going until the task is done.
What agentic QE looks like in practice
Self-healing tests
The most immediate use case. When a Playwright test fails due to a selector change, an agent can:
1. Receive failing test + error message
2. Fetch the current DOM snapshot
3. Identify the element that moved/changed
4. Generate a new selector using semantic understanding
5. Update the test file
6. Run the test to verify the fix
7. Open a PR with the change
All without human intervention.
Autonomous test planning
Given a new feature spec, an agent can:
- Break the spec into testable scenarios
- Determine priority based on risk
- Generate test scripts for each scenario
- Schedule them in the appropriate test suite
Continuous monitoring agents
A background agent that:
- Watches your staging environment 24/7
- Runs smoke tests on every deployment
- Identifies regressions and creates tickets
- Correlates failures with recent code changes
Building your first agent with LangGraph
LangGraph is excellent for building stateful agents with clear decision loops.
from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
from typing import TypedDict, List
class TestAgentState(TypedDict):
failing_test: str
error_message: str
dom_snapshot: str
proposed_fix: str
fix_verified: bool
attempts: int
llm = ChatAnthropic(model="claude-opus-4-6")
def analyze_failure(state: TestAgentState) -> TestAgentState:
"""Agent node: understand what broke"""
response = llm.invoke(f"""
Failing test:
{state['failing_test']}
Error:
{state['error_message']}
Current DOM (relevant section):
{state['dom_snapshot']}
Identify what changed and propose a fix to the test selector or assertion.
Output only the corrected test code.
""")
return {**state, "proposed_fix": response.content}
def verify_fix(state: TestAgentState) -> TestAgentState:
"""Tool node: run the fixed test"""
# In reality, this calls subprocess to run playwright
result = run_playwright_test(state['proposed_fix'])
return {**state, "fix_verified": result.passed, "attempts": state['attempts'] + 1}
def should_retry(state: TestAgentState) -> str:
if state['fix_verified']:
return "commit"
if state['attempts'] >= 3:
return "escalate"
return "retry"
# Build the graph
graph = StateGraph(TestAgentState)
graph.add_node("analyze", analyze_failure)
graph.add_node("verify", verify_fix)
graph.add_edge("analyze", "verify")
graph.add_conditional_edges("verify", should_retry, {
"commit": END,
"retry": "analyze",
"escalate": END,
})
graph.set_entry_point("analyze")
agent = graph.compile()The agent observability problem
Autonomous agents are powerful but opaque. You need to know:
- What decisions did the agent make and why?
- Where did it go wrong?
- What tools did it call?
Always instrument your agents with tracing:
from langsmith import traceable
@traceable(name="test-repair-agent")
def run_repair_agent(failing_test: str, error: str):
return agent.invoke({
"failing_test": failing_test,
"error_message": error,
"dom_snapshot": fetch_dom_snapshot(),
"proposed_fix": "",
"fix_verified": False,
"attempts": 0,
})LangSmith gives you full visibility into every step, token, and decision.
Where to start
Don't try to build a full autonomous QE system on day one. The pragmatic path:
Week 1-2: AI test generation (single-shot, human reviews)
Week 3-4: Automated failure analysis (agent reads logs, suggests fixes)
Month 2: Self-healing selector repair (limited scope)
Month 3+: Full autonomous test maintenance pipeline
Start small, measure the time savings, and expand from there.
I'm running workshops on Agentic AI for QE teams. If your organization wants to explore this, reach out.