Shift-Right Testing: How to Embed Quality in Production
Shift-right testing goes beyond shift-left to embed continuous quality validation in production. Learn canary releases, synthetic monitoring, chaos engineering, and observability-driven QA strategies that catch what staging never will.
For the past decade, "shift-left" has been the dominant quality engineering mantra — test earlier, test closer to the requirements, find defects before they're expensive. The advice is sound and the results are real. But shift-left alone has a blind spot: staging environments are not production.
No matter how good your pre-deployment testing is, production has data, traffic patterns, infrastructure configurations, third-party integrations, and user behaviour that staging never fully replicates. The World Quality Report 2025–26 found that 38% of organisations are now running shift-right pilots — validating quality continuously in production, not just before deployment.
This guide covers the strategies, tools, and practices that make up a mature shift-right quality approach.
What Shift-Right Testing Is
Shift-right testing is the practice of continuing quality validation after deployment, using production systems and real user traffic as the test environment. It is not a replacement for pre-deployment testing — it's an additional layer that catches defects that only manifest at scale, in real conditions.
The core insight: the question "did our tests pass?" is less useful than "is our production system healthy for real users right now?"
Shift-right encompasses:
- Synthetic monitoring — scripted user journeys running continuously against production
- Canary releases — gradual rollouts with automated quality gates
- Chaos engineering — deliberate fault injection to validate resilience
- Observability-driven QA — deriving test insights from production telemetry
- A/B testing validation — quality checks across experiment variants
- Feature flag testing — validating behaviour across flag configurations
Synthetic Monitoring
Synthetic monitoring runs automated tests against production on a continuous schedule — every minute, every 5 minutes, every hour. Unlike passive alerting (which fires when something already broken reaches users), synthetic monitoring proactively checks critical flows before users encounter issues.
Setting up Playwright-based synthetic monitoring with Checkly
Checkly executes Playwright scripts on a schedule and alerts on failures. You write tests once and they run from multiple geographic locations continuously.
// checkly.config.ts
import { defineConfig } from 'checkly'
import { Frequency } from 'checkly/constructs'
export default defineConfig({
projectName: 'InnovateBits Production',
logicalId: 'innovatebits-prod',
repoUrl: 'https://github.com/your-org/your-repo',
checks: {
activated: true,
muted: false,
runtimeId: '2024.02',
frequency: Frequency.EVERY_5M,
locations: ['us-east-1', 'eu-west-1', 'ap-southeast-1'],
tags: ['production'],
alertChannels: [],
checkMatch: '**/__checks__/**/*.check.ts',
},
})// __checks__/homepage.check.ts
import { BrowserCheck, Frequency } from 'checkly/constructs'
new BrowserCheck('homepage-check', {
name: 'Homepage loads correctly',
frequency: Frequency.EVERY_5M,
locations: ['us-east-1', 'eu-west-1'],
code: {
entrypoint: './homepage.spec.ts'
}
})// homepage.spec.ts (standard Playwright test)
import { test, expect } from '@playwright/test'
test('homepage is accessible and functional', async ({ page }) => {
await page.goto(process.env.ENVIRONMENT_URL ?? 'https://www.yourapp.com')
// Core availability check
await expect(page).toHaveTitle(/YourApp/)
// Navigation functional
await expect(page.getByRole('navigation')).toBeVisible()
// Core CTA present
await expect(page.getByRole('button', { name: /get started/i })).toBeVisible()
// Performance check (basic)
const navigationTiming = await page.evaluate(() => {
const [entry] = performance.getEntriesByType('navigation') as PerformanceNavigationTiming[]
return entry?.loadEventEnd - entry?.startTime
})
expect(navigationTiming).toBeLessThan(5000) // Alert if >5s load time
})What to monitor synthetically
Not everything warrants synthetic monitoring. Prioritise flows where:
- A failure would directly impact revenue (checkout, payment, auth)
- A failure would be invisible until many users experience it
- Dependencies on third-party services could silently degrade
Standard synthetic monitoring suite:
- Homepage availability + load time
- Login flow (creates a real session)
- Core user journey (add to cart, checkout initiation)
- API health endpoints
- Key integrations (payment provider, auth service, search)
Canary Releases with Quality Gates
A canary release deploys new code to a small percentage of production traffic (typically 1–10%) while keeping the majority on the stable version. Quality gates monitor the canary cohort and trigger automatic rollback if metrics degrade.
Quality gate metrics
Error rate gate: If the error rate on the canary cohort exceeds baseline by more than 1%, pause or rollback.
Latency gate: If p95 latency increases by more than 20%, investigate before continuing rollout.
Business metric gate: If conversion rate or key business events drop significantly in the canary cohort, rollback.
Implementation with Argo Rollouts (Kubernetes)
# argo-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
strategy:
canary:
steps:
- setWeight: 5 # 5% canary
- pause: { duration: 10m }
- analysis:
templates:
- templateName: error-rate-check
- setWeight: 25 # 25% if analysis passed
- pause: { duration: 10m }
- analysis:
templates:
- templateName: latency-check
- setWeight: 100 # Full rollout
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: error-rate-check
spec:
metrics:
- name: error-rate
interval: 1m
successCondition: result[0] < 0.01 # <1% error rate
failureLimit: 3
provider:
prometheus:
address: http://prometheus.monitoring:9090
query: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))Chaos Engineering
Chaos engineering is the practice of deliberately introducing failures into your production system to validate that it handles them gracefully. The goal is to find resilience gaps before real failures find them for you.
Netflix famously pioneered chaos engineering with Chaos Monkey, which randomly terminates production instances. For most teams, a more measured approach is appropriate.
Starting with "game days"
A game day is a scheduled chaos exercise where your team deliberately introduces a failure scenario and evaluates how the system responds. Run them in staging before production:
Example game days:
- Kill the primary database and verify failover completes within SLA
- Throttle the payment service to 10% of normal capacity and verify the checkout flow degrades gracefully
- Introduce 500ms latency to the search API and verify caching prevents user impact
- Terminate 50% of API server instances and verify auto-scaling replaces them before requests start failing
Chaos tools
Chaos Monkey (Netflix, open source) — randomly terminates EC2 instances Litmus (CNCF) — cloud-native chaos engineering for Kubernetes Gremlin — commercial chaos-as-a-service with a broad attack library AWS Fault Injection Simulator — AWS-managed chaos for AWS workloads
# Litmus: inject pod failure in a namespace
kubectl apply -f - <<EOF
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: pod-failure-experiment
spec:
engineState: 'active'
appinfo:
appns: 'production'
applabel: 'app=payment-service'
experiments:
- name: pod-failure
spec:
components:
env:
- name: TOTAL_CHAOS_DURATION
value: '60' # seconds
- name: CHAOS_INTERVAL
value: '10'
- name: FORCE
value: 'false'
EOFObservability-Driven QA
Observability (logs, metrics, traces) is not just for operations — it's a quality engineering tool. Production telemetry reveals defects that never surface in testing.
Deriving test insights from production data
User error patterns → new test cases
If your logs show that users frequently encounter a specific error (e.g., "Invalid phone format" on the registration form), that's a signal your test suite lacks coverage for that input pattern. Mine your error logs for user-encountered failures and add test cases for the top patterns.
Performance regressions → performance tests
If your traces show that a specific API endpoint slowed from 50ms to 500ms after a deploy, that's a missing performance test. Add a latency assertion to your CI pipeline for that endpoint.
Feature flag states → test matrix gaps
If your feature flags create 8 possible configuration states and your tests only cover 2 of them, production is testing the other 6. Map your flag states to your test matrix and close the gaps.
Recommended observability stack for QE
# docker-compose for local observability (mirrors production stack)
services:
jaeger: # Distributed tracing
image: jaegertracing/all-in-one:latest
ports: ["16686:16686", "6831:6831/udp"]
prometheus: # Metrics
image: prom/prometheus:latest
ports: ["9090:9090"]
grafana: # Dashboards
image: grafana/grafana:latest
ports: ["3001:3000"]
loki: # Log aggregation
image: grafana/loki:latest
ports: ["3100:3100"]Combining Shift-Left and Shift-Right
The most effective quality strategy combines both approaches into a continuous quality loop:
Requirements
↓
[Shift-Left] Acceptance criteria + test cases written
↓
[Shift-Left] Unit + API + E2E tests in CI
↓
Deploy to staging
↓
[Shift-Left] Full regression suite
↓
Canary deploy (5%)
↓
[Shift-Right] Quality gates on canary cohort
↓
Full deploy
↓
[Shift-Right] Synthetic monitoring (continuous)
↓
[Shift-Right] Observability data → new test cases
↓
(Back to requirements for next feature)
Each stage catches defects that the previous stage would miss. The system is self-improving: production monitoring generates new test cases that strengthen pre-deployment testing, which in turn reduces production incidents.
Getting Started with Shift-Right
If you're starting from zero, the highest-value first step is synthetic monitoring on your three most critical user flows. The setup takes a few hours. The value — continuous visibility into production quality — is immediate and ongoing.
From there: add a canary release process for your highest-risk deployments. Then build toward chaos engineering as your production confidence grows.
Shift-right doesn't replace shift-left. It completes it.
For the CI/CD pipeline foundations that shift-right builds on, see our CI/CD Pipeline Guide. For the QE strategy that ties both approaches together, see our Quality Engineering Strategy Roadmap.