Test your AI Agents with the all-new Agent to Agent Testing Platform.Learn More

TestMu 2025 Home / Video /

QA in the Age of AI: Enhancing Agent Reliability Through Evaluation-Driven Development | Shadab Nazar | TestMu 2025

QA in the Age of AI: Enhancing Agent Reliability Through Evaluation-Driven Development | Shadab Nazar | TestMu 2025

Testμ

Testμ

Playlist

About the talk

In this thought-provoking TestMu session, 𝐒𝐡𝐚𝐝𝐚𝐛 𝐍𝐚𝐳𝐚𝐫, Lead Generative AI Architect at Splunk, dives into the future of QA in the Age of AI - where ensuring agent reliability demands a whole new mindset.

With over two decades of experience across Generative AI, system assurance, and NLP observability, Shadab shares real-world lessons from building enterprise-scale AI platforms and why QA professionals are uniquely positioned to lead this new era of evaluation-first development.

Traditional QA wasn’t built for probabilistic, data-driven systems that evolve constantly. But Shadab shows how the core skills of QA - test planning, scenario coverage, and automation can be reimagined through an evaluation-driven development framework to make AI agents more reliable, transparent, and trustworthy.

Key Takeaways:

✔ Why AI agents require a rethinking - not a replacement of traditional QA practices.

✔ How QA teams can drive evaluation-first development workflows.

✔ Practical tools and techniques for automating AI agent testing and monitoring.

✔ Real-world examples of QA practices improving AI agent reliability and user experience.

Testμ

Testμ

LambdaTest is an AI-Native test orchestration and execution platform that allows you to perform both manual and automated testing across 3000+ environments, making it a top choice among other cloud testing platforms.

More Videos from TestMu 2025

LT Video

Opening Note by Joe Colantonio

LT Video

Keynote: Air Fryers, Automation, and AI

LT Video

Intent Over Scripts: Modernizing Software Testing with AI

LT Video

CI and the Great Flakiness Adventure

LT Video

AI for Accessibility: Empowering Inclusive Digital Experiences

LT Video

Panel Discussion: AI and Community: Shifting Roles, Rising Impact

LT Video

Ask Me Anything: Future-Proof Your Career: AI, Testing & Path Ahead

LT Video

2025: The Agentic Shift – Are We Reasoning, Or Just Retrieving Smarter?

LT Video

Accelerating Success: How to Optimize Value Delivery with DORA

LT Video

Rapid Threat-to-Test for Agents

LT Video

How to Build Enterprise-Grade AI Agents Using Robust Evaluation

LT Video

Ship Code. Without Writing It.

LT Video

Exploratory Testing with AI

LT Video

AI-Powered Debugging & Browser Automation with Playwright MCP

LT Video

Network Control for End-to-End Web Testing

LT Video

Keynote: Zero-UI Engineering: Architecting Systems for Agent Experience (AX)

LT Video

So you think a new tool will help? Here’s an idea-t to think about…

LT Video

AI, Automation & DevEx: Fueling High-Velocity Engineering

LT Video

Fast Doesn’t Mean Fragile: Delivering AI-Powered Software at Scale

LT Video

How to Test LLM Agents

LT Video

The Great Reckoning: How AI is Exposing the Existential Crisis of Software Testing

LT Video

Your Test Suite Can’t Catch a Hallucination: Real Talk on AI in Production

LT Video

Event Driven Architecture: Love Triangle in Integration Testing

LT Video

When Life Gives You Lemons… Are You Counting Them or Making Lemonade?

LT Video

AI-Driven Quality Engineering Practices

LT Video

Transforming Retail with Quality Engineering for Seamless Digital Experiences

LT Video

Role of Quality Engineering in Shaping the Future of Financial Services

LT Video

Opening Note Day 2

LT Video

Build Your Testing Sidekick: Custom Tools with Model Context Protocol

LT Video

Reactive Browser Testing with WebDriver BiDi

LT Video

How Software Testing can Increase Agent Autonomy

LT Video

What can go wrong with AI in testing?

LT Video

Code It Forward: Making Your Mark in Open Source

LT Video

Testing the Untestable: Agent to Agent Testing

LT Video

Testing Early, Testing Right - Balancing Early Testing with Real-World Reliability

LT Video

The Enterprise AI Playbook: Strategies for Scaling AI in Quality Engineering

LT Video

Advanced Playwright with AI

LT Video

Generative to Agentic to Quantum - The Evolution of AI

LT Video

Test Data Key to Effective Test Coverage

LT Video

QE Strategic Shift: What's Changing with AI, Automation, and Speed?

LT Video

Building AI Fluency: Leading Teams Through the Learning Curve

LT Video

The Practical Automation Playbook

LT Video

Building a Handwriting Recognition System for the New York Times Crossword

LT Video

Agentic Cloud: Using Agents to Build Tomorrow’s Cloud

LT Video

QA to QE: Scaling Quality with Ownership, Code, and Culture

LT Video

Automated Test Data Portal for Financial Services

LT Video

QA in the Age of AI: Enhancing Agent Reliability Through Evaluation-Driven Development

LT Video

Ensuring quality testing in an AI-driven world

LT Video

AI-Driven Strategies for Scalable & Resilient Performance Engineering

LT Video

Day 3 Opening Note

LT Video

Mastering Appium 3: Architecture, Gestures & Beyond

LT Video

From Zero to MCP: Automating Test Environments for DevOps & QA

LT Video

AI & GenAI in Quality Engineering: Crawl, Walk, Run

LT Video

Embracing Agentic AI: From Autonomous Goals to Enterprise Guarantees

LT Video

Oops, AI Did It Again: How to Get AI to Stop Being Weird and Actually Be Useful

LT Video

Should We Let AI Take Over Test Automation Completely?

LT Video

From Hours to Minutes: Run Thousands of CI Tests in Just Minute

LT Video

Evaluating RAG Applications: From Retrieval to Response Quality

LT Video

Stop Breaking Your Teams: Seeing the Whole Instead of Pieces

LT Video

Surviving and Thriving with AI in QA

LT Video

The Quality Leadership Shift: From Compulsiveness to Cautiousness

LT Video

Full Court Quality: Lacing Up for Speed, Stability & Style

LT Video

Navigating Mobile App Testing and App Store Rejection: From Review to Release

LT Video

Randomized testing: Gotta Catch ‘Em All

LT Video

Balancing release & sprint delivery speed with thorough testing

LT Video

Building for AI at Scale: Infrastructure, Integrity, and Innovation

LT Video

Trusting the Machine: Building Confidence in AI-Driven Testing Decisions

LT Video

Observability - Holistic Quality across Software Systems

LT Video

From SDLC to ADLC: The Enterprise Agent Development Lifecycle

LT Video

Agentic Testing: Your Skilled Human Tester

LT Video

Evolution of Quality Engineering in Financial Services

LT Video

Closing Note Day 3