Test your AI Agents with the all-new Agent to Agent Testing Platform.Learn More

TestMu 2025 Home / Video /

How to Build Enterprise-Grade AI Agents Using Robust Evaluation | Viktoria Semaan | TestMu 2025

How to Build Enterprise-Grade AI Agents Using Robust Evaluation | Viktoria Semaan | TestMu 2025

Testμ

Testμ

Playlist

About the talk

In her insightful session at TestMu Conference 2025, 𝐕𝐢𝐤𝐭𝐨𝐫𝐢𝐚 𝐒𝐞𝐦𝐚𝐚𝐧, Principal Technical Evangelist, Databricks, dives deep into the complexities of building enterprise-level AI agents. She emphasizes the importance of robust evaluation frameworks that go beyond basic benchmarks to measure the true performance of AI agents in real-world applications.

The session also covers key considerations for adapting AI agents to enterprise environments, ensuring they can handle multi-step tasks with minimal errors while maintaining efficiency. With a live demo and actionable insights, Viktoria guides you with the tools needed to assess and improve your AI agents for seamless integration into production environments.

Key Takeaways:

✔ The journey from basic LLMs to sophisticated multi-agent systems.

✔ How to evaluate AI agents for scalability, reliability, and safety.

✔ A framework to define and measure key performance indicators for AI agents.

✔ A live demo using MLflow to implement and evaluate AI systems.

Testμ

Testμ

Testμ(TestMu) Conference is LambdaTest’s annual flagship event, one of the world’s largest virtual software testing conferences dedicated to decoding the future of testing and development. Built by the community, for the community, it’s a space where you’re at the center, connecting, learning, and leading together. From deep-dive sessions on emerging trends in engineering, testing, and DevOps, to hands-on workshops and inspiring culture-driven talks, every experience is designed to keep you at the heart of the conversation.

More Videos from TestMu 2025

LT Video

Opening Note by Joe Colantonio

LT Video

Keynote: Air Fryers, Automation, and AI

LT Video

Intent Over Scripts: Modernizing Software Testing with AI

LT Video

CI and the Great Flakiness Adventure

LT Video

AI for Accessibility: Empowering Inclusive Digital Experiences

LT Video

Panel Discussion: AI and Community: Shifting Roles, Rising Impact

LT Video

Ask Me Anything: Future-Proof Your Career: AI, Testing & Path Ahead

LT Video

2025: The Agentic Shift – Are We Reasoning, Or Just Retrieving Smarter?

LT Video

Accelerating Success: How to Optimize Value Delivery with DORA

LT Video

Rapid Threat-to-Test for Agents

LT Video

How to Build Enterprise-Grade AI Agents Using Robust Evaluation

LT Video

Ship Code. Without Writing It.

LT Video

Exploratory Testing with AI

LT Video

AI-Powered Debugging & Browser Automation with Playwright MCP

LT Video

Network Control for End-to-End Web Testing

LT Video

Keynote: Zero-UI Engineering: Architecting Systems for Agent Experience (AX)

LT Video

So you think a new tool will help? Here’s an idea-t to think about…

LT Video

AI, Automation & DevEx: Fueling High-Velocity Engineering

LT Video

Fast Doesn’t Mean Fragile: Delivering AI-Powered Software at Scale

LT Video

How to Test LLM Agents

LT Video

The Great Reckoning: How AI is Exposing the Existential Crisis of Software Testing

LT Video

Your Test Suite Can’t Catch a Hallucination: Real Talk on AI in Production

LT Video

Event Driven Architecture: Love Triangle in Integration Testing

LT Video

When Life Gives You Lemons… Are You Counting Them or Making Lemonade?

LT Video

AI-Driven Quality Engineering Practices

LT Video

Transforming Retail with Quality Engineering for Seamless Digital Experiences

LT Video

Role of Quality Engineering in Shaping the Future of Financial Services

LT Video

Opening Note Day 2

LT Video

Build Your Testing Sidekick: Custom Tools with Model Context Protocol

LT Video

Reactive Browser Testing with WebDriver BiDi

LT Video

How Software Testing can Increase Agent Autonomy

LT Video

What can go wrong with AI in testing?

LT Video

Code It Forward: Making Your Mark in Open Source

LT Video

Testing the Untestable: Agent to Agent Testing

LT Video

Testing Early, Testing Right - Balancing Early Testing with Real-World Reliability

LT Video

The Enterprise AI Playbook: Strategies for Scaling AI in Quality Engineering

LT Video

Advanced Playwright with AI

LT Video

Generative to Agentic to Quantum - The Evolution of AI

LT Video

Test Data Key to Effective Test Coverage

LT Video

QE Strategic Shift: What's Changing with AI, Automation, and Speed?

LT Video

Building AI Fluency: Leading Teams Through the Learning Curve

LT Video

The Practical Automation Playbook

LT Video

Building a Handwriting Recognition System for the New York Times Crossword

LT Video

Agentic Cloud: Using Agents to Build Tomorrow’s Cloud

LT Video

QA to QE: Scaling Quality with Ownership, Code, and Culture

LT Video

Automated Test Data Portal for Financial Services

LT Video

QA in the Age of AI: Enhancing Agent Reliability Through Evaluation-Driven Development

LT Video

Ensuring quality testing in an AI-driven world

LT Video

AI-Driven Strategies for Scalable & Resilient Performance Engineering

LT Video

Day 3 Opening Note

LT Video

Mastering Appium 3: Architecture, Gestures & Beyond

LT Video

From Zero to MCP: Automating Test Environments for DevOps & QA

LT Video

AI & GenAI in Quality Engineering: Crawl, Walk, Run

LT Video

Embracing Agentic AI: From Autonomous Goals to Enterprise Guarantees

LT Video

Oops, AI Did It Again: How to Get AI to Stop Being Weird and Actually Be Useful

LT Video

Should We Let AI Take Over Test Automation Completely?

LT Video

From Hours to Minutes: Run Thousands of CI Tests in Just Minute

LT Video

Evaluating RAG Applications: From Retrieval to Response Quality

LT Video

Stop Breaking Your Teams: Seeing the Whole Instead of Pieces

LT Video

Surviving and Thriving with AI in QA

LT Video

The Quality Leadership Shift: From Compulsiveness to Cautiousness

LT Video

Full Court Quality: Lacing Up for Speed, Stability & Style

LT Video

Navigating Mobile App Testing and App Store Rejection: From Review to Release

LT Video

Randomized testing: Gotta Catch ‘Em All

LT Video

Balancing release & sprint delivery speed with thorough testing

LT Video

Building for AI at Scale: Infrastructure, Integrity, and Innovation

LT Video

Trusting the Machine: Building Confidence in AI-Driven Testing Decisions

LT Video

Observability - Holistic Quality across Software Systems

LT Video

From SDLC to ADLC: The Enterprise Agent Development Lifecycle

LT Video

Agentic Testing: Your Skilled Human Tester

LT Video

Evolution of Quality Engineering in Financial Services

LT Video

Closing Note Day 3