Test your AI Agents with the all-new Agent to Agent Testing Platform.Learn More

Testμ 2024 Home / Video /

How to Scalably Test LLMs | Anand Kannappan

How to Scalably Test LLMs | Anand Kannappan

Testμ

Testμ

Playlist

About the talk

Watch this session as 𝐀𝐧𝐚𝐧𝐝 𝐊𝐚𝐧𝐧𝐚𝐩𝐩𝐚𝐧, Co-Founder and CEO of Patronus AI addresses the unpredictability of Large Language Models (LLMs) and the most reliable automated methods for testing them at scale. He discusses why intrinsic evaluation metrics like perplexity often don’t align with human judgments and why open-source LLM benchmarks may not be the best way to measure AI progress anymore.

Key Highlights

35+ Sessions

60+ Speakers

20,000+ Attendees

2000+ Minutes

Live Q&As

Key Topics Covered

Intrinsic evaluation metrics like perplexity tend to be weakly correlated with human judgments, so they shouldn't be used to evaluate LLMs post-training.

Creating test cases to measure LLM performance is as much an art as it is a science. Test sets should be diverse in distribution and cover as much of the use case scope as possible.

Open-source LLM benchmarks are no longer trustworthy to measure progress in AI since most LLM developers have already trained on them.

Testμ

Testμ

Testμ(TestMu) Conference is LambdaTest’s annual flagship event, one of the world’s largest virtual software testing conferences dedicated to decoding the future of testing and development. Built by the community, for the community, it’s a space where you’re at the center, connecting, learning, and leading together. From deep-dive sessions on emerging trends in engineering, testing, and DevOps, to hands-on workshops and inspiring culture-driven talks, every experience is designed to keep you at the heart of the conversation.

More Videos from Testμ 2024