LLMs as a Judge: Governing Agentic AI
GenAI & Data
Can GenAI judge like a human?
As LLMs take on roles like scoring chatbot responses and assessing risk, their reliability, transparency, and fairness become critical. This ‘no-code’ workshop explores the paradigm of LLM-as-a-Judge, offering practical strategies and hands-on tactics for deploying LLMs as scalable, consistent evaluators. Participants will examine how LLMs replicate human judgement and where they fall short - especially around ambiguity, bias, and drift. Through a real-world use case and interactive exercise, you’ll learn to design robust evaluation frameworks, engineer effective prompts, and build reproducible systems that combine GenAI and human oversight.
Time & Place
Wed, March 25
13:00 - 14:30
Salon Tiergarten
Classroom Seating
Max. Capacity: 32 Seats
Meet Your Intructors

Speaker To Be Revealed
What To Expect
Who Is This For?
Product Managers
Data Analysts
Data Scientists
Contact Center Managers
Business translators
Pre-Requisites
Basic understanding of chatbot evaluation, LLMs, and statistics or business analytics
What You'll Learn & Do?
Understanding the role of LLMs as a judge
Key Strategies for Effective Judgement with LLMs
Tactics on Enhancing Performance and Reliability
Understanding the complexities and weaknesses of LLMs as a judge
Agenda & Activities
Agenda for this session:
Getting Settled - 5 mins
Information Session - 40 mins
Break - 5 min
Individual / Group Exercise - 20 min
Q&A/Discussion - 20 min
Reflection - 5 min
Prerequisits:
Basic Understanding of Chatbot Evaluation, LLMs, and Statistics or Business Analytics
.png)