top of page

LLMs as a Judge: Governing Agentic AI

GenAI & Data

Can GenAI judge like a human?

As LLMs take on roles like scoring chatbot responses and assessing risk, their reliability, transparency, and fairness become critical. This ‘no-code’ workshop explores the paradigm of LLM-as-a-Judge, offering practical strategies and hands-on tactics for deploying LLMs as scalable, consistent evaluators. Participants will examine how LLMs replicate human judgement and where they fall short - especially around ambiguity, bias, and drift. Through a real-world use case and interactive exercise, you’ll learn to design robust evaluation frameworks, engineer effective prompts, and build reproducible systems that combine GenAI and human oversight.

Time & Place

Wed, March 25

13:00 - 14:30

Salon Tiergarten

Classroom Seating

Max. Capacity: 32 Seats

Meet Your Intructors

Speaker To Be Revealed

What To Expect

Who Is This For?

  • Product Managers

  • Data Analysts

  • Data Scientists

  • Contact Center Managers

  • Business translators

Pre-Requisites

  • Basic understanding of chatbot evaluation, LLMs, and statistics or business analytics

What You'll Learn & Do?

  • Understanding the role of LLMs as a judge

  • Key Strategies for Effective Judgement with LLMs

  • Tactics on Enhancing Performance and Reliability

  • Understanding the complexities and weaknesses of LLMs as a judge

Agenda & Activities

Agenda for this session:

  • Getting Settled - 5 mins

  • Information Session - 40 mins

  • Break - 5 min

  • Individual / Group Exercise - 20 min

  • Q&A/Discussion - 20 min

  • Reflection - 5 min

Prerequisits:

  • Basic Understanding of Chatbot Evaluation, LLMs, and Statistics or Business Analytics

Registration

In order to register to our workshops you must purchase a Platinum Pass. With the Pass you are eligible to select up to 4 workshops. If you are interested in attending only one workshop you may purchase the Gold Plus Pass.

WhatsApp button (66 x 66 px).png
bottom of page