top of page

LLM as a Judge: Strategies & Tactics

AI Security

Can GenAI judge like a human?

As LLMs take on roles like scoring chatbot responses and assessing risk, their reliability, transparency, and fairness become critical. This ‘no-code’ workshop explores the paradigm of LLM-as-a-Judge, offering practical strategies and hands-on tactics for deploying LLMs as scalable, consistent evaluators. Participants will examine how LLMs replicate human judgement and where they fall short - especially around ambiguity, bias, and drift. Through a real-world use case and interactive exercise, you’ll learn to design robust evaluation frameworks, engineer effective prompts, and build reproducible systems that combine GenAI and human oversight.

Time & Place

October 30, 2025

13:30 - 15:00

Hôtel Mövenpick Amsterdam City Centre

Matterhorn III

Limited to 40 participants.

Meet Your Intructors

Frank Ebbers

Data Scientist, ING

Frank Ebbers is a Data Scientist at ING Global Analytics with a strong focus on Natural Language Processing and Generative AI. After building a long career in the fashion industry, he made the bold move to study Computer and Data Science and joined ING in 2000 within the Customer Dialogue Analytics domain. Over the years, he has specialized in chatbots, speech analytics, NLP tooling, and most recently GenAI applications.
Today, Frank is leading work on “LLM-as-a-Judge,” applying large language models to evaluate chatbot performance, bridging the gap between technical innovation and real-world business value.

What To Expect

Who Is This For?

  • Product Managers

  • Data Analysts

  • Data Scientists

  • Contact Center Managers

  • Business translators

Pre-Requisites

  • Basic understanding of chatbot evaluation, LLMs, and statistics or business analytics

What You'll Learn & Do?

  • Understanding the role of LLMs as a judge

  • Key Strategies for Effective Judgement with LLMs

  • Tactics on Enhancing Performance and Reliability

  • Understanding the complexities and weaknesses of LLMs as a judge

Agenda & Activities

Agenda for this session:

  • Getting Settled - 5 mins

  • Information Session - 40 mins

  • Break - 5 min

  • Individual / Group Exercise - 20 min

  • Q&A/Discussion - 20 min

  • Reflection - 5 min

Prerequisits:

  • Basic Understanding of Chatbot Evaluation, LLMs, and Statistics or Business Analytics

Registration

In order to register to our workshops you must purchase a Platinum Pass. With the Pass you are eligible to select up to 4 workshops. If you are interested in attending only one workshop you may purchase the Gold Plus Pass.

WhatsApp button (66 x 66 px).png
bottom of page