Abstract
In this practical talk, we share how Booking.com built its AI Trip Planner - an LLM-powered experience that personalizes travel planning at scale. We’ll walk through real-world design decisions, technical challenges, and infrastructure optimizations involved in delivering real-time hotel and destination recommendations using large language models (LLMs).
We’ll cover key challenges like moderating user input, classifying intent, structuring dialogues, and generating grounded responses. Through prompt engineering and custom model development, we tailored LLM interactions to our product needs while ensuring speed and relevance.
To address inference latency, we implemented speculative decoding and integrated Medusa-1, a novel architecture that predicts multiple tokens in parallel, achieving a 1.8x speedup with no loss in quality. We’ll detail its design and training trade-offs.
Beyond acceleration, we’ll highlight our move toward agentic AI systems - modular components that orchestrate LLMs, retrieval services, and Booking.comAPIs to solve complex travel queries. For example: A Question-Answering Agent that fuses LLMs, real-time data, and APIs for context-aware answers.
Finally, we’ll show how we evaluate quality in production using LLM-based evaluations, including Judge LLMs for automatic assessment, dialog quality and more.
Topics To Be Covered
Design decisions behind Booking.com’s AI Trip Planner
How to balance speed, accuracy, and personalization in LLM products
Techniques for moderating user input and classifying intent at scale
How speculative decoding and Medusa-1 boost inference speed
Best practices for orchestrating LLMs, APIs, and retrieval agents
How to evaluate LLM quality using automated “Judge LLMs”
Perfect For
AI & ML Engineers
Product Managers
Data Scientists
Technical Leaders
Innovation Managers
Meet Your Speaker
Moran Beladev
Senior ML Manager, Booking.com
Moran is a Senior Machine Learning Manager at booking.com, researching and developing GenAI, NLP and CV models for the tourism domain.
Moran is a Ph.D candidate in information systems engineering at Ben Gurion University, researching NLP aspects in temporal graphs.
Previously worked as a Data Science Team Leader at Diagnostic Robotics, building ML solutions for the medical domain and NLP algorithms to extract clinical entities from medical visit summaries.
ADDITIONAL INFORMATION
Time & Place
Thu, Nov 26
14:00 - 14:45
Mövenpick Amsterdam City Centre
Matterhorn I
Limited to 45 participants.
Secure your seat – registration required.
Notes
Agenda for this session
20 min presentation + Audience Q&A


.png)