Building the Future of AI Trip Planning: LLMs, Inference Optimization, and Agentic Designs at Booking.com

Abstract

In this practical talk, we share how Booking.com built its AI Trip Planner - an LLM-powered experience that personalizes travel planning at scale. We’ll walk through real-world design decisions, technical challenges, and infrastructure optimizations involved in delivering real-time hotel and destination recommendations using large language models (LLMs).
We’ll cover key challenges like moderating user input, classifying intent, structuring dialogues, and generating grounded responses. Through prompt engineering and custom model development, we tailored LLM interactions to our product needs while ensuring speed and relevance.
To address inference latency, we implemented speculative decoding and integrated Medusa-1, a novel architecture that predicts multiple tokens in parallel, achieving a 1.8x speedup with no loss in quality. We’ll detail its design and training trade-offs.
Beyond acceleration, we’ll highlight our move toward agentic AI systems - modular components that orchestrate LLMs, retrieval services, and Booking.comAPIs to solve complex travel queries. For example: A Question-Answering Agent that fuses LLMs, real-time data, and APIs for context-aware answers.

Finally, we’ll show how we evaluate quality in production using LLM-based evaluations, including Judge LLMs for automatic assessment, dialog quality and more.

Topics To Be Covered

Design decisions behind Booking.com’s AI Trip Planner
How to balance speed, accuracy, and personalization in LLM products
Techniques for moderating user input and classifying intent at scale
How speculative decoding and Medusa-1 boost inference speed
Best practices for orchestrating LLMs, APIs, and retrieval agents
How to evaluate LLM quality using automated “Judge LLMs”

Perfect For

AI & ML Engineers
Product Managers
Data Scientists
Technical Leaders
Innovation Managers

Meet Your Speaker

Moran Beladev

Senior ML Manager, Booking.com

Moran is a Senior Machine Learning Manager at booking.com, researching and developing GenAI, NLP and CV models for the tourism domain.

Moran is a Ph.D candidate in information systems engineering at Ben Gurion University, researching NLP aspects in temporal graphs.

Previously worked as a Data Science Team Leader at Diagnostic Robotics, building ML solutions for the medical domain and NLP algorithms to extract clinical entities from medical visit summaries.

ADDITIONAL INFORMATION

Time & Place

Thu, Nov 26

14:00 - 14:45

Mövenpick Amsterdam City Centre

Matterhorn I

Limited to 45 participants.

Secure your seat – registration required.

Notes

Agenda for this session

20 min presentation + Audience Q&A

REGISTRATION

In order to register to this session you must hold a Pro or Max Pass.

Limited Seating Still Guaranteed