Why Rigorous Training Data is the Key to Chatbot Success
Not so long ago, we would have been dazzled if a machine replied, "Hi," when we said its name. Nowadays, we get frustrated when we ask our 'personal assistants' to call mom, and they actually call Don.
As consumer expectations soar, the tech industry is working hard to keep up with the demand for a near perfect human-machine interaction. But despite all the advancements and the time and money invested, these interactions still leave us feeling frustrated. Why? Because the AI models behind them require highly specialized training data to work properly.
The best chatbots and personal assistants are powered by deep learning models that need vast amounts of data to be trained. However, it's currently estimated that 15-20% of data used is garbage and 80% of data scientists’ time is spent scrubbing and cleaning data. This means high-quality training data is often difficult to obtain, expensive and hard to scale to new markets and domains.
This talk will discuss the challenges around collecting and annotating multi-language and multimodal datasets for conversational AI. We’ll also look at the importance of eliminating bias and how a human-in-the-loop process can help ensure a fluent and – as near as possible – flawless chatbot experience.
Dr. Rui Correia
Machine Learning Engineer, DefinedCrowd