Not so long ago, we would have been dazzled if a machine replied, "Hi," when we said its name. Nowadays, we get frustrated when we ask our 'personal assistants' to call mom, and they actually call Don.
As consumer expectations soar, the tech industry is working hard to keep up with the demand for a near perfect human-machine interaction. But despite all the advancements and the time and money invested, these interactions still leave us feeling frustrated. Why? Because the AI models behind them require highly specialized training data to work properly.
The best chatbots and personal assistants are powered by deep learning models that need vast amounts of data to be trained. However, it's currently estimated that 15-20% of data used is garbage and 80% of data scientists’ time is spent scrubbing and cleaning data. This means high-quality training data is often difficult to obtain, expensive and hard to scale to new markets and domains.
This talk will discuss the challenges around collecting and annotating multi-language and multimodal datasets for conversational AI. We’ll also look at the importance of eliminating bias and how a human-in-the-loop process can help ensure a fluent and – as near as possible – flawless chatbot experience.
Why Rigorous Training Data is the Key to Chatbot Success
Dr. Rui Correia is Lead Machine Learning Engineer at DefinedCrowd. Dr. Rui Correia has a PhD in Language Technologies through the CMU|Portugal program (2018) under the topic “Automatic Classification of Metadiscourse”. His areas of interest include Computer-Assisted Language Learning, Crowdsourcing, NLP, and more recently, the interaction between them. He has more than 10 years of experience within the NLP field and 8 years of crowdsourcing experience. Rui has been actively working in these research areas with more than 10 scientific papers already published.