Multimedia Learning: Speech to Text to Language
To understand language is to understand how human beings experience the world around them, so its no wonder NLP is one of the most difficult problems in AI. If thats not difficult enough, it seems almost impossible to incorporate other domains like images, videos or voice into the mix and generate human-like interactions. However, learning from different modalities simultaneously is not only possible for human beings, it actually makes language easier to learn. In this talk well examine these difficulties and propose a different approach for multi-media learning inspired by how we humans learn.