top of page

The Age of Personal Assistants: More Machine Learning, Less Hand-Crafting!

Updated: Jul 10, 2018

Personal assistants such as Alexa and Siri are everywhere, but have trouble handling any tasks more complex than setting an alarm or playing songs on Spotify. PolyAI is a new London-based startup with a next generation platform for building voice-based agents. We use machine learning to handle complex tasks across different application domains, in a wide array of world languages.


Author: Nikola Mrkšić, CEO & Co-founder at PolyAI


The first phone call was placed on March 10, 1876. Graham Bell shouted at his assistant: “Mr Watson, come here — I want to see you!” This was a game changer. For the first time in history, we had real-time remote communication, and with it, the ability to remotely automate all kinds of services. In his case, Mr Bell could ask his assistant to run errands without ever seeing him in person. Of course, Mr Bell could have sent a letter, telegram or a pigeon, but this would not have been as fast or convenient as picking up his phone — it wouldn’t have been real time!

The first real-time remote service automation tool


Over time, one could dial an increasing number of businesses and procure services remotely and in real time. However, there was no easy way to discover these services. The solution came with the Yellow Pages: large print directories listing local businesses and their phone numbers.


After a long wait, the World Wide Web moved the service directory away from dusty Yellow Pages into sleek and minimalist search engines. Users could read the latest news, order books, or even check the balance of their bank account without waiting for the phone operator on the other end of the line. However, websites forced users into structured interaction with backend systems, forcing companies to spend millions on user interface design to hook users to their service. Since there were no uniform design standards, users had to adapt to a different interface for each service, instead of using their voice alone.


Yellow Pages: the first step towards wide-scale service discoverability.


Next stop, smartphones. With their limited screen real estate, smartphones posed new challenges for software and interface developers. Apple, in turn, set an amazing example for how to create extremely simple and intuitive mobile interfaces. These principles proliferated through the forthcoming AppStore. Despite the plethora of apps available, users are left to their own devices when choosing which app to download, and this is not always straightforward. In fact, the majority of US smartphone owners install zero new apps each month.


Following on from smartphones, operating system vendors now offer us Virtual Personal Assistants. Recent advances in machine learning have led to huge improvements in speech recognition, allowing companies such as Google and Amazon to bring voice-powered personal assistants to every home, phone, watch or any other piece of hardware fitted with a microphone. Instead of adapting to interfaces of third-party apps, assistants abstract them away, allowing users to access a plethora of services using their voice alone.


Personal Assistants like Siri, Alexa and Google Assistant aspire to be the de facto entry point for most actions that users might want to perform on their smartphones, smart homes, and other assistant enabled devices. Rather than forcing users to choose the right app, personal assistants provide a natural conduit for accessing third-party services. Since they are voice-based, they allow users to bypass graphical user interfaces altogether. This is especially intuitive for younger generations who have grown up accustomed to smartphones and other connected devices.


From search engines to mobile OS platforms, the contest between big tech companies has revolved around the control of platforms because they are the central point of access to billions of customers. For these revenue streams to come to life, personal assistant platforms have to connect to third-party services. If not, the assistants are empty-headed, like a Google without search results or an iPhone without its AppStore. In fact, Amazon’s Echo in many ways resembles the iPhone circa 2007. The first iPhone’s touchscreen took the world by storm, and similarly, consumers can’t get enough of the Echo, Amazon’s top selling item over last Christmas.


Platform providers want third-party apps for their personal assistants. Alexa Skills, Actions on Google, Azure Bots… all the giants are trying to make their platform the best ecosystem for building your voice-powered application. However, building conversational apps that people want to use is proving very difficult. Despite tens of thousands of deployed Alexa Skills, 62% of them