With the emergence of powerful voice recognition systems like Alexa, Siri, and co., people are becoming fond of using voice activation instead of physical input. Together with the rapid growth of the Internet of Things (IoT), an increasing number of embedded devices are being deployed in various settings that could benefit from voice activation. However, embedded devices are often not as powerful and only have a fraction of memory and computation performance available to them. As a first step, we want to provide voice authentication that could be used to detect people's presence in a room (e.g., to easily clock in and out of work), or that could eliminate the need for bulky keyboard or pin code inputs. Instead of these input components, only a small and cheap microphone would need to be added.
Our goal is to bring voice authentication to deeply embedded device with little resources, and to investigate the aforementioned use cases. To do this, we need to understand the key problem. Voice authentication requires machine learning (ML), which in turn typically requires powerful computers and a lot of memory. Therefore, a vital step is to understand and balance the requirements for voice authentication and the restrictions of deeply embedded devices.