With the emergence of powerful speaker recognition systems like Alexa and Siri, people are becoming fond of using voice activation instead of physical input. Together with the rapid growth of the Internet of Things (IoT), an increasing number of embedded devices are being deployed in various settings that could benefit from voice activation. However, embedded IoT devices are often not as powerful as Alexa or Siri, and only have a fraction of their memory and processing power.
Our goal is to let constrained embedded IoT devices (i.e., deeply embedded devices with limited memory and computational resources) perform speaker recognition autonomously: this could be used to detect people's presence in a room (e.g., to easily clock in and out of work), or to eliminate the need for bulky keyboard or pin code inputs. Instead, only a small and cheap microphone needs to be present on the IoT device. Towards this goal, we need to understand how state-of-the-art speaker recognition systems (who are commonly based on ML techniques) can be shrunk to t the constraints of IoT devices and smart objects, and evaluate whether the shrunk models can sustain a recognition accuracy that is sufficient to build real-world applications.