Automatic Class Discovery and One-Shot Interaction for Activity Recognition

Smart devices are becoming more prevalent in peoples’ living environments, accelerating the vision of ubiquitous computing and the Internet-of-Things (IoT). However, these devices still lack contextual sensing capabilities— they have minimal understanding of what is happening around them, therefore limiting their potential to enable truly assistive computational experiences. In response, acoustic activity recognition has emerged as a practical modality for contextual sensing. This is chiefly due to the prevalence of microphones and their robustness to occlusion, as well as the availability of hardware that can process high-fidelity acoustic information on-device (i.e., at the edge).

A key challenge for acoustic activity recognition is building classifiers that can recognize highly localized events with minimal user intervention or in situ training. To train such classifiers, two predominant approaches have been proposed, with particular accuracy and user burden implications. First is to train a system manually, after it is deployed, most often by demonstrating different activities and having a user provide class labels. Because data is collected in-situ, accuracy tends to be quite high. However, the burden to the user is also high. The other approach is to provide users with pre-trained general classifiers that work “out of the box.” This technique is achieved by training a classifier on a large, preexisting corpus of acoustic data. However, because the classifier has no data for a user’s specific environment, it tends to be less accurate, but the burden to the user is very low.

We present Listen Learner, a middle-ground approach that seeks to provide high classification accuracy, while minimizing user burden. Our approach requires no up-front data, and instead, learns acoustic events over time, requiring no tedious manual demonstration. Instead, our system learns events in situ, and thus it is highly tuned to its local environment and events of interest, offering superior accuracy than pre-trained classifiers. Our study results show that Listen Learner can accurately and automatically learn acoustic events across environments (e.g., 97% precision, 87% recall), while adhering to users’ preferences for non-intrusive interactive behavior.



Wu, J., Harrison, C., Bigham, J. and Laput, G. 2020. Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition. In Proceedings of the 38th Annual SIGCHI Conference on Human Factors in Computing Systems. CHI '20. ACM, New York, NY.

© Chris Harrison