Crowd-AI Camera Sensing in the Real World

Smart appliances with built-in cameras, such as the Nest Cam and Amazon Echo Look, are becoming pervasive. They hold the promise of bringing high fidelity, contextually rich sensing into our homes, workplaces and other environments. Despite recent and impressive advances, computer vision systems are still limited in the types of sensing questions they can answer, and more importantly, do not easily generalize across diverse human environments. In response, researchers have investigated hybrid crowd- and AI-powered methods that collect human labels to bootstrap automatic processes. However, deployments have been small and mostly confined to institutional settings, leaving open questions about the scalability and generality of the approach. In this work, we describe our iterative development of Zensors++, a full-stack crowd-AI camera-based sensing system that moves significantly beyond prior work in terms of scale, question diversity, accuracy, latency, and economic feasibility. We deployed Zensors++ in the wild, with real users, over many months and environments, generating 1.6 million answers for nearly 200 questions created by our participants, costing roughly 6/10ths of a cent per answer delivered. We share lessons learned, insights gleaned, and implications for future crowd-AI vision systems.



Guo, A., Jain, A., Ghose, S., Laput, G., Harrison, C. and Bigham, J. 2018. Crowd-AI Camera Sensing in the Real World. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. UBICOMP ’18. 2, 3, Article 111 (September 2018). ACM, New York, NY. 20 pages. DOI:

© Chris Harrison