Date of Award
Doctor of Philosophy (PhD)
Electrical and Computer Engineering
Applications for Internet-enabled devices use machine learning to process captured data to make intelligent decisions or provide information to users. Typically, the computation to process the data is executed in cloud-based backends. The devices are used for sensing data, offloading it to the cloud, receiving responses and acting upon them. However, this approach leads to high end-to-end latency due to communication over the Internet. This dissertation proposes reducing this response time by minimizing offloading, and pushing computation close to the source of the data, i.e. to edge servers and devices themselves. To adapt to the resource constrained environment at the edge, it presents an approach that leverages spatiotemporal locality to push subparts of the model to the edge. This approach is embodied in a distributed caching framework, Cachier. Cachier is built upon a novel caching model for recognition, and is distributed across edge servers and devices. The analytical caching model for recognition provides a formulation for expected latency for recognition requests in Cachier. The formulation incorporates the effects of compute time and accuracy. It also incorporates network conditions, thus providing a method to compute expected response times under various conditions. This is utilized as a cost function by Cachier, at edge servers and devices. By analyzing requests at the edge server, Cachier caches relevant parts of the trained model at edge servers, which is used to respond to requests, minimizing the number of requests that go to the cloud. Then, Cachier uses context-aware prediction to prefetch parts of the trained model onto devices. The requests can then be processed on the devices, thus minimizing the number of offloaded requests. Finally, Cachier enables cooperation between nearby devices to allow exchanging prefetched data, reducing the dependence on remote servers even further. The efficacy of Cachier is evaluated by using it with an art recognition application. The application is driven using real world traces gathered at museums. By conducting a large-scale study with different control variables, we show that Cachier can lower latency, increase scalability and decrease infrastructure resource usage, while maintaining high accuracy.
Drolia, Utsav, "Adaptive Distributed Caching for Scalable Machine Learning Services" (2017). Dissertations. 1004.