(Back to Session Schedule)

The 27th Asia and South Pacific Design Automation Conference

Session 7C  Low-Energy Edge AI Computing
Time: 10:00 - 10:35, Thursday, January 20, 2022
Location: Room C
Chairs: Bing Li (Capital Normal University, China), Yaojun Zhang (Pimchip Technology Co., China)

7C-1
TitleEfficient Computer Vision on Edge Devices with Pipeline-Parallel Hierarchical Neural Networks
Author*Abhinav Goel, Caleb Tung, Xiao Hu (Purdue University, USA), George K. Thiruvathukal (Loyola University Chicago, USA), James C. Davis, Yung-Hsiang Lu (Purdue University, USA)
Pagepp. 532 - 537
KeywordNeural network, low-power, computer vision, parallel computing, edge devices
AbstractComputer vision on low-power edge devices enables applications including search-and-rescue and security. State-of-the-art computer vision algorithms, such as Deep Neural Networks (DNNs), are too large for inference on low-power edge devices. To improve efficiency, some existing approaches parallelize DNN inference across multiple edge devices. However, these techniques introduce significant communication and synchronization overheads or are unable to balance workloads across devices. This paper demonstrates that the hierarchical DNN architecture is well suited for parallel processing on multiple edge devices. We design a novel method that creates a parallel inference pipeline for computer vision problems that use hierarchical DNNs. The method balances loads across the collaborating devices and reduces communication costs to facilitate the processing of multiple video frames simultaneously with higher throughput. Our experiments consider a representative computer vision problem where image recognition is performed on each video frame, running on multiple Raspberry Pi 4Bs. With four collaborating low-power edge devices, our approach achieves 3.21× higher throughput, 68% less energy consumption per device per frame, and 58% decrease in memory when compared with existing single-device hierarchical DNNs.

7C-2
TitleEfficient On-Device Incremental Learning by Weight Freezing
Author*Ze-Han Wang, Zhenli He, Hui Fang, Yi-Xiong Huang, Ying Sun, Yu Yang, Zhi-Yuan Zhang, Di Liu (Yunnan University, China)
Pagepp. 538 - 543
Keyworddeep learning, incremental learning
AbstractOn-device learning has become a new trend for edge intelligence systems. In this paper, we investigate the on-device incremental learning problem, which targets to learn new classes on top of a well-trained model on the device. Incremental learning is known to suffer from catastrophic forgetting, i.e., a model learns new classes at the cost of forgetting the old classes. Inspired by model pruning techniques, we propose a new on-device incremental learning method based on the weight freezing. The weight freezing in our framework plays two roles: 1) preserving the knowledge of the old classes; 2) boosting the training procedure. By means of weight freezing, we build up an efficient incremental learning framework which combines knowledge distillation to fine-tune the new model. We conduct extensive experiments on CIFAR100 and compare our method with two existing methods. The experimental results show that our method can achieve higher accuracy after incrementally learning new classes.

7C-3
TitleEdgenAI: Distributed Inference with Local Edge Devices and Minimum Latency
AuthorMaedeh Hemmat, *Azadeh Davoodi, Yu Hen Hu (University of Wisconsin-Madison, USA)
Pagepp. 544 - 549
KeywordDistributed inference, Deep neural networks
AbstractWe propose EdgenAI, a framework to decompose a complex deep neural networks (DNN) over n available local edge devices with minimal communication overhead and overall latency. Our framework creates small DNNs (SNNs) from an original DNN by partitioning its classes across the edge devices, while taking into account their available resources. Class-aware pruning is then applied to aggressively reduce the size of the SNN mapped to each edge device. The SNNs perform inference in parallel, and are additionally configured to generate a ‘Don’t Know’ response when identifying an unassigned class. Our experiments show up to 17X speedup compared to a recent work, on devices with at most 100MB memory when distributing a variant of VGG-16 over 20 parallel edge devices, without much loss in accuracy.

7C-4
TitleLarge Forests and Where to “Partially” Fit Them
Author*Andrea Damiani, Emanuele Del Sozzo, Marco D. Santambrogio (Politecnico di Milano, Italy)
Pagepp. 550 - 555
KeywordDecision Trees, Random Forests, Field-Programmable Gate Arrays, Partial Dynamic Reconfiguration
AbstractThe Artificial Intelligence of Things (AIoT) calls for on-site Machine Learning inference to overcome the instability in latency and availability of networks. Thus, hardware acceleration is paramount for reaching the Cloud’s modeling performance within an embedded device’s resources. In this paper, we propose Entree, the first automatic design flow for deploying the inference of Decision Tree (DT) ensembles over Field-Programmable Gate Arrays (FPGAs) at the network’s edge. It exploits dynamic partial reconfiguration on modern FPGA-enabled Systems-on-a-Chip (SoCs) to accelerate arbitrarily large DT ensembles data latency a hundred times stabler than software alternatives. Plus, given Entree’s suitability for both hardware designers and non-hardware-savvy developers, we believe it has the potential of helping data scientists to develop a non-Cloud-centric AIoT.