School of Electrical Engineering and Computer Science (EECS) researchers Harsh Sharma, Jana Doppa and Partha Pande recently received a best paper award at the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES).
The conference is a premier forum for researchers, developers and practitioners in the area of compilers and architectures for high-performance, low-power embedded systems, according to the CASES website. The conference features leading research in embedded processor, memory, interconnect, storage architectures and related compiler techniques targeting performance, power, predictability, security, and reliability issues.
The WSU team received the award for their paper, SWAP: A Server-scale Communication-aware Chiplet-based Manycore PIM Accelerator. Sharma, a Ph.D. student in the School of EECS, was the lead author. In addition to WSU, the research team also included Sumit Mandal and Umit Ogras, with the Department of Electrical and Computer Engineering at the University of Wisconsin, Madison.
Abstract:
Processing-in-memory (PIM) is a promising technique to accelerate Deep Learning (DL) workloads. Emerging DL workloads (e.g., ResNet with 152 layers) consist of millions of parameters, which increase the area and fabrication cost of monolithic PIM accelerators. The fabrication cost challenge can be addressed by 2.5D systems integrating multiple PIM chiplets connected through a network-on-package (NoP). However, server-scale scenarios simultaneously execute multiple compute-heavy DL workloads, leading to significant inter-chiplet data volume. State-of-the-art NoP architectures proposed in the literature do not consider the nature of DL workloads.
In this paper, we propose a novel server scale 2.5D manycore architecture called SWAP that accounts for the traffic characteristics of DL applications. Comprehensive experimental evaluations with different system sizes as well as diverse emerging DL workloads demonstrate that SWAP achieves significant performance and energy consumption improvements with much lower fabrication cost than state-of-the-art NoP topologies.