joined North Carolina State University in August 2014 as a Chancellor’s Faculty Excellence Program cluster hire in Data-Driven Science. He is a receipt of the DOE Early Career Award, NSF CAREER Award, Google Faculty Research Award, and IBM CAS Faculty Fellow Award. He is an ACM Distinguished Member, ACM Distinguished Speaker, and a senior member of IEEE. He was honored with University Faculty Scholars Award "as an emerging academic leader who turns research into solutions to society’s most pressing issues".
lies in the broad fields of Programming Systems and Machine Learning, with an emphasis on enabling extreme-scale data-intensive computing and intelligent computing through innovations in compilers,runtime systems, and Machine Learning algorithms. His current research focuses on Heterogeneous Massively Parallel Computing, High Performance Machine Learning, and High-Level Large-Scale Program Optimizations. He leads the PICTure research group. He is part of the NCSU Systems Laboratory.
Convolution Neural Networks (CNN) pruning is an important method to adapt a large CNN model attained on general datasets to a more specialized task or to fit a device with stricter space or power constraints. Finding the best pruned network is time-consuming. This work tackles the problem by creating a compiler-based framework named Wootz, which for the first time enables composability-based CNN pruning. Wootz shortens the state-of-art pruning process by up to 117.9X while producing significantly better pruning results. Details will appear in our PLDI'2019 paper.
Pipeline is an important parallel computing model for applications running on CPU-GPU systems. Efficient communications among pipeline stages are key for performance. This work presents HiWayLib, a novel high performance communincation support that produces 1.22-2.13X speedups for real-world applications. See our ASPLOS'19 paper for more...
We investigate a series of designs to improve pipeline flexibility and adaptivity of Deep Learning frameworks. We implement our designs using Tensorflow with Horovod, and test it on training an ensemble of large DNNs on the Titan supercomputer at Oak Ridge National Lab. Our results show that with the new flexible communication schemes, the CPU time spent during training is reduced by 2-11X, training efficiency improves by up to 10X, and power consumption is reduced by 5-16%. See our SC'18 paper for more...
We propose the first known solution to enable direct document analytics on compressed data, which save 90.8% storage space and 77.5% memory usage, while halving the analytics time. It employs a hierarchical compression algorithm to convert analytics problems into graph traversal problems. The article presents a set of guidelines and assistant software modules for developers to effectively apply compression-based direct processing. See our VLDB'18 and ICS'18 papers for more.
It is now an exciting time in computer systems research. New technologies such as machine learning and the Internet of Things (IoT) are rapidly enabling new capabilities that were only dreamed of a few years ago. At the same time, technology discontinuities such as the end of Moore’s Law and the inescapable Energy Wall combine with new challenges in security and privacy, and the rise of Artificial Intelligence (AI). Against this backdrop, an NSF-sponsored community visioning workshop convened about 150 researchers of multiple computer systems areas during ASPLOS'2018. The goal was to outline a few high-priority areas where inter-disciplinary research is likely to have a high payoff in the next 10 years. This report summarizes the workshop’s findings. (ACM DL link here.)