Prof. Shen

joined North Carolina State University in August 2014 as a Chancellor’s Faculty Excellence Program cluster hire in Data-Driven Science. He is a receipt of the DOE Early Career Award, NSF CAREER Award, Google Faculty Research Award, and IBM CAS Faculty Fellow Award. He is an ACM Distinguished Member, ACM Distinguished Speaker, and a senior member of IEEE. He was honored with University Faculty Scholars Award "as an emerging academic leader who turns research into solutions to society’s most pressing issues".

His Research

lies in the broad fields of Programming Systems and Machine Learning, with an emphasis on enabling extreme-scale data-intensive computing and intelligent computing through innovations in compilers,runtime systems, and Machine Learning algorithms. His current research focuses on Heterogeneous Massively Parallel Computing, High Performance Machine Learning, and High-Level Large-Scale Program Optimizations. He leads the PICTure research group. He is part of the NCSU Systems Laboratory.

Featured Articles (Full Publication List)

Wootz: A Compiler-Based Framework for Fast CNN Pruning via Composability (PLDI'19)

Convolution Neural Networks (CNN) pruning is an important method to adapt a large CNN model attained on general datasets to a more specialized task or to fit a device with stricter space or power constraints. Finding the best pruned network is time-consuming. This work tackles the problem by creating a compiler-based framework named Wootz, which for the first time enables composability-based CNN pruning. Wootz shortens the state-of-art pruning process by up to 117.9X while producing significantly better pruning results. See our PLDI'2019 paper for more...

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations (ASPLOS'2019)

Pipeline is an important parallel computing model for applications running on CPU-GPU systems. Efficient communications among pipeline stages are key for performance. This work presents HiWayLib, a novel high performance communincation support that produces 1.22-2.13X speedups for real-world applications. See our ASPLOS'19 paper for more...

Boosting TensorFlow with Better Communications for Supercomputers (SC'18)

We investigate a series of designs to improve pipeline flexibility and adaptivity of Deep Learning frameworks. We implement our designs using Tensorflow with Horovod, and test it on training an ensemble of large DNNs on the Titan supercomputer at Oak Ridge National Lab. Our results show that with the new flexible communication schemes, the CPU time spent during training is reduced by 2-11X, training efficiency improves by up to 10X, and power consumption is reduced by 5-16%. See our SC'18 paper for more...

Document Analytics directly on Compressed Data (VLDB'18, ICS'18)

We propose the first known solution to enable direct document analytics on compressed data, which save 90.8% storage space and 77.5% memory usage, while halving the analytics time. It employs a hierarchical compression algorithm to convert analytics problems into graph traversal problems. The article presents a set of guidelines and assistant software modules for developers to effectively apply compression-based direct processing. See our VLDB'18 and ICS'18 papers for more.

Inter-Disciplinary Research Challenges in Computer Systems for the 2020s

It is now an exciting time in computer systems research. New technologies such as machine learning and the Internet of Things (IoT) are rapidly enabling new capabilities that were only dreamed of a few years ago. At the same time, technology discontinuities such as the end of Moore’s Law and the inescapable Energy Wall combine with new challenges in security and privacy, and the rise of Artificial Intelligence (AI). Against this backdrop, an NSF-sponsored community visioning workshop convened about 150 researchers of multiple computer systems areas during ASPLOS'2018. The goal was to outline a few high-priority areas where inter-disciplinary research is likely to have a high payoff in the next 10 years. This report summarizes the workshop’s findings. (ACM DL link here.)

Current Research Areas

High-Performance Machine Learning & Data Analytics

Meets the relentless demands of Machine Learning and Data Analytics for efficiency, responsiveness, quality, and scalability in all kinds of settings through innovations in algorithms, infrastructures, and implementations. More ...

Heterogeneous Massively Parallel Computing

Bridges the gap between productivity needs of programmers and the extreme power and complexity of modern heterogeneous massively parallel computing devices (e.g., GPU) through innovations in programming systems.More ...

Foundations of Programming Systems & Languages

Tackles fundamental challenges that prevent modern software from tapping into the full potential of computing hardware by advancing compilers, runtime, and programming language implementations in general. More ...

Research Group: PICTure

Ph.Ds with their placement upon graduation

  • Yue Zhao (2018, Facebook)
  • Yufei Ding (2017, Assist. Prof @ UC Santa Barbara)
  • Guoyang Chen (2016, Qualcomm)
  • Zhijia Zhao (2015, Assist. Prof @ UC Riverside)
  • Mingzhou Zhou (2015, IBM)
  • Bo Wu (2014, Assist. Prof @ Colorado School of Mines)
  • Zheng (Eddy) Zhang (2012, Assist. Prof @ Rutgers University)
  • Kai (Kelvin) Tian (2012, Microsoft)
  • Yunlian Jiang (2011, Google)


Recent Professional Activities