Prof. Shen

joined North Carolina State University in August 2014 as a Chancellor’s Faculty Excellence Program cluster hire in Data-Driven Science. He is a receipt of the DOE Early Career Award, NSF CAREER Award, Google Faculty Research Award, and IBM CAS Faculty Fellow Award. He is an ACM Distinguished Member, ACM Distinguished Speaker, and a senior member of IEEE. He was honored with University Faculty Scholars Award "as an emerging academic leader who turns research into solutions to society’s most pressing issues".

His Research

lies in the broad fields of Programming Systems and Machine Learning, with an emphasis on enabling extreme-scale data-intensive computing and intelligent computing through innovations in compilers,runtime systems, and Machine Learning algorithms. His current research focuses on Heterogeneous Massively Parallel Computing, High Performance Machine Learning, and High-Level Large-Scale Program Optimizations. He leads the PICTure research group. He is part of the NCSU Systems Laboratory.

Featured Articles (Full Publication List)

Inter-Disciplinary Research Challenges in Computer Systems for the 2020s

It is now an exciting time in computer systems research. New technologies such as machine learning and the Internet of Things (IoT) are rapidly enabling new capabilities that were only dreamed of a few years ago. At the same time, technology discontinuities such as the end of Moore’s Law and the inescapable Energy Wall combine with new challenges in security and privacy, and the rise of Artificial Intelligence (AI). Against this backdrop, an NSF-sponsored community visioning workshop convened about 150 researchers of multiple computer systems areas during ASPLOS'2018. The goal was to outline a few high-priority areas where inter-disciplinary research is likely to have a high payoff in the next 10 years. This report summarizes the workshop’s findings. (ACM DL link here.)

Boosting TensorFlow with Better Communications for Supercomputers (SC'18)

We investigate a series of designs to improve pipeline flexibility and adaptivity of Deep Learning frameworks. We implement our designs using Tensorflow with Horovod, and test it on training an ensemble of large DNNs on the Titan supercomputer at Oak Ridge National Lab. Our results show that with the new flexible communication schemes, the CPU time spent during training is reduced by 2-11X, training efficiency improves by up to 10X, and power consumption is reduced by 5-16%. See our SC'18 paper for more...

Document Analytics directly on Compressed Data (VLDB'18, ICS'18)

We propose the first known solution to enable direct document analytics on compressed data, which save 90.8% storage space and 77.5% memory usage, while halving the analytics time. It employs a hierarchical compression algorithm to convert analytics problems into graph traversal problems. The article presents a set of guidelines and assistant software modules for developers to effectively apply compression-based direct processing. See our VLDB'18 and ICS'18 papers for more.

Deep Learning for Tackling Key HPC Problems (PPOPP'18, IPDPS'18)

We give a systematic exploration on deep learning for selecting the best sparse matrix format to maximize the performance of Sparse Matrix Vector Multiplication (SpMV). SpMV is one of the most critical kernels in High Performance Computing. We investigate how to effectively bridge the gap between deep learning and the special needs of the pillar HPC problem. The new solution cuts format selection errors by two thirds, and improves SpMV performance by 1.73× on average over the state of the art. We further propose the first known overhead-conscious design to effectively mitigate the overhead of machine learning models when they work for SpMV. See our PPOPP'18 and IPDPS'18 papers for more.

VersaPipe: Making Pipeline Computations on GPU Easier and Faster (Micro'17)

By inventing a set of novel execution models, we build a software programming framework that largely simplifies programmers' job in developing an efficient pipeline programs on 1000s-core Graphic Processing Units (GPU). The framework, named VersaPipe, automatically adapts the software pipeline implementation to best meet the needs of a pipeline program, producing up to 6.90X (2.88X on average) speedups over manual implementations of face recognition and other applications. See our Micro'17 paper for more...

Current Research Areas

High-Performance Machine Learning & Data Analytics

Meets the relentless demands of Machine Learning and Data Analytics for efficiency, responsiveness, quality, and scalability in all kinds of settings through innovations in algorithms, infrastructures, and implementations. More ...

Heterogeneous Massively Parallel Computing

Bridges the gap between productivity needs of programmers and the extreme power and complexity of modern heterogeneous massively parallel computing devices (e.g., GPU) through innovations in programming systems.More ...

Foundations of Programming Systems & Languages

Tackles fundamental challenges that prevent modern software from tapping into the full potential of computing hardware by advancing compilers, runtime, and programming language implementations in general. More ...

Research Group: PICTure

Ph.Ds with their placement upon graduation

  • Yue Zhao (2018, Facebook)
  • Yufei Ding (2017, Assist. Prof @ UC Santa Barbara)
  • Guoyang Chen (2016, Qualcomm)
  • Zhijia Zhao (2015, Assist. Prof @ UC Riverside)
  • Mingzhou Zhou (2015, IBM)
  • Bo Wu (2014, Assist. Prof @ Colorado School of Mines)
  • Zheng (Eddy) Zhang (2012, Assist. Prof @ Rutgers University)
  • Kai (Kelvin) Tian (2012, Microsoft)
  • Yunlian Jiang (2011, Google)


Recent Professional Activities