I am a Ph.D student in the Department of Computer Science and Engineering at the University of Michigan.
I am advised by Prof. Trevor Mudge in the Computer Engineering Lab.
Prior to that, I received my M.S.E degree in Computer Engineering from the University of Michigan
and B.S.E degree in Electrical Engineering from Zhejiang University, China.
My research interest lies in computer architecture, performance analysis, mobile systems and machine learning in general.
You can find my resume and CV here.

Research

● Architecture design and user experience in mobile systems

-- User quality-of-experience metrics for android applications
Identified a set of metrics that measures the user experience of Android applications
Implemented a framework in Android to automate workload execution and metrics collection
Tested a set of benchmarks on a smartphone-like development board with the proposed metrics
-- A study of mobile device utilization
Evaluated the CPU and GPU utilization of a wide range of common mobile applications on real hardware
Identified the diminishing return of increasing core counts and suggested a more flexible system

● Machine learning workload characterization and accelerator design

-- An ultra-low power non-uniform memory accelerator for wearable devices
Analyzed “always-on” applications in wearable devices, such as keyword spotting and motion detection
Developed the architecture, ISA, and compiler for the accelerator, which has been fabricated as a chip
Designed a framework that automatically generates optimal memory layout based on target applications
-- Accelerating deep learning algorithms on mobile platforms
Analyzed the characteristics of Deep Neural Network workloads on mobile / server GPUs
Participated in designing a framework which intelligently partitions workloads between mobile and server
Investigated co-execution of graphics and computation workloads on mobile GPUs

● Graph analytics processing accelerator

Designed an accelerator architecture for billion-edge scale graph applications
Led four grad students to characterize applications, explore algorithm and architecture choices

Publications

Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, Lingjia Tang. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2017.

Suyoung Bang, Jingcheng Wang, Ziyun Li, Cao Gao, Yejoong Kim, Qing Dong, Yen-Po Chen, et.al . A 288µW Programmable Deep Learning Processor with 270kB On-chip Weight Storage Using Non-uniform Memory Hierarchy for Mobile Intelligence. 2017 IEEE international Solid-State Circuits Conference (ISSCC), February 2017.

Qi Zheng, Cao Gao, Trevor Mudge, and Ronald G. Dreslinski. Leveraging Mobile GPUs for Flexible High-speed Wireless Communication. The 3rd International Workshop on Parallelism in Mobile Platforms (PRISM-3), June 2015.
[Paper: here][Talk: here]

Cao Gao, Anthony Gutierrez, Madhav Rajan, Ronald G. Dreslinski, Trevor Mudge, and Carole-Jean Wu. A Study of Mobile Device Utilization. 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2015.
[Paper: here][Talk: here]

Cao Gao, Anthony Gutierrez, Ronald G. Dreslinski, Trevor Mudge, Krisztian Flautner, and Geoffery Blake. A Study of Thread Level Parallelism on Mobile Devices. 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2014. [Poster and text]
[Paper: here]

Experience

  • Was a research intern at the mobile system group at ARM Austin, TX during summer 2014.
  • Mostly program in Python, C, C++, familiar with CUDA, Matlab, Verilog
  • Experienced with Linux and Anroid environments, using profiling tools, and caffe
  • Fluent in English, native Mandarin speaker