Ph.D. Student
Paul G. Allen School of Computer Science and Engineering
University of Washington
cyulin [at]


I’m a Ph.D. student in the Paul G. Allen School of Computer Science and Engineering at University of Washington. I'm broadly interested in efficient machine learning and particularly passionate about acceleration. I like to tackle the inefficiency from a vertical perspective, up from algorithms, middle to system software, and down to hardware architecture. I have done works in sparse CNNs, graph neural networks, neural rendering, and currently focus on LLMs. I’m fortunate to be advised by Prof. Luis Ceze.

Prior to UW, I obtained my Bachelor and Master degree from Department of Electronics Engineering, National Chiao-Tung University. I also hold a minor in Computer Science for my undergraduate. During my Master, I was fortunate to work with Prof. Bo-Cheng Lai.

Besides doing research, I enjoy tennis, hiking and skiing.


NanoFlow: Towards Optimal Large Language Model Serving Throughput
Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Yufei Gao, Qinyu Xu, Tian Tang, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci, Preprint, ArXiv 2024 [Paper] [Code]

Palu: Compressing KV-Cache with Low-Rank Projection
Chi-Chih Chang*, Wei-Cheng Lin*, Chien-Yu Lin*, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu, Preprint, ArXiv 2024 (*equal contribution). [Paper] [Code]

Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks
Bo-Ru Lu, Nikita Haduong, Chien-Yu Lin, Hao Cheng, Noah A. Smith, Mari Ostendorf, Preprint, ArXiv 2024. [Paper]

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci, in Conference on Machine Learning and Systems (MLSys) 2024 (acceptance rate 22% ). [Paper] [Code]

FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline
Chien-Yu Lin, Qichen Fu, Thomas Merth, Karren Yang, Anurag Ranjan, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, Oral (Top 2.6%) .

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Chien-Yu Lin*, Anish Prabhu*, Thomas Merth, Sachin Mehta, Anurag Ranjan, Maxwell Horton and Mohammad Rastegari, in the 17th European Conference on Computer Vision (ECCV), 2022(*equal contribution). [Paper] [Code] [Video]

Accelerating Spmm Kernel with Cache-First Edge Sampling for Graph Neural Networks
Chien-Yu Lin, Liang Luo, and Luis Ceze, Preprint, 2021. [Code]

Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks
Bo-Cheng Lai, Jyun-Wei Pan and Chien-Yu Lin, in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019. [Paper]

Supporting compressed-sparse activations and weights on SIMD-like accelerator for sparse convolutional neural networks
Chien-Yu Lin and Bo-Cheng Lai, in the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), 2018. [Paper] [Slides]