Deep Learning Systems PostDoc Research Associate
Allen School of Computer Science, SAMPL Research Group
I am a Post-Doctoral Research Associate at the University of Washington in Computer Science & Engineering. My research focuses on hardware optimizations and cross-stack optimizations for deep learning systems.
I am advised by Luis Ceze in the SAMPL Deep Learning Systems research group. I received my B.A.Sc. in Electrical and Computer Engineering from the University of Toronto in 2012, and my M.S. and Ph.D. in Computer Science and Engineering from the University of Washington in 2015, and 2018 respectively.
I am the technical lead behind VTA, an open-sourced deep learning acceleration hardware/software stack. I am also involved as a PMC member to Apache TVM, the open end-to-end compiler for deep learning systems.
In 2018 I co-organized with Grigori Fursin the first ACM ReQuEST tournament on reproducible and Pareto-Efficient deep learning systems research co-located at the ASPLOS 2018 conference.
As part of my Master's degree, I developed SNNAP, a co-processor prototype that can improve the energy efficiency of programs by approximating regions of code with neural networks. This work was presented at HPCA 2015.
In the spring of 2017, we introduced students taking graduate-level computer architecture to hardware-software co-design for machine learning. This lab is open sourced on GitHub and is an approachable introduction to FPGA acceleration for machine learning. I have summarized the results from the open-ended Pareto-optimality competition here.
As a follow up to that class Luis and I co-taught a graduate level class in Spring of 2018 on hardware/software co-optimization for Machine Learning.
Learning to Optimize Tensor Programs. Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. In NeruIPS 2018.
Towards Reproducible and Reusable Deep Learning Systems Research Artifact Evaluation. Thierry Moreau, Anton Lokhmotov, Grigori Fursin. In MLOSS 2018 (co-located with NeurIPS).
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. In OSDI 2018.
MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators. Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze and Visvesh Sathe. In DATE 2018 (application track best paper!).
TVM: End-to-End Compilation Stack for Deep Learning. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. In SysML Conference 2018 (one of six contributed talks!).
Exploring Quality-Energy Tradeoffs with Arbitrary Quantization. Thierry Moreau, Felipe Augusto, Patrick Howe, Armin Alaghi, Luis Ceze. In CODES+ISSS 2017 (special session). [slides]
Exploring Computation-Communication Tradeoffs in Camera Systems. Amrita Mazumdar, Thierry Moreau, Sung Kim, Meghan Cowan, Armin Alaghi, Luis Ceze, Mark Oskin, Visvesh Sathe. In IISWC 2017.
Approximating to the Last Bit. Thierry Moreau, Adrian Sampson, Luis Ceze, and Mark Oskin. In WAX 2016 (co-located with ASPLOS). [slides]
Compilation and Hardware Support for Approximate Acceleration. Thierry Moreau, Adrian Sampson, Andre Baixo, Mark Wyse, Ben Ransford, Jacob Nelson, Luis Ceze, and Mark Oskin. In TECHCON 2015. [slides]
REACT: A Framework for Rapid Exploration of Approximate Computing Techniques. Mark Wyse, Andre Baixo, Thierry Moreau, Bill Zorn, James Bornholt, Adrian Sampson, Luis Ceze, and Mark Oskin. In WAX 2015 (co-located with PLDI).
SNNAP: Approximate Computing on Programmable SoCs via Neural Acceleration. Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, and Mark Oskin. In HPCA 2015. [slides]
Energy-Efficient Neural Network Acceleration in the Presence of Bit-Level Memory Errors. Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze and Visvesh Sathe. In IEEE Transactions on Circuits and Systems, December 2018.
A Taxonomy of Approximate Computing Techniques. Thierry Moreau, Joshua San Miguel, Mark Wyse, James Bornholt, Armin Alaghi, Luis Ceze, Natalie Enright Jerger and Adrian Sampson. In IEEE Embedded Systems Letter, October 2017.
Approximate Computing: Making Mobile Systems More Efficient. Thierry Moreau, Adrian Sampson, and Luis Ceze. In IEEE Pervasive Computing, April 2015.
A Hardware-Software Blueprint for Flexible Deep Learning Specialization. Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. arXiv:1807.04188.
Relay: A High-Level IR for Deep Learning. Jared Roesch, Steven Lyubomirsky, Marisa Kirisame, Josh Pollock, Logan Weber, Ziheng Jiang, Tianqi Chen, Thierry Moreau, Zachary Tatlock. arXiv:1904.08368.
Automating Generation of Low Precision Deep Learning Operators. Meghan Cowan, Thierry Moreau, Tianqi Chen, Luis Ceze. arXiv:1810.11066.
Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms. Phillip Stanley-Marbell, Armin Alaghi, Michael Carbin, Eva Darulova, Lara Dolecek, Andreas Gerstlauer, Ghayoor Gillani, Djordje Jevdjic, Thierry Moreau, Mattia Cacciotti, Alexandros Daglis, Natalie Enright Jerger, Babak Falsafi, Sasa Misailovic, Adrian Sampson, Damien Zufferey. arXiv:1809.05859.
QAPPA: A Framework for Navigating Quality-Energy Tradeoffs with Arbitrary Quantization. Thierry Moreau, Felipe Augusto, Patrick Howe, Armin Alaghi, Luis Ceze. UW-CSE Tech Report (UW-CSE-17-03-02).
ACCEPT: A Programmer-Guided Compiler Framework for Practical Approximate Computing. Adrian Sampson, Andre Baixo, Benjamin Ransford, Thierry Moreau, Joshua Yip, Luis Ceze and Mark Oskin. UW-CSE Tech Report (UW-CSE-15-01-01).
CSE599S: Hardware/Software Co-Optimization for Machine Learning, Spring 2018, Co-Instructor, with Luis Ceze.
CSE548: Computer Architecture, Spring 2017, head Teaching-Assistant, with Luis Ceze.
CSE352: Hardware Design and Implementation, Spring 2013, head Teaching-Assistant, with Mark Oskin.
My research is generously supported by the Center for Future Architectures Research, the Qualcomm Innovation Fellowship, the National Sciences and Engineering Research Council of Canada, the Weil Family Endowed Fellowship, and gifts from Intel (under the CAPA program), Huawei, and Xilinx.