Zhihan Xiong

I am currently a fifth-year PhD student in the Paul G. Allen School of Computer Science & Engineering at University of Washington, and very fortunate to be advised by Prof. Maryam Fazel. Meanwhile, I also closely collaborate with Prof. Simon S. Du, Prof. Kevin Jamieson from UW, and Dr. Lin Xiao from Meta AI on certain projects.

My research interest generally lies in the theory and application of reinforcement learning and bandit problems. From 2022 to 2024, I was generously supported by the UW/META AI Mentorship Program.

Prior to UW, I received my Master's Degree in Statistics from Stanford University in 2020 and Bachelor's Degree in Mathematics and Engineering Physics from University of Illinois at Urbana-Champaign in 2018, where I was fortunate to be advised by Prof. Pierre Moulin.

Email / CV / Google Scholar / LinkedIn

Publications/ Preprints

(* indicates equal contributions)

LoRe: Personalizing LLMs via Low-Rank Reward Modeling [arXiv]
Avinandan Bose, Zhihan Xiong, Yuejie Chi, Simon S. Du, Lin Xiao, Maryam Fazel
Preprint

Hybrid Preference Optimization for Alignment: Faster Convergence Rates by Combining Offline Preferences with Online Exploration [arXiv]
Avinandan Bose, Zhihan Xiong, Aadirupa Saha, Simon S. Du, Maryam Fazel
Preprint

Language Model Preference Evaluation with Multiple Weak Evaluators [arXiv]
Zhengyu Hu, Jieyu Zhang, Zhihan Xiong, Alexander Ratner, Hui Xiong, Ranjay Krishna
Preprint

Policy Mirror Descent with Dual Function Approximation [arXiv]
Zhihan Xiong, Maryam Fazel, Lin Xiao
Preprint

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity [arXiv] [paper]
Zhihan Xiong*, Romain Camilleri*, Maryam Fazel, Lalit Jain, Kevin Jamieson
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Conference on Digital Experimentation @ MIT (CODE@MIT), 2023

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [arXiv]
Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
International Conference on Learning Representations (ICLR), 2024

Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement [arXiv]
Haozhe Jiang*, Qiwen Cui*, Zhihan Xiong, Maryam Fazel, Simon S. Du
International Conference on Learning Representations (ICLR), 2023

Learning in Congestion Games with Bandit Feedback [arXiv] [paper]
Qiwen Cui*, Zhihan Xiong*, Maryam Fazel, Simon S. Du
Advances in Neural Information Processing Systems (NeurIPS), 2022

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes [arXiv] [paper]
Zhihan Xiong*, Ruoqi Shen*, Qiwen Cui*, Maryam Fazel, Simon S. Du
Advances in Neural Information Processing Systems (NeurIPS), 2022

Fourier Learning with Cyclical Data [paper]
Yingxiang Yang*, Zhihan Xiong*, Tianyi Liu*, Taiqing Wang, Chong Wang
International Conference on Machine Learning (ICML), 2022

Selective Sampling for Online Best-arm Identification [arXiv] [paper]
Romain Camilleri* , Zhihan Xiong*, Maryam Fazel, Lalit Jain, Kevin Jamieson
Advances in Neural Information Processing Systems (NeurIPS), 2021

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning [arXiv] [paper]
Tian Tan* Zhihan Xiong*, Vikranth R. Dwaracherla
Association for the Advancement of Artificial Intelligence (AAAI, Oral), 2020

Professional Experiences

Visiting Researcher, Meta (FAIR Labs) Oct 2022 -- Sep 2024
Research Intern, Bytedance (AML Group) Jun 2021 -- Sep 2021
Applied Scientist Intern, Zillow (Personalization Team) Jun 2019 -- Sep 2019

Reviewer for: ICML (2021, 2022, 2023, 2024), NeurIPS (2021, 2022, 2023) and ICLR (2022, 2023, 2024).

Teaching Experiences

CSE/EE/ME 578: Convex Optimization, Teaching Assistant, Winter 2025 University of Washington, WA
CS 229: Machine Learning, Teaching Assistant, Spring 2020 Stanford University, CA
CS 234: Reinforcement Learning, Teaching Assistant, Winter 2020 Stanford University, CA
CS 229: Machine Learning, Teaching Assistant, Autumn 2019 Stanford University, CA