Weihao Kong 
I am currently a postdoctoral researcher at University of Washington working with Sham M. Kakade. I recently got my PhD from Stanford University advised by Gregory Valiant. My thesis can be found here. I got my bachelor degree from Shanghai Jiao Tong University and did undergraduate research with WuJun Li.
Research interests:
 Learning from a large number of heterogeneous data sources: A common modern machine learning scenario involves a large amount of data contributed by a large number of heterogeneous data sources, with each data source providing a modest amount of data. How well can we learn in this setting? To what extent can a large number of sources compensate for the lack of data from each source? What is the fundamental limit of learning?
This problem has been studied under multitask learning, metalearning, federated learning, fewshot learning, empirical Bayesian by different communities.
Metalearning for linear models: [2, 3]; Learning populations of binomial parameters: [4, 6, 11].  Estimating learnability: Without enough data to learn a good model for prediction, is it possible to tell whether a good model exists? This is surprisingly possible under linear model assumptions [8]. An analogs question can be asked in the contextual bandits setting: Without enough rounds to learn a good policy, is it possible to estimate the value of the optimal policy [5]? Such techniques can be applied for model selection in contextual bandits [FKL 19].
 Robust machine learning: How to design learning algorithms that are provably robust against a small fraction of malicious data/users? The problem becomes more and more pressing with the advancement of federated learning.
Robust metalearning and robust PCA: [2]; Robust linear regression: [7].
Recent talks:
 The 2020 Junior Theorists Workshop at Northwestern CS: MetaLearning for Mixed Linear Regression [Video]
CV  ·  Teaching  ·  Talks 
Publications

Online Model Selection for Reinforcement Learning with Function Approximation
Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill.
In submission.A metaalgorithm that adapts to the optimal model complexity with regret bound $\tilde{O}(d_*T^{2/3})$.

Robust Metalearning for Mixed Linear Regression with Small Batches [Video]
Weihao Kong, Raghav Somani, Sham M. Kakade, Sewoong Oh.
NeurIPS 2020Smooth time/sample complexity tradeoff between the number of tasks and the dataset size of each task. Algorithmic contribution includes an optimal robust PCA algorithm.

MetaLearning for Mixed Linear Regression [Video]
Weihao Kong, Raghav Somani, Zhao Song, Sham M. Kakade, Sewoong Oh.
ICML 2020First poly time/sample complexity algorithm for $k$mixture linear regression by utilizing a small set of tasks each with $\tilde{O}(\sqrt{k})$ examples.

Optimal Estimation of Change in a Population of Parameters
Ramya Korlakai Vinayak, Weihao Kong, Sham M. Kakade.
ManuscriptEstimate the distribution of treatment effects.

Sublinear Optimal Policy Value Estimation in Contextual Bandits [Video]
Weihao Kong, Gregory Valiant, Emma Brunskill.
AISTATS 2020.Estimate optimal policy value of contextual bandits with $K$ disjoint $d$dimensional arms in $\tilde{O} (\sqrt{d}K)$ rounds, minimax optimal up to log factors.

Maximum Likelihood Estimation for Learning Populations of Parameters
Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade.
ICML 2019.$n$ different coins, each tossed $t$ times, estimate distribution of coin biases with Wasserstein error $O(1/\sqrt{t\log n})$ when $n<2^t$, minimax optimal.

Efficient Algorithms and Lower Bounds for Robust Linear Regression
Ilias Diakonikolas*, Weihao Kong*, Alistair Stewart*.
SODA 2019Solve $\epsilon$ corrupted linear regression up to $\tilde{O}(\epsilon)$ error under gaussianity assumption, minimax optimal, previous work achieves $\sqrt{\epsilon}$.

Estimating Learnability in the Sublinear Data Regime [Code]
Weihao Kong, Gregory Valiant.
NIPS 2018Estimate optimal prediction error of linear regression/classification with $O(\sqrt{d})$ examples, minimax optimal.

Approximating the Spectrum of a Graph [Code][Video]
David CohenSteiner*, Weihao Kong*, Christian Sohler*, Gregory Valiant*.
KDD 2018.Approximate graph Laplacian specturm in constant time.

Recovering Structured Probability Matrices
Qingqing Huang*, Sham M. Kakade*, Weihao Kong*, Gregory Valiant*.
ITCS 2018. 
Learning Populations of Parameters [Video]
Kevin Tian, Weihao Kong, Gregory Valiant.
NIPS 2017.$n$ different coins, each tossed $t$ times, estimate distribution of coin biases with Wasserstein error $O(1/{t})$ when $n\ge 2^t$, minimax optimal.

Spectrum Estimation from Samples [Code]
Weihao Kong, Gregory Valiant.
Annals of Statistics, 2017.First algorithm consistently estimates eigenvalues of covariance with $o(d)$ samples.

Optimal Allocation for ChunkedReward Advertising
Weihao Kong, Jian Li, Tao Qin, TieYan Liu.
WINE 2013. 
Isotropic Hashing [Code]
Weihao Kong, WuJun Li.
NIPS 2012.[Poster] 
Manhattan Hashing for LargeScale Image Retrieval [Code]
Weihao Kong, WuJun Li, Minyi Guo.
SIGIR 2012. 
Doublebit quantization for hashing [Code]
Weihao Kong, WuJun Li.
AAAI 2012.