Importance sampling and leverage scores

Intro to sparsification

Unbiased estimators and importance sampling

\(\ell_2\) regression and statistical leverage scores

Concentration

Once we’ve chosen our sampling probabilities \(\rho_i = \sigma_i(A)/n\), we can independently sample \(M\) terms from the distribution, and we are left to analyze the random sum

\[\tilde{A} = \frac{1}{M}\left(\frac{a_{i_1} a_{i_1}^{\top}}{\rho_{i_1}} + \cdots + \frac{a_{i_M} a_{i_M}^{\top}}{\rho_{i_M}}\right).\]

We have already argued that \(\E[\tilde{A}] = A\), so the real question is about concentration, which we will begin to cover in the next lecture.

There are actually two ways to approach this question. One is to think about proving concentration of the sum along every direction \(x \in \R^n\) simultaneously. This leads naturally to entropy bounds, covering numbers, and the generic chaining theory. The second approach is to think about \(\tilde{A}\) as a random operator and try to prove concentration of sums of independent random matrices. The former will be far more general, while the latter appears substantially more powerful in certain special cases.