$\ell_p$ sparsification for $1 \leq p \leq 2$ and uniform smoothness

Fix \(1 \leq p \leq 2\). We will now prove that if \(a_1,\ldots,a_m \in \R^n\) and \(f_i(x) = |\langle a_i,x\rangle|^p\), then the sum \(F(x) = f_1(x) + \cdots + f_m(x)\) admits \(s\)-sparse \(\e\)-approximations for

$$ s \lesssim \frac{n}{\e^2} \left(\log \frac{n}{\e}\right)^{O(1)}, $$

generalizing what we saw last lecture for \(p=2\).

As we have already seen, this will follow from an estimate of the form

$$\begin{equation}\label{eq:lp_ent_goal} e_h(B_F, d_{\nu}) \lesssim 2^{-h/2} \sqrt{\frac{n}{M}} \left(\log(M) \log(n)\right)^{O(1)} \left(\max_{F(x) \leq 1} \tilde{F}_{\nu}(x)\right)^{1/2}\,, \end{equation} $$


$$\begin{align*} B_F &= \{ x \in \R^n : \|Ax\|_p \leq 1 \} \\ \tilde{F}_{\nu}(x) &= \sum_{j=1}^M \frac{f_{\nu_j}(x)}{M \rho_{\nu_j}} \\ d_{\nu}(x,y) &= \left(\sum_{j=1}^M \left(\frac{f_{\nu_j}(x)-f_{\nu_j}(y)}{M \rho_{\nu_j}}\right)^2\right)^{1/2}\,, \end{align*} $$

for some choice of sampling probabilities \((\rho_1,\ldots,\rho_m) \in \R_+^m\).

Reduction to the consideration of \(d_{\nu,\infty}\)

$\ell_p$ importances from averages

Consider \(A : \R^n \to \R^m\) with rows \(a_1,\ldots,a_m \in \R^n\), and let us define importance scores

$$\begin{equation}\label{eq:lp_scores} \rho_i \seteq \frac{\int e^{-\norm{Ax}_p^p} \abs{\langle a_i,x\rangle}^p\,dx}{\int e^{-\norm{Ax}_p^p} \norm{Ax}_p^p\,dx}\,. \end{equation} $$

The shift lemma and uniform smoothness

In light of our choice of important scores in \eqref{eq:lp_scores}, we can think of the following setting. Let \(U\) be a matrix with rows \(u_1,\ldots,u_M \in \R^n\). Let \(\mathbf{Z}\) be a random variable whose distribution has density proportional to \(e^{-\|A x\|_p^p}\), and assume that

$$\begin{equation}\label{eq:pth-moments} \left(\E\left[\abs{\langle u_j, \mathbf{Z}\rangle}^p\right]\right)^{1/p} \leq K\,,\quad j=1,\ldots,M\,. \end{equation} $$

Note that if \(u_j = a_{\nu_j}/(M \rho_{\nu_j})^{1/p}\), then \(K \leq (\frac{n}{p M})^{1/p}\) using \eqref{eq:rho-alt}.

Our goal now is to bound the covering numbers of the set \(B_A \seteq \{x \in \R^n : \|A x\|_p \leq 1 \}\) in the metric \(d_{U}(x,y) \seteq \|U(x-y)\|_{\infty}\). Recall that we did this in the last lecture in the case \(p=2\) using the Shift Lemma for the gaussian measure. There we used the inequality

$$ \frac{\|x + z\|_2^2 +\|x-z\|_2^2}{2} \leq \|x\|_2^2 + \|z\|_2^2\,, $$

which actually holds with equality (by the Pythagorean theorem) for the Euclidean norm. We will need a generalization of this property.

\(\psi_1\) concentration for one-dimensional marginals of log-concave measures