Fix \(1 \leq p \leq 2\). We will now prove that if \(a_1,\ldots,a_m \in \R^n\) and \(f_i(x) = |\langle a_i,x\rangle|^p\), then the sum \(F(x) = f_1(x) + \cdots + f_m(x)\) admits \(s\)-sparse \(\e\)-approximations for
generalizing what we saw last lecture for \(p=2\).
As we have already seen, this will follow from an estimate of the form
where
for some choice of sampling probabilities \((\rho_1,\ldots,\rho_m) \in \R_+^m\).
Let’s introduce a distance analogous to our argument for \(p=2\):
\[d_{\nu,\infty}(x,y) \seteq \max_{j \in [M]} \frac{|\langle a_{\nu_j}, x-y\rangle|}{(M \rho_{\nu_j})^{1/p}}\]Using the elementary inequality, valid for \(1 \leq p \leq 2\),
\[|a^p - b^p|^2 \leq 2 |a-b|^p \left(a^p + b^p\right)\,,\quad \forall a,b \geq 0\,,\]it holds that
\[d_{\nu}(x,y) \lesssim d_{\nu,\infty}(x,y)^{p/2} \left(\max_{F(x) \leq 1} \tilde{F}_{\nu}(x)\right)^{1/2}.\]Thus we need to prove a bound of the form
\[e_h(B_F, d_{\nu,\infty}^{p/2}) \lesssim 2^{-h/2} L\,,\quad h \geq 0\,.\]As we discussed last time, this is equivalent to the bound
\[\begin{equation}\label{eq:lp_cov_goal} \left(\log \mathcal{N}(B_F, d_{\nu,\infty}, \e^{2/p})\right)^{1/2} \lesssim \frac{L}{\e}\,\quad \e > 0\,. \end{equation}\]Consider \(A : \R^n \to \R^m\) with rows \(a_1,\ldots,a_m \in \R^n\), and let us define importance scores
Lemma: For any norm \(N\) on \(\R^n\) and \(1 \leq p \leq 2\), it holds that
\[\int e^{-N(x)^p} N(x)^p \,dx = \frac{n}{p} \int e^{-N(x)^p}\,dx\,.\]Proof:
Let \(B_N\) denote the unit ball of \(N\). Define \(V(r) \seteq \mathrm{vol}_n(r B_N) = r^n \mathrm{vol}_n(B_N)\) so that \(\frac{d}{dr} V(r) = n r^{n-1} \mathrm{vol}_n(B_N)\). Therefore,
\[\begin{align*} \int_{\R^n} N(x)^p\,d\mu(x) &= \frac{\int_{\R^n} N(x)^p e^{-N(x)^p}\,dx}{\int_{\R^n} e^{-N(x)^p}\,dx} \\ &= \frac{\int_{0}^{\infty} r^p e^{-r^p} d V(r)}{\int_{0}^{\infty} e^{-r^p} d V(r)} = \frac{\int_{0}^{\infty} e^{-r^p} r^{n-1+p}\,dr}{\int_{0}^{\infty} e^{-r^p} r^{n-1}\,dr}\,. \end{align*}\]Make the subsitution $u = r^p$, yielding
\[\int_{\R^n} N(x)^p e^{-N(x)^p}\,dx = \frac{\int_{0}^{\infty} e^{-u} u^{n/p}\,du}{\int_{0}^{\infty} e^{-u} u^{n/p-1}\,du} = \frac{n}{p}\,,\]where the latter equality follows from integration by parts.
Let \(\mathbf{Z}\) be a random variable with law \(\mu\), where \(\mu\) has density proportional to \(e^{-\norm{Ax}_p^p}\). Then the lemma tells us that
\[\begin{equation}\label{eq:rho-alt} \rho_i = \frac{p}{n} \E \left[\abs{\langle a_i,\mathbf{Z}\rangle}^p\right]\,, \quad i=1,\ldots,m\,. \end{equation}\]Check that in the case \(p=2\), it holds that \(\rho_i = \norm{(A^{\top} A)^{-1/2} a_i}_2^2/n\), i.e., the corresponding weights are precisely the statistical leverage scores.
In light of our choice of important scores in \eqref{eq:lp_scores}, we can think of the following setting. Let \(U\) be a matrix with rows \(u_1,\ldots,u_M \in \R^n\). Let \(\mathbf{Z}\) be a random variable whose distribution has density proportional to \(e^{-\|A x\|_p^p}\), and assume that
Note that if \(u_j = a_{\nu_j}/(M \rho_{\nu_j})^{1/p}\), then \(K \leq (\frac{n}{p M})^{1/p}\) using \eqref{eq:rho-alt}.
Our goal now is to bound the covering numbers of the set \(B_A \seteq \{x \in \R^n : \|A x\|_p \leq 1 \}\) in the metric \(d_{U}(x,y) \seteq \|U(x-y)\|_{\infty}\). Recall that we did this in the last lecture in the case \(p=2\) using the Shift Lemma for the gaussian measure. There we used the inequality
which actually holds with equality (by the Pythagorean theorem) for the Euclidean norm. We will need a generalization of this property.
Uniform smoothness.
A norm $N$ on $\R^n$ is said to be $p$-uniformly smooth with constant $S$ if it holds that
\[\begin{equation}\label{eq:p-smooth} \frac{N(x+y)^p + N(x-y)^p}{2} \leq N(x)^p + N(S y)^p,\qquad x,y \in \R^n\,. \end{equation}\]For \(1 \leq p \leq 2\), the \(\ell_p\) norm is \(p\)-uniformly smooth with constant \(1\). This reduces to the \(1\)-dimensional inequality (due to Clarkson):
\[\begin{equation}\label{eq:clarkson} \frac{|a+b|^p + |a-b|^p}{2} \leq |a|^p + |b|^p\,,\quad \forall a,b \in \R\,. \end{equation}\]where the first inequality is concavity of \(y \mapsto y^{q}\) for \(0 \leq q \leq 1\), and the second inequalitiy uses $(u+v)^q \leq u^q + v^q$ for \(0 \leq q \leq 1\) and \(u,v \geq 0\).
Uniform smoothness is a quantative measure of how much the unit ball of a norm lacks “corners.” Consider the modulus of smoothness
\[\rho_N(t) \seteq \sup \left\{ \frac{N(x+y) + N(x-y)}{2} : N(x)=1, N(y) = t\right\}\]Then \(N\) is \(p\)-uniformly smooth if \(\rho_N(t) \leq O(t^p)\) for all \(t > 0\).
To get some intuition for this notion, see the figure to the right. The blue arrow corresponds to \(x\) and the red arrows correspond to \(\pm y\). One of the directions is bad for the \(\ell_1\) ball, and the other for the \(\ell_{\infty}\) ball.
Note how following the red arrows takes one far outside the \(\ell_1\) (and \(\ell_{\infty}\)) ball, but almost stays inside the \(\ell_2\) ball. It turns out that \(\ell_p\) is \(p\)-uniformly smooth with constant \(1\) for \(1 \leq p \leq 2\), and \(2\)-uniformly smooth with constant \(O(\sqrt{p})\) for \(p > 2\).
Shift Lemma for \(p\)-uniformly smooth norms:
Suppose $N$ is a norm on $\R^n$ that is $p$-uniformly smooth with constant $S$. Let \(\mu\) be the probability measure on \(\R^n\) with density proportional to \(e^{-N(x)^p}\). Then for any symmetric convex body \(W \subseteq \R^n\) and \(z\in \R^n\),
\[\begin{equation}\label{eq:shift-measure} \mu(W+z) \geq e^{- S^p N (z)^p}\mu(W)\,. \end{equation}\]Proof:
For any $z \in \R^n$, it holds that
\[\mu({W}+z) = \frac{\int_{W} \exp(-N(x+z)^p)\,dx}{\int_{W} \exp(-N(x)^p)\,dx} \mu({W})\,.\]Now we bound
\[\begin{align*} \int_{W} \exp(-N(x+z)^p)\,dx &= \int_{W} \E_{\sigma \in \{-1,1\}} \exp(-N(\sigma x+z)^p)\,dx \\ &\geq \int_{W} \exp\left(-\E_{\sigma \in \{-1,1\}} N(\sigma x+z)^p\right)\,dx \\ &\geq \int_{W} \exp\left(-(N(x)^p + S^p N(z)^p)\right)\,dx \\ &= \exp(-S^p N(z)^p) \int_{W} \exp(-N(x)^p)\,dx\,, \end{align*}\]where the equality uses symmetry of ${W}$, the first inequality uses convexity of $e^y$, and the second inequality uses $p$-uniform smoothness of $N$ (recall \eqref{eq:p-smooth}).
Dual-Sudakov Lemma for \(p\)-smooth norms.
Let \(N\) and \(\hat{N}\) be norms on \(\R^n\). Suppose that $N$ is $p$-uniformly smooth with constant $S$, and let \(\mu\) be the probability measure on $\R^n$ with density propertional to \(e^{-N(x)^p}\). Then for any $\e > 0$,
\[\left(\log \mathcal{N}(B_{N}, \hat{N}, \e)\right)^{1/p} \lesssim \frac{S}{\e} \int \hat{N}(x)\,d\mu(x)\,.\]Proof: By scaling $\hat{N}$, we may assume that $\e = 1$. Suppose now that $x_1,\ldots,x_M \in B_{N}$ and $x_1+B_{\hat{N}},\ldots,x_M+B_{\hat{N}}$ are pairwise disjoint. To establish an upper bound on $M$, let $\lambda > 0$ be a number we will choose later and write
\[\begin{align*} 1 \geq \mu\left(\bigcup_{j=1}^M \lambda (x_j + B_{\hat{N}})\right) &= \sum_{j=1}^M \mu\left(\lambda x_j + \lambda B_{\hat{N}}\right) \\ &\geq \sum_{j=1}^M e^{-\lambda^p S^p N(x_j)^p} \mu(\lambda B_{\hat{N}}) \geq M e^{-S^p \lambda^p} \mu(\lambda B_{\hat{N}})\,, \end{align*}\]where \eqref{eq:shift-measure} used the Shift Lemma and the last inequality used $x_1,\ldots,x_M \in B_N$.
Now choose $\lambda \seteq 2 \int \hat{N}(x)\, d\mu(x)$ so that Markov’s inequality gives
\[\mu(\lambda B_{\hat{N}}) = \mu\left(\{x : \hat{N}(x) \leq \lambda \} \right) \geq 1/2\,.\]Combining with the preceding inequality yields the upper bound
\[\left(\log (M/2)\right)^{1/p} \leq S \lambda\,.\]Control on the covering numbers. Using the dual-Sudakov Lemma, we have
\[\left(\log \mathcal{N}(B_A, d_U, \e)\right)^{1/p} \lesssim \frac{1}{\e} \E \|U \mathbf{Z}\|_{\infty} = \frac{1}{\e} \max_{j \in [M]} \abs{\langle u_j, \mathbf{Z}\rangle}\,.\]Since it holds that \(\E[\abs{\langle u_j,\mathbf{Z}\rangle}] \leq \left(\E \abs{\langle u_j,\mathbf{Z}\rangle}^p\right)^{1/p} \leq L^{1/p}\) by \eqref{eq:pth-moments}, we are left with the final piece, a concentration estimate for the random variables \(\{ \langle u_j, \mathbf{Z} \rangle \}\).
Exponential concentration. Let’s first summarize the final piece: The function \(e^{-\|Ax\|_p^p}\) is log-concave, meaning that the random variable \(\mathbf{Z}\) is log-concave. You will prove in HW#2 that the \(1\)-dimensional marginals of such a random variable always satisfy an exponential concentration inequality: For some constant \(c > 0\),
\[\P\left[\abs{\langle u_j, \mathbf{Z}\rangle} \geq t \E[\abs{\langle u_j, \mathbf{Z}\rangle} \right] \leq e^{-c t}\,,\quad t \geq 1\,.\]Finishing the argument. In particular, this implies that
\[\E \max_{j \in [M]} \abs{\langle u_j, \mathbf{Z}\rangle} \lesssim \left(\max_{j \in [M]} \E \abs{\langle u_j, \mathbf{Z}\rangle}\right) \log M\,,\]which gives
\[\left(\log \mathcal{N}(B_A, d_U, \e)\right)^{1/p} \lesssim \frac{1}{\e} K \log M\,.\]Now for
\[u_j \seteq \frac{a_{\nu_j}}{M^{1/p} \rho_{\nu_j}^{1/p}}\,,\]we satisfy this with \(K \leq (n/M)^{1/p}\). Recalling \eqref{eq:lp_cov_goal}, let us write this as
\[\left(\log \mathcal{N}(B_A, d_U, \e^{2/p})\right)^{1/2} \lesssim \frac{1}{\e} \sqrt{\frac{n (\log M)^p}{M}}\,.\]Translating all the way back to our initial \eqref{eq:lp_ent_goal} gives
\[e_h(B_F, d_{\nu}) \lesssim 2^{-h/2} \sqrt{\frac{n (\log M)^p}{M}} \left(\max_{F(x) \leq 1} \tilde{F}_{\nu}(x)\right)^{1/2}\,,\]Now applying Dudley’s entropy bound as in the previous lecture gives
\[\E_{\e} \max_{F(x) \leq 1} \mag{\sum_{j=1}^M \e_j \frac{f_{\nu_j}(x)}{M \rho_{\nu_j}}} \lesssim \sqrt{\frac{n (\log M)^p}{M}} \log n\,.\]And using our symmetrization reduction, we should take \(M \asymp \frac{n}{\e^2} (\log (n/\e))^p (\log n)^2\) to obtain an \(\e\)-approximation with precisely \(M\) terms.