# Sparsifying sums of norms (Mon, Jan 23)

• Norms on $\R^n$. Recall that a norm $N$ on $\R^n$ is a nonnegative mapping $N : \R^n \to \R_+$ such that

1. $N(\lambda x) = \abs{\lambda} N(x)$ for all $x \in \R^n$ and $\lambda \in \R$.
2. $N(x+y) \leq N(x) + N(y)$ for all $x,y \in \R^n$.
3. $N(x) = 0 \iff x = 0$ for all $x \in \R^n$.

$N$ is a semi-norm if it only satisfies properties (1) and (2). We will consider semi-norms throughout the lecture and say “norm” even when considering semi-norms.

• An upper bound. Suppose that $N_1,N_2,\ldots,N_m$ are norms on $\R^n$ and $T \subseteq B_2^n$, where $B_2^n = \{ x \in \R^n : \|x\|_2 \leq 1 \}$ is the Euclidean unit ball.

Define $\kappa \seteq \E \max_{k=1,\ldots,m} N_k(g)$, where $g$ is a standard $n$-dimensional Gaussian.

Our goal is to prove the following: If $\e_1,\e_2,\ldots,\e_m$ are i.i.d. random signs, then

$$$\label{eq:goal11} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \kappa \log(n) \max_{x \in T} \sqrt{\sum_{k=1}^m N_k(x)^2}\,.$$$
• Example: Operator norm for sums of random matrices. Another natural situation where a bound like this is useful is for sums of random matrices. Suppose that

$A = \sum_{k=1}^m \e_k A_k^T A_k\,,$

where each $A_k^T A_k$ is a PSD matrix. Then the operator norm of $A$ can be written as

$\|A\|_{\mathrm{op}} = \max_{\|x\|_2 \leq 1} \langle x, A x\rangle = \max_{\|x\|_2 \leq 1} \sum_{k=1}^m \e_k \langle x, A_k^T A_k x\rangle = \max_{\|x\|_2 \leq 1} \sum_{k=1}^m \e_k \|A_k x\|_2^2\,,$

which is the special case of our problem where $N_k(x) = \|A_k x\|_2^2$ and $T = B_2^n$.

• Applying Dudley’s entropy inequality. Recall that $\left\{ \sum_{k=1}^m \e_k N_k(x)^2 : x \in T \right\}$ is a subgaussian process when equipped with the distance

$d(x,y) \seteq \left(\sum_{k=1}^m \left(N_k(x)^2 - N_k(y)^2\right)^2\right)^{1/2}.$

Therefore Dudley’s entropy bound shows that

$$$\label{eq:dudley3} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \sum_{h \geq 0} 2^{h/2} e_h(T,d)\,.$$$
• Controlling the subgaussian distance via a norm. Notice that both sides of the inequality \eqref{eq:goal11} are quadratic in the norms $N_k$. Therefore by rescaling we may assume that

$$$\label{eq:L2N} \max_{x \in T} \sqrt{\sum_{k=1}^m N_k(x)^2} = 1\,.$$$

Define the norm

$\|x\|_N \seteq \max_{k=1,\ldots,m} N_k(x)\,.$

Note that using $a^2-b^2 = (a-b)(a+b)$, for any $x,y \in T$, we have

$\begin{eqnarray*} d(x,y) &=& \left(\sum_{k=1}^m (N_k(x)-N_k(y))^2 (N_k(x)+N_k(y))^2 \right)^{1/2} \\ &\leq& \|x-y\|_N \left(\sum_{k=1}^m (N_k(x)+N_k(y))^2\right)^{1/2} \\ &\leq& 2 \|x-y\|_N\,, \end{eqnarray*}$

where the first inequality uses $|N_k(x)-N_k(y)| \leq N_k(x-y)$ (true for any norm) and the second inequality uses \eqref{eq:L2N} and $x,y \in T$.

• Splitting the Dudley sum. Therefore we have $e_h(T,d) \leq 2 e_h(T, \|\cdot\|_N)$ and combining this with Dudley’s bound \eqref{eq:dudley3} gives

$$$\label{eq:dudley2} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \sum_{h \geq 0} 2^{h/2} e_h(T,\|\cdot\|_N)\,.$$$

We will split this sum into two parts depending on whether $h \leq 4 \log (n)$ or $h > 4 \log (n)$.

• Controlling the large $h$ terms with a volume bound. Note that for $x \in T$, by \eqref{eq:L2N} we have

$\|x\|_N \leq \sqrt{\sum_{k=1}^m N_k(x)^2} \leq 1\,.$

In other words, $T \subseteq B_N \seteq \{ x \in \R^n : \|x\|_N \leq 1\}$.

Therefore $e_h(T,\|\cdot\|_N) \leq e_h(B_N,\|\cdot\|_N)$.

Claim: It holds that $e_h(B_N,\|\cdot\|_N) \leq 4 \cdot 2^{-2^h/n}$.

This proof works for any norm on $\R^n$.

Assume that $x_1,\ldots,x_{s} \in B_N$ are a maximal set of points satisfying $\|x_i-x_j\|_N \geq \delta$ for all $i \neq j$. In that case, we have

$B_N \subseteq \bigcup_{j=1}^s (x_j + 2 \delta B_N).$

In other words, $B_N$ is covered by $s$ balls of radius $2 \delta$.

But it also holds that (for $\delta < 1$),

$2 B_N \supseteq \bigcup_{j=1}^s (x_j + \delta B_n)$

hence

$\mathrm{vol}_n(2 B_N) \geq s \,\mathrm{vol}_n(\delta B_N) = s \,(\delta/2)^n \mathrm{vol}_n(2 B_N),$

where $\mathrm{vol}_n$ is the standard $n$-dimensional volume on $\R^n$. It follows that $s \leq (\delta/2)^{-n}$, i.e., $\delta \leq 2 s^{-n}$. Taking $s = 2^{2^h}$ yields

$e_h(B_N,\|\cdot\|_N) \leq 4 \delta = 4 \cdot 2^{-2^h/n}.$
• Recalling \eqref{eq:dudley2}, we can use the preceding claim to evaluate

$\sum_{h > 4 \log n} 2^{h/2} e_h(T, \|\cdot\|_N) \leq 4 \sum_{h > 4 \log n} 2^{h/2} 2^{-2^h/n} \leq O(1)\,.$

Thus we have

$\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim O(1) + \sum_{0 \leq h \leq 4 \log n} 2^{h/2} e_h(T,\|\cdot\|_N)\,.$
• Controlling the interesting $h$ terms. To bound the second sum, let us use $T \subseteq B_2^n$ to write $e_h(T,\|\cdot\|_N) \leq e_h(B_2^n,\|\cdot\|_N)$.

(Dual) Sudakov Lemma: For any norm $\|\cdot\|$ on $\R^n$ and $h \geq 0$, it holds that

$e_h(B_2^n, \|\cdot\|) \lesssim 2^{-h/2} \E \|g\|\,.$

Note that $\E\, \|g\|_N = \E \max_k N_k(g) = \kappa$ (by definition), and therefore the second sum above is bounded by

$\sum_{0 \leq h \leq 4 \log n} 2^{h/2} e_h(T,\|\cdot\|_N) \lesssim \kappa \log n\,.$

This completes the proof of \eqref{eq:goal11} once we have established the Dual Sudakov Lemma.

• Gaussian Shift Lemma. We need the following

Lemma: Suppose that $K$ is a symmetric convex set and $\gamma_n$ is the $n$-dimensional Gaussian measure on $\R^n$. Then for any $x \in \R^n$,

$\gamma_n(K+x) \geq e^{-\|x\|_2^2/2} \gamma_n(K)\,.$

Proof: Write

$\begin{eqnarray*} \gamma_n(K+x) &=& (2\pi)^{-n} \int_K e^{-\|x+z\|^2/2}\,dz \\ &=& (2\pi)^{-n} \int_K \E_{\sigma \in \{-1,1\}} e^{-\|\sigma x+z\|_2^2/2}\,dz\,, \end{eqnarray*}$

where the second inequality follows because $K$ is symmetric. Now note that $\E_{\sigma \in \{-1,1\}} \|\sigma x + z\|_2^2 = \|x\|_2^2 + \|z\|_2^2$ and use convexity of $y \mapsto e^y$ to write

$\begin{eqnarray*} (2\pi)^{-n} \int_K \E_{\sigma \in \{-1,1\}} e^{-\|\sigma x+z\|_2^2/2}\,dz &\geq & (2\pi)^{-n} \int_K e^{-\E_{\sigma \in \{-1,1\}} \|\sigma x+z\|_2^2/2}\,dz \\ &=& (2\pi)^{-n} \int_K e^{-(\|x\|_2^2+\|z\|_2^2)/2}\,dz = e^{-\|x\|_2^2} \gamma_n(K)\,. \end{eqnarray*}$
• Proof of Dual Sudakov. Let $\mathcal{B} \seteq \{ x \in \R^n : \|x\| \leq 1 \}$ denote the unit ball of the $\|\cdot\|$ norm and suppose $x_1,\ldots,x_s \in B_2^n$ is a maximal set of points such that the shifted balls $x_j + \delta \mathcal{B}$ are all pairwise disjoint. In that case, we have

$$$\label{eq:Bcover} B_2^n \subseteq \bigcup_{j=1}^s (x_j + 2 \delta \mathcal{B})\,,$$$

i.e., we have covered $B_2^n$ by $s$ balls (in $\|\cdot\|$) of radius at most $2\delta$.

Let $\lambda > 0$ be a parameter we will choose later and note that the sets $\lambda(x_j+\delta \mathcal{B})$ are also pairwise disjoint, and therefore

$\begin{eqnarray*} 1 &\geq& \gamma_n\left(\bigcup_{j=1}^s \lambda (x_j + \delta \mathcal{B}) \right) \\ &=& \sum_{j=1}^s \gamma_n\left(\lambda (x_j + \delta \mathcal{B}) \right) \\ &\geq& \sum_{j=1}^s e^{-\lambda^2 \|x_j\|_2^2/2} \gamma_n(\lambda \delta \mathcal{B}) \geq s \cdot e^{-\lambda^2/2} \gamma_n(\lambda \delta \mathcal{B})\,, \end{eqnarray*}$

where the first equality uses disjointness, the second inequality uses the Gaussian Shift Lemma, and the final inequality uses $x_1,\ldots,x_s \in B_2^n$.

If we define $\lambda \seteq \frac{2}{\delta} \E\, \|g\|$, then

$\gamma_n(\lambda \delta B) = \P\left(g \in \lambda \delta B\right) = \P\left(\|g\| \leq \lambda \delta\right) = \P\left(\|g\| \leq 2 \E\, \|g\|\right) \geq \frac12\,,$

where the last step is simply Markov’s inequality.

Combining this with our previous calculation yields

$1 \geq s \cdot \exp\left(-(2/\delta)^2 (\E\, \|g\|)^2\right) \cdot \frac12\,,$

i.e.

$\sqrt{\log (s/2)} \leq \frac{2}{\delta} \E\, \|g\|\,.$

Taking $s=2^{2^h}$ and recalling \eqref{eq:Bcover} shows that we can cover $B_2^n$ by at most $s$ $\|\cdot\|$-balls of radius $2 \delta$ with

$\delta \leq \frac{2\, \E\, \|g\|}{\sqrt{\log (s/2)}} \lesssim 2^{-h/2} \E\,\|g\|\,.$

By definition, this yields our desired bound

$e_h(B_2^n, \|\cdot\|) \lesssim 2^{-h/2} \E\,\|g\|\,.$