Norms on \(\R^n\). Recall that a norm \(N\) on \(\R^n\) is a nonnegative mapping \(N : \R^n \to \R_+\) such that
\(N\) is a semi-norm if it only satisfies properties (1) and (2). We will consider semi-norms throughout the lecture and say “norm” even when considering semi-norms.
An upper bound. Suppose that \(N_1,N_2,\ldots,N_m\) are norms on \(\R^n\) and \(T \subseteq B_2^n\), where \(B_2^n = \{ x \in \R^n : \|x\|_2 \leq 1 \}\) is the Euclidean unit ball.
Define \(\kappa \seteq \E \max_{k=1,\ldots,m} N_k(g)\), where \(g\) is a standard \(n\)-dimensional Gaussian.
Our goal is to prove the following: If \(\e_1,\e_2,\ldots,\e_m\) are i.i.d. random signs, then
\[\begin{equation}\label{eq:goal11} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \kappa \log(n) \max_{x \in T} \sqrt{\sum_{k=1}^m N_k(x)^2}\,. \end{equation}\]Example: Operator norm for sums of random matrices. Another natural situation where a bound like this is useful is for sums of random matrices. Suppose that
\[A = \sum_{k=1}^m \e_k A_k^T A_k\,,\]where each \(A_k^T A_k\) is a PSD matrix. Then the operator norm of \(A\) can be written as
\[\|A\|_{\mathrm{op}} = \max_{\|x\|_2 \leq 1} \langle x, A x\rangle = \max_{\|x\|_2 \leq 1} \sum_{k=1}^m \e_k \langle x, A_k^T A_k x\rangle = \max_{\|x\|_2 \leq 1} \sum_{k=1}^m \e_k \|A_k x\|_2^2\,,\]which is the special case of our problem where \(N_k(x) = \|A_k x\|_2^2\) and \(T = B_2^n\).
Applying Dudley’s entropy inequality. Recall that \(\left\{ \sum_{k=1}^m \e_k N_k(x)^2 : x \in T \right\}\) is a subgaussian process when equipped with the distance
\[d(x,y) \seteq \left(\sum_{k=1}^m \left(N_k(x)^2 - N_k(y)^2\right)^2\right)^{1/2}.\]Therefore Dudley’s entropy bound shows that
\[\begin{equation}\label{eq:dudley3} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \sum_{h \geq 0} 2^{h/2} e_h(T,d)\,. \end{equation}\]Controlling the subgaussian distance via a norm. Notice that both sides of the inequality \eqref{eq:goal11} are quadratic in the norms \(N_k\). Therefore by rescaling we may assume that
\[\begin{equation}\label{eq:L2N} \max_{x \in T} \sqrt{\sum_{k=1}^m N_k(x)^2} = 1\,. \end{equation}\]Define the norm
\[\|x\|_N \seteq \max_{k=1,\ldots,m} N_k(x)\,.\]Note that using \(a^2-b^2 = (a-b)(a+b)\), for any \(x,y \in T\), we have
\[\begin{eqnarray*} d(x,y) &=& \left(\sum_{k=1}^m (N_k(x)-N_k(y))^2 (N_k(x)+N_k(y))^2 \right)^{1/2} \\ &\leq& \|x-y\|_N \left(\sum_{k=1}^m (N_k(x)+N_k(y))^2\right)^{1/2} \\ &\leq& 2 \|x-y\|_N\,, \end{eqnarray*}\]where the first inequality uses \(|N_k(x)-N_k(y)| \leq N_k(x-y)\) (true for any norm) and the second inequality uses \eqref{eq:L2N} and \(x,y \in T\).
Splitting the Dudley sum. Therefore we have \(e_h(T,d) \leq 2 e_h(T, \|\cdot\|_N)\) and combining this with Dudley’s bound \eqref{eq:dudley3} gives
\[\begin{equation}\label{eq:dudley2} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \sum_{h \geq 0} 2^{h/2} e_h(T,\|\cdot\|_N)\,. \end{equation}\]We will split this sum into two parts depending on whether \(h \leq 4 \log (n)\) or \(h > 4 \log (n)\).
Controlling the large \(h\) terms with a volume bound. Note that for \(x \in T\), by \eqref{eq:L2N} we have
\[\|x\|_N \leq \sqrt{\sum_{k=1}^m N_k(x)^2} \leq 1\,.\]In other words, \(T \subseteq B_N \seteq \{ x \in \R^n : \|x\|_N \leq 1\}\).
Therefore \(e_h(T,\|\cdot\|_N) \leq e_h(B_N,\|\cdot\|_N)\).
Claim: It holds that \(e_h(B_N,\|\cdot\|_N) \leq 4 \cdot 2^{-2^h/n}\).
This proof works for any norm on \(\R^n\).
Assume that \(x_1,\ldots,x_{s} \in B_N\) are a maximal set of points satisfying \(\|x_i-x_j\|_N \geq \delta\) for all \(i \neq j\). In that case, we have
\[B_N \subseteq \bigcup_{j=1}^s (x_j + 2 \delta B_N).\]In other words, \(B_N\) is covered by \(s\) balls of radius \(2 \delta\).
But it also holds that (for \(\delta < 1\)),
\[2 B_N \supseteq \bigcup_{j=1}^s (x_j + \delta B_n)\]hence
\[\mathrm{vol}_n(2 B_N) \geq s \,\mathrm{vol}_n(\delta B_N) = s \,(\delta/2)^n \mathrm{vol}_n(2 B_N),\]where \(\mathrm{vol}_n\) is the standard \(n\)-dimensional volume on \(\R^n\). It follows that \(s \leq (\delta/2)^{-n}\), i.e., \(\delta \leq 2 s^{-n}\). Taking \(s = 2^{2^h}\) yields
\[e_h(B_N,\|\cdot\|_N) \leq 4 \delta = 4 \cdot 2^{-2^h/n}.\]Recalling \eqref{eq:dudley2}, we can use the preceding claim to evaluate
\[\sum_{h > 4 \log n} 2^{h/2} e_h(T, \|\cdot\|_N) \leq 4 \sum_{h > 4 \log n} 2^{h/2} 2^{-2^h/n} \leq O(1)\,.\]Thus we have
\[\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim O(1) + \sum_{0 \leq h \leq 4 \log n} 2^{h/2} e_h(T,\|\cdot\|_N)\,.\]Controlling the interesting \(h\) terms. To bound the second sum, let us use \(T \subseteq B_2^n\) to write \(e_h(T,\|\cdot\|_N) \leq e_h(B_2^n,\|\cdot\|_N)\).
(Dual) Sudakov Lemma: For any norm \(\|\cdot\|\) on \(\R^n\) and \(h \geq 0\), it holds that
\[e_h(B_2^n, \|\cdot\|) \lesssim 2^{-h/2} \E \|g\|\,.\]Note that \(\E\, \|g\|_N = \E \max_k N_k(g) = \kappa\) (by definition), and therefore the second sum above is bounded by
\[\sum_{0 \leq h \leq 4 \log n} 2^{h/2} e_h(T,\|\cdot\|_N) \lesssim \kappa \log n\,.\]This completes the proof of \eqref{eq:goal11} once we have established the Dual Sudakov Lemma.
Gaussian Shift Lemma. We need the following
Lemma: Suppose that \(K\) is a symmetric convex set and \(\gamma_n\) is the \(n\)-dimensional Gaussian measure on \(\R^n\). Then for any \(x \in \R^n\),
\[\gamma_n(K+x) \geq e^{-\|x\|_2^2/2} \gamma_n(K)\,.\]Proof: Write
\[\begin{eqnarray*} \gamma_n(K+x) &=& (2\pi)^{-n} \int_K e^{-\|x+z\|^2/2}\,dz \\ &=& (2\pi)^{-n} \int_K \E_{\sigma \in \{-1,1\}} e^{-\|\sigma x+z\|_2^2/2}\,dz\,, \end{eqnarray*}\]where the second inequality follows because \(K\) is symmetric. Now note that \(\E_{\sigma \in \{-1,1\}} \|\sigma x + z\|_2^2 = \|x\|_2^2 + \|z\|_2^2\) and use convexity of \(y \mapsto e^y\) to write
\[\begin{eqnarray*} (2\pi)^{-n} \int_K \E_{\sigma \in \{-1,1\}} e^{-\|\sigma x+z\|_2^2/2}\,dz &\geq & (2\pi)^{-n} \int_K e^{-\E_{\sigma \in \{-1,1\}} \|\sigma x+z\|_2^2/2}\,dz \\ &=& (2\pi)^{-n} \int_K e^{-(\|x\|_2^2+\|z\|_2^2)/2}\,dz = e^{-\|x\|_2^2} \gamma_n(K)\,. \end{eqnarray*}\]Proof of Dual Sudakov. Let \(\mathcal{B} \seteq \{ x \in \R^n : \|x\| \leq 1 \}\) denote the unit ball of the \(\|\cdot\|\) norm and suppose \(x_1,\ldots,x_s \in B_2^n\) is a maximal set of points such that the shifted balls \(x_j + \delta \mathcal{B}\) are all pairwise disjoint. In that case, we have
\[\begin{equation}\label{eq:Bcover} B_2^n \subseteq \bigcup_{j=1}^s (x_j + 2 \delta \mathcal{B})\,, \end{equation}\]i.e., we have covered \(B_2^n\) by \(s\) balls (in \(\|\cdot\|\)) of radius at most \(2\delta\).
Let \(\lambda > 0\) be a parameter we will choose later and note that the sets \(\lambda(x_j+\delta \mathcal{B})\) are also pairwise disjoint, and therefore
\[\begin{eqnarray*} 1 &\geq& \gamma_n\left(\bigcup_{j=1}^s \lambda (x_j + \delta \mathcal{B}) \right) \\ &=& \sum_{j=1}^s \gamma_n\left(\lambda (x_j + \delta \mathcal{B}) \right) \\ &\geq& \sum_{j=1}^s e^{-\lambda^2 \|x_j\|_2^2/2} \gamma_n(\lambda \delta \mathcal{B}) \geq s \cdot e^{-\lambda^2/2} \gamma_n(\lambda \delta \mathcal{B})\,, \end{eqnarray*}\]where the first equality uses disjointness, the second inequality uses the Gaussian Shift Lemma, and the final inequality uses \(x_1,\ldots,x_s \in B_2^n\).
If we define \(\lambda \seteq \frac{2}{\delta} \E\, \|g\|\), then
\[\gamma_n(\lambda \delta B) = \P\left(g \in \lambda \delta B\right) = \P\left(\|g\| \leq \lambda \delta\right) = \P\left(\|g\| \leq 2 \E\, \|g\|\right) \geq \frac12\,,\]where the last step is simply Markov’s inequality.
Combining this with our previous calculation yields
\[1 \geq s \cdot \exp\left(-(2/\delta)^2 (\E\, \|g\|)^2\right) \cdot \frac12\,,\]i.e.
\[\sqrt{\log (s/2)} \leq \frac{2}{\delta} \E\, \|g\|\,.\]Taking \(s=2^{2^h}\) and recalling \eqref{eq:Bcover} shows that we can cover \(B_2^n\) by at most \(s\) \(\|\cdot\|\)-balls of radius \(2 \delta\) with
\[\delta \leq \frac{2\, \E\, \|g\|}{\sqrt{\log (s/2)}} \lesssim 2^{-h/2} \E\,\|g\|\,.\]By definition, this yields our desired bound
\[e_h(B_2^n, \|\cdot\|) \lesssim 2^{-h/2} \E\,\|g\|\,.\]