Wed, Jan 04
NO CLASS
|
|
| Generic chaining and extrema of random processes |
Mon, Jan 09
Examples, applications, Gaussian processes
|
Summary
-
A random process is a collection \(\{X_t : t \in T\}\) of random variables.
For the next few lectures, we will considered symmetric processes, i.e.,
those where \(X_t\) and \(-X_t\) have the same law for every \(t \in T\).
-
We equip the index set \(T\) of such a process with the natural \(\ell_2\) metric
\[d(s,t) \seteq \sqrt{\E(X_s-X_t)^2}.\]
-
A fundamental example is that of a Gaussian process, where the family \(\{X_t : t \in T\}\)
is a jointly normal family of random variables. When \(T\) is finite, any such family can be
described as follows: \(T \subseteq \R^n\) for some \(n \geq 1\), and \(X_t = \langle g,t\rangle\),
where \(g=(g_1,\ldots,g_n)\) is a standard \(n\)-dimensional gaussian, i.e., the coordinates
\(g_1,\ldots,g_n\) are i.i.d. \(N(0,1)\) random variables.
In this case the geometry of the metric space \((T,d)\) is Eucliean, since \(d(s,t) = \|s-t\|_2\).
Also note that the metric \(d(s,t)\) completely describes the covariance structure of the process,
and therefore characterizes it (this is a special property of jointly normal families of random variables).
-
We will be concerned with extrema of such a process, and to that end we study the quantity
\[\E \max_{t \in T} X_t\,.\]
-
A random process \(\{ X_t : t \in T \}\) is called sub-gaussian if there is some constant \(c > 0\) such that
\[\begin{equation}\label{eq:sg-tail}
\Pr[X_s - X_t > \lambda] \leq \exp\left(- c\lambda^2/d(s,t)^2\right)
\end{equation}\]
In particular, Gaussian processes are sub-gaussian with \(c=1/2\).
Another example is a Bernoulli process, i.e. a family of random variables defined thusly:
For some subset \(T \subseteq \R^n\),
\[\left\{ \e_1 t_1 + \e_2 t_2 + \cdots + \e_n t_n : t = (t_1,\ldots,t_n) \in T \right\},\]
where \(\e_1,\ldots,\e_n \in \{-1,1\}\) are i.i.d. uniformly random signs.
-
Diameter bound: When the index set \(T\) is finite, we can use the tail bound \eqref{eq:sg-tail} to give a simple
upper bound on \(\E \max_{t \in T} X_t\).
Fix some \(t_0 \in T\) and note that \(\E \max_{t \in T} X_t = \E \max_{t \in T} (X_t-X_{t_0})\).
Now applying a union bound gives
\[\P\left[\max_{t \in T} (X_t-X_{t_0}) > \lambda\right] \leq |T| \exp\left(- c\lambda^2/\Delta^2\right)\,,\]
where \(\Delta \seteq \max \{ d(t,t_0) : t \in T \}\).
A simple calculation gives the conclusion
\[\E \max_{t \in T} X_t \lesssim \Delta \sqrt{\log |T|}\,.\]
DETAILED PRELIMINARY NOTES
Supplementary reading:
|
Wed, Jan 11
Generic chaining and Dudley's entropy bound
|
Summary
Recall that we consider a subgaussian process \(\{X_t : t \in T\}\) and equip \(T\)
with the distance \(d(s,t) = \sqrt{\E(X_s-X_t)^2}\).
-
Chaining: Instead of applying a naive union bound to control \(X_t - X_{t_0}\) for every \(t \in T\),
we can break every such difference into parts and then bound each part. This potentially reduces
the number of events we need to take a union bound over.
For instance, given three random variables \(X_0,X_1,X_2,X_3\), we could write
\[\begin{eqnarray*}
X_3 - X_0 &=& (X_3 - X_1) + (X_1 - X_0) \\
X_2 - X_0 &=& (X_2 - X_1) + (X_1 - X_0) \\
X_1 - X_0 &=& X_1 - X_0\,.
\end{eqnarray*}\]
-
Generic chaining upper bound: For sub-gaussian processes, we can control \(\E \max_{t \in T} X_t\) using chaining.
Fix some \(t_0 \in T\) and a sequence of finite sets
\[\{t_0\} = T_0 \subseteq T_1 \subseteq T_2 \subseteq \cdots \subseteq T_n \subseteq \cdots \subseteq T\,,\]
with cardinalities bounded by \(\abs{T_n} \leq 2^{2^n}\) for every \(n \geq 1\).
Then we have:
\[\begin{equation}\label{eq:gc-ub}
\E \max_{t \in T} X_t \lesssim \max_{t \in T} \sum_{n \geq 0} 2^{n/2} d(t,T_n)\,,
\end{equation}\]
where \(d(t,T_n) = \min_{s \in T_n} d(s,t)\).
-
Dudley’s entropy bound: If \(\{X_t : t \in T\}\) is a subgaussian process, then
\[\begin{equation}\label{eq:dudley}
\E \max_{t \in T} X_t \lesssim \sum_{n \geq 1} 2^{n/2} e_n(T)\,,
\end{equation}\]
where \(e_n(T)\) is the largest distance \(d\) such that there are \(2^{2^n}\)
points \(y_1,y_2,\ldots,y_{2^{2^n}}\) in \(T\) with \(d(y_i,y_j) \geq d\) for all \(i \neq j\).
Easy exercise: Show that Dudley’s entropy bound follows from the generic chaining upper bound \eqref{eq:gc-ub}.
-
Covering numbers: Let \(N(T,d,\e)\) denote the smallest number \(N\) such that the metric space \((T,d)\)
can be covered by \(N\) balls of radius \(\e\), and consider the quantity
\[C_T \seteq \sum_{h=-\infty}^{\infty} 2^h \sqrt{\log N(T,d,2^{h})}\,.\]
Up to a factor of \(2\), this is equivalent to the more elegant expression \(\int_0^{\infty} \sqrt{\log N(T,d,\e)} \,d\e\).
It is an exercise to show that
\[C_T \asymp \sum_{n \geq 1} 2^{n/2} e_n(T)\,.\]
Exercise: Show that for any metric space \((T,d)\), it holds that
\[\begin{equation}\label{eq:CT}
\sum_{n \geq 1} 2^{n/2} e_n(T) \asymp \sum_{h=-\infty}^{\infty} 2^h \sqrt{\log N(T,d,2^h)}\,.
\end{equation}\]
-
Analysis of the probability simplex
Let \(T = \{ x \in \R^n : x_1,\ldots,x_n \geq 0, x_1+\cdots+x_n = 1 \}\) denote the
probability simplex.
Lemma 1: For this \(T\), let \(X_t = \langle g,t\rangle\), where \(g=(g_1,\ldots,g_n)\) is a standard
\(n\)-dimensional Gaussian. Then \(\E \max_{t \in T} X_t \asymp \sqrt{\log n}\).
This is because the maximum of the linear functional \(t \mapsto \langle g,t\rangle\) is achieved at a vertex of the convex body \(T\)
which is equal to the convex hull of the standard basis vectors: \(T = \mathrm{conv}(e_1,\ldots,e_n)\).
Thus \(\max_{t \in T} \langle g,t\rangle = \max \{ g_i : i =1,\ldots,n\}\), and therefore
\[\E \max_{t \in T} \langle g,t\rangle = \E \max(g_1,\ldots,g_n) \asymp \sqrt{\log n}.\]
Let us now see that the entropy bound \eqref{eq:dudley} is far from tight in this simple case.
We will use the formulation via covering numbers coming from the equivalence \eqref{eq:CT}.
Consider vectors \(t \in T\) with \(t_i \in \{0,1/k\}\) for $i=1,\ldots,n$. There are \({n \choose k} \geq \left(\frac{n}{k}\right)^k\) such vectors.
For \(k \leq \sqrt{n}\), we have \((n/k)^k \gtrsim 2^{k \log n}\),
and it is possible to choose a proportional subset of those vectors (e.g., a random subset will suffice) such that
\[d(t_i,t_j) \gtrsim \left(k \cdot \frac{1}{k^2} \right)^{1/2} \asymp k^{-1/2}, \qquad i \neq j\,.\]
It follows that for \(k \leq \sqrt{n}\), we have $\log N(T,d,k^{-1/2}) \gtrsim k \log n$.
Consider then the \(\approx \log n\) values of \(h\) for which \(1 \leq 2^h \leq \sqrt{n}\).
This gives us \(\approx \log n\) values of \(h\) for which
\[2^{-h} \sqrt{\log N(T,d,2^{-h})} \gtrsim 2^{-h} \cdot 2^h \sqrt{\log n} = \sqrt{\log n}\,.\]
Therefore Dudley’s bound \eqref{eq:CT} is \(\gtrsim (\log n) \cdot \sqrt{\log n} = (\log n)^{3/2}\).
(Recall that the correct bound, given by Lemma 1, is \(\sqrt{\log n}\).)
|
Mon, Jan 16
NO CLASS (MLK day)
|
|
Wed, Jan 18
Graph and hypergraph sparsification
|
Summary
-
We have seen that if \(\{X_t : t \in T \}\) is a symmetric sub-Gaussian process equipped
with the distance \(d(s,t) = \sqrt{\E(X_s-X_t)^2}\), then for any sequence
\(\{t_0\} = T_0 \subseteq T_1 \subseteq \cdots \subseteq T_h \subseteq \cdots T\)
with \(|T_h| \leq 2^{2^h}\), we have a chaining upper bound
\[\E \max_{t \in T} X_t \lesssim \max_{t \in T} \sum_{h \geq 0} 2^{n/2} d(t,T_h)\,.\]
Let us define \(\gamma_2(T,d)\) as the best possible chaining upper bound, i.e., the infimum of the RHS
over all compliant sequences \(\{T_h\}\). This is Talagrand’s famous \(\gamma_2\) functional, and it may be
surprising that it characterizes (up to constants) the expected supremum of a Gaussian process.
Majorizing-measures theorem (Fernique-Talagrand): If \(\{X_t : t\in T \}\) is a centered Gaussian process, then
\[\E \max_{t \in T} X_t \asymp \gamma_2(T,d)\,.\]
-
Random signed sums: Suppose that \(\e_1,\e_2,\ldots,\e_n\) is a sequence of independent, symmetric random variables and
we define the random process:
\[\left\{ X_t = \e_1 t_1 + \e_2 t_2 + \cdots + \e_n t_n : t \in T \right\}\,,\]
where \(T \subseteq \R^n\). Then classical concentration bounds show that \(\{X_t : t\in T\}\) is subgaussian,
and it is straightforward to calculate the distance (by expanding the square):
\[d(s,t) = \|s-t\|_2\,.\]
The chaining upper bound gives \(\E \max_{t \in T} X_t \lesssim \gamma_2(T,d)\). Let us now use this to
understand the Hypergraph sparsification problem.
-
Hypergraph sparsification. Consider a weighted hypergraph \(H=(V,E,w)\) where \(\{w_e : e \in E\}\) are nonnegative edge weights.
We associated to \(H\) the quadratic expression
\[Q_H(x) = \sum_{e \in E} w_e \max_{u,v \in e} (x_u-x_v)^2\,.\]
If \(H\) were a graph (where \(|e|=2\) for every \(e \in E\)), this would correspond to the quadratic form
of the graph Laplacian.
Our goal is to find another hypergraph \(\tilde{H} = (V,\tilde{E},\tilde{w})\) with \(\tilde{E} \subseteq E\) and such that
\[\begin{equation}\label{eq:sparse-approx}
|Q_H(x)-Q_{\tilde{H}}(x)| \leq \e Q_H(x),\qquad \forall x \in \R^n\,,
\end{equation}\]
where \(\e > 0\) is some accuracy parameter. Moreover, we would like \(\abs{\tilde{E}}\) to be small, ideally near-linear in \(\abs{V}\)
(while \(\abs{E}\) could be as large as \(2^{\abs{V}}\)).
-
Independent random sampling. Suppose we have a probability distribution \(\{\mu_e : e \in E\}\). Let us form \(\tilde{H}\) by
sampling \(M\) edges \(e_1,e_2,\ldots,e_M\) independently from \(\mu\) and setting the edge weights in \(\tilde{H}\) so that
\[Q_{\tilde{H}}(x) = \frac{1}{M} \sum_{k=1}^M \frac{w_{e_k}}{\mu_{e_k}} Q_{e_k}(x)\,,\]
where we define \(Q_e(x) = \max_{u,v \in e} (x_u-x_v)^2\).
Note that \(\E[Q_{\tilde{H}}(x)] = Q_H(x)\) for all \(x \in \R^n\), and to establish a bound like \eqref{eq:sparse-approx},
it makes sense to study the quantity
\[\E \max_{Q_H(x) \leq 1} |Q_H(x) - Q_{\tilde{H}}(x)|.\]
-
Passing to a sub-Gaussian process.
Let \(\hat{H}\) be an independent copy of \(\tilde{H}\) and write
\[\begin{eqnarray*}
\E_{\tilde{H}} \max_{Q_H(x) \leq 1} |Q_H(x) - Q_{\tilde{H}}(x)| &= \E_{\tilde{H}} \max_{Q_H(x) \leq 1} |\E_{\hat{H}}[Q_{\hat{H}}(x)]-Q_{\tilde{H}}(x)| \\
&= \E_{\tilde{H}} \max_{Q_H(x) \leq 1} |\E_{\hat{H}} [Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)]| \\
&\leq \E_{\hat{H},\tilde{H}} \max_{Q_H(x) \leq 1} |Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)|\,,
\end{eqnarray*}\]
where we have used convexity to pull out the expectation: \(\abs{\E X} \leq E \abs{X}\) and \(\max(\E X_1,\ldots,\E X_k) \leq \E \max(X_1,\ldots,X_k)\)
Note that for any choice of signs \(\e_1,\ldots,\e_M \in \{-1,1\}\), we have
\[\begin{eqnarray*}
|Q_{\hat{H}}(x)-Q_{\hat{H}}(x)|
&= \frac{1}{M} \left|\sum_{k=1}^M \left(\frac{w_{\hat{e}_k}}{\mu_{\hat{e}_k}} Q_{\hat{e}_k}(x)- \frac{w_{\tilde{e}_k}}{\mu_{\tilde{e}_k}} Q_{\hat{e}_k}(x)\right)\right| \\
&= \frac{1}{M} \left|\sum_{k=1}^M \e_k \left(\frac{w_{\hat{e}_k}}{\mu_{\hat{e}_k}} Q_{\hat{e}_k}(x)- \frac{w_{\tilde{e}_k}}{\mu_{\tilde{e}_k}} Q_{\hat{e}_k}(x)\right)\right|.
\end{eqnarray*}\]
because \(\tilde{e}_k\) and \(\hat{e}_k\) have the same law. Define \(t_k(x) \seteq \frac{w_{e_k}}{\mu_{e_k}} Q_{\hat{e}_k}(x)\) and then we have
\[\max_{Q_H(x) \leq 1} |Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)| \leq \frac{2}{M} \E_{(\e_1,\ldots,\e_M)} \max_{Q_H(x) \leq 1} \left|\sum_{k=1}^M \e_k t_k\right|
= \frac{4}{M} \E_{(\e_1,\ldots,\e_M)} \max_{Q_H(x) \leq 1} \sum_{k=1}^M \e_k t_k(x)\,,\]
where we take the expectation over i.i.d. uniformly random signs, and
we removed the absolute values using symmetry of the process.
So now to bound the quantity we cared about initially, it suffices to bound \(\max_{Q_H(x) \leq 1} \sum_{k=1}^M \e_k t_k(x)\) for every choice
of hyperedges \(e_1,\ldots,e_M\). Of course, this is a sub-Gaussian process, and so we are led to the study of \(\gamma_2(T,d)\),
where \(T = \{ x \in \R^n : Q_H(x) \leq 1\}\), and
\[d(x,y) = \left(\sum_{k=1}^M \left(\frac{w_{e_k}}{\mu_{e_k}} (Q_{e_k}(x)-Q_{e_k}(y))\right)^2\right)^{1/2}\]
-
Abstracting out the norm sparsification problem.
Note that each \(\sqrt{Q_e(x)}\) for \(e \in E\) is actually a norm on \(\R^n\) (technically it is only a seminorm since \(Q_e(x)=0\) does not entail \(x=0\)).
Thus we can consider the following related problem: Suppose that \(N_1,N_2,\ldots,N_m\) are (semi)norms on \(\R^n\) and we want to bound
\[\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2\,,\]
where \(T \subseteq \R^n\) and \(\e_1,\ldots,\e_m\) are i.i.d. random signs. Again, this is a sub-Gaussian process equipped with the distance
\[d(x,y) = \left(\sum_{k=1}^M (N_k(x)^2 - N_k(y)^2)^2\right)^{1/2}.\]
In the next lecture, we will find upper bounds on \(\gamma_2(T,d)\) for this faimly of problems.
Supplementary reading:
|
Mon, Jan 23
Sparsifying sums of norms
|
Summary
-
Norms on \(\R^n\). Recall that a norm \(N\) on \(\R^n\) is a nonnegative mapping \(N : \R^n \to \R_+\) such that
- \(N(\lambda x) = \abs{\lambda} N(x)\) for all \(x \in \R^n\) and \(\lambda \in \R\).
- \(N(x+y) \leq N(x) + N(y)\) for all \(x,y \in \R^n\).
- \(N(x) = 0 \iff x = 0\) for all \(x \in \R^n\).
\(N\) is a semi-norm if it only satisfies properties (1) and (2). We will consider semi-norms throughout the lecture and say “norm”
even when considering semi-norms.
-
An upper bound. Suppose that \(N_1,N_2,\ldots,N_m\) are norms on \(\R^n\) and \(T \subseteq B_2^n\), where \(B_2^n = \{ x \in \R^n : \|x\|_2 \leq 1 \}\)
is the Euclidean unit ball.
Define \(\kappa \seteq \E \max_{k=1,\ldots,m} N_k(g)\), where \(g\) is a standard \(n\)-dimensional Gaussian.
Our goal is to prove the following: If \(\e_1,\e_2,\ldots,\e_m\) are i.i.d. random signs, then
\[\begin{equation}\label{eq:goal11}
\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \kappa \log(n) \max_{x \in T} \sqrt{\sum_{k=1}^m N_k(x)^2}\,.
\end{equation}\]
-
Example: Operator norm for sums of random matrices. Another natural situation where a bound like this is useful is for sums of random matrices. Suppose that
\[A = \sum_{k=1}^m \e_k A_k^T A_k\,,\]
where each \(A_k^T A_k\) is a PSD matrix. Then the operator norm of \(A\) can be written as
\[\|A\|_{\mathrm{op}} = \max_{\|x\|_2 \leq 1} \langle x, A x\rangle = \max_{\|x\|_2 \leq 1} \sum_{k=1}^m \e_k \langle x, A_k^T A_k x\rangle
= \max_{\|x\|_2 \leq 1} \sum_{k=1}^m \e_k \|A_k x\|_2^2\,,\]
which is the special case of our problem where \(N_k(x) = \|A_k x\|_2^2\) and \(T = B_2^n\).
-
Applying Dudley’s entropy inequality. Recall that \(\left\{ \sum_{k=1}^m \e_k N_k(x)^2 : x \in T \right\}\) is a subgaussian process when equipped with the distance
\[d(x,y) \seteq \left(\sum_{k=1}^m \left(N_k(x)^2 - N_k(y)^2\right)^2\right)^{1/2}.\]
Therefore Dudley’s entropy bound shows that
\[\begin{equation}\label{eq:dudley3}
\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \sum_{h \geq 0} 2^{h/2} e_h(T,d)\,.
\end{equation}\]
-
Controlling the subgaussian distance via a norm. Notice that both sides of the inequality \eqref{eq:goal11} are quadratic in the norms \(N_k\). Therefore by rescaling we may assume that
\[\begin{equation}\label{eq:L2N}
\max_{x \in T} \sqrt{\sum_{k=1}^m N_k(x)^2} = 1\,.
\end{equation}\]
Define the norm
\[\|x\|_N \seteq \max_{k=1,\ldots,m} N_k(x)\,.\]
Note that using \(a^2-b^2 = (a-b)(a+b)\), for any \(x,y \in T\), we have
\[\begin{eqnarray*}
d(x,y) &=& \left(\sum_{k=1}^m (N_k(x)-N_k(y))^2 (N_k(x)+N_k(y))^2 \right)^{1/2} \\
&\leq& \|x-y\|_N \left(\sum_{k=1}^m (N_k(x)+N_k(y))^2\right)^{1/2} \\
&\leq& 2 \|x-y\|_N\,,
\end{eqnarray*}\]
where the first inequality uses \(|N_k(x)-N_k(y)| \leq N_k(x-y)\) (true for any norm) and the second inequality uses
\eqref{eq:L2N} and \(x,y \in T\).
-
Splitting the Dudley sum. Therefore we have \(e_h(T,d) \leq 2 e_h(T, \|\cdot\|_N)\) and combining this with Dudley’s bound \eqref{eq:dudley3} gives
\[\begin{equation}\label{eq:dudley2}
\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \sum_{h \geq 0} 2^{h/2} e_h(T,\|\cdot\|_N)\,.
\end{equation}\]
We will split this sum into two parts depending on whether \(h \leq 4 \log (n)\) or \(h > 4 \log (n)\).
-
Controlling the large \(h\) terms with a volume bound. Note that for \(x \in T\), by \eqref{eq:L2N} we have
\[\|x\|_N \leq \sqrt{\sum_{k=1}^m N_k(x)^2} \leq 1\,.\]
In other words, \(T \subseteq B_N \seteq \{ x \in \R^n : \|x\|_N \leq 1\}\).
Therefore \(e_h(T,\|\cdot\|_N) \leq e_h(B_N,\|\cdot\|_N)\).
Claim: It holds that \(e_h(B_N,\|\cdot\|_N) \leq 4 \cdot 2^{-2^h/n}\).
This proof works for any norm on \(\R^n\).
Assume that \(x_1,\ldots,x_{s} \in B_N\) are a maximal set of points
satisfying \(\|x_i-x_j\|_N \geq \delta\) for all \(i \neq j\). In that case, we have
\[B_N \subseteq \bigcup_{j=1}^s (x_j + 2 \delta B_N).\]
In other words, \(B_N\) is covered by \(s\) balls of radius \(2 \delta\).
But it also holds that (for \(\delta < 1\)),
\[2 B_N \supseteq \bigcup_{j=1}^s (x_j + \delta B_n)\]
hence
\[\mathrm{vol}_n(2 B_N) \geq s \,\mathrm{vol}_n(\delta B_N) = s \,(\delta/2)^n \mathrm{vol}_n(2 B_N),\]
where \(\mathrm{vol}_n\) is the standard \(n\)-dimensional volume on \(\R^n\). It follows that \(s \leq (\delta/2)^{-n}\), i.e.,
\(\delta \leq 2 s^{-n}\). Taking \(s = 2^{2^h}\) yields
\[e_h(B_N,\|\cdot\|_N) \leq 4 \delta = 4 \cdot 2^{-2^h/n}.\]
-
Recalling \eqref{eq:dudley2}, we can use the preceding claim to evaluate
\[\sum_{h > 4 \log n} 2^{h/2} e_h(T, \|\cdot\|_N) \leq 4 \sum_{h > 4 \log n} 2^{h/2} 2^{-2^h/n} \leq O(1)\,.\]
Thus we have
\[\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim O(1) + \sum_{0 \leq h \leq 4 \log n} 2^{h/2} e_h(T,\|\cdot\|_N)\,.\]
-
Controlling the interesting \(h\) terms. To bound the second sum, let us use \(T \subseteq B_2^n\) to write \(e_h(T,\|\cdot\|_N) \leq e_h(B_2^n,\|\cdot\|_N)\).
(Dual) Sudakov Lemma: For any norm \(\|\cdot\|\) on \(\R^n\) and \(h \geq 0\), it holds that
\[e_h(B_2^n, \|\cdot\|) \lesssim 2^{-h/2} \E \|g\|\,.\]
Note that \(\E\, \|g\|_N = \E \max_k N_k(g) = \kappa\) (by definition), and therefore the second sum above is bounded by
\[\sum_{0 \leq h \leq 4 \log n} 2^{h/2} e_h(T,\|\cdot\|_N) \lesssim \kappa \log n\,.\]
This completes the proof of \eqref{eq:goal11} once we have established the Dual Sudakov Lemma.
-
Gaussian Shift Lemma. We need the following
Lemma: Suppose that \(K\) is a symmetric convex set and \(\gamma_n\) is the \(n\)-dimensional Gaussian measure on \(\R^n\). Then for any \(x \in \R^n\),
\[\gamma_n(K+x) \geq e^{-\|x\|_2^2/2} \gamma_n(K)\,.\]
Proof: Write
\[\begin{eqnarray*}
\gamma_n(K+x) &=& (2\pi)^{-n} \int_K e^{-\|x+z\|^2/2}\,dz \\
&=&
(2\pi)^{-n} \int_K \E_{\sigma \in \{-1,1\}} e^{-\|\sigma x+z\|_2^2/2}\,dz\,,
\end{eqnarray*}\]
where the second inequality follows because \(K\) is symmetric. Now note that \(\E_{\sigma \in \{-1,1\}} \|\sigma x + z\|_2^2 = \|x\|_2^2 + \|z\|_2^2\)
and use convexity of \(y \mapsto e^y\) to write
\[\begin{eqnarray*}
(2\pi)^{-n} \int_K \E_{\sigma \in \{-1,1\}} e^{-\|\sigma x+z\|_2^2/2}\,dz &\geq &
(2\pi)^{-n} \int_K e^{-\E_{\sigma \in \{-1,1\}} \|\sigma x+z\|_2^2/2}\,dz \\
&=&
(2\pi)^{-n} \int_K e^{-(\|x\|_2^2+\|z\|_2^2)/2}\,dz = e^{-\|x\|_2^2} \gamma_n(K)\,.
\end{eqnarray*}\]
-
Proof of Dual Sudakov. Let \(\mathcal{B} \seteq \{ x \in \R^n : \|x\| \leq 1 \}\) denote the unit ball of the \(\|\cdot\|\) norm and suppose
\(x_1,\ldots,x_s \in B_2^n\) is a maximal set of points such that the shifted balls \(x_j + \delta \mathcal{B}\) are all pairwise disjoint.
In that case, we have
\[\begin{equation}\label{eq:Bcover}
B_2^n \subseteq \bigcup_{j=1}^s (x_j + 2 \delta \mathcal{B})\,,
\end{equation}\]
i.e., we have covered \(B_2^n\) by \(s\) balls (in \(\|\cdot\|\)) of radius at most \(2\delta\).
Let \(\lambda > 0\) be a parameter we will choose later and note that the sets \(\lambda(x_j+\delta \mathcal{B})\) are also pairwise disjoint, and therefore
\[\begin{eqnarray*}
1 &\geq& \gamma_n\left(\bigcup_{j=1}^s \lambda (x_j + \delta \mathcal{B}) \right) \\
&=& \sum_{j=1}^s \gamma_n\left(\lambda (x_j + \delta \mathcal{B}) \right) \\
&\geq& \sum_{j=1}^s e^{-\lambda^2 \|x_j\|_2^2/2} \gamma_n(\lambda \delta \mathcal{B})
\geq s \cdot e^{-\lambda^2/2} \gamma_n(\lambda \delta \mathcal{B})\,,
\end{eqnarray*}\]
where the first equality uses disjointness, the second inequality uses the Gaussian Shift Lemma, and the final inequality uses
\(x_1,\ldots,x_s \in B_2^n\).
If we define \(\lambda \seteq \frac{2}{\delta} \E\, \|g\|\), then
\[\gamma_n(\lambda \delta B) = \P\left(g \in \lambda \delta B\right) = \P\left(\|g\| \leq \lambda \delta\right) = \P\left(\|g\| \leq 2 \E\, \|g\|\right) \geq \frac12\,,\]
where the last step is simply Markov’s inequality.
Combining this with our previous calculation yields
\[1 \geq s \cdot \exp\left(-(2/\delta)^2 (\E\, \|g\|)^2\right) \cdot \frac12\,,\]
i.e.
\[\sqrt{\log (s/2)} \leq \frac{2}{\delta} \E\, \|g\|\,.\]
Taking \(s=2^{2^h}\) and recalling \eqref{eq:Bcover} shows that we can cover \(B_2^n\) by at most \(s\) \(\|\cdot\|\)-balls of radius \(2 \delta\) with
\[\delta \leq \frac{2\, \E\, \|g\|}{\sqrt{\log (s/2)}} \lesssim 2^{-h/2} \E\,\|g\|\,.\]
By definition, this yields our desired bound
\[e_h(B_2^n, \|\cdot\|) \lesssim 2^{-h/2} \E\,\|g\|\,.\]
Supplementary reading:
|
| The KLS conjecture and stochastic localization |
Wed, Jan 25
A Cheeger inequality for isotropic convex sets
|
|
|
Mon, Jan 30
|
|
|
Wed, Feb 01
|
|
|
Mon, Feb 06
|
|
|
Wed, Feb 08
|
|
|
Mon, Feb 13
|
|
|
Wed, Feb 15
|
|
|
Wed, Feb 22
|
|
|
Mon, Feb 27
|
|
|
Wed, Mar 01
|
|
|
Mon, Mar 06
|
|
|
Wed, Mar 08
|
|