CSE 599: Analytic and geometric methods in TCS

Winter 2023

MW 11:30am-12:50pm in CSE2 387

Instructor: James R. Lee

Office hours: By appointment

Course email list [archives]

Course description:

This is an advanced graduate-level theory course about tools from analysis and geometry that have interesting applications in algorithms, complexity theory, and ML theory. Evaluation will be based on a small number of homeworks and student presentations. One topic per week, choice of topics will be guided by class interest.

I taught a version of this course back in prehistory. The set of topics will be mostly disjoint.

Possible topics:

  • High-dimensional expanders
  • Stochastic localization
  • Metric embedding theory
  • Generic chaining theory
  • Quantum information
  • Discrete Fourier analysis
  • Algebraic topology
  • ML Theory, e.g,
    • The edge of stability phenomenon
    • Stable diffusion, Doob interpolation in feature space

Lectures

  • Jan 04: NO CLASS
Generic chaining and extrema of random processes The KLS conjecture and stochastic localization

Wed, Jan 04
NO CLASS

Generic chaining and extrema of random processes
Mon, Jan 09
Examples, applications, Gaussian processes

  • A random process is a collection \(\{X_t : t \in T\}\) of random variables. For the next few lectures, we will considered symmetric processes, i.e., those where \(X_t\) and \(-X_t\) have the same law for every \(t \in T\).

  • We equip the index set \(T\) of such a process with the natural \(\ell_2\) metric

    \[d(s,t) \seteq \sqrt{\E(X_s-X_t)^2}.\]
  • A fundamental example is that of a Gaussian process, where the family \(\{X_t : t \in T\}\) is a jointly normal family of random variables. When \(T\) is finite, any such family can be described as follows: \(T \subseteq \R^n\) for some \(n \geq 1\), and \(X_t = \langle g,t\rangle\), where \(g=(g_1,\ldots,g_n)\) is a standard \(n\)-dimensional gaussian, i.e., the coordinates \(g_1,\ldots,g_n\) are i.i.d. \(N(0,1)\) random variables.

    In this case the geometry of the metric space \((T,d)\) is Eucliean, since \(d(s,t) = \|s-t\|_2\).

    Also note that the metric \(d(s,t)\) completely describes the covariance structure of the process, and therefore characterizes it (this is a special property of jointly normal families of random variables).

  • We will be concerned with extrema of such a process, and to that end we study the quantity

    \[\E \max_{t \in T} X_t\,.\]
  • A random process \(\{ X_t : t \in T \}\) is called sub-gaussian if there is some constant \(c > 0\) such that

    \[\begin{equation}\label{eq:sg-tail} \Pr[X_s - X_t > \lambda] \leq \exp\left(- c\lambda^2/d(s,t)^2\right) \end{equation}\]

    In particular, Gaussian processes are sub-gaussian with \(c=1/2\).

    Another example is a Bernoulli process, i.e. a family of random variables defined thusly: For some subset \(T \subseteq \R^n\),

    \[\left\{ \e_1 t_1 + \e_2 t_2 + \cdots + \e_n t_n : t = (t_1,\ldots,t_n) \in T \right\},\]

    where \(\e_1,\ldots,\e_n \in \{-1,1\}\) are i.i.d. uniformly random signs.

  • Diameter bound: When the index set \(T\) is finite, we can use the tail bound \eqref{eq:sg-tail} to give a simple upper bound on \(\E \max_{t \in T} X_t\).

    Fix some \(t_0 \in T\) and note that \(\E \max_{t \in T} X_t = \E \max_{t \in T} (X_t-X_{t_0})\). Now applying a union bound gives

    \[\P\left[\max_{t \in T} (X_t-X_{t_0}) > \lambda\right] \leq |T| \exp\left(- c\lambda^2/\Delta^2\right)\,,\]

    where \(\Delta \seteq \max \{ d(t,t_0) : t \in T \}\).

    A simple calculation gives the conclusion

    \[\E \max_{t \in T} X_t \lesssim \Delta \sqrt{\log |T|}\,.\]

DETAILED PRELIMINARY NOTES

Supplementary reading:

Wed, Jan 11
Generic chaining and Dudley's entropy bound

Recall that we consider a subgaussian process \(\{X_t : t \in T\}\) and equip \(T\) with the distance \(d(s,t) = \sqrt{\E(X_s-X_t)^2}\).

  • Chaining: Instead of applying a naive union bound to control \(X_t - X_{t_0}\) for every \(t \in T\), we can break every such difference into parts and then bound each part. This potentially reduces the number of events we need to take a union bound over.

    For instance, given three random variables \(X_0,X_1,X_2,X_3\), we could write

    \[\begin{eqnarray*} X_3 - X_0 &=& (X_3 - X_1) + (X_1 - X_0) \\ X_2 - X_0 &=& (X_2 - X_1) + (X_1 - X_0) \\ X_1 - X_0 &=& X_1 - X_0\,. \end{eqnarray*}\]
  • Generic chaining upper bound: For sub-gaussian processes, we can control \(\E \max_{t \in T} X_t\) using chaining.

    Fix some \(t_0 \in T\) and a sequence of finite sets

    \[\{t_0\} = T_0 \subseteq T_1 \subseteq T_2 \subseteq \cdots \subseteq T_n \subseteq \cdots \subseteq T\,,\]

    with cardinalities bounded by \(\abs{T_n} \leq 2^{2^n}\) for every \(n \geq 1\).

    Then we have:

    \[\begin{equation}\label{eq:gc-ub} \E \max_{t \in T} X_t \lesssim \max_{t \in T} \sum_{n \geq 0} 2^{n/2} d(t,T_n)\,, \end{equation}\]

    where \(d(t,T_n) = \min_{s \in T_n} d(s,t)\).

  • Dudley’s entropy bound: If \(\{X_t : t \in T\}\) is a subgaussian process, then

    \[\begin{equation}\label{eq:dudley} \E \max_{t \in T} X_t \lesssim \sum_{n \geq 1} 2^{n/2} e_n(T)\,, \end{equation}\]

    where \(e_n(T)\) is the largest distance \(d\) such that there are \(2^{2^n}\) points \(y_1,y_2,\ldots,y_{2^{2^n}}\) in \(T\) with \(d(y_i,y_j) \geq d\) for all \(i \neq j\).

    Easy exercise: Show that Dudley’s entropy bound follows from the generic chaining upper bound \eqref{eq:gc-ub}.

  • Covering numbers: Let \(N(T,d,\e)\) denote the smallest number \(N\) such that the metric space \((T,d)\) can be covered by \(N\) balls of radius \(\e\), and consider the quantity

    \[C_T \seteq \sum_{h=-\infty}^{\infty} 2^h \sqrt{\log N(T,d,2^{h})}\,.\]

    Up to a factor of \(2\), this is equivalent to the more elegant expression \(\int_0^{\infty} \sqrt{\log N(T,d,\e)} \,d\e\). It is an exercise to show that

    \[C_T \asymp \sum_{n \geq 1} 2^{n/2} e_n(T)\,.\]

    Exercise: Show that for any metric space \((T,d)\), it holds that

    \[\begin{equation}\label{eq:CT} \sum_{n \geq 1} 2^{n/2} e_n(T) \asymp \sum_{h=-\infty}^{\infty} 2^h \sqrt{\log N(T,d,2^h)}\,. \end{equation}\]
  • Analysis of the probability simplex

    Let \(T = \{ x \in \R^n : x_1,\ldots,x_n \geq 0, x_1+\cdots+x_n = 1 \}\) denote the probability simplex.

    Lemma 1: For this \(T\), let \(X_t = \langle g,t\rangle\), where \(g=(g_1,\ldots,g_n)\) is a standard \(n\)-dimensional Gaussian. Then \(\E \max_{t \in T} X_t \asymp \sqrt{\log n}\).

    This is because the maximum of the linear functional \(t \mapsto \langle g,t\rangle\) is achieved at a vertex of the convex body \(T\) which is equal to the convex hull of the standard basis vectors: \(T = \mathrm{conv}(e_1,\ldots,e_n)\). Thus \(\max_{t \in T} \langle g,t\rangle = \max \{ g_i : i =1,\ldots,n\}\), and therefore

    \[\E \max_{t \in T} \langle g,t\rangle = \E \max(g_1,\ldots,g_n) \asymp \sqrt{\log n}.\]

    Let us now see that the entropy bound \eqref{eq:dudley} is far from tight in this simple case. We will use the formulation via covering numbers coming from the equivalence \eqref{eq:CT}.

    Consider vectors \(t \in T\) with \(t_i \in \{0,1/k\}\) for $i=1,\ldots,n$. There are \({n \choose k} \geq \left(\frac{n}{k}\right)^k\) such vectors. For \(k \leq \sqrt{n}\), we have \((n/k)^k \gtrsim 2^{k \log n}\), and it is possible to choose a proportional subset of those vectors (e.g., a random subset will suffice) such that

    \[d(t_i,t_j) \gtrsim \left(k \cdot \frac{1}{k^2} \right)^{1/2} \asymp k^{-1/2}, \qquad i \neq j\,.\]

    It follows that for \(k \leq \sqrt{n}\), we have $\log N(T,d,k^{-1/2}) \gtrsim k \log n$. Consider then the \(\approx \log n\) values of \(h\) for which \(1 \leq 2^h \leq \sqrt{n}\).

    This gives us \(\approx \log n\) values of \(h\) for which

    \[2^{-h} \sqrt{\log N(T,d,2^{-h})} \gtrsim 2^{-h} \cdot 2^h \sqrt{\log n} = \sqrt{\log n}\,.\]

    Therefore Dudley’s bound \eqref{eq:CT} is \(\gtrsim (\log n) \cdot \sqrt{\log n} = (\log n)^{3/2}\). (Recall that the correct bound, given by Lemma 1, is \(\sqrt{\log n}\).)

Mon, Jan 16
NO CLASS (MLK day)

Wed, Jan 18
Graph and hypergraph sparsification

  • We have seen that if \(\{X_t : t \in T \}\) is a symmetric sub-Gaussian process equipped with the distance \(d(s,t) = \sqrt{\E(X_s-X_t)^2}\), then for any sequence \(\{t_0\} = T_0 \subseteq T_1 \subseteq \cdots \subseteq T_h \subseteq \cdots T\) with \(|T_h| \leq 2^{2^h}\), we have a chaining upper bound

    \[\E \max_{t \in T} X_t \lesssim \max_{t \in T} \sum_{h \geq 0} 2^{n/2} d(t,T_h)\,.\]

    Let us define \(\gamma_2(T,d)\) as the best possible chaining upper bound, i.e., the infimum of the RHS over all compliant sequences \(\{T_h\}\). This is Talagrand’s famous \(\gamma_2\) functional, and it may be surprising that it characterizes (up to constants) the expected supremum of a Gaussian process.

    Majorizing-measures theorem (Fernique-Talagrand): If \(\{X_t : t\in T \}\) is a centered Gaussian process, then

    \[\E \max_{t \in T} X_t \asymp \gamma_2(T,d)\,.\]
  • Random signed sums: Suppose that \(\e_1,\e_2,\ldots,\e_n\) is a sequence of independent, symmetric random variables and we define the random process:

    \[\left\{ X_t = \e_1 t_1 + \e_2 t_2 + \cdots + \e_n t_n : t \in T \right\}\,,\]

    where \(T \subseteq \R^n\). Then classical concentration bounds show that \(\{X_t : t\in T\}\) is subgaussian, and it is straightforward to calculate the distance (by expanding the square):

    \[d(s,t) = \|s-t\|_2\,.\]

    The chaining upper bound gives \(\E \max_{t \in T} X_t \lesssim \gamma_2(T,d)\). Let us now use this to understand the Hypergraph sparsification problem.

  • Hypergraph sparsification. Consider a weighted hypergraph \(H=(V,E,w)\) where \(\{w_e : e \in E\}\) are nonnegative edge weights. We associated to \(H\) the quadratic expression

    \[Q_H(x) = \sum_{e \in E} w_e \max_{u,v \in e} (x_u-x_v)^2\,.\]

    If \(H\) were a graph (where \(|e|=2\) for every \(e \in E\)), this would correspond to the quadratic form of the graph Laplacian.

    Our goal is to find another hypergraph \(\tilde{H} = (V,\tilde{E},\tilde{w})\) with \(\tilde{E} \subseteq E\) and such that

    \[\begin{equation}\label{eq:sparse-approx} |Q_H(x)-Q_{\tilde{H}}(x)| \leq \e Q_H(x),\qquad \forall x \in \R^n\,, \end{equation}\]

    where \(\e > 0\) is some accuracy parameter. Moreover, we would like \(\abs{\tilde{E}}\) to be small, ideally near-linear in \(\abs{V}\) (while \(\abs{E}\) could be as large as \(2^{\abs{V}}\)).

  • Independent random sampling. Suppose we have a probability distribution \(\{\mu_e : e \in E\}\). Let us form \(\tilde{H}\) by sampling \(M\) edges \(e_1,e_2,\ldots,e_M\) independently from \(\mu\) and setting the edge weights in \(\tilde{H}\) so that

    \[Q_{\tilde{H}}(x) = \frac{1}{M} \sum_{k=1}^M \frac{w_{e_k}}{\mu_{e_k}} Q_{e_k}(x)\,,\]

    where we define \(Q_e(x) = \max_{u,v \in e} (x_u-x_v)^2\).

    Note that \(\E[Q_{\tilde{H}}(x)] = Q_H(x)\) for all \(x \in \R^n\), and to establish a bound like \eqref{eq:sparse-approx}, it makes sense to study the quantity

    \[\E \max_{Q_H(x) \leq 1} |Q_H(x) - Q_{\tilde{H}}(x)|.\]
  • Passing to a sub-Gaussian process. Let \(\hat{H}\) be an independent copy of \(\tilde{H}\) and write

    \[\begin{eqnarray*} \E_{\tilde{H}} \max_{Q_H(x) \leq 1} |Q_H(x) - Q_{\tilde{H}}(x)| &= \E_{\tilde{H}} \max_{Q_H(x) \leq 1} |\E_{\hat{H}}[Q_{\hat{H}}(x)]-Q_{\tilde{H}}(x)| \\ &= \E_{\tilde{H}} \max_{Q_H(x) \leq 1} |\E_{\hat{H}} [Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)]| \\ &\leq \E_{\hat{H},\tilde{H}} \max_{Q_H(x) \leq 1} |Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)|\,, \end{eqnarray*}\]

    where we have used convexity to pull out the expectation: \(\abs{\E X} \leq E \abs{X}\) and \(\max(\E X_1,\ldots,\E X_k) \leq \E \max(X_1,\ldots,X_k)\)

    Note that for any choice of signs \(\e_1,\ldots,\e_M \in \{-1,1\}\), we have

    \[\begin{eqnarray*} |Q_{\hat{H}}(x)-Q_{\hat{H}}(x)| &= \frac{1}{M} \left|\sum_{k=1}^M \left(\frac{w_{\hat{e}_k}}{\mu_{\hat{e}_k}} Q_{\hat{e}_k}(x)- \frac{w_{\tilde{e}_k}}{\mu_{\tilde{e}_k}} Q_{\hat{e}_k}(x)\right)\right| \\ &= \frac{1}{M} \left|\sum_{k=1}^M \e_k \left(\frac{w_{\hat{e}_k}}{\mu_{\hat{e}_k}} Q_{\hat{e}_k}(x)- \frac{w_{\tilde{e}_k}}{\mu_{\tilde{e}_k}} Q_{\hat{e}_k}(x)\right)\right|. \end{eqnarray*}\]

    because \(\tilde{e}_k\) and \(\hat{e}_k\) have the same law. Define \(t_k(x) \seteq \frac{w_{e_k}}{\mu_{e_k}} Q_{\hat{e}_k}(x)\) and then we have

    \[\max_{Q_H(x) \leq 1} |Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)| \leq \frac{2}{M} \E_{(\e_1,\ldots,\e_M)} \max_{Q_H(x) \leq 1} \left|\sum_{k=1}^M \e_k t_k\right| = \frac{4}{M} \E_{(\e_1,\ldots,\e_M)} \max_{Q_H(x) \leq 1} \sum_{k=1}^M \e_k t_k(x)\,,\]

    where we take the expectation over i.i.d. uniformly random signs, and we removed the absolute values using symmetry of the process.

    So now to bound the quantity we cared about initially, it suffices to bound \(\max_{Q_H(x) \leq 1} \sum_{k=1}^M \e_k t_k(x)\) for every choice of hyperedges \(e_1,\ldots,e_M\). Of course, this is a sub-Gaussian process, and so we are led to the study of \(\gamma_2(T,d)\), where \(T = \{ x \in \R^n : Q_H(x) \leq 1\}\), and

    \[d(x,y) = \left(\sum_{k=1}^M \left(\frac{w_{e_k}}{\mu_{e_k}} (Q_{e_k}(x)-Q_{e_k}(y))\right)^2\right)^{1/2}\]
  • Abstracting out the norm sparsification problem. Note that each \(\sqrt{Q_e(x)}\) for \(e \in E\) is actually a norm on \(\R^n\) (technically it is only a seminorm since \(Q_e(x)=0\) does not entail \(x=0\)).

    Thus we can consider the following related problem: Suppose that \(N_1,N_2,\ldots,N_m\) are (semi)norms on \(\R^n\) and we want to bound

    \[\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2\,,\]

    where \(T \subseteq \R^n\) and \(\e_1,\ldots,\e_m\) are i.i.d. random signs. Again, this is a sub-Gaussian process equipped with the distance

    \[d(x,y) = \left(\sum_{k=1}^M (N_k(x)^2 - N_k(y)^2)^2\right)^{1/2}.\]

    In the next lecture, we will find upper bounds on \(\gamma_2(T,d)\) for this faimly of problems.

Supplementary reading:

Mon, Jan 23
Sparsifying sums of norms

Summary

  • Norms on \(\R^n\). Recall that a norm \(N\) on \(\R^n\) is a nonnegative mapping \(N : \R^n \to \R_+\) such that

    1. \(N(\lambda x) = \abs{\lambda} N(x)\) for all \(x \in \R^n\) and \(\lambda \in \R\).
    2. \(N(x+y) \leq N(x) + N(y)\) for all \(x,y \in \R^n\).
    3. \(N(x) = 0 \iff x = 0\) for all \(x \in \R^n\).

    \(N\) is a semi-norm if it only satisfies properties (1) and (2). We will consider semi-norms throughout the lecture and say “norm” even when considering semi-norms.

  • An upper bound. Suppose that \(N_1,N_2,\ldots,N_m\) are norms on \(\R^n\) and \(T \subseteq B_2^n\), where \(B_2^n = \{ x \in \R^n : \|x\|_2 \leq 1 \}\) is the Euclidean unit ball.

    Define \(\kappa \seteq \E \max_{k=1,\ldots,m} N_k(g)\), where \(g\) is a standard \(n\)-dimensional Gaussian.

    Our goal is to prove the following: If \(\e_1,\e_2,\ldots,\e_m\) are i.i.d. random signs, then

    \[\begin{equation}\label{eq:goal11} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \kappa \log(n) \max_{x \in T} \sqrt{\sum_{k=1}^m N_k(x)^2}\,. \end{equation}\]
  • Example: Operator norm for sums of random matrices. Another natural situation where a bound like this is useful is for sums of random matrices. Suppose that

    \[A = \sum_{k=1}^m \e_k A_k^T A_k\,,\]

    where each \(A_k^T A_k\) is a PSD matrix. Then the operator norm of \(A\) can be written as

    \[\|A\|_{\mathrm{op}} = \max_{\|x\|_2 \leq 1} \langle x, A x\rangle = \max_{\|x\|_2 \leq 1} \sum_{k=1}^m \e_k \langle x, A_k^T A_k x\rangle = \max_{\|x\|_2 \leq 1} \sum_{k=1}^m \e_k \|A_k x\|_2^2\,,\]

    which is the special case of our problem where \(N_k(x) = \|A_k x\|_2^2\) and \(T = B_2^n\).

  • Applying Dudley’s entropy inequality. Recall that \(\left\{ \sum_{k=1}^m \e_k N_k(x)^2 : x \in T \right\}\) is a subgaussian process when equipped with the distance

    \[d(x,y) \seteq \left(\sum_{k=1}^m \left(N_k(x)^2 - N_k(y)^2\right)^2\right)^{1/2}.\]

    Therefore Dudley’s entropy bound shows that

    \[\begin{equation}\label{eq:dudley3} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \sum_{h \geq 0} 2^{h/2} e_h(T,d)\,. \end{equation}\]
  • Controlling the subgaussian distance via a norm. Notice that both sides of the inequality \eqref{eq:goal11} are quadratic in the norms \(N_k\). Therefore by rescaling we may assume that

    \[\begin{equation}\label{eq:L2N} \max_{x \in T} \sqrt{\sum_{k=1}^m N_k(x)^2} = 1\,. \end{equation}\]

    Define the norm

    \[\|x\|_N \seteq \max_{k=1,\ldots,m} N_k(x)\,.\]

    Note that using \(a^2-b^2 = (a-b)(a+b)\), for any \(x,y \in T\), we have

    \[\begin{eqnarray*} d(x,y) &=& \left(\sum_{k=1}^m (N_k(x)-N_k(y))^2 (N_k(x)+N_k(y))^2 \right)^{1/2} \\ &\leq& \|x-y\|_N \left(\sum_{k=1}^m (N_k(x)+N_k(y))^2\right)^{1/2} \\ &\leq& 2 \|x-y\|_N\,, \end{eqnarray*}\]

    where the first inequality uses \(|N_k(x)-N_k(y)| \leq N_k(x-y)\) (true for any norm) and the second inequality uses \eqref{eq:L2N} and \(x,y \in T\).

  • Splitting the Dudley sum. Therefore we have \(e_h(T,d) \leq 2 e_h(T, \|\cdot\|_N)\) and combining this with Dudley’s bound \eqref{eq:dudley3} gives

    \[\begin{equation}\label{eq:dudley2} \E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim \sum_{h \geq 0} 2^{h/2} e_h(T,\|\cdot\|_N)\,. \end{equation}\]

    We will split this sum into two parts depending on whether \(h \leq 4 \log (n)\) or \(h > 4 \log (n)\).

  • Controlling the large \(h\) terms with a volume bound. Note that for \(x \in T\), by \eqref{eq:L2N} we have

    \[\|x\|_N \leq \sqrt{\sum_{k=1}^m N_k(x)^2} \leq 1\,.\]

    In other words, \(T \subseteq B_N \seteq \{ x \in \R^n : \|x\|_N \leq 1\}\).

    Therefore \(e_h(T,\|\cdot\|_N) \leq e_h(B_N,\|\cdot\|_N)\).

    Claim: It holds that \(e_h(B_N,\|\cdot\|_N) \leq 4 \cdot 2^{-2^h/n}\).

    This proof works for any norm on \(\R^n\).

    Assume that \(x_1,\ldots,x_{s} \in B_N\) are a maximal set of points satisfying \(\|x_i-x_j\|_N \geq \delta\) for all \(i \neq j\). In that case, we have

    \[B_N \subseteq \bigcup_{j=1}^s (x_j + 2 \delta B_N).\]

    In other words, \(B_N\) is covered by \(s\) balls of radius \(2 \delta\).

    But it also holds that (for \(\delta < 1\)),

    \[2 B_N \supseteq \bigcup_{j=1}^s (x_j + \delta B_n)\]

    hence

    \[\mathrm{vol}_n(2 B_N) \geq s \,\mathrm{vol}_n(\delta B_N) = s \,(\delta/2)^n \mathrm{vol}_n(2 B_N),\]

    where \(\mathrm{vol}_n\) is the standard \(n\)-dimensional volume on \(\R^n\). It follows that \(s \leq (\delta/2)^{-n}\), i.e., \(\delta \leq 2 s^{-n}\). Taking \(s = 2^{2^h}\) yields

    \[e_h(B_N,\|\cdot\|_N) \leq 4 \delta = 4 \cdot 2^{-2^h/n}.\]
  • Recalling \eqref{eq:dudley2}, we can use the preceding claim to evaluate

    \[\sum_{h > 4 \log n} 2^{h/2} e_h(T, \|\cdot\|_N) \leq 4 \sum_{h > 4 \log n} 2^{h/2} 2^{-2^h/n} \leq O(1)\,.\]

    Thus we have

    \[\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2 \lesssim O(1) + \sum_{0 \leq h \leq 4 \log n} 2^{h/2} e_h(T,\|\cdot\|_N)\,.\]
  • Controlling the interesting \(h\) terms. To bound the second sum, let us use \(T \subseteq B_2^n\) to write \(e_h(T,\|\cdot\|_N) \leq e_h(B_2^n,\|\cdot\|_N)\).

    (Dual) Sudakov Lemma: For any norm \(\|\cdot\|\) on \(\R^n\) and \(h \geq 0\), it holds that

    \[e_h(B_2^n, \|\cdot\|) \lesssim 2^{-h/2} \E \|g\|\,.\]

    Note that \(\E\, \|g\|_N = \E \max_k N_k(g) = \kappa\) (by definition), and therefore the second sum above is bounded by

    \[\sum_{0 \leq h \leq 4 \log n} 2^{h/2} e_h(T,\|\cdot\|_N) \lesssim \kappa \log n\,.\]

    This completes the proof of \eqref{eq:goal11} once we have established the Dual Sudakov Lemma.

  • Gaussian Shift Lemma. We need the following

    Lemma: Suppose that \(K\) is a symmetric convex set and \(\gamma_n\) is the \(n\)-dimensional Gaussian measure on \(\R^n\). Then for any \(x \in \R^n\),

    \[\gamma_n(K+x) \geq e^{-\|x\|_2^2/2} \gamma_n(K)\,.\]

    Proof: Write

    \[\begin{eqnarray*} \gamma_n(K+x) &=& (2\pi)^{-n} \int_K e^{-\|x+z\|^2/2}\,dz \\ &=& (2\pi)^{-n} \int_K \E_{\sigma \in \{-1,1\}} e^{-\|\sigma x+z\|_2^2/2}\,dz\,, \end{eqnarray*}\]

    where the second inequality follows because \(K\) is symmetric. Now note that \(\E_{\sigma \in \{-1,1\}} \|\sigma x + z\|_2^2 = \|x\|_2^2 + \|z\|_2^2\) and use convexity of \(y \mapsto e^y\) to write

    \[\begin{eqnarray*} (2\pi)^{-n} \int_K \E_{\sigma \in \{-1,1\}} e^{-\|\sigma x+z\|_2^2/2}\,dz &\geq & (2\pi)^{-n} \int_K e^{-\E_{\sigma \in \{-1,1\}} \|\sigma x+z\|_2^2/2}\,dz \\ &=& (2\pi)^{-n} \int_K e^{-(\|x\|_2^2+\|z\|_2^2)/2}\,dz = e^{-\|x\|_2^2} \gamma_n(K)\,. \end{eqnarray*}\]
  • Proof of Dual Sudakov. Let \(\mathcal{B} \seteq \{ x \in \R^n : \|x\| \leq 1 \}\) denote the unit ball of the \(\|\cdot\|\) norm and suppose \(x_1,\ldots,x_s \in B_2^n\) is a maximal set of points such that the shifted balls \(x_j + \delta \mathcal{B}\) are all pairwise disjoint. In that case, we have

    \[\begin{equation}\label{eq:Bcover} B_2^n \subseteq \bigcup_{j=1}^s (x_j + 2 \delta \mathcal{B})\,, \end{equation}\]

    i.e., we have covered \(B_2^n\) by \(s\) balls (in \(\|\cdot\|\)) of radius at most \(2\delta\).

    Let \(\lambda > 0\) be a parameter we will choose later and note that the sets \(\lambda(x_j+\delta \mathcal{B})\) are also pairwise disjoint, and therefore

    \[\begin{eqnarray*} 1 &\geq& \gamma_n\left(\bigcup_{j=1}^s \lambda (x_j + \delta \mathcal{B}) \right) \\ &=& \sum_{j=1}^s \gamma_n\left(\lambda (x_j + \delta \mathcal{B}) \right) \\ &\geq& \sum_{j=1}^s e^{-\lambda^2 \|x_j\|_2^2/2} \gamma_n(\lambda \delta \mathcal{B}) \geq s \cdot e^{-\lambda^2/2} \gamma_n(\lambda \delta \mathcal{B})\,, \end{eqnarray*}\]

    where the first equality uses disjointness, the second inequality uses the Gaussian Shift Lemma, and the final inequality uses \(x_1,\ldots,x_s \in B_2^n\).

    If we define \(\lambda \seteq \frac{2}{\delta} \E\, \|g\|\), then

    \[\gamma_n(\lambda \delta B) = \P\left(g \in \lambda \delta B\right) = \P\left(\|g\| \leq \lambda \delta\right) = \P\left(\|g\| \leq 2 \E\, \|g\|\right) \geq \frac12\,,\]

    where the last step is simply Markov’s inequality.

    Combining this with our previous calculation yields

    \[1 \geq s \cdot \exp\left(-(2/\delta)^2 (\E\, \|g\|)^2\right) \cdot \frac12\,,\]

    i.e.

    \[\sqrt{\log (s/2)} \leq \frac{2}{\delta} \E\, \|g\|\,.\]

    Taking \(s=2^{2^h}\) and recalling \eqref{eq:Bcover} shows that we can cover \(B_2^n\) by at most \(s\) \(\|\cdot\|\)-balls of radius \(2 \delta\) with

    \[\delta \leq \frac{2\, \E\, \|g\|}{\sqrt{\log (s/2)}} \lesssim 2^{-h/2} \E\,\|g\|\,.\]

    By definition, this yields our desired bound

    \[e_h(B_2^n, \|\cdot\|) \lesssim 2^{-h/2} \E\,\|g\|\,.\]

Supplementary reading:

The KLS conjecture and stochastic localization
Wed, Jan 25
A Cheeger inequality for isotropic convex sets

Mon, Jan 30

Wed, Feb 01

Mon, Feb 06

Wed, Feb 08

Mon, Feb 13

Wed, Feb 15

Wed, Feb 22

Mon, Feb 27

Wed, Mar 01

Mon, Mar 06

Wed, Mar 08