Recall the KLS constant of a probability measure \(\mu\) on \(\R^n\):
\[\psi_{\mu} \seteq \inf_{S \subseteq \R^n : \mu(S) \leq 1/2} \frac{\mu(\partial S)}{\mu(S)}\,,\]where the infimum is over measureable subsets and
\[\mu(\partial S) \seteq \lim_{\e \to 0+} \frac{\mu(S_{\e} \setminus S)}{\e}\,,\]where \(S_{\e} \seteq \{ x \in \R^n : d(x,S) \leq \e \}\) is the \(\e\)-neighborhood of \(S\).
The KLS conjecture asserts that when \(\mu\) is a log-concave distribution, it should hold that the infimum can be restricted to halfspaces:
\[\psi_{\mu} \gtrsim \inf_{H \subseteq \R^n : \mu(H) \leq 1/2} \frac{\mu(\partial H)}{\mu(H)}\,,\]where a halfspace is defined by \(H = \{ x \in \R^n : \langle x,v \rangle \leq c \}\) for some \(v \in \R^n\) and \(c \in \R\).
Suppose that \(\mu\) is measure on \(\R^n\). The covariance matrix of \(\mu\) is defined as
\[A \seteq \E[(X-\E[X])(X-\E[X])^{\top}]\,,\]where \(X\) is a random variable with law \(\mu\).
This describes the variance of \(\mu\) along every projection: If \(v \in \R^n\), then
\[\E[\langle v, X - \E[X]\rangle^2] = \langle v, A v\rangle\,.\]Lemma: If \(\mu\) is log-concave, then
\[\inf_{H \subseteq \R^n : \mu(H) \leq 1/2} \frac{\mu(\partial H)}{\mu(H)} \asymp \frac{1}{\sqrt{\|A\|_{op}}}\,,\]where \(\|A\|_{op}\) denotes the maximum eigenvalue of \(A\).
This lemma is not too difficult to prove given that if \(\mu\) is log-concave and \(X\) has law \(\mu\), then \(\langle v,X\rangle\) is also log-concave for any \(v \in \R^n\).
For distributions that are more log-concave than a Gaussian, one can prove that an analogous bound holds.
Gaussian Factor Lemma: Suppose that the density of \(\mu\) is proportional to \(f(x) e^{-\langle x, B x\rangle}\) for some PSD matrix \(B\), and a log-concave function \(f : \R^n \to \R_+\). Then,
\[\psi_{\mu} \gtrsim \frac{1}{\sqrt{\|B^{-1}\|_{op}}}\,.\]Since the product of log-concave functions is log-concave, the measure \(\mu\) is log-concave. This lemma can be proved using localization (the approach we sketched last lecture) to reduce it to the \(1\)-dimensional case. Note that \(B^{-1}\) is precisely the covariance matrix of the Gaussian with density proportional to \(e^{- \langle x,Bx\rangle}\):
\[\int e^{-\langle x,Bx\rangle} xx^{\top}\,dx = c_B \int e^{-\|x\|^2/2} (B^{-1/2} x) (B^{-1/2} x)^{\top}\,dx = c_B B^{-1/2} \left(\int e^{-\|x\|^2} xx^{\top} \,dx\right) B^{-1/2}\,,\]where \(c_B\) is some constant resulting from the substitution \(x \mapsto B^{-1/2} x\). The quantity in parentheses is a multiple of the identity.
Therefore if a measure is log-concave with respect to a Gaussian distribution, its KLS constant is at least (up to a universal constant factor) as good as that for the Gaussian.
Brownian motion: An \(n\)-dimensional Brownian motion is a stochastic process \((B_t : t \geq 0)\) such that \(B_s = \int_0^s dB_t\) and \(dB_t\) can be thought of as a random variable independent of \((B_s : 0 \leq s \leq t)\) with law \(N(0, dt \cdot I)\) (the law of a spherical \(n\)-dimensional Gaussian with covariance \(dt \cdot I\)).
Stochastic localization: Suppose that a log-concave measure \(\mu\) has density \(f : \R^n \to \R_+\). We will define a stochastic process on densities: Take \(f_0(x)=f(x)\) and
\[d f_t(x) = f_t(x) \langle x-a_t, dB_t\rangle\,,\]where \((B_t : t \geq 0)\) is an \(n\)-dimensional Brownian motion, and
\[a_t \seteq \int_{\R^n} x f_t(x)\,dx\]is the center of mass of the \(f_t\).
Note that if \(f_0(x)=f(x)\) is a density, then so is \(f_t\) since
\[\int_{\R^n} d f_t(x) = \left\langle \int_{\R^n} f_t(x) (x-a_t), dB_t\right\rangle = 0\,.\]Let \(\mu_t\) be the corresponding measure. Note that \(f_t(x)\) is a martingale: \(\E[d f_t(x)] = 0\).
Martingale property: \(\E[\mu_t(S)] = \mu(S)\) for all \(S \subseteq \R^n\).
For a sufficiently nice function \(f : \R \to \R\), we can approximate the change in \(f\) by its Taylor expansion
\[df(x) = f(x+dx) - f(x) = f'(x) \,dx + \frac12 d^2 f (x_0) \,dx^2 + \cdots\,.\]Dividing both sides by \(dx\) and taking \(dx \to 0\), all but the \(dx\) vanishes.
But now suppose $B_t$ is \(1\)-dimensional Brownian motion and we compute
\[d f(B_t) = f(B_{t+dt}) - f(B_t) = f'(B_t) \,dB_t + \frac12 f''(B_t) \,dB_t^2 + \cdots\,.\]Then \(dB_t\) has law \(N(0,dt)\), hence \(\E[dB_t^2]=dt\). In other words, \((B_t)\) has non-trivial quadratic variation. That means we cannot neglect the 2nd order terms. Ito’s lemma tells us that we need only go to second order so that
\[d f(B_t) = f'(B_t) \,dB_t + \frac12 f''(B_t) \,dt\,.\]Let’s consider a more general version for stochastic processes in \(n\) dimension.
Stochastic processes and Ito derivatives: Consider a process \((X_t : t \geq 0)\) defined by the stochastic differential equation
\[dX_t = u_t \,dt + \Sigma_t dB_t\,,\]where \(u_t \in \R^m\) and \(\Sigma_t \in \R^{m \times n}\).
Let \(f : \R^m \to \R\) be a sufficiently nice (e.g., continuously differentiable) function. Then Ito’s lemma tells us how to compute the time derivative of \(f(X_t)\):
\[df(X_t) = \sum_{i=1}^m \partial_i f(X_t) dX_t^i + \frac12 \sum_{i,j} \partial_{ij} f(X_t) d[X_t^i,X_t^j]_t\,,\]where \([X_t^i, X_t^j]_t\) is the quadratic variation
\[[X_t^i, X_t^j]_t = \int_0^t \left(\Sigma_s \Sigma_s^{\top}\right)_{i,j} \,ds\]so that
\[d [X_t^i, X_t^j]_t = \left(\Sigma_s \Sigma_s^{\top}\right)_{i,j}\,.\]Note that \(\Sigma_t \Sigma_t^{\top}\) is the covariance matrix of the random vector \(\Sigma_t dB_t\).
We can use this figure out what \(f_t\) looks like:
Here for a fixed \(x \in \R^n\), we take \(X_t = f_t(x)\) so that \(\Sigma_t = f_t(x) (x-a_t)^{\top}\). Then using the Ito lemma:
\[\begin{align*} d \log f_t(x) = d \log (X_t) &= \frac{d X_t}{X_t} - \frac12 \frac{d[X_t,X_t]_t}{X_t^2} \\ &= \frac{d f_t(x)}{f_t(x)} - \frac12 \frac{d[f_t(x),f_t(x)]_t}{f_t(x)^2} \\ &=\langle x-a_t,dB_t\rangle - \frac12 \|x-a_t\|^2\,, \end{align*}\]where we used
\[d[X_t,X_t]_t = (x-a_t)^{\top} (x-a_t) = \|x-a_t\|^2 \,dt\,.\]Let us simplify:
\[d \log f_t(x) = \langle x, a_t\,dt + dB_t\rangle - \frac12 \|x\|^2 - \left[\langle a_t, dB_t\rangle + \tfrac12 \|a_t\|^2\,dt \right]\]Note that the terms in brackets don’t depend on \(x\). Thus integrating the logarithm gives (up to scaling by a constant):
\[f_t(x) = \exp\left(- \frac{t}{2} \|x\|^2+ \int_0^t \langle x,a_s\,ds + dB_s\rangle \right) f(x)\,.\]If we let \(\mu_t\) denote the corresponding measure, then the Gaussian Factor Lemma says that (with probability one):
\[\psi_{\mu_t} \gtrsim \sqrt{t}\,.\]Now consider \(S \subseteq \R^n\) with \(\mu(S)=1/2\) and suppose there is a time \(t_0\) such that
\[\begin{equation}\label{eq:good-meas} \P[\tfrac14 \mu(S) \leq \mu_{t_0}(S) \leq \tfrac34 \mu(S)] \geq \frac{1}{10}\,. \end{equation}\]Use the Martingale Property of \(\mu_t\) to write
\[\mu(\partial S) = \E[\mu_{t_0}(\partial S)] \geq \frac18 \psi_{\mu_{t_0}} \P[\tfrac14 \mu(S) \leq \mu_{t_0}(S) \leq \tfrac34 \mu(S)] \gtrsim \psi_{\mu_{t_0}} \gtrsim \sqrt{t_0}\,.\]Thus if we can find a lower bound on \(t_0\), we have proved a lower bound on \(\psi_{\mu}\).
In the next lecture, we will show that \eqref{eq:good-meas} holds for some \(t_0 \asymp (\tr(A^2))^{-1/2}\).
This yields
\[\psi_{\mu} \gtrsim \frac{1}{\tr(A^2)^{1/4}}\,.\]If \(\mu\) is istropic, this value is \(\gtrsim n^{-1/4}\), implying that \(\psi_n \gtrsim n^{-1/4}\).