# Stochastic localization (Wed, Feb 01)

• Recall the KLS constant of a probability measure $\mu$ on $\R^n$:

$\psi_{\mu} \seteq \inf_{S \subseteq \R^n : \mu(S) \leq 1/2} \frac{\mu(\partial S)}{\mu(S)}\,,$

where the infimum is over measureable subsets and

$\mu(\partial S) \seteq \lim_{\e \to 0+} \frac{\mu(S_{\e} \setminus S)}{\e}\,,$

where $S_{\e} \seteq \{ x \in \R^n : d(x,S) \leq \e \}$ is the $\e$-neighborhood of $S$.

• The KLS conjecture asserts that when $\mu$ is a log-concave distribution, it should hold that the infimum can be restricted to halfspaces:

$\psi_{\mu} \gtrsim \inf_{H \subseteq \R^n : \mu(H) \leq 1/2} \frac{\mu(\partial H)}{\mu(H)}\,,$

where a halfspace is defined by $H = \{ x \in \R^n : \langle x,v \rangle \leq c \}$ for some $v \in \R^n$ and $c \in \R$.

• Suppose that $\mu$ is measure on $\R^n$. The covariance matrix of $\mu$ is defined as

$A \seteq \E[(X-\E[X])(X-\E[X])^{\top}]\,,$

where $X$ is a random variable with law $\mu$.

This describes the variance of $\mu$ along every projection: If $v \in \R^n$, then

$\E[\langle v, X - \E[X]\rangle^2] = \langle v, A v\rangle\,.$
• Lemma: If $\mu$ is log-concave, then

$\inf_{H \subseteq \R^n : \mu(H) \leq 1/2} \frac{\mu(\partial H)}{\mu(H)} \asymp \frac{1}{\sqrt{\|A\|_{op}}}\,,$

where $\|A\|_{op}$ denotes the maximum eigenvalue of $A$.

This lemma is not too difficult to prove given that if $\mu$ is log-concave and $X$ has law $\mu$, then $\langle v,X\rangle$ is also log-concave for any $v \in \R^n$.

• For distributions that are more log-concave than a Gaussian, one can prove that an analogous bound holds.

Gaussian Factor Lemma: Suppose that the density of $\mu$ is proportional to $f(x) e^{-\langle x, B x\rangle}$ for some PSD matrix $B$, and a log-concave function $f : \R^n \to \R_+$. Then,

$\psi_{\mu} \gtrsim \frac{1}{\sqrt{\|B^{-1}\|_{op}}}\,.$

Since the product of log-concave functions is log-concave, the measure $\mu$ is log-concave. This lemma can be proved using localization (the approach we sketched last lecture) to reduce it to the $1$-dimensional case. Note that $B^{-1}$ is precisely the covariance matrix of the Gaussian with density proportional to $e^{- \langle x,Bx\rangle}$:

$\int e^{-\langle x,Bx\rangle} xx^{\top}\,dx = c_B \int e^{-\|x\|^2/2} (B^{-1/2} x) (B^{-1/2} x)^{\top}\,dx = c_B B^{-1/2} \left(\int e^{-\|x\|^2} xx^{\top} \,dx\right) B^{-1/2}\,,$

where $c_B$ is some constant resulting from the substitution $x \mapsto B^{-1/2} x$. The quantity in parentheses is a multiple of the identity.

Therefore if a measure is log-concave with respect to a Gaussian distribution, its KLS constant is at least (up to a universal constant factor) as good as that for the Gaussian.

• Brownian motion: An $n$-dimensional Brownian motion is a stochastic process $(B_t : t \geq 0)$ such that $B_s = \int_0^s dB_t$ and $dB_t$ can be thought of as a random variable independent of $(B_s : 0 \leq s \leq t)$ with law $N(0, dt \cdot I)$ (the law of a spherical $n$-dimensional Gaussian with covariance $dt \cdot I$).

• Stochastic localization: Suppose that a log-concave measure $\mu$ has density $f : \R^n \to \R_+$. We will define a stochastic process on densities: Take $f_0(x)=f(x)$ and

$d f_t(x) = f_t(x) \langle x-a_t, dB_t\rangle\,,$

where $(B_t : t \geq 0)$ is an $n$-dimensional Brownian motion, and

$a_t \seteq \int_{\R^n} x f_t(x)\,dx$

is the center of mass of the $f_t$.

Note that if $f_0(x)=f(x)$ is a density, then so is $f_t$ since

$\int_{\R^n} d f_t(x) = \left\langle \int_{\R^n} f_t(x) (x-a_t), dB_t\right\rangle = 0\,.$

Let $\mu_t$ be the corresponding measure. Note that $f_t(x)$ is a martingale: $\E[d f_t(x)] = 0$.

Martingale property: $\E[\mu_t(S)] = \mu(S)$ for all $S \subseteq \R^n$.

• For a sufficiently nice function $f : \R \to \R$, we can approximate the change in $f$ by its Taylor expansion

$df(x) = f(x+dx) - f(x) = f'(x) \,dx + \frac12 d^2 f (x_0) \,dx^2 + \cdots\,.$

Dividing both sides by $dx$ and taking $dx \to 0$, all but the $dx$ vanishes.

But now suppose $B_t$ is $1$-dimensional Brownian motion and we compute

$d f(B_t) = f(B_{t+dt}) - f(B_t) = f'(B_t) \,dB_t + \frac12 f''(B_t) \,dB_t^2 + \cdots\,.$

Then $dB_t$ has law $N(0,dt)$, hence $\E[dB_t^2]=dt$. In other words, $(B_t)$ has non-trivial quadratic variation. That means we cannot neglect the 2nd order terms. Ito’s lemma tells us that we need only go to second order so that

$d f(B_t) = f'(B_t) \,dB_t + \frac12 f''(B_t) \,dt\,.$

Let’s consider a more general version for stochastic processes in $n$ dimension.

• Stochastic processes and Ito derivatives: Consider a process $(X_t : t \geq 0)$ defined by the stochastic differential equation

$dX_t = u_t \,dt + \Sigma_t dB_t\,,$

where $u_t \in \R^m$ and $\Sigma_t \in \R^{m \times n}$.

Let $f : \R^m \to \R$ be a sufficiently nice (e.g., continuously differentiable) function. Then Ito’s lemma tells us how to compute the time derivative of $f(X_t)$:

$df(X_t) = \sum_{i=1}^m \partial_i f(X_t) dX_t^i + \frac12 \sum_{i,j} \partial_{ij} f(X_t) d[X_t^i,X_t^j]_t\,,$

where $[X_t^i, X_t^j]_t$ is the quadratic variation

$[X_t^i, X_t^j]_t = \int_0^t \left(\Sigma_s \Sigma_s^{\top}\right)_{i,j} \,ds$

so that

$d [X_t^i, X_t^j]_t = \left(\Sigma_s \Sigma_s^{\top}\right)_{i,j}\,.$

Note that $\Sigma_t \Sigma_t^{\top}$ is the covariance matrix of the random vector $\Sigma_t dB_t$.

• We can use this figure out what $f_t$ looks like:

Here for a fixed $x \in \R^n$, we take $X_t = f_t(x)$ so that $\Sigma_t = f_t(x) (x-a_t)^{\top}$. Then using the Ito lemma:

\begin{align*} d \log f_t(x) = d \log (X_t) &= \frac{d X_t}{X_t} - \frac12 \frac{d[X_t,X_t]_t}{X_t^2} \\ &= \frac{d f_t(x)}{f_t(x)} - \frac12 \frac{d[f_t(x),f_t(x)]_t}{f_t(x)^2} \\ &=\langle x-a_t,dB_t\rangle - \frac12 \|x-a_t\|^2\,, \end{align*}

where we used

$d[X_t,X_t]_t = (x-a_t)^{\top} (x-a_t) = \|x-a_t\|^2 \,dt\,.$

Let us simplify:

$d \log f_t(x) = \langle x, a_t\,dt + dB_t\rangle - \frac12 \|x\|^2 - \left[\langle a_t, dB_t\rangle + \tfrac12 \|a_t\|^2\,dt \right]$

Note that the terms in brackets don’t depend on $x$. Thus integrating the logarithm gives (up to scaling by a constant):

$f_t(x) = \exp\left(- \frac{t}{2} \|x\|^2+ \int_0^t \langle x,a_s\,ds + dB_s\rangle \right) f(x)\,.$

If we let $\mu_t$ denote the corresponding measure, then the Gaussian Factor Lemma says that (with probability one):

$\psi_{\mu_t} \gtrsim \sqrt{t}\,.$
• Now consider $S \subseteq \R^n$ with $\mu(S)=1/2$ and suppose there is a time $t_0$ such that

$\begin{equation}\label{eq:good-meas} \P[\tfrac14 \mu(S) \leq \mu_{t_0}(S) \leq \tfrac34 \mu(S)] \geq \frac{1}{10}\,. \end{equation}$

Use the Martingale Property of $\mu_t$ to write

$\mu(\partial S) = \E[\mu_{t_0}(\partial S)] \geq \frac18 \psi_{\mu_{t_0}} \P[\tfrac14 \mu(S) \leq \mu_{t_0}(S) \leq \tfrac34 \mu(S)] \gtrsim \psi_{\mu_{t_0}} \gtrsim \sqrt{t_0}\,.$
• Thus if we can find a lower bound on $t_0$, we have proved a lower bound on $\psi_{\mu}$.

In the next lecture, we will show that \eqref{eq:good-meas} holds for some $t_0 \asymp (\tr(A^2))^{-1/2}$.

This yields

$\psi_{\mu} \gtrsim \frac{1}{\tr(A^2)^{1/4}}\,.$

If $\mu$ is istropic, this value is $\gtrsim n^{-1/4}$, implying that $\psi_n \gtrsim n^{-1/4}$.