Recall from last time that \(f : \R^n \to \R_+\) is a log-concave density on \(\R^n\) and we use \(\mu\) to denote the associated probability measure.
We evolve \(f\) according to a stochastic differential equation (SDE) given by \(f_0(x)=f(x)\) and
\[d f_t(x) = f_t(x) \langle x - a_t, dB_t \rangle\,,\]were \((B_t : t \geq 0)\) is an $n$-dimensional Brownian motion and
\[a_t = \int_{\R^n} x f_t(x)\,dx\]is the center of mass of \(f_t\). Let \(\mu_t\) be the measure corresponding to the density \(f_t\).
We used Itô’s Lemma to calculate
\[\begin{equation}\label{eq:gfactor} f_t(x) \propto \exp\left(-\frac{t}{2} \|x\|^2 + \int_0^t \langle x,a_s \,ds + dB_s\rangle\right) f(x)\,. \end{equation}\]The Gaussian factor here is important for us because, according to the Gaussian Factor Lemma from last lecture, this yields \(\psi_{\mu_t} \gtrsim \sqrt{t}\).
Note that \(f_t(x)\) is a martingale for every \(x \in \R^n\): If $\mathcal{F}_t$ denotes the filtration generated by \((B_s : 0 \leq s \leq t)\), then
\[\E[df_t(x) \mid \mathcal{F}_t] = f_t(x) \left\langle x-a_t, \E[dB_t \mid \mathcal{F}_t]\right\rangle = 0.\]This means that for every set \(S \subseteq \R^n\), we have \(\E[\mu_t(S)]=\mu(S)\).
So now suppose we want start with \(S \subseteq \R^n\) with \(\mu(S) = 1/2\). We want to prove a lower bound on \(\mu(\partial S)\) as follows: At any time \(t \geq 0\),
\[\E[\mu(\partial S)] = \E[\mu_t(\partial S)] \gtrsim \psi_{\mu_t} \P[\tfrac14 \leq \mu_t(S) \leq \tfrac34] \gtrsim \sqrt{t} \cdot \P[\tfrac14 \leq \mu_t(S) \leq \tfrac34]\,.\]So now we are left to solve the following problem: For how long does \(\mu_t(S)\) stay bounded between \(1/4\) and \(3/4\)?
So let’s analyze how \(\mu_t(S)\) evolves using Itô’s lemma. Define
\[m_t \seteq \mu_t(S) = \int_S f_t(x) \,dx\,.\]Using the SDE for $f_t$, we have
\[d m_t = \int_S df_t(x)\,dx = \left\langle \int_S (x-a_t) f_t(x), dB_t\right\rangle.\]Dubins-Schwartz: While we don’t need this technically, it’s useful to keep in mind. Roughly speaking, all reasonable martingales are a reparameterized Brownian motion.
Given a \(1\)-dimensional stochastic process \(dm_t = u_t \,dt + \Sigma_t dB_t\) (so \(\Sigma_t \in \R^{1 \times n}\)), recall the quadratic variation
\[[m_t]_t = \int_0^t \Sigma_t \Sigma_t^{\top} \,ds\,.\]Define the stopping time \(\tau_s \seteq \inf \{ t : [m_t]_t > s \}\), i.e., the first time at which we have collected quadratic variation \(s\). Then $m_{\tau_s}$ and $B_s$ (with \(B_0 = m_0\)) have the same law.
Controlling \(\mu_t(S)\) by controlling the operator norm of the covariance. This means that in order to control \(\P(\tfrac14 \leq m_t \leq \frac34)\), we just need to control the quadratic variation:
\[[m_t]_t = \int_0^t \left\|\int_S (x-a_s) f_s(x)\,dx\right\|^2 \,ds\,.\]Note that
\[\begin{align*} \left\|\int_S (x-a_s) f_s(x)\,dx\right\|^2 &= \max_{\|v\| \leq 1} \left(\int_S \langle v, x-a_s \rangle f_s(x)\,dx\right)^2 \\ &=\max_{\|v\| \leq 1} \left(\int_{\R^n} \mathbf{1}_S(x) \langle v, x-a_s \rangle f_s(x)\,dx\right)^2 \\ &\leq \max_{\|v\| \leq 1} \left(\int_{\R^n} \langle v, x-a_s \rangle^2 f_s(x)\,dx\right) \left(\int_{\R^n} \mathbf{1}_S(x)^2 f_s(x)\,dx\right), \end{align*}\]where the last line is Cauchy-Schwarz. The second integral is precisely \(\mu_s(S) \leq 1\), and the first integral is \(\langle v, A_s v\rangle\), where \(A_s\) is the covariance matrix of \(\mu_s\). So we have
\[[m_t]_t \leq \int_0^t \max_{\|v\| \leq 1} \langle v, A_s v\rangle\,dt = \int_0^t \|A_s\|_{op}\,dt\]So we need to see how large \(t\) can be so that \(\int_0^t \|A_s\|_{op}\,dt \leq 0.1\). The operator norm \(\|A_s\|_{op}\) (the maximum eigenvalue of \(A_s\)) is not something easier to argue about as time evolves because it’s defined as a maximum over directions.
Thus we will use a crude upper bound:
\[\|A_t\|_{op} \leq \tr(A_t^2)^{1/2}\,.\]Key Lemma: Let \(A\) be the covariance matrix of \(\mu\). Then for some constant $c > 0$ and \(0 \leq t \leq c (\tr(A^2))^{-1/2}\), we have
\[\tr(A_t^2) \lesssim \tr(A^2)\,.\]The Key Lemma implies that \([m_t]_t \lesssim \int_0^t \tr(A_s^2)^{1/2}\,ds \leq t \cdot \max_{0 \leq s \leq t} \tr(A_t^2)^{1/2} \lesssim t \cdot \tr(A^2)^{1/2}\,.\)
And therefore for some choice of \(T \gtrsim (\tr(A^2))^{-1/2}\), we have \(\Pr([m_{T}]_T \ll 1) \gtrsim 1\), and therefore \(\Pr(\tfrac14 \leq m_T \leq \tfrac34) \gtrsim 1\).
Now \eqref{eq:gfactor} and the Gaussian Factor Lemma together tell us that \(\psi_{\mu} \gtrsim \sqrt{T} \gtrsim \tr(A^2)^{-1/4}\). If \(\mu\) is in isotropic position, then \(A=I\) and we get \(\psi_{\mu} \gtrsim n^{-1/4}\).
We first need to compute the Itô derivative of the covariance
\[A_t = \int_{\R^n} (x-a_t)(x-a_t)^{\top} f_t(x)\,dx\,.\]For this it helps to extend Itô’s lemma notationally to handle a pair of stochastic processes: Suppose that \(dX_t = u_t \,dt + M_t\,dB_t\) and \(dY_t = v_t\,dt + N_t\,dB_t\), then
\[\begin{align*} d f(X_t, Y_t) &= \sum_i \partial_{x^i} f(X_t,Y_t) dX_t^i + \sum_j \partial_{y^j} f(X_t,Y_t) dY_t^j \\ &+ \frac12 \sum_{i,i'} \partial_{x^i,x^{i'}} f(X_t,Y_t) d[x^i,x^{i'}]_t + \frac12 \sum_{j,j'} \partial_{y^j,y^{j'}} f(X_t,Y_t) d[y^{j},y^{j'}]_t + \frac12 \sum_{i,j} \partial_{x^i,y^j} f(X_t,Y_t) d[x^i, y^j]_t\,, \end{align*}\]where
\[[X_t,Y_t]_t = \int_0^t M_s N_s^{\top}\,ds\,.\]Note that this is the same formula as before (by lifting to a single process \(Z_t = X_t \oplus Y_t\)), but it gives guidance to our next calculation.
Write:
\[d A_t = \int_{\R^n} (x-a_t)(x-a_t)^{\top} d f_t(x)\,dx + \int_{\R^n} \left((d a_t)(x-a_t)^{\top}\right) f_t(x)\,dx + \int_{\R^n} \left((x - a_t)(d a_t)^{\top}\right) f_t(x)\,dx + \cdots\,,\]where \(\cdots\) represents the second order derivatives.
The first term is equal to \(\int_{\R^n} (x-a_t)(x-a_t)^{\top} \langle x-a_t, dB_t\rangle f_t(x)\,dx.\)
The second two terms both evaluate to zero since \(\int (x-a_t) f_t(x)\,dx = 0\) by definition of \(a_t\).
Now let’s calculate the second-order derivatives:
\[\frac12 \cdot 2 \cdot d[a_t,a_t]_t \int f_t(x) \,dx - \frac{1}{2} \cdot 2 \cdot \int (x-a_t) d[a_t^{\top},f_t(x)]_t)^{\top}\,dx - \frac{1}{2} \cdot 2 \cdot d[a_t,f_t(x)]_t (x-a_t)^{\top}\,dx\]Recalling \(d a_t = A_t dB_t\) gives \(d[a_t,a_t]_t = A_t^2\,dt\).
We also have
\[d[a_t, f_t(x)]_t = A_t (x-a_t) f_t(x) \,dt\,,\]hence the second order terms evaluate to
\[A_t^2\,dt - 2 A_t \int f_t(x) (x-a_t) (x-a_t)^{\top}\,dx = - A_t^2\,dt\,.\]Altogether this gives
\[dA_t = \int_{\R^n} (x-a_t)(x-a_t)^{\top} \langle x-a_t, dB_t\rangle f_t(x)\,dx -A_t^2\,dt\]Now we wish to analyze the evolution of \(\Phi_t = \tr(A_t^2)\), and another application of Itô’s lemma gives
\[\begin{align*} d \Phi_t = -2 \tr(A_t^3) &+ \int \int \left((x-a_t)^{\top} (y-a_t)\right)^3 f_t(x) f_t(y)\,dx\,dy \\ &+ 2 \int (x-a_t)^{\top} A_t (x-a_t) \langle x-a_t,dB_t\rangle f_t(x)\,dx \end{align*}\]We want to ensure that \(\Phi_t\) is small for \(t \in [0,T]\), so we can ignore the term \(-2 \tr(A_t^3)\) (since \(A_t\) is PSD, this is always a negative term).
If \(\hat{\mathbf{X}},\hat{\mathbf{Y}}\) are independent random variables with law \(\mu_t\) and \(\mathbf{X} \seteq \hat{\mathbf{X}} - \E[\hat{\mathbf{X}}]\), \(\mathbf{Y} \seteq \hat{\mathbf{Y}} - \E[\hat{\mathbf{Y}}]\), then the second term is
\[d \Phi_t \leq \E \langle \mathbf{X}, \mathbf{Y}\rangle^3 +\E \left[\langle \mathbf{X}, A_t \mathbf{X}\rangle \mathbf{X}^{\top} \right] dB_t\]Reverse Holder inequality: For any log-concave random vector \(\mathbf{X}\) in \(\R^n\) and \(k \geq 0\),
\[(\E \|\mathbf{X}\|^k)^{1/k} \leq 2k\ \left(\E \|\mathbf{X}\|^2\right)^{1/2}\,.\]Using the this inequality one obtains
\[d \Phi_t \lesssim \Phi_t^{3/2}\,dt + \Phi_t^{5/4} dB_t\,.\]To analyze this, let us assume that we run the process until the first time \(\tau_0\) such that \(\Phi_{\tau_0} = (1+c) \tr(A^2)\) Then for \(t \in [0,\tau_0]\),
\[d \Phi_t \lesssim \tr(A^2)^{3/2}\,dt + \tr(A^2)^{5/4} dB_t\,,\]and therefore
\[c\, \tr(A^2) = \Phi_{\tau_0} - \Phi_{0} = \int_{0}^{\tau_0} d\Phi_t \lesssim \tau_0 \tr(A^2)^{3/2} + W_{\tau}\,,\]where \(\tau \leq \tau_0 \tr(A^2)^{5/2}\), and \(W_{\tau}\) has the law of a Brownian motion (recall Dubins-Schwartz). Thus with high probability, this upper bound is
\[\lesssim \tau_0\,\tr(A^2)^{3/2} + \tau_0^{1/2}\,\tr(A^2)^{5/4}\,.\]We therefore conclude that, with constant probability, \(\tau_0 \gtrsim \tr(A^2)^{-1/2}\), as desired.