# Tracking the covariance (Mon, Feb 06)

## Reducing to control of the covariance

• Recall from last time that $f : \R^n \to \R_+$ is a log-concave density on $\R^n$ and we use $\mu$ to denote the associated probability measure.

• We evolve $f$ according to a stochastic differential equation (SDE) given by $f_0(x)=f(x)$ and

$d f_t(x) = f_t(x) \langle x - a_t, dB_t \rangle\,,$

were $(B_t : t \geq 0)$ is an $n$-dimensional Brownian motion and

$a_t = \int_{\R^n} x f_t(x)\,dx$

is the center of mass of $f_t$. Let $\mu_t$ be the measure corresponding to the density $f_t$.

• We used Itô’s Lemma to calculate

$$$\label{eq:gfactor} f_t(x) \propto \exp\left(-\frac{t}{2} \|x\|^2 + \int_0^t \langle x,a_s \,ds + dB_s\rangle\right) f(x)\,.$$$

The Gaussian factor here is important for us because, according to the Gaussian Factor Lemma from last lecture, this yields $\psi_{\mu_t} \gtrsim \sqrt{t}$.

• Note that $f_t(x)$ is a martingale for every $x \in \R^n$: If $\mathcal{F}_t$ denotes the filtration generated by $(B_s : 0 \leq s \leq t)$, then

$\E[df_t(x) \mid \mathcal{F}_t] = f_t(x) \left\langle x-a_t, \E[dB_t \mid \mathcal{F}_t]\right\rangle = 0.$

This means that for every set $S \subseteq \R^n$, we have $\E[\mu_t(S)]=\mu(S)$.

• So now suppose we want start with $S \subseteq \R^n$ with $\mu(S) = 1/2$. We want to prove a lower bound on $\mu(\partial S)$ as follows: At any time $t \geq 0$,

$\E[\mu(\partial S)] = \E[\mu_t(\partial S)] \gtrsim \psi_{\mu_t} \P[\tfrac14 \leq \mu_t(S) \leq \tfrac34] \gtrsim \sqrt{t} \cdot \P[\tfrac14 \leq \mu_t(S) \leq \tfrac34]\,.$

So now we are left to solve the following problem: For how long does $\mu_t(S)$ stay bounded between $1/4$ and $3/4$?

• So let’s analyze how $\mu_t(S)$ evolves using Itô’s lemma. Define

$m_t \seteq \mu_t(S) = \int_S f_t(x) \,dx\,.$

Using the SDE for $f_t$, we have

$d m_t = \int_S df_t(x)\,dx = \left\langle \int_S (x-a_t) f_t(x), dB_t\right\rangle.$
• Dubins-Schwartz: While we don’t need this technically, it’s useful to keep in mind. Roughly speaking, all reasonable martingales are a reparameterized Brownian motion.

Given a $1$-dimensional stochastic process $dm_t = u_t \,dt + \Sigma_t dB_t$ (so $\Sigma_t \in \R^{1 \times n}$), recall the quadratic variation

$[m_t]_t = \int_0^t \Sigma_t \Sigma_t^{\top} \,ds\,.$

Define the stopping time $\tau_s \seteq \inf \{ t : [m_t]_t > s \}$, i.e., the first time at which we have collected quadratic variation $s$. Then $m_{\tau_s}$ and $B_s$ (with $B_0 = m_0$) have the same law.

• Controlling $\mu_t(S)$ by controlling the operator norm of the covariance. This means that in order to control $\P(\tfrac14 \leq m_t \leq \frac34)$, we just need to control the quadratic variation:

$[m_t]_t = \int_0^t \left\|\int_S (x-a_s) f_s(x)\,dx\right\|^2 \,ds\,.$

Note that

\begin{align*} \left\|\int_S (x-a_s) f_s(x)\,dx\right\|^2 &= \max_{\|v\| \leq 1} \left(\int_S \langle v, x-a_s \rangle f_s(x)\,dx\right)^2 \\ &=\max_{\|v\| \leq 1} \left(\int_{\R^n} \mathbf{1}_S(x) \langle v, x-a_s \rangle f_s(x)\,dx\right)^2 \\ &\leq \max_{\|v\| \leq 1} \left(\int_{\R^n} \langle v, x-a_s \rangle^2 f_s(x)\,dx\right) \left(\int_{\R^n} \mathbf{1}_S(x)^2 f_s(x)\,dx\right), \end{align*}

where the last line is Cauchy-Schwarz. The second integral is precisely $\mu_s(S) \leq 1$, and the first integral is $\langle v, A_s v\rangle$, where $A_s$ is the covariance matrix of $\mu_s$. So we have

$[m_t]_t \leq \int_0^t \max_{\|v\| \leq 1} \langle v, A_s v\rangle\,dt = \int_0^t \|A_s\|_{op}\,dt$
• So we need to see how large $t$ can be so that $\int_0^t \|A_s\|_{op}\,dt \leq 0.1$. The operator norm $\|A_s\|_{op}$ (the maximum eigenvalue of $A_s$) is not something easier to argue about as time evolves because it’s defined as a maximum over directions.

Thus we will use a crude upper bound:

$\|A_t\|_{op} \leq \tr(A_t^2)^{1/2}\,.$

Key Lemma: Let $A$ be the covariance matrix of $\mu$. Then for some constant $c > 0$ and $0 \leq t \leq c (\tr(A^2))^{-1/2}$, we have

$\tr(A_t^2) \lesssim \tr(A^2)\,.$

The Key Lemma implies that $[m_t]_t \lesssim \int_0^t \tr(A_s^2)^{1/2}\,ds \leq t \cdot \max_{0 \leq s \leq t} \tr(A_t^2)^{1/2} \lesssim t \cdot \tr(A^2)^{1/2}\,.$

And therefore for some choice of $T \gtrsim (\tr(A^2))^{-1/2}$, we have $\Pr([m_{T}]_T \ll 1) \gtrsim 1$, and therefore $\Pr(\tfrac14 \leq m_T \leq \tfrac34) \gtrsim 1$.

• Now \eqref{eq:gfactor} and the Gaussian Factor Lemma together tell us that $\psi_{\mu} \gtrsim \sqrt{T} \gtrsim \tr(A^2)^{-1/4}$. If $\mu$ is in isotropic position, then $A=I$ and we get $\psi_{\mu} \gtrsim n^{-1/4}$.

## Controlling the covariance

• We first need to compute the Itô derivative of the covariance

$A_t = \int_{\R^n} (x-a_t)(x-a_t)^{\top} f_t(x)\,dx\,.$
• For this it helps to extend Itô’s lemma notationally to handle a pair of stochastic processes: Suppose that $dX_t = u_t \,dt + M_t\,dB_t$ and $dY_t = v_t\,dt + N_t\,dB_t$, then

\begin{align*} d f(X_t, Y_t) &= \sum_i \partial_{x^i} f(X_t,Y_t) dX_t^i + \sum_j \partial_{y^j} f(X_t,Y_t) dY_t^j \\ &+ \frac12 \sum_{i,i'} \partial_{x^i,x^{i'}} f(X_t,Y_t) d[x^i,x^{i'}]_t + \frac12 \sum_{j,j'} \partial_{y^j,y^{j'}} f(X_t,Y_t) d[y^{j},y^{j'}]_t + \frac12 \sum_{i,j} \partial_{x^i,y^j} f(X_t,Y_t) d[x^i, y^j]_t\,, \end{align*}

where

$[X_t,Y_t]_t = \int_0^t M_s N_s^{\top}\,ds\,.$

Note that this is the same formula as before (by lifting to a single process $Z_t = X_t \oplus Y_t$), but it gives guidance to our next calculation.

• Write:

$d A_t = \int_{\R^n} (x-a_t)(x-a_t)^{\top} d f_t(x)\,dx + \int_{\R^n} \left((d a_t)(x-a_t)^{\top}\right) f_t(x)\,dx + \int_{\R^n} \left((x - a_t)(d a_t)^{\top}\right) f_t(x)\,dx + \cdots\,,$

where $\cdots$ represents the second order derivatives.

The first term is equal to $\int_{\R^n} (x-a_t)(x-a_t)^{\top} \langle x-a_t, dB_t\rangle f_t(x)\,dx.$

The second two terms both evaluate to zero since $\int (x-a_t) f_t(x)\,dx = 0$ by definition of $a_t$.

Now let’s calculate the second-order derivatives:

$\frac12 \cdot 2 \cdot d[a_t,a_t]_t \int f_t(x) \,dx - \frac{1}{2} \cdot 2 \cdot \int (x-a_t) d[a_t^{\top},f_t(x)]_t)^{\top}\,dx - \frac{1}{2} \cdot 2 \cdot d[a_t,f_t(x)]_t (x-a_t)^{\top}\,dx$
• Recalling $d a_t = A_t dB_t$ gives $d[a_t,a_t]_t = A_t^2\,dt$.

• We also have

$d[a_t, f_t(x)]_t = A_t (x-a_t) f_t(x) \,dt\,,$

hence the second order terms evaluate to

$A_t^2\,dt - 2 A_t \int f_t(x) (x-a_t) (x-a_t)^{\top}\,dx = - A_t^2\,dt\,.$

Altogether this gives

$dA_t = \int_{\R^n} (x-a_t)(x-a_t)^{\top} \langle x-a_t, dB_t\rangle f_t(x)\,dx -A_t^2\,dt$
• Now we wish to analyze the evolution of $\Phi_t = \tr(A_t^2)$, and another application of Itô’s lemma gives

\begin{align*} d \Phi_t = -2 \tr(A_t^3) &+ \int \int \left((x-a_t)^{\top} (y-a_t)\right)^3 f_t(x) f_t(y)\,dx\,dy \\ &+ 2 \int (x-a_t)^{\top} A_t (x-a_t) \langle x-a_t,dB_t\rangle f_t(x)\,dx \end{align*}

We want to ensure that $\Phi_t$ is small for $t \in [0,T]$, so we can ignore the term $-2 \tr(A_t^3)$ (since $A_t$ is PSD, this is always a negative term).

If $\hat{\mathbf{X}},\hat{\mathbf{Y}}$ are independent random variables with law $\mu_t$ and $\mathbf{X} \seteq \hat{\mathbf{X}} - \E[\hat{\mathbf{X}}]$, $\mathbf{Y} \seteq \hat{\mathbf{Y}} - \E[\hat{\mathbf{Y}}]$, then the second term is

$d \Phi_t \leq \E \langle \mathbf{X}, \mathbf{Y}\rangle^3 +\E \left[\langle \mathbf{X}, A_t \mathbf{X}\rangle \mathbf{X}^{\top} \right] dB_t$
• Reverse Holder inequality: For any log-concave random vector $\mathbf{X}$ in $\R^n$ and $k \geq 0$,

$(\E \|\mathbf{X}\|^k)^{1/k} \leq 2k\ \left(\E \|\mathbf{X}\|^2\right)^{1/2}\,.$
• Using the this inequality one obtains

$d \Phi_t \lesssim \Phi_t^{3/2}\,dt + \Phi_t^{5/4} dB_t\,.$
• To analyze this, let us assume that we run the process until the first time $\tau_0$ such that $\Phi_{\tau_0} = (1+c) \tr(A^2)$ Then for $t \in [0,\tau_0]$,

$d \Phi_t \lesssim \tr(A^2)^{3/2}\,dt + \tr(A^2)^{5/4} dB_t\,,$

and therefore

$c\, \tr(A^2) = \Phi_{\tau_0} - \Phi_{0} = \int_{0}^{\tau_0} d\Phi_t \lesssim \tau_0 \tr(A^2)^{3/2} + W_{\tau}\,,$

where $\tau \leq \tau_0 \tr(A^2)^{5/2}$, and $W_{\tau}$ has the law of a Brownian motion (recall Dubins-Schwartz). Thus with high probability, this upper bound is

$\lesssim \tau_0\,\tr(A^2)^{3/2} + \tau_0^{1/2}\,\tr(A^2)^{5/4}\,.$

We therefore conclude that, with constant probability, $\tau_0 \gtrsim \tr(A^2)^{-1/2}$, as desired.