Spectral sparsification via Ito
These are preliminary notes for analyzing the Ito process.
Reis and Rothvoss gave an algorithm
for finding BSS sparsifiers that uses
a a discretized random walk. It appears conceptually simpler to
use a modified Brownian motion directly.
Consider symmetric matrices $A_1,\ldots,A_m \succeq 0$ with
$|A_1|+\cdots+|A_m| \preceq \mathbb{I}_n$. Define a stochastic
process $X_t = (X_t^1,\ldots,X_t^m)$ by $X_0 = 0$, and
$$dX_t = \mathbb{I}_{\mathcal J_t}\, dB_t\,,$$
where \(\{B_t\}\) is a standard Brownian motion and
\(\mathcal J_t \subseteq [m]\) is the set of good coordinates at time \(t\).
Define the stopping times
\(T_i \mathrel{\mathop:}=\inf \{ t > 0 : |X_t^i| \geqslant 1 \}\) and the
set of stopped coordinates
$$\mathcal S_t \mathrel{\mathop:}=\{ i \in [m] : T_i \leqslant t \}\,.$$
We will ensure that \(\mathcal J_t \cap \mathcal S_t = \emptyset\) so that
\(|X_t^1|,\ldots,|X_t^m| \leqslant 1\) holds for all times
\(t \geqslant 0\).
Denote
\(T \mathrel{\mathop:}=\sup \{ t > 0 : |\mathcal S_t| \leqslant m/2 \}\).
Our goal is to show that with good probability,
$$\label{eq:goal}
\sum_{i=1}^m X_T^i A_i \preceq C \left(\frac{n}{m}\right)^{1/2} \mathbb{I}_n\,.$$
Since the stopping time \(T\) entails that \(m/2\) coordinates have been
fixed to \(\pm 1\), this completes the argument since at least one of the
two matrices \(\sum_{i=1}^m (1+X_T^i) A_i\) or
\(\sum_{i=1}^m (1-X_T^i) A_i\) has at most \(m/2\) non-zero coefficients,
and
$$\sum_{i=1}^m (1 \pm X_T^i) A_i \preceq \left(1 + C \left(\frac{n}{m}\right)^{1/2}\right) \mathbb{I}_n\,.$$
Repeatedly halving \(m\) until \(m \asymp n/\varepsilon^2\) yields a
\(1 \pm \varepsilon\) approximation with \(O(n/\varepsilon^2)\) sparsity. To
get a two-sided bound, replace \(A_i\) by
\(\begin{pmatrix} A_i & 0 \\ 0 & - A_i\end{pmatrix}\).
To this end, let us define, for parameters $\theta,\lambda > 0$ to be
chosen later,
$$\begin{aligned}
U(x) &\mathrel{\mathop:}=\theta \mathbb{I}_n + \lambda \|x\|^2 \mathbb{I}_n - \sum_{i=1}^m x_i A_i \\
\Phi(x) &\mathrel{\mathop:}=\mathop{\mathrm{tr}}(U(x)^{-1})\,.\end{aligned}$$
Define $U_t \mathrel{\mathop:}=U(X_t)$. Note that
$\Phi(X_0)=\Phi(0) = n/\theta$ and $U_0 \succ 0$. The idea is that if
${ \Phi(X_t) : t \in [0,T] }$ remains bounded, then $U_t \succ 0$
holds for $t \in [0,T]$, and therefore at time $T$,
$$\sum_{i=1}^m X_T^i A_i \preceq (\theta + \lambda \|X_T\|^2) \mathbb{I}_n \preceq (\theta + \lambda m)\mathbb{I}_n\,.$$
From Ito’s lemma, we know that if $dX_t = \Sigma_t dB_t$ and
$f : \mathbb R^m \to \mathbb R$, then
$$d f(X_t) = \langle \nabla f(t,X_t), dX_t\rangle + \frac12 \mathop{\mathrm{tr}}\left(\nabla^2 f(t,X_t) \Sigma_t \Sigma_t^{\top}\right) dt\,.$$
Since $\Sigma_t = \mathbb{I}_{\mathcal J_t}$,
$$\frac12 \mathop{\mathrm{tr}}\left(\nabla^2 f(t,X_t) \Sigma_t \Sigma_t^{\top}\right) dt
= \frac12 \sum_{i \in \mathcal J_t} \partial_{x_i}^2 f(t,X_t).$$
Now denote $f(x) \mathrel{\mathop:}=\mathop{\mathrm{tr}}(U(x)^{-1})$ and
calculate:
$$\begin{aligned}
\partial_{x_i} f(x) &= - 2 \lambda x_i \mathop{\mathrm{tr}}(U(x)^{-2}) + \mathop{\mathrm{tr}}(U(x)^{-2} A_i) \\
\partial_{x_j} \partial_{x_i} f(x) &= - 2 \lambda \mathbf{1}_{\{i=j\}} \mathop{\mathrm{tr}}(U(x)^{-2}) + 8 \lambda^2 x_i x_j \mathop{\mathrm{tr}}(U(x)^{-3}) - 4 \lambda x_i \mathop{\mathrm{tr}}(U(x)^{-3} A_j)
+ \partial_{x_j} \mathop{\mathrm{tr}}(U(x)^{-2} A_i) \\
&= - 2 \lambda \mathbf{1}_{\{i=j\}} \mathop{\mathrm{tr}}(U(x)^{-2}) + 8 \lambda^2 x_i x_j \mathop{\mathrm{tr}}(U(x)^{-3}) - 4 \lambda x_i \mathop{\mathrm{tr}}(U(x)^{-3} A_j) \\
&\quad - 4 \lambda x_j \mathop{\mathrm{tr}}(U(x)^{-3} A_i) + 2 \mathop{\mathrm{tr}}\left(U(x)^{-1} A_i U(x)^{-1} A_j U(x)^{-1}\right).\end{aligned}$$
This gives
$$\begin{aligned}
d \Phi(X_t) = d f(X_t) &= {\color{Blue}{- 2 \lambda \langle X_t, d X_t\rangle \mathop{\mathrm{tr}}(U_t^{-2}) + \sum_{i=1}^m \mathop{\mathrm{tr}}(U_t^{-2} A_i) dX_t^i}}
+ \frac12 \mathop{\mathrm{tr}}\left(\nabla^2 f(t,X_t) \Sigma_t \Sigma_t^{\top}\right)\,dt \\
\frac12 \mathop{\mathrm{tr}}\left(\nabla^2 f(t,X_t) \Sigma_t \Sigma_t^{\top}\right) &=
{\color{Maroon}{- \lambda \mathop{\mathrm{tr}}(U_t^{-2}) |\mathcal J_t|}} - {\color{ForestGreen}{4 \lambda \sum_{i \in \mathcal J_t} X_t^i \mathop{\mathrm{tr}}\left(U_t^{-3} A_i\right)}}
+ {\color{ForestGreen}{4 \lambda^2 \|\mathbb{I}_{\mathcal J_t} X_t\|^2 \mathop{\mathrm{tr}}(U_t^{-3})}} \\
&\quad {\color{Maroon}{+ \sum_{i \in \mathcal J_t} \mathop{\mathrm{tr}}\left(U_t^{-1} A_i U_t^{-1} A_i U_t^{-1}\right)}}\end{aligned}$$
The blue terms are the martingales, which we ignore for the moment. The
green terms are error terms. The two red terms need to be balanced by
choosing $\theta,\lambda$ appropriately.
For any $A,B$ with singular values
$\sigma_1(A) \geqslant\cdots \geqslant\sigma_n(A)$ and
$\sigma_1(B) \geqslant\cdots \geqslant\sigma_n(B)$, we have
$$\mathop{\mathrm{tr}}(A B) \leqslant\sum_{i=1}^n \sigma_i(A) \sigma_i(B)\,.$$
In particular, if $A,B \succeq 0$, then
$\mathop{\mathrm{tr}}(AB) \leqslant\mathop{\mathrm{tr}}(A) \mathop{\mathrm{tr}}(B)$.
Let us use this to bound the error terms
$$\begin{aligned}
{\color{ForestGreen}{4 \lambda^2 \|\mathbb{I}_{\mathcal J_t} X_t\|^2 \mathop{\mathrm{tr}}(U_t^{-3})}} &\leqslant 4 \lambda^2 m \mathop{\mathrm{tr}}(U_t^{-2}) \mathop{\mathrm{tr}}(U_t^{-1}) \\
{\color{ForestGreen}{4 \lambda \sum_{i \in \mathcal J_t} X_t^i \mathop{\mathrm{tr}}(U_t^{-3} A_i)}} &\leqslant 4 \lambda (\theta+\lambda m) \mathop{\mathrm{tr}}(U_t^{-2}) \mathop{\mathrm{tr}}(U_t^{-1})\,.\end{aligned}$$
This fact also bounds, for any $U \succ 0$ and symmetric $V$,
$$\mathop{\mathrm{tr}}\left(U^{-1} V U^{-1} V U^{-1}\right) \leqslant\mathop{\mathrm{tr}}(U^{-2} |V|) \mathop{\mathrm{tr}}(U^{-1} |V|)\,.$$
Accordingly, we define the set of bad coordinates:
$$\mathcal I_t \mathrel{\mathop:}=\left\{ i \in [m] : \mathop{\mathrm{tr}}(U_t^{-1} |A_i|) > \frac{\gamma}{m} \mathop{\mathrm{tr}}(U_t^{-1}) \right\}\,,$$
and finally
$\mathcal J_t \mathrel{\mathop:}=[m] \setminus (\mathcal I_t \cup \mathcal S_t)$.
Observe that
$$\sum_{i \in \mathcal J_t} \mathop{\mathrm{tr}}\left(U_t^{-1} A_i U_t^{-1} A_i U_t^{-1}\right) \leqslant\frac{\gamma}{m} \mathop{\mathrm{tr}}(U_t^{-1}) \sum_{i=1}^m \mathop{\mathrm{tr}}(U_t^{-2} |A_i|) \leqslant\frac{\gamma}{m} \mathop{\mathrm{tr}}(U_t^{-1}) \mathop{\mathrm{tr}}(U_t^{-2})\,,$$
where the last inequality uses
$|A_1|+\cdots+|A_m| \preceq \mathbb{I}_n$.
The point is to maintain
$\Phi(X_t) \leqslant 2 \Phi_0(X_0) = 2 n/\theta$ by proving that
$d \Phi(X_t) < 0$. Calculate:
$$\begin{aligned}
d \Phi(X_t) &\leqslant\mathop{\mathrm{tr}}(U_t^{-2}) \left(- \lambda |\mathcal J_t| + \gamma \mathop{\mathrm{tr}}(U_t^{-1})\right) \\
&=\mathop{\mathrm{tr}}(U_t^{-2}) \left(- \lambda |\mathcal J_t| + \frac{\gamma}{m} \Phi(X_t)\right) \\
&\leqslant\mathop{\mathrm{tr}}(U_t^{-2}) \left(- \lambda |\mathcal J_t| + \frac{2 \gamma n}{\theta m}\right).\end{aligned}$$
For $\gamma \mathrel{\mathop:}=8$, we have
$|\mathcal J_t| \geqslant m - |\mathcal I_t| - |\mathcal S_t| \geqslant\frac{m}{2} - \frac{m}{\gamma} \geqslant\frac{m}{4}$.
So let us take
$$\theta \mathrel{\mathop:}=4 \left(\frac{n}{m}\right)^{1/2},\ \lambda \mathrel{\mathop:}=\frac{4}{m} \theta\,.$$
If we maintain $\max_{t \in [0,T]} \Phi(X_t) \leqslant 2n/\theta$, it
means that $U_T \succ 0$, so
$$\sum_{i=1}^m X_T^i A_i \preceq \left(\theta + \lambda \|X_T\|^2\right) \mathbb{I}_n \preceq 5 \theta \mathbb{I}_n \asymp \left(\frac{n}{m}\right)^{1/2} \mathbb{I}_n\,,$$
completing the proof.