We have seen that if \(\{X_t : t \in T \}\) is a symmetric sub-Gaussian process equipped with the distance \(d(s,t) = \sqrt{\E(X_s-X_t)^2}\), then for any sequence \(\{t_0\} = T_0 \subseteq T_1 \subseteq \cdots \subseteq T_h \subseteq \cdots T\) with \(|T_h| \leq 2^{2^h}\), we have a chaining upper bound
\[\E \max_{t \in T} X_t \lesssim \max_{t \in T} \sum_{h \geq 0} 2^{n/2} d(t,T_h)\,.\]Let us define \(\gamma_2(T,d)\) as the best possible chaining upper bound, i.e., the infimum of the RHS over all compliant sequences \(\{T_h\}\). This is Talagrandâ€™s famous \(\gamma_2\) functional, and it may be surprising that it characterizes (up to constants) the expected supremum of a Gaussian process.
Majorizing-measures theorem (Fernique-Talagrand): If \(\{X_t : t\in T \}\) is a centered Gaussian process, then
\[\E \max_{t \in T} X_t \asymp \gamma_2(T,d)\,.\]Random signed sums: Suppose that \(\e_1,\e_2,\ldots,\e_n\) is a sequence of independent, symmetric random variables and we define the random process:
\[\left\{ X_t = \e_1 t_1 + \e_2 t_2 + \cdots + \e_n t_n : t \in T \right\}\,,\]where \(T \subseteq \R^n\). Then classical concentration bounds show that \(\{X_t : t\in T\}\) is subgaussian, and it is straightforward to calculate the distance (by expanding the square):
\[d(s,t) = \|s-t\|_2\,.\]The chaining upper bound gives \(\E \max_{t \in T} X_t \lesssim \gamma_2(T,d)\). Let us now use this to understand the Hypergraph sparsification problem.
Hypergraph sparsification. Consider a weighted hypergraph \(H=(V,E,w)\) where \(\{w_e : e \in E\}\) are nonnegative edge weights. We associated to \(H\) the quadratic expression
\[Q_H(x) = \sum_{e \in E} w_e \max_{u,v \in e} (x_u-x_v)^2\,.\]If \(H\) were a graph (where \(|e|=2\) for every \(e \in E\)), this would correspond to the quadratic form of the graph Laplacian.
Our goal is to find another hypergraph \(\tilde{H} = (V,\tilde{E},\tilde{w})\) with \(\tilde{E} \subseteq E\) and such that
\[\begin{equation}\label{eq:sparse-approx} |Q_H(x)-Q_{\tilde{H}}(x)| \leq \e Q_H(x),\qquad \forall x \in \R^n\,, \end{equation}\]where \(\e > 0\) is some accuracy parameter. Moreover, we would like \(\abs{\tilde{E}}\) to be small, ideally near-linear in \(\abs{V}\) (while \(\abs{E}\) could be as large as \(2^{\abs{V}}\)).
Independent random sampling. Suppose we have a probability distribution \(\{\mu_e : e \in E\}\). Let us form \(\tilde{H}\) by sampling \(M\) edges \(e_1,e_2,\ldots,e_M\) independently from \(\mu\) and setting the edge weights in \(\tilde{H}\) so that
\[Q_{\tilde{H}}(x) = \frac{1}{M} \sum_{k=1}^M \frac{w_{e_k}}{\mu_{e_k}} Q_{e_k}(x)\,,\]where we define \(Q_e(x) = \max_{u,v \in e} (x_u-x_v)^2\).
Note that \(\E[Q_{\tilde{H}}(x)] = Q_H(x)\) for all \(x \in \R^n\), and to establish a bound like \eqref{eq:sparse-approx}, it makes sense to study the quantity
\[\E \max_{Q_H(x) \leq 1} |Q_H(x) - Q_{\tilde{H}}(x)|.\]Passing to a sub-Gaussian process. Let \(\hat{H}\) be an independent copy of \(\tilde{H}\) and write
\[\begin{eqnarray*} \E_{\tilde{H}} \max_{Q_H(x) \leq 1} |Q_H(x) - Q_{\tilde{H}}(x)| &= \E_{\tilde{H}} \max_{Q_H(x) \leq 1} |\E_{\hat{H}}[Q_{\hat{H}}(x)]-Q_{\tilde{H}}(x)| \\ &= \E_{\tilde{H}} \max_{Q_H(x) \leq 1} |\E_{\hat{H}} [Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)]| \\ &\leq \E_{\hat{H},\tilde{H}} \max_{Q_H(x) \leq 1} |Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)|\,, \end{eqnarray*}\]where we have used convexity to pull out the expectation: \(\abs{\E X} \leq E \abs{X}\) and \(\max(\E X_1,\ldots,\E X_k) \leq \E \max(X_1,\ldots,X_k)\)
Note that for any choice of signs \(\e_1,\ldots,\e_M \in \{-1,1\}\), we have
\[\begin{eqnarray*} |Q_{\hat{H}}(x)-Q_{\hat{H}}(x)| &= \frac{1}{M} \left|\sum_{k=1}^M \left(\frac{w_{\hat{e}_k}}{\mu_{\hat{e}_k}} Q_{\hat{e}_k}(x)- \frac{w_{\tilde{e}_k}}{\mu_{\tilde{e}_k}} Q_{\hat{e}_k}(x)\right)\right| \\ &= \frac{1}{M} \left|\sum_{k=1}^M \e_k \left(\frac{w_{\hat{e}_k}}{\mu_{\hat{e}_k}} Q_{\hat{e}_k}(x)- \frac{w_{\tilde{e}_k}}{\mu_{\tilde{e}_k}} Q_{\hat{e}_k}(x)\right)\right|. \end{eqnarray*}\]because \(\tilde{e}_k\) and \(\hat{e}_k\) have the same law. Define \(t_k(x) \seteq \frac{w_{e_k}}{\mu_{e_k}} Q_{\hat{e}_k}(x)\) and then we have
\[\max_{Q_H(x) \leq 1} |Q_{\hat{H}}(x)-Q_{\tilde{H}}(x)| \leq \frac{2}{M} \E_{(\e_1,\ldots,\e_M)} \max_{Q_H(x) \leq 1} \left|\sum_{k=1}^M \e_k t_k\right| = \frac{4}{M} \E_{(\e_1,\ldots,\e_M)} \max_{Q_H(x) \leq 1} \sum_{k=1}^M \e_k t_k(x)\,,\]where we take the expectation over i.i.d. uniformly random signs, and we removed the absolute values using symmetry of the process.
So now to bound the quantity we cared about initially, it suffices to bound \(\max_{Q_H(x) \leq 1} \sum_{k=1}^M \e_k t_k(x)\) for every choice of hyperedges \(e_1,\ldots,e_M\). Of course, this is a sub-Gaussian process, and so we are led to the study of \(\gamma_2(T,d)\), where \(T = \{ x \in \R^n : Q_H(x) \leq 1\}\), and
\[d(x,y) = \left(\sum_{k=1}^M \left(\frac{w_{e_k}}{\mu_{e_k}} (Q_{e_k}(x)-Q_{e_k}(y))\right)^2\right)^{1/2}\]Abstracting out the norm sparsification problem. Note that each \(\sqrt{Q_e(x)}\) for \(e \in E\) is actually a norm on \(\R^n\) (technically it is only a seminorm since \(Q_e(x)=0\) does not entail \(x=0\)).
Thus we can consider the following related problem: Suppose that \(N_1,N_2,\ldots,N_m\) are (semi)norms on \(\R^n\) and we want to bound
\[\E \max_{x \in T} \sum_{k=1}^m \e_k N_k(x)^2\,,\]where \(T \subseteq \R^n\) and \(\e_1,\ldots,\e_m\) are i.i.d. random signs. Again, this is a sub-Gaussian process equipped with the distance
\[d(x,y) = \left(\sum_{k=1}^M (N_k(x)^2 - N_k(y)^2)^2\right)^{1/2}.\]In the next lecture, we will find upper bounds on \(\gamma_2(T,d)\) for this faimly of problems.