Entropy numbers
Suppose now that \(\{ X_t : t \in T \}\) is subgaussian with respect to the distance \(d\). Our goal is to get some control on \(\E \max_{t \in T} X_t\), and one of the central tenants of the theory is that we can do this using just the geometry of the distance \(d\). Let’s suppose that \(d\) is symmetric \(d(s,t)=d(t,s)\) for all \(s,t \in T\), and that it satisfies the triangle inequality:
\[d(s,t) \leq d(s,u) + d(u,t)\,,\qquad \forall u,s,t \in T\,.\]Define the quantity \(e_h(T,d)\) as the smallest radius \(r\) such that \((T,d)\) can be covered by at most \(2^{2^h}\) balls of radius \(r\), where a ball in \((T,d)\) is given by
\[B(t, r) \seteq \{ s \in Y : d(s,t) \leq r \}\,.\]It follows by a greedy construction that there exists a “net” \(N_h \subseteq T\) such that \(|N_h| \leq 2^{2^h}\) and every point of \(T\) is within distance at most \(e_h(T,d)\) of a point in \(N_h\).
It is useful to think of \(N_h\) as the “best” uniform approximation of \((T,d)\) when one is only allowed to use \(2^{2^h}\) points. Let’s fix any such sets \(\{N_h : h=0,1,2,\ldots\}\).
Let \(\pi_h : T \to N_h\) be a map that satisfies
\[\begin{equation}\label{eq:pih} d(t, \pi_h(t)) \leq e_h(T,d)\,\qquad \forall t \in T\,. \end{equation}\]For instance, a natural choice is for \(\pi_h(t)\) to map \(t\) to the closest point of \(N_h\), but we will only need \eqref{eq:pih}.
A multilevel union bound
Now define
\[U_h \seteq \{ (t, t') : t \in N_{h}, t' \in N_{h+1}, d(t,t') \leq 2 e_h(T,d) \}\,.\]Think of these as the “edges” from points in \(N_{h+1}\) to nearby points in \(N_h\). Let us denote the event
\[\mathcal{E}_h(\lambda) \seteq \left\{|X_t - X_{t'}| \leq 2 \lambda e_h(T,d),\ \forall (t,t') \in U_h \right\}.\]If we think about \(U_h\) as containing “edges” that gets stretched to random lengths, then this event entails no edge of \(U_h\) being stretched too much. Since every pair in \(U_h\) is within distance \(2 e_h(T,d)\), a union bound combined with \eqref{eq:subgaussian} gives
\[\begin{align*} \P\left(\mathcal{E}_h(\lambda) \right) &\geq 1 - |U_h| e^{-c \lambda^2} \\ &\geq 1 - |N_h| |N_{h+1}| e^{-c \lambda^2} \\ &\geq 1 - 2^{2^{h+2}} e^{-c \lambda^2}\,, \end{align*}\]where we have used the obvious bound \(\abs{U_h} \leq \abs{N_h} \abs{N_{h+1}}\) and the fact that \(\abs{N_h} \abs{N_{h+1}} \leq 2^{2^h} 2^{2^{h+1}}\).
A natural choice is now \(\lambda \seteq 4 \frac{\alpha}{c} 2^{h/2}\) for some number \(\alpha > 0\) we will choose later. If we define \(\mathcal{E}_h \seteq \mathcal{E}_h((\alpha/c) 2^{(h+2)/2})\), then the preceding bound gives us
\[\begin{equation}\label{eq:Ehtail} \P\left(\mathcal{E}_h\right) \geq 1 - 2^{2^{h+2}} e^{-2 \alpha 2^{h+2}} \geq 1 - e^{-\alpha 2^{h}}\,. \end{equation}\]Note that the leading constant is not so important, we just wanted a clean form for the lower bound. The real point here is that \(\lambda\) is scaling like \(2^{h/2}\), and the probability of \(\mathcal{E}_h\) not occurring can be summed over all \(h \geq 0\), since it’s doubly geometric.
If we now simply take a union bound over \(h=0,1,2,\ldots,\), we get
\[\begin{equation}\label{eq:Ehall} \Pr\left(\mathcal{E}_0 \wedge \mathcal{E}_1 \wedge \mathcal{E}_2 \wedge \cdots \right) \geq 1 - \sum_{h \geq 0} e^{-\alpha 2^h} \geq 1 - 2 e^{-\alpha}\,. \end{equation}\]The chaining argument
We’ve gone through all this effort because it turns out that if \(\mathcal{E}_h\) holds for every \(h \geq 0\), it gives us a bound on \(\E \max_{t \in T} (X_t - X_{t_0})\), where \(\{t_0\} = N_0\) (note that \(N_0\) could be any fixed point of \(T\)). The argument goes via chaining. Consider first the infinite telescoping sum
\[X_t - X_{t_0} = \cdots + \left(X_{\pi_{h+2}(t)} - X_{\pi_{h+1}(t)}\right) + \left(X_{\pi_{h+1}(t)} - X_{\pi_h(t)}\right) + \cdots + \left(X_{\pi_1(t)} - X_{\pi_0(t)}\right)\]If we think of this as a chain, then every link in the chain is going to lie in some set \(U_h\), and thus having control on the stretch of all the “edges” we considered previously will give us control on every term in this chain. You can assume the sum is finite if you like, because
\[\E \max_{t \in T} (X_t - X_{t_0}) = \max_{S \subseteq T : |S| < \infty} \E\max_{t \in S} (X_t - X_{t_0})\,,\]Now, note that \(\pi_h(t) \in N_h\) and \(\pi_{h+1}(t) \in N_{h+1}\), and by the triangle inequality
\[d(\pi_h(t), \pi_{h+1}(t)) \leq d(\pi_h(t), t) + d(\pi_{h+1}(t), t) \leq e_h(T,d) + e_{h+1}(T,d) \leq 2 e_{h}(T,d)\,.\]In other words, \((\pi_h(t), \pi_{h+1}(t)) \in U_h\), and therefore
\[\mathcal{E}_h \implies |X_{\pi_{h+1}(t)} - X_{\pi_h(t)}| \leq 8 \frac{\alpha}{c} 2^{h/2} e_h(T,d)\,.\]And therefore the telescoping representation of \(X_t - X_{t_0}\) gives us
\[\mathcal{E}_0 \wedge \mathcal{E}_1 \wedge \mathcal{E}_2 \wedge \cdots \implies |X_t - X_{t_0}| \leq 8 \frac{\alpha}{c} \sum_{h \geq 0} 2^{h/2} e_h(T,d)\,.\]Combining this with \eqref{eq:Ehall} gives
\[\P\left(|X_t - X_{t_0}| \leq 8 \frac{\alpha}{c} \sum_{h \geq 0} 2^{h/2} e_h(T,d),\quad \forall t \in T\right) \geq 1 - 2 e^{-\alpha}\,.\]By integrating over \(\alpha\), one immediately concludes that
\[\E \max_{t \in T} X_t \leq \E \max_{t \in T} |X_t - X_{t_0}| \lesssim \frac{1}{c} \sum_{h \geq 0} 2^{h/2} e_h(T,d)\,.\]This is known as Dudley’s entropy bound.