## General norms: Concentration analysis

• Recall that $N(x) = N_1(x) + \cdots + N_m(x)\,,$ and $B_N$ is the unit ball in the norm $N$. To apply our standard symmetrization/chaining analysis, define the metric
$$d_{\nu}(x,y) \mathrel{\mathop:}=\left(\sum_{j=1}^M \left(\frac{N_{\nu_j}(x)-N_{\nu_j}(y)}{M \rho_{\nu_j}}\right)^2 \right)^{1/2}.$$
• We need a bound on the covering numbers of the form

$(\log \mathcal{N}(B_N, d_{\nu}, r))^{1/2} \lesssim \frac{L}{r} \left(\max_{x \in B_N} \tilde{N}_{\nu}(x)\right)^{1/2}\,,$

with

$\tilde{N}_{\nu}(x) \mathrel{\mathop:}=\sum_{j=1}^M \frac{N_{\nu_j}(x)}{M \rho_{\nu_j}}\,.$
• As usual, we will bound $d_{\nu}$ by an $\ell_{\infty}$ metric: For $x,y \in B_N$,

$d_{\nu}(x,y) \lesssim \max_i \left(\frac{|N_i(x)-N_i(y)|}{M \rho_i}\right)^{1/2} \left(\max_{x \in B_N} \tilde{N}_{\nu}(x) \right)^{1/2}.$
• Define $\mathcal N_{\infty}(x) \mathrel{\mathop:}=\max_{i \in [m]} \frac{N_i(x)}{M \rho_i}$ so that

$d_{\nu}(x,y) \lesssim \mathcal N_{\infty}(x-y)^{1/2} \left(\max_{x \in B_N} \tilde{N}_{\nu}(x) \right)^{1/2},$

where we used $\abs{N_i(x)-N_i(y)} \leqslant N_i(x-y)$.

• Thus we have reduced to the simpler problem of bounding the covering numbers

\begin{aligned} (\log \mathcal{N}(B_N, \mathcal N_{\infty}^{1/2}, r))^{1/2} \lesssim \frac{L}{r}\,, \ \forall r > 0 &\iff (\log \mathcal{N}(B_N, \mathcal N_{\infty}, r^2))^{1/2} \lesssim \frac{L}{r}\,, \ \forall r > 0 \\ &\iff \log \mathcal{N}(B_N, \mathcal N_{\infty}, r) \lesssim \frac{L^2}{r}\,, \ \forall r > 0\,.\end{aligned}

Now the generalized dual-Sudakov Lemma (for $p=1$) gives

$\log \mathcal{N}(B_N, \mathcal N_{\infty}, r) \lesssim \frac{\mathbb{E}[\mathcal N_{\infty}(\mathbf{Z})]}{r}\,,$

where $\mathbf{Z}$ is the random variable with density $e^{-N(x)}$. Note that this applies without any assumptions on $N$ because every norm is $p$-uniformly smooth when $p=1$.

• Recall that in this setting we take $\rho_i = \frac{1}{n} \mathbb{E}[N_i(\mathbf{Z})]$, therefore

$\mathcal N_{\infty}(x) = \max_i \frac{N_i(x)}{\rho_i} = \frac{n}{M} \max_i \frac{N_i(x)}{\mathbb{E}N_i(\mathbf{Z})}\,.$

We will momentarily prove that

$$$\label{eq:max-ub} \mathbb{E}\max_{i} \frac{N_i(\mathbf{Z})}{\mathbb{E}N_i(\mathbf{Z})} \lesssim \psi_n \log M\,,$$$

where $\psi_n$ is the KLS constant in $n$ dimensions. In this case, we arrive at the bound

$\log \mathcal N(B_N, \mathcal N_{\infty}, r) \lesssim \frac{1}{r} \frac{n}{M} \psi_n \log M\,,$

which yields

$\left(\log \mathcal N(B_N, d_{\nu}, r)\right)^{1/2} \lesssim \frac{1}{r} \left(\frac{n}{M} \psi_n \log M\right)^{1/2} \left(\max_{x \in B_N} \tilde{N}_{\nu}(x)\right)^{1/2}\,,$
• Combined with Dudley’s entropy bound and Problem 4 on Homework #1, this yields: For every $\nu_1,\ldots,\nu_M \in [m]^M$,

$\mathbb{E}_{\varepsilon_1,\ldots,\varepsilon_M \in \{\pm 1\}} \max_{N(x) \leqslant 1} \left|\sum_{j=1}^M \varepsilon_j \frac{N_{\nu_j}(x)}{M \rho_{\nu_j}}\right| \lesssim \left(\frac{n}{M} \psi_n \log M\right)^{1/2} (\log n) \left(\max_{x \in B_N} \tilde{N}_{\nu}(x)\right)^{1/2}.$

If we choose $M \asymp \frac{n}{\delta^2} \psi_n \log(n/\delta) (\log n)^2$, then our sparsification framework yields

$\mathbb{E}\max_{N(x) \leqslant 1} |N(x)-\tilde{N}_{\nu}(x)| \lesssim \delta\,,$

completing the proof.

## Concentration of Lipschitz functions and KLS

• We are left to prove \eqref{eq:max-ub}. First, observe that the maximum need only be taken over indicies $i$ in the set ${ \nu_1,\ldots,\nu_M }$. So we are looking at the expected maximum of (at most) $M$ random variables. The claimed upper bound follows immediately (see Problem 2(A) on Homework #2) from an exponential concentration inequality:

$$$\label{eq:ni-con} \mathbb{P}\left[N_i(\mathbf{Z}) > (t+1) \mathbb{E}[N_i(\mathbf{Z})]\right] \leqslant e^{-c t/\psi_n}\,.$$$

Note that $N_i$ is only a small piece of the norm $N$, so there is essentially no exploitable relationship between $N_i$ and $\mathbf{Z}$. Therefore the concentration inequality will have to hold in some generality.

• Log-concave random vector: A random variable $\mathbf{X}$ on $\mathbb{R}^n$ is log-concave if its density is $e^{-g(x)}\,dx$ for some convex function $g : \mathbb{R}^n \to \mathbb{R}$.

• Isotropic random vector: A random variable $\mathbf{X}$ on $\mathbb{R}^n$ is isotropic if $\mathbb{E}[\mathbf{X}]=0$ and $\mathbb{E}[\mathbf{X} \mathbf{X}^{\top}] = I$.

• Lipschitz functions: A function $\varphi: \mathbb{R}^n \to \mathbb{R}$ is $L$-Lipschitz (with respect to the Euclidean distance) if

$|\varphi(x)-\varphi(y)\| \leqslant L \|x-y\|_2\,, \quad \forall x,y \in \mathbb{R}^n\,.$
• Lemma [Gromov-Milman]: If $\varphi: \mathbb{R}^n \to \mathbb{R}$ is $L$-Lipschitz and $\mathbf{X}$ is an isotropic, log-concave random variable, then

$$$\label{eq:kls-tail} \mathbb{P}\left(|\varphi(\mathbf{X})-\mathbb{E}[\varphi(\mathbf{X})]| > t L\right) \leqslant e^{-c t/\psi_n}\,,$$$

where $c > 0$ is an absolute constant and $\psi_n$ is the KLS constant in $n$ dimensions.

Note that the KLS constant has many definitions that are equivalent up to a constant factor, and one could take \eqref{eq:kls-tail} as the definition. See links to related material at the end.

• To prove \eqref{eq:ni-con}, first note that our random variable $\mathbf{Z}$ is log-concave because $N(x)$ is a convex function. To make it isotropic, we define $\mathbf{X} \mathrel{\mathop:}=A^{-1/2} \mathbf{Z}$, where $A \mathrel{\mathop:}=\mathbb{E}[\mathbf{Z} \mathbf{Z}^{\top}]$. We will apply the lemma to the map $\varphi(x) \seteq N_i(A^{1/2} x)$, and we need only evaluate the Lipschitz constant of $\varphi$.

In Homework #2 problem 4, you show that $\varphi$ is $L$-Lipschitz with $L \lesssim \mathbb{E}[N_i(\mathbf{Z})]$, yielding \eqref{eq:ni-con}.