General norm sparsification

CSE 599, Autumn 2023

The material in this lecture is taken from Sparsifying sums of norms.

Suppose that $N_1,\ldots,N_m : \R^n \to \R_+$ are semi-norms on $\R^n$, and define the semi-norm given by their $\ell_p$ sum, for $p > 1$:

$$\begin{equation}\label{eq:ns} N(x) \mathrel{\mathop:}=\left(N_1(x)^p + \cdots + N_m(x)^p\right)^{1/p}\,.\end{equation} $$

Our goal will be to sparification where the terms are $f_i(x) \mathrel{\mathop:}=N_i(x)^p$ in our usual notation, and $F(x) = N(x)^p$.

A semi-norm $N$ is nonnegative and satisfies $N(\lambda x) = |\lambda| N(x)$ and $N(x+y) \leqslant N(x)+N(y)$ for all $\lambda\in \R$, $x,y \in \R^n$, though possibly $N(x)=0$ for $x \neq 0$.

Note that, equivalently, a semi-norm is a homogeneous function: $N(\lambda x) = |\lambda| N(x)$ such that $N$ is also convex.

Examples

Spectral hypergraph sparsifiers. Consider subsets $S_1,\ldots,S_m \subseteq {1,\ldots,n}$ and
\[f_i(x) \mathrel{\mathop:}=\max_{u,v \in S_i} (x_u-x_v)^2\,.\]
Then sparsifying $F(x) = f_1(x) + \cdots + f_m(x)$ is called spectral hypergraph sparsification.

More generally, we can consider linear operators ${ A_i \in \R^{d_i \times n} }$ and the terms $f_i(x) \mathrel{\mathop:}=\norm{A_i x}_{\infty}^2$.
Symmetric submodular functions. A function $f : 2^V \to \R_+$ is submodular if
\[f(S \cup \{v\}) - f(S) \geqslant f(T \cup \{v\}) - f(T)\,,\quad \forall S \subseteq T \subseteq V, v \in V\,.\]
If one imagines $S \subseteq V$ to be a set of items and $f(S)$ to be the total utility, then this says that the marginal utility of an additional item $v \in V$ is non-increasing as the basket of items increases. As a simple example, consider $w_1,\ldots,w_n \geqslant 0$ and $f(S) \mathrel{\mathop:}=\min\left(B, \sum_{i \in S} w_i\right)$. Here, the additional utility of an item drops to $0$ as one approaches the “budget” $B$.
- A submodular function $f : 2^V \to \R_+$ is symmetric if $f(\emptyset)=0$ and $f(S) = f(V \setminus S)$ for all $S \subseteq V$. A canonical example is where $f(S)$ denote the set of edges leaving a cut $(S, V \setminus S)$ in an undirected graph.
- The Lovász extension of a submodular function is the function $\tilde{f} : [0,1]^V \to \R_+$ defined by
  \[\tilde{f}(x) \mathrel{\mathop:}=\mathop{\mathrm{\E}}[f(\mathbf{S}_x)]\,,\]
  where $\mathbf{S}_x$ is a random set such that $i \in \mathbf{S}_x$ with probability $x_i$, independently for every $i \in V$.
  
  Fact: A function $f : 2^V \to \R_+$ is submodular if and only if $\tilde{f} : [0,1]^V \to \R_+$ is convex.
- Because of this fact, it turns out that a function $f$ is symmetric and submodular if and only if $\tilde{f}$ is a semi-norm, where we extend $\tilde{f} : \R^V \to \R_+$ to all of $\R^V$ via the definition
  \[\begin{equation}\label{eq:lov-ext} \tilde{f}(x) \mathrel{\mathop:}=\int_{-\infty}^{\infty} f(\{ i : x_i \leqslant t \})\,dt\,.\end{equation}\]
  Note that if $x \in \{0,1\}^V$ is such that $x_i = \mathbf{1}_S(i)$, then we have
  \[\tilde{f}(x) = \int_{0}^1 f(\{ i : x_i \leqslant t \}\,dt\,,\]
  since $f(\emptyset)=f(V)=0$. Moreover, $\{ i : x_i \leqslant t \} = S$ for $0 < t < 1$, hence $\tilde{f}(x)=f(S)$. So $\tilde{f}$ is indeed an extension of $f$. A similar argument shows that for $x$ defined this way, $\tilde{f}(-x)=f(V \setminus S)$.
- Lemma: A function $f : 2^V \to \R_+$ is symmetric and submodular if and only if $\tilde{f}$ is a semi-norm.
- Proof: A semi-norm $\tilde{f}$ is convex, and therefore by the fact above, $f$ is submodular. Moreover $\tilde{f}(0)=0$ and $\tilde{f}(x)=\tilde{f}(-x)$ for a semi-norm, so if $x \in {0,1}^V$ is given by $x_i = \mathbf{1}_S(i)$, then $f(S)=\tilde{f}(x)=\tilde{f}(-x)=f(V\setminus S)$ and thus $f$ is symmetric.
  
  Conversely, if $f$ is symmetric and submodular, then $\tilde{f}(x)=\tilde{f}(-x)$ and $\tilde{f}$ is convex. Moreover, the defintion \eqref{eq:lov-ext} shows that $\tilde{f}(cx)=c \tilde{f}(x)$ for $c > 0$, hence $\tilde{f}$ is also homogeneous, implying that it’s a semi-norm.
- Thus the sparsification problem for sums $F(S) = f_1(S) + \cdots + f_m(S)$ of symmetric submodular functions ${f_i}$ reduces to the sparsification problem for semi-norms: $\tilde{F}(x) \mathrel{\mathop:}=\tilde{f}_1(x) + \cdots + \tilde{f}_m(x)$.

Importance scores

Let us consider now the sparsification problem \eqref{eq:ns} for $p=1$. As we discussed in previous lectures, a reasonable way of assigning sampling probabilities is to take
\[\begin{equation}\label{eq:ns-rhoi} \rho_i \mathrel{\mathop:}=\frac{\mathop{\mathrm{\E}}[N_i(\mathbf{Z})]}{\mathop{\mathrm{\E}}[N(\mathbf{Z})]}\end{equation}\]
for some random variable $\mathbf{Z}$ on $\R^n$, as this represents the “importance” of the $i$th term $N_i(x)$ under the distribution of $\mathbf{Z}$.
Let us also recall the notion of approximation we’re looking for:
\[\max_{N(x) \leqslant 1} |N(x)-\tilde{N}(x)| \leqslant\varepsilon\,.\]
Given the desired approximation guarantee, it’s natural to simply take $\mathbf{Z}$ to be uniform on the unit ball
\[B_N \mathrel{\mathop:}=\{ x \in \R^n : N(x) \leqslant 1 \}\,.\]
For the purposes of analysis, it will be easier to take $\mathbf{Z}$ as the random variable on $\R^n$ with density proportional to $e^{-N(x)}$, but let us see that gives the same probabilities \eqref{eq:ns-rhoi} as averaging uniformly over $B_N$. To compare the density $e^{-N(x)}$ and the density that is uniform on $B_N$, one can take $\varphi_1(y)=e^{-y}$ and $\varphi_2(y) = \mathbf{1}_{\{y \leqslant 1\}}$ in the next lemma.
Lemma: Suppose that the density of the random variable $\mathbf{Z}$ is $\varphi(N(x))$ for any sufficiently nice function $\varphi$. Then the probabilities defined in \eqref{eq:ns-rhoi} are independent of $\varphi$.
Proof: Define $V(r) \mathrel{\mathop:}=\mathrm{vol}_n(r B_N)$ and note that $V(r) = r^n \mathrm{vol}_n(B_n)$. Let $u : \R^n \to \R$ be a homogeneous function, i.e., $u(\lambda x) = |\lambda| u(x)$ for all $\lambda \in \R, x \in \R^n$. Then we have
\[\begin{aligned} \int_{\R^n} \varphi(N(x)) u(x)\,dx &= \int \varphi(r) V'(r) \int_{x : N(x)=r} u(x)\,dx \\ &= n \mathrm{vol}_n(B_N) \int \varphi(r) r^{n-1} \int_{x : N(x)=r} u(x)\,dx \,dr \\ &= n \mathrm{vol}_n(B_N) \int \varphi(r) r^{n} \,dr \int_{x : N(x)=1} u(x)\,dx\,. \end{aligned}\]
In other words, the integral only depends on the latter spherical integral, which is independent of $\varphi$.