Spectral clustering (Thu, May 05)

Graph conductance

As in the last lecture, consider an undirected graph \(\def\seteq{\mathrel{\vcenter{:}}=} \def\cE{\mathcal{E}} \def\cL{\mathcal{L}} \def\R{\mathbb{R}} \def\cR{\mathcal{R}} \def\argmin{\mathrm{argmin}} \def\llangle{\left\langle} \def\rrangle{\right\rangle} \def\1{\mathbf{1}} \def\e{\varepsilon} \def\vol{\mathrm{vol}} \def\avint{\mathop{\,\rlap{-}\!\!\int}\nolimits}\) \(G=(V,E)\) with \(n=|V|\) vertices. One way to capture the the idea that a subset \(S \subseteq V\) of vertices is a good “cluster” is through the notion of conductance. Define the volume of \(S\) as the quantity

\[\vol(S) = \sum_{u \in S} \deg(u),\]

i.e., the sum of vertex degrees in $S$. Then we define the conductance of the set \(S\) by

\[\phi_G(S) \seteq \frac{|E(S,\bar{S})|}{\min(\vol(S),\vol(\bar{S}))},\]

where \(E(S,\bar{S})\) denotes the set of edges that run from \(S\) to its complement.

Let us define \(\rho_2(G)\) to be the minimum of the conductance as it runs over all non-empty subsets of \(G\):

\[\rho_2(G) \seteq \min_{\emptyset \neq S \subseteq V} \phi_G(S)\,.\]

Spectral clusters

It turns out that the eigenvectors of the Laplacian of \(G\) encode information about sets of small conductance. It will be more useful to work with the normalized Laplacian. For simplicity, let’s assume that our graph is \(d\)-regular, which means that \(\deg(v)=d\) for every \(v \in V\). Then the normalized Laplacian is just the usual (“combinatorial”) Laplacian scaled down by \(d\):

\[\cL_G \seteq \frac{1}{d} L_G = I - \frac{1}{d} A\,,\]

where \(A\) is the adjacency matrix of \(G\). Recall that if we sort the eigenvalues of \(\cL_G\) as \(\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n\), then \(\lambda_1=0\) corresponds to the constant eigenvector \(\mathbf{x}_1 = (1/\sqrt{n},\ldots,1/\sqrt{n})\). The next claim gives the first connection between the 2nd smallest eigenvalue \(\lambda_2(G)\) of \(\cL_G\) and the sets of minimal conductance in \(G\).

Claim: It holds that \(\lambda_2(G) \leq 2 \rho_2(G)\) for any graph \(G\).

To prove this claim, it will help to use the variational characterization of \(\lambda_2(G)\) from the last lecture:

\[\begin{align*} \lambda_2(G) &= \min \left\{ \langle \mathbf{x}, \cL_G \mathbf{x} \rangle : \mathbf{x} \perp \mathbf{x}_1, \|\mathbf{x}\|^2=1\right\} \nonumber \\ &= \min \left\{ \frac{\langle \mathbf{x}, \cL_G \mathbf{x} \rangle}{\|\mathbf{x}\|^2} : \mathbf{x} \perp \mathbf{x}_1\right\} \nonumber \\ &= \min \left\{ \frac{\langle \mathbf{x}, \cL_G \mathbf{x} \rangle}{\|\mathbf{x}-\bar{\mathbf{x}}\|^2} : \mathbf{x} \neq 0\right\}, \nonumber \end{align*}\]

where \(\bar{\mathbf{x}} \seteq \left(\frac{1}{n} \sum_{u \in V} x_u\right)(1,1,\ldots,1)\) is the vector containing the average value of the coordinates of \(\mathbf{x}\) in every coordinate. Note that we have removed the condition \(\mathbf{x} \perp \mathbf{x}_1\) by removing the part of \(\mathbf{x}\) that is not orthogonal to \(\mathbf{x}_1\):

\[\mathbf{x} - \langle \mathbf{x}, \mathbf{x}_1\rangle \mathbf{x}_1 = \mathbf{x} - \bar{\mathbf{x}}.\]

Using the expression we derived last time for the quadratic form \(\langle \mathbf{x}, \cL_G \mathbf{x}\rangle\), we can rewrite the last line above as

\[\begin{equation}\label{eq:var} \lambda_2(G) = \min \left\{ \frac{\frac{1}{d} \sum_{\{u,v\} \in E} (x_u-x_v)^2}{\sum_{u \in V} (x_u-\bar{x})^2} : \mathbf{x} \neq 0 \right\}, \end{equation}\]

where \(\bar{x} = \frac{1}{n} \sum_{u \in V} x_u\) is the average value of the components of \(x\).

Proof: Note that for a \(d\)-regular graph, we have \(\vol(S) = d|S|\). Consider any subset \(S \subseteq V\). We will show that \(\lambda_2(G) \leq 2 \phi_G(S)\). Clearly we may assume that \(|S|\leq n/2\). Define the indicator vector \(\mathbf{v}_S\) so that \(\mathbf{v}_S(u)=1\) if \(u \in S\) and \(\mathbf{v}_S(u)=0\) otherwise.

Let us now plug \(\mathbf{v}_S\) into \eqref{eq:var} to get an upper bound on \(\lambda_2(G)\):

\[\begin{align*} \lambda_2(G) \leq \frac{\frac{1}{d} \sum_{\{u,v\} \in E} (\mathbf{v}_S(u)-\mathbf{v}_S(v))^2}{\sum_{u \in V} (\mathbf{v}_S(u)-\frac{|S|}{n})^2} &= \frac{\frac{1}{d} |E(S,\bar{S})|}{|S| \left(1-\frac{|S|}{n}\right)^2 + (n-|S|) \left(\frac{|S|}{n}\right)^2} \\ &= \frac{\frac{1}{d} |E(S,\bar{S})|}{\frac{|S|}{n} (n-|S|)}. \end{align*}\]

\(\) Since we have assumed that \(|S| \leq n/2\), we know that \(\frac{n-|S|}{n} = 1 - \frac{|S|}{n} \geq 1/2\), and therefore

\[\lambda_2(G) \leq 2 \frac{\frac{1}{d} |E(S,\bar{S})|}{|S|} = 2 \frac{|E(S,\bar S)|}{\vol(S)} = 2 \phi_G(S)\,.\]

Cheeger’s inequality and sweep cuts

In the previous section, we saw that sets of small conductance force \(\lambda_2(G)\) to be small. It turns out that the other direction is true as well.

Discrete Cheeger inequality: It holds that \(\rho_2(G) \leq \sqrt{2 \lambda_2(G)}\) for any graph \(G\).

While we will not prove this inequality, we will now discuss the simple algorithm behind it:

Algorithm SWEEP(\(\mathbf{x}\)):

Sort the vertices \(V = \{ v_1, v_2, \ldots, v_n \}\) according to the value of the corresponding coordinate in \(\mathbf{x}\):

\[\mathbf{x}(v_1) \leq \mathbf{x}(v_2) \leq \cdots \leq \mathbf{x}(v_n).\]

This sorted order gives us \(n-1\) different cuts:

\[(\{v_1\}, \{v_2, \ldots, v_n\}), (\{v_1,v_2\}, \{v_3,\ldots,v_n\}), \ldots, (\{v_1,\ldots,v_{n-1}\}, v_n)\]

Now return the cut \(S\) among these \(n-1\) that has the smallest \(\phi_G(S)\) value.

The algorithm is well-defined for any vector \(\mathbf{x} \in \R^V\). If we additionally assume that \(\mathbf{x} \perp \mathbf{x}_1\), then this algorithm is guaranteed to output a cut \(S\) that satisfies

\[\phi_G(S) \leq \sqrt{2 \cR_G(\mathbf{x})},\]

where \(\cR_G(\mathbf{x})\)—called the “Rayleigh quotient of \(\mathbf{x}\)”—is defined by

\[\cR_G(\mathbf{x}) = \frac{\sum_{\{u,v\} \in E} \left(\mathbf{x}(u) - \mathbf{x}(v)\right)^2}{d \|\mathbf{x}\|^2}\]

Multi-way spectral clustering

If we are interested in finding a clustering into more pieces, we can use higher eigenvectors. Define the \(k\)-way conductance of \(G\) as the quantity

\[\rho_k(G) \seteq \min_{S_1, S_2, \ldots, S_k \subseteq V} \max \left\{ \phi_G(S_i) : i=1,\ldots k \right\},\]

where the minimum is over all non-empty, pairwise disjoint subsets of \(V\). Then one has the following higher-order Cheeger inequality:

\[\frac{\lambda_k(G)}{2} \leq \rho_k(G) \leq O(k^2) \sqrt{\lambda_k}\]

It is not known whether one can reduce the factor \(O(k^2)\), but this is possible if we know there are at least \(1.1 k\) small eigenvalues:

\[\rho_k(G) \leq O\left(\sqrt{\lambda_{1.1 k} \log k}\right).\]

One general way to recover a \(k\)-clustering is to first construct the spectral embedding along the first \(k\) eigenvectors: This is the map \(F : V \to \R^k\) defined by

\[F(u) = \left(\mathbf{x}_1(u), \mathbf{x}_2(u), \ldots, \mathbf{x}_k(u)\right),\]

where \(\mathbf{x}_1,\ldots,\mathbf{x}_k\) are the first \(k\) eigenvectors of \(\cL_G\).

Now one can look for geometric clusters, and using the right procedure, it is possible to give an algorithm that is guaranteed to produce \(k\) disjoint clusters with small conductance. In practice, this second geometric partitioning step often is done using some variant of the \(k\)-means algorithm.