**Date and Time**: Thursday, November 03, 2005, 12:15 pm

**Speaker**: Ulrike von Luxburg (Fraunhofer IPSI, Darmstadt)

Clustering is a basic tool in exploratory data analysis which tries to identify meaningful groups among given data points. As for all sample based algorithms it is an important question whether a clustering algorithm is consistent: the more data points we get, the more stable and reliable the constructed partition should be. We want to discuss the question of consistency for the class of spectral clustering algorithms. Starting with a similarity graph representing the sample points, those algorithms use the first eigenvectors of the graph Laplacian matrix to construct a partition of the data points. By combining methods from graph theory, functional analysis, and empirical process theory we can show that different versions of the Laplace matrix (``normalized'' or ``unnormalized'') have very different large sample behavior: while the spectral properties of the normalized Laplacian always converge for increasing sample size, for the unnormalized Laplacian the same is only true under strong additional requirements which are often not satisfied in practice. Hence, from a statistical point of view, we advocate the use of normalized rather than unnormalized graph Laplacians.

