Theory of Combinatorial Algorithms Institute of Theoretical Computer Science Department of Computer Science ETH Zurich

Approximate Methods in Geometry
(251-0456-00L) - SS 2005


Time & Place


Instructors

Lecturers: Bernd Gärtner, Joachim Giesen, Emo Welzl

Assistant: Andreas Razen
IFW B48.2 / Tel: (044) 632 74 22. e-mail: arazen@inf.ethz.ch

Course description

The course is concerned with approximate geometric methods for the analysis of large data sets represented by point clouds.

Data is being collected in order to draw conclusions from it, i.e. to discover relations, make extrapolations into the future, etc. More often than not, data comes as a set or sequence of points describing objects, with each coordinate representing some quantification of some feature. On a computer such data is just a sequence of 0's and 1's; the need to analyze and "understand" it calls for means to support the process. One way is to visualize the data. For example, a data set representing a number of people by their respective heights and weights can be drawn as a point set in the plane, and this drawing may reveal some correlation that could be approximated by a linear function.

For a wide range of applications (brain research, robotics, statistics, bioinformatics, character and speech recognition, computer graphics etc.) this approach is too simplistic, for various reasons. First of all, the size of the data may be huge (in the millions and billions, and sometimes so huge that we cannot even store it). And secondly, objects may have many features, giving rise to sets of dimension in the hundreds - and we know that simple visualization methods tend to fail starting in dimension 4. (Many features may in fact be redundant, but it is part of the endeavor to find out which ones.)

Many of the arising problems appear to be too hard to be solved exactly in an efficient fashion. The course discusses several approximate methods for the analysis of large high-dimensional data sets that have been developed over the last years in response to the issues indicated above. While we have applications in mind for the questions we address, we emphasize theoretical aspects in the solutions.

Methods we cover are random sampling, grid structures, core-sets, well separated pair decomposition, low distortion low-dimensional embeddings. Applications we address are shape and dimension analysis, nearest neighbor search, clustering etc.

Examples for specific questions arising in these applications are the following: for some point in d-space, what is its closest neighbor in the point cloud? What is the closest pair in the point cloud? What is the "best" grouping of the points in the cloud into k groups? Which subset of the point set of size k provides the "best" description of the point cloud? What is the dimensionality of the point cloud and what does dimensionality mean here? Can the point cloud be embedded into a lower-dimensional space while preserving many of its characteristics?

Procedures, Testat, Exercises

There is no obligation for you to solve the homework exercises, but we highly recommend that you do so. Writing the solutions up also teaches you to precisely formulate your ideas and communicate them such that others understand them. This ability is beneficial not just in mathematics but in (almost) any discipline.
Whatever you write up we will read and comment on.
Solving the exercises also helps you to prepare for the exam at the end of the course. Grading will solely be based on this oral exam (which will take place in fall 2005).

Language

English

Course material

(Updated Lecture Notes will be available soon!! --- Old version: [PDF] [PS])


Lectures Lecture Notes
(Handout)
Problems Solutions

Lecture 1
March 29
(Introduction - ANN)
(See literature) [No homework] [No homework]
(Apr 04) (Apr 04) (Apr 04)

Lecture 2
April 05
(ANN - Basic Geometry)
[PDF] [PS] [PDF] [PS]
(Apr 19) (Apr 04) (Apr 12)

Lecture 3
April 12
(Bounding volumes)
[PDF] [PS] [PDF] [PS]
(Apr 19) (Apr 12) (Apr 19)

Lecture 4
April 19
(Miniball-Approx.-Alg.)
[PDF] [PS] [PDF] [PS]
(Apr 19) (Apr 19) (Apr 26)

Lecture 5
April 26
(Quadratic Programming)
[PDF] [PS] [PDF] [PS]
(Apr 26) (Apr 26) (May 03)

Lecture 6
May 03
(Cuboids & Ellipsoids)
[PDF] [PS] [PDF] [PS]
(May 03) (May 03) (May 10)

Lecture 7
May 10
(Support Vector Machines)
[PDF] [PS] [PDF] [PS]
(May 31) (May 11) (May 25)

Lecture 8
May 17
(Quadtrees)
(See literature) [PDF] [PS] [discussed on May 24]
(May 17) (May 18) (May 24)

Lecture 9
May 24
(WSPD)
(See literature) [PDF] [PS]
(May 24) (May 24) (May 31)

Lecture 10
May 31
(Applications of WSPD)
(See literature) [PDF] [PS]
(May 31) (May 31) (Jun 07)

Lecture 11
June 07
(ε-Nets and VC-Dimension)
(Material covered by Notes [No homework] [No homework]
distributed on June 14) (Jun 07) (Jun 07)

Lecture 12
June 14
(Size of ε-Nets)
(Handout of Lecture Notes) [PDF] [PS]
(Jun 14) (Jun 13) (Jun 21)

Lecture 13
June 21
(ε-Nets - Directional width)
(Material covered by Notes [PDF] [PS]
below from June 28) (Jun 21) (Jun 28)

Lecture 14
June 28
(Core sets for direct. width)
[PDF] [PS] [No homework] [No homework]
(Jun 28) (Jun 28) (Jun 28)




Further Literature

References from the Lecture Notes

  • M.Badoiu and K.L.Clarkson. Optimal core-sets for balls.
    Submitted, 2002.

  • G.Barequet and S.Har-Peled. Efficiently approximating the minimum-volume bounding box of a point set in three dimensions
    J. of Algorithms (JoA), volume 38 (1), pages 91-109, January 2001.

  • K.Fischer and B.Gärtner. The smallest enclosing ball of balls: combinatorial structure and algorithms.
    Proceedings of the nineteenth annual symposium on Computational geometry, pages 292-301, 2003.

  • A.Goel, P.Indyk and K.R.Varadarajan. Reductions among high dimensional proximity problems.
    Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 769-778, 2001.

  • B.Schölkopf and A.Smola. Learning with Kernels.
    MIT Press, Cambridge Massachusetts, 2002.

  • E.Welzl. Smallest Enclosing Disks (Balls and Ellipsoids).
    In H.Maurer, editor, New Results and New Trends in Computer Science, volume 555 of Lecture Notes in Computer Science, pages 359-370, 1991.

  • Applications and data sets


    Last edited: Jun 28, 2005
    arazen@inf.ethz.ch