Stat 991, Spring 2010

Multivariate Analysis, Dimensionality Reduction, and Spectral Methods

Syllabus:

Modern statistical approaches on large datasets must directly analyze and manipulate data in either matrix or vector formats. This course will focus on the statistical theory and practice of manipulating such data. The topics covered will be: multivariate analysis, dimensionality reduction, convexity issues of working with matrices, and spectral methods.

With regards to dimensionality reduction, we will cover PCA, CCA, and random projections (e.g. Johnson-Lindenstrauss) and examine potential applications. With regards to convexity issues, the course will examine the rudimentary question of how accurate is an SVD of a random matrix (we will examine a generalization of the Chernoff method to matrices). Other potential topics may include matrix completion (filling in the entries of a matrix with missing entries), subspace identification (e.g. learning time series models like Kalman filters based on a multivariate, covariance analysis), locality sensitive hashing (randomly projecting data for efficient storage and recall), matrix based regularization methods (and related convexity issues), and kernel methods/Gaussian process regression.

The major topics discussed in the course will include the following:

Prerequisites:

The course is appropriate for a graduate student with some background in statistics and machine learning. The course will assume a basic level of mathematical maturity, so please contact the instructor if you have concerns.

Requirements:

As this is an advanced grad course, the point is for you to just learn the material on your own. For requirements, I'd like you to read the notes, give me corrections if you find them, and write a short informal (typed) summary of any (subset) of the following: 1) questions/insights about the course notes 2) possible research directions 3) how ideas might be related to your work 4) even just some different derivation you like 5) or a related paper you read 6) points that were unclear. You don't need to write more than a page. Also, I'd like you to find related papers to the material we cover, which overlaps with your research interests.

Instructor:

Sham Kakadeskakade at wharton.upenn.edu

Time and location:

Time:MW : 1:30 - 3:00
Location: JMHH F88

Schedule and notes: