Stat 928, Spring 2011

Statistical Learning Theory

Syllabus:

Statistical learning theory studies the statistical aspects of machine learning and automated reasoning, through the use of (sampled) data. In particular, the focus is on characterizing the generalization ability of learning algorithms in terms of how well they perform on ``new'' data when trained on some given data set. The focus of the course is on: providing the the fundamental tools used in this analysis; understanding the performance of widely used learning algorithms (with a focus on regression and classification); understanding the ``art'' of designing good algorithms, both in terms of statistical and computational properties. Potential topics include: concentration of measure; empirical process theory; online learning; stochastic optimization; margin based algorithms; feature selection; regularization; PCA.

Prerequisites:

The course is appropriate for a graduate student with some background in statistics and machine learning. The course will assume a basic level of mathematical maturity, so please contact the instructor if you have concerns.

Requirements:

Homework sets, readings, and a project.

Instructor:

Sham Kakade skakade at wharton.upenn.edu

Time and location:

Time:	MW : 3 - 4:30
Location:	G90 JMHH

Material:

Notes will be posted for each lecture.

Schedule and notes:

Lecture 0
- Risk vs. Risk: Some terminology differences between Stats and ML
- (ML people have not defined risk analogously, causing some confusion)
- lecture notes pdf

Lecture 1: 1/12/11
- Introduction; Bias-Variance Tradeoff
- lecture notes pdf

Lecture 2: 1/19/11
- Fixed Design Regression and Ridge Regression
- lecture notes pdf

Lecture 3: 1/24/11
- Ridge Regression and PCA
- lecture notes pdf

Lecture 4: 1/26/11
- The Central Limit Theorem; Large Deviations; and Rate Functions
- lecture notes pdf

Lecture 5: 1/30/11
- The Moment Method; Convex Duality; and Large/Medium/Small Deviations
- lecture notes pdf

Lecture 6: 2/2/11
- Hoeffding, Chernoff, Bennet, and Bernstein Bounds
- lecture notes pdf

Lecture 7: 2/7/11
- Feature Selection, Empirical Risk Minimization, and The Orthogonal Case
- lecture notes pdf

Lecture 8: 2/9/11
- Feature Selection and Chi^2 Tail bounds
- lecture notes pdf

Lecture 9: 2/14/11
- Risk vs. Risk: Some terminology differences between Stats and ML
- lecture 0 notes pdf
- Empirical Processes
- lecture 9 notes pdf

Lecture 10: 2/16/11
- Bracketing Covering Numbers
- lecture 10 notes pdf

Lecture 11: 2/21/11
- Symmetrization and Rademacher Averages
- lecture 11 notes pdf

Lecture 12: 2/23/11
- Rademacher Composition and Linear Prediction
- lecture 12 notes pdf

Lecture 13: 2/28/11
- Review: Norms and Dual Norms

Lecture 14: 3/2/11
- Bounded Differences, Rademacher Averages, and L1 Regularization
- lecture 14 notes pdf

Lecture 15: 3/14/11
- Rademacher Averages, Linear Prediction, and Convex Duality
- lecture 15 notes pdf
- further reading:
  S. M. Kakade, S. Shalev-Shwartz, A. Tewari. Regularization Techniques for Learning with Matrices. pdf

Lecture 16: 3/16/11
- Uniform and Empirical Covering Numbers
- lecture 16 notes pdf

Lecture 17: 3/21/11
- Dudley's Theorem and Packing Numbers
- lecture 17 notes pdf

Lecture 18: 3/28/11
- Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron
- lecture 18 notes pdf

Lecture 19: 3/30/11
- Perceptron Lower Bound & The Winnow Algorithm
- lecture 19 notes pdf

Lecture 20: 4/4/11
- The Perceptron for Generalized Linear Models and Single Index Models
- lecture 20 notes pdf

Lecture 21: 4/6/11
- Online Convex Programming and Gradient Descent
- lecture 21 notes pdf

Lecture 22: 4/11/11
- Exponentiated Gradient Descent
- lecture 22 notes pdf

Lecture 23: 4/13/11
- Online to Batch Conversions
- lecture 23 notes pdf

Lecture 24: 4/18/11
- Growth Functions and the VC dimension
- lecture 24 notes pdf

Lecture 25: 4/20/11
- Boosting
- lecture 25 notes pdf