# Statistical Learning Theory

## Syllabus:

Statistical learning theory studies the statistical aspects of machine learning and automated reasoning, through the use of (sampled) data. In particular, the focus is on characterizing the generalization ability of learning algorithms in terms of how well they perform on ``new'' data when trained on some given data set. The focus of the course is on: providing the the fundamental tools used in this analysis; understanding the performance of widely used learning algorithms (with a focus on regression and classification); understanding the ``art'' of designing good algorithms, both in terms of statistical and computational properties. Potential topics include: concentration of measure; empirical process theory; online learning; stochastic optimization; margin based algorithms; feature selection; regularization; PCA.

## Prerequisites:

The course is appropriate for a graduate student with some background in statistics and machine learning. The course will assume a basic level of mathematical maturity, so please contact the instructor if you have concerns.

## Requirements:

Homework sets, readings, and a project.

## Time and location:

 Time: MW : 3 - 4:30 Location: G90 JMHH

## Material:

Notes will be posted for each lecture.

## Schedule and notes:

• Lecture 0
• Risk vs. Risk: Some terminology differences between Stats and ML
• (ML people have not defined risk analogously, causing some confusion)
• lecture notes pdf
• Lecture 1: 1/12/11
• lecture notes pdf
• Lecture 2: 1/19/11
• Fixed Design Regression and Ridge Regression
• lecture notes pdf
• Lecture 3: 1/24/11
• Ridge Regression and PCA
• lecture notes pdf
• Lecture 4: 1/26/11
• The Central Limit Theorem; Large Deviations; and Rate Functions
• lecture notes pdf
• Lecture 5: 1/30/11
• The Moment Method; Convex Duality; and Large/Medium/Small Deviations
• lecture notes pdf
• Lecture 6: 2/2/11
• Hoeffding, Chernoff, Bennet, and Bernstein Bounds
• lecture notes pdf
• Lecture 7: 2/7/11
• Feature Selection, Empirical Risk Minimization, and The Orthogonal Case
• lecture notes pdf
• Lecture 8: 2/9/11
• Feature Selection and Chi^2 Tail bounds
• lecture notes pdf
• Lecture 9: 2/14/11
• Risk vs. Risk: Some terminology differences between Stats and ML
• lecture 0 notes pdf
• Empirical Processes
• lecture 9 notes pdf
• Lecture 10: 2/16/11
• Bracketing Covering Numbers
• lecture 10 notes pdf
• Lecture 11: 2/21/11
• lecture 11 notes pdf
• Lecture 12: 2/23/11
• Rademacher Composition and Linear Prediction
• lecture 12 notes pdf
• Lecture 13: 2/28/11
• Review: Norms and Dual Norms
• Lecture 14: 3/2/11
• Bounded Differences, Rademacher Averages, and L1 Regularization
• lecture 14 notes pdf
• Lecture 15: 3/14/11
• Rademacher Averages, Linear Prediction, and Convex Duality
• lecture 15 notes pdf
S. M. Kakade, S. Shalev-Shwartz, A. Tewari. Regularization Techniques for Learning with Matrices. pdf
• Lecture 16: 3/16/11
• Uniform and Empirical Covering Numbers
• lecture 16 notes pdf
• Lecture 17: 3/21/11
• Dudley's Theorem and Packing Numbers
• lecture 17 notes pdf
• Lecture 18: 3/28/11
• Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron
• lecture 18 notes pdf
• Lecture 19: 3/30/11
• Perceptron Lower Bound & The Winnow Algorithm
• lecture 19 notes pdf
• Lecture 20: 4/4/11
• The Perceptron for Generalized Linear Models and Single Index Models
• lecture 20 notes pdf
• Lecture 21: 4/6/11
• Online Convex Programming and Gradient Descent
• lecture 21 notes pdf
• Lecture 22: 4/11/11