Stat 928, Spring 2011
Statistical Learning Theory
Syllabus:
Statistical learning theory studies the statistical aspects of machine
learning and automated reasoning, through the use of (sampled) data.
In particular, the focus is on characterizing the generalization
ability of learning algorithms in terms of how well they perform on
``new'' data when trained on some given data set. The focus of the
course is on: providing the the fundamental tools used in this
analysis; understanding the performance of widely used learning
algorithms (with a focus on regression and classification);
understanding the ``art'' of designing good algorithms, both in terms
of statistical and computational properties. Potential topics include:
concentration of measure; empirical process theory; online learning;
stochastic optimization; margin based algorithms; feature selection;
regularization; PCA.
Prerequisites:
The course is appropriate for a graduate student with some background in
statistics and machine learning. The course will assume a basic level of mathematical maturity, so
please contact the instructor if you have concerns.
Requirements:
Homework sets, readings, and a project.
Instructor:
Time and location:
Time: | MW : 3 - 4:30 |
Location: | G90 JMHH |
Material:
Notes will be posted for each lecture.
Schedule and notes:
- Lecture 0
- Risk vs. Risk: Some terminology differences between Stats and ML
- (ML people have not defined risk analogously, causing some confusion)
- lecture notes pdf
- Lecture 1: 1/12/11
- Introduction; Bias-Variance Tradeoff
- lecture notes pdf
- Lecture 2: 1/19/11
- Fixed Design Regression and Ridge Regression
- lecture notes pdf
- Lecture 3: 1/24/11
- Ridge Regression and PCA
- lecture notes pdf
- Lecture 4: 1/26/11
- The Central Limit Theorem; Large Deviations; and Rate Functions
- lecture notes pdf
- Lecture 5: 1/30/11
- The Moment Method; Convex Duality; and Large/Medium/Small Deviations
- lecture notes pdf
- Lecture 6: 2/2/11
- Hoeffding, Chernoff, Bennet, and Bernstein Bounds
- lecture notes pdf
- Lecture 7: 2/7/11
- Feature Selection, Empirical Risk Minimization, and The Orthogonal Case
- lecture notes pdf
- Lecture 8: 2/9/11
- Feature Selection and Chi^2 Tail bounds
- lecture notes pdf
- Lecture 9: 2/14/11
- Risk vs. Risk: Some terminology differences between Stats and ML
- lecture 0 notes pdf
- Empirical Processes
- lecture 9 notes pdf
- Lecture 10: 2/16/11
- Bracketing Covering Numbers
- lecture 10 notes pdf
- Lecture 11: 2/21/11
- Symmetrization and Rademacher Averages
- lecture 11 notes pdf
- Lecture 12: 2/23/11
- Rademacher Composition and Linear Prediction
- lecture 12 notes pdf
- Lecture 13: 2/28/11
- Review: Norms and Dual Norms