Office hours: Tues, 330-430pm in CSE 574
This course provides a rigorous introduction to the principles of modern algorithm design, with a particular focus on the analysis of large, noisy data sets, and the algorithmic principles underlying modern statistics and machine learning. For most topics, there will be an associated assignment, where students will get their hands dirty, experimenting with underlying ideas.
Evaluation is based on nine weekly assignments. No late assignments are allowed, but your lowest score will be dropped.
Introduction | |
Thu, Jan 04 | |
Modern hashing | Assignment #1 [gradescope, dropbox for code], due Wed, Jan 17, 11:59pm |
Tue, Jan 09
Consistent hashing video |
|
Thu, Jan 11
Heavy hitters, Count-Min sketch video |
Mathematical analysis of balls in bins (uniform + two choices):
|
Metric data | Assignment #2 [gradescope, dropbox for code], due Wed, Jan 24th, 11:59pm |
Tue, Jan 16
Similarity search video |
Intrisinc dimensionality:
|
Thu, Jan 18
Curse of dimensionality video |
We discussed Euclidean dimension reduction based on linear maps. But there are some data sets with low-dimensional structure (for instance, points coming from a manifold) where one wants to do non-linear dimension reduction. |
Learning theory | Assignment #3 [gradescope, dropbox for code], due Wed, Jan 31, 11:59pm |
Mon, Jan 22
Generalization video |
|
Thu, Jan 25
Regularization video |
|
Linear algebra | Assignment #4 [gradescope, dropbox for code], due Wed, Feb 7, 11:59pm |
Tue, Jan 30
Principal component analysis (PCA) video |
|
Thu, Feb 01
Computing the principal components video |
|
Low-rank approximation | Assignment #5 [gradescope, dropbox for code], due Wed, Feb 14, 11:59pm |
Tue, Feb 06
Singular value decompositions video |
|
Fri, Feb 09
Tensors and low-rank tensor recovery video |
|
Spectral graph theory | Assignment #6 [gradescope, dropbox for code], due Wed, Feb 21, 11:59pm |
Mon, Feb 12
Graphs as matrices video |
|
Thu, Feb 15
Spectral clustering video |
|
Mathematical programming | Assignment #7 [gradescope, dropbox for code], due Wed, Feb 28, 11:59pm |
Tue, Feb 20
Compressive sensing video |
|
Thu, Feb 22
Linear programming video |
|
Sampling and estimation | Assignment #8 [dropbox for code], due Wed, Mar 6, 11:59pm |
Tue, Feb 27
Reservoir sampling video |
|
Thu, Feb 29
Markov chains video |
|
Blockchains | Assignment #9 [gradescope, dropbox for code], due Wed, Mar 13, 11:59pm |
Tue, Mar 05
Distributed consensus and blockchains video |
|
Thu, Mar 07
Bitcoin and proof of work video |
|
Completing and submitting assignments
Assignments are released on Tuesdays, and everything (code + written answers)
is due at 11:59pm on the following Wednesday.
Each assignment is based on the week's lecture and reading material, and can be done alone, or with one
other student enrolled in the course. In the latter case, only one student should
submit files.
You should typeset your homeworks using LaTeX or some other system that supports both code snippets and mathematical formulas.
Note that all your code files should be submitted in this way; some of your code will also be required in the written part, as specified in the assignment.
The formal requirements are CSE 311 and 312 (or equivalent background), but the course is intended for senior CSE students. Students should be comfortable with data structures, probability theory, and linear algebra. I will assume you know what a hash table is, what a balanced search tree is, what a cache does, etc., and that you are capable of translating high-level ideas into code. I will assume that you know basic probability (like linearity of expectation), that you know how to multiply a matrix times a vector, that you know what an eigenvalue is. (Or that you can learn very fast on the fly.)
You are allowed to refer to your course notes, as well any books, lecture notes, papers, and additional resources that are directly linked from the course web page. Remember to cite the source appropriately and always write discussions and homework answers in your own words.
It is also permissible to use general-purpose references for the programming languages and libraries that you employ.
You may discuss assignments at a high level with other groups in the class, as well as with the course staff during office hours, or via EdTech.