Office hours: Tues, 330430pm in CSE 574
This course provides a rigorous introduction to the principles of modern algorithm design, with a particular focus on the analysis of large, noisy data sets, and the algorithmic principles underlying modern statistics and machine learning. For most topics, there will be an associated assignment, where students will get their hands dirty, experimenting with underlying ideas.
Evaluation is based on nine weekly assignments. No late assignments are allowed, but your lowest score will be dropped.
Introduction  
Thu, Jan 04  
Modern hashing  Assignment #1 [gradescope, dropbox for code], due Wed, Jan 17, 11:59pm 
Tue, Jan 09
Consistent hashing video 

Thu, Jan 11
Heavy hitters, CountMin sketch video 
Mathematical analysis of balls in bins (uniform + two choices):

Metric data  Assignment #2 [gradescope, dropbox for code], due Wed, Jan 24th, 11:59pm 
Tue, Jan 16
Similarity search video 
Intrisinc dimensionality:

Thu, Jan 18
Curse of dimensionality video 
We discussed Euclidean dimension reduction based on linear maps. But there are some data sets with lowdimensional structure (for instance, points coming from a manifold) where one wants to do nonlinear dimension reduction. 
Learning theory  Assignment #3 [gradescope, dropbox for code], due Wed, Jan 31, 11:59pm 
Mon, Jan 22
Generalization video 

Thu, Jan 25
Regularization video 

Linear algebra  Assignment #4 [gradescope, dropbox for code], due Wed, Feb 7, 11:59pm 
Tue, Jan 30
Principal component analysis (PCA) video 

Thu, Feb 01
Computing the principal components video 

Lowrank approximation  Assignment #5 [gradescope, dropbox for code], due Wed, Feb 14, 11:59pm 
Tue, Feb 06
Singular value decompositions video 

Fri, Feb 09
Tensors and lowrank tensor recovery video 

Spectral graph theory  Assignment #6 [gradescope, dropbox for code], due Wed, Feb 21, 11:59pm 
Mon, Feb 12
Graphs as matrices video 

Thu, Feb 15
Spectral clustering video 

Mathematical programming  Assignment #7 [gradescope, dropbox for code], due Wed, Feb 28, 11:59pm 
Tue, Feb 20
Compressive sensing video 

Thu, Feb 22
Linear programming video 

Sampling and estimation  Assignment #8 [dropbox for code], due Wed, Mar 6, 11:59pm 
Tue, Feb 27
Reservoir sampling video 

Thu, Feb 29
Markov chains video 

Blockchains  Assignment #9 [gradescope, dropbox for code], due Wed, Mar 13, 11:59pm 
Tue, Mar 05
Distributed consensus and blockchains video 

Thu, Mar 07
Bitcoin and proof of work video 

You should typeset your homeworks using LaTeX or some other system that supports both code snippets and mathematical formulas.
Note that all your code files should be submitted in this way; some of your code will also be required in the written part, as specified in the assignment.
The formal requirements are CSE 311 and 312 (or equivalent background), but the course is intended for senior CSE students. Students should be comfortable with data structures, probability theory, and linear algebra. I will assume you know what a hash table is, what a balanced search tree is, what a cache does, etc., and that you are capable of translating highlevel ideas into code. I will assume that you know basic probability (like linearity of expectation), that you know how to multiply a matrix times a vector, that you know what an eigenvalue is. (Or that you can learn very fast on the fly.)
You are allowed to refer to your course notes, as well any books, lecture notes, papers, and additional resources that are directly linked from the course web page. Remember to cite the source appropriately and always write discussions and homework answers in your own words.
It is also permissible to use generalpurpose references for the programming languages and libraries that you employ.
You may discuss assignments at a high level with other groups in the class, as well as with the course staff during office hours, or via EdTech.