Toolkit for modern algorithms

Winter, 2024

T Th 11:30-12:50, SMI 211

Instructor: James R. Lee (jrl@cs)

Office hours: Tues, 330-430pm in CSE 574

Teaching assistants:

  • Xun Cao (whdecx@cs), Wed 330-430pm in CSE2 151
  • William Howard-Snyder (howarwil@cs), Mon 9-10am in CSE2 150/Zoom

Course description:

This course provides a rigorous introduction to the principles of modern algorithm design, with a particular focus on the analysis of large, noisy data sets, and the algorithmic principles underlying modern statistics and machine learning. For most topics, there will be an associated assignment, where students will get their hands dirty, experimenting with underlying ideas.

Evaluation is based on nine weekly assignments. No late assignments are allowed, but your lowest score will be dropped.


Thu, Jan 04
Modern hashing Assignment #1 [gradescope, dropbox for code], due Wed, Jan 17, 11:59pm
Tue, Jan 09
Consistent hashing
Thu, Jan 11
Heavy hitters, Count-Min sketch

Mathematical analysis of balls in bins (uniform + two choices):

Metric data Assignment #2 [gradescope, dropbox for code], due Wed, Jan 24th, 11:59pm
Tue, Jan 16
Similarity search

Intrisinc dimensionality:

k-d tree animation:

Thu, Jan 18
Curse of dimensionality

We discussed Euclidean dimension reduction based on linear maps. But there are some data sets with low-dimensional structure (for instance, points coming from a manifold) where one wants to do non-linear dimension reduction.

Learning theory Assignment #3 [gradescope, dropbox for code], due Wed, Jan 31, 11:59pm
Mon, Jan 22
Thu, Jan 25
Linear algebra Assignment #4 [gradescope, dropbox for code], due Wed, Feb 7, 11:59pm
Tue, Jan 30
Principal component analysis (PCA)

Thu, Feb 01
Computing the principal components
Low-rank approximation Assignment #5 [gradescope, dropbox for code], due Wed, Feb 14, 11:59pm
Tue, Feb 06
Singular value decompositions
Fri, Feb 09
Tensors and low-rank tensor recovery
Spectral graph theory Assignment #6 [gradescope, dropbox for code], due Wed, Feb 21, 11:59pm
Mon, Feb 12
Graphs as matrices
Thu, Feb 15
Spectral clustering
Mathematical programming Assignment #7 [gradescope, dropbox for code], due Wed, Feb 28, 11:59pm
Tue, Feb 20
Compressive sensing
Thu, Feb 22
Linear programming
Sampling and estimation Assignment #8 [dropbox for code], due Wed, Mar 6, 11:59pm
Tue, Feb 27
Reservoir sampling
Thu, Feb 29
Markov chains
Blockchains Assignment #9 [gradescope, dropbox for code], due Wed, Mar 13, 11:59pm
Tue, Mar 05
Distributed consensus and blockchains
Thu, Mar 07
Bitcoin and proof of work

Completing and submitting assignments

Assignments are released on Tuesdays, and everything (code + written answers) is due at 11:59pm on the following Wednesday. Each assignment is based on the week's lecture and reading material, and can be done alone, or with one other student enrolled in the course. In the latter case, only one student should submit files.

Written part
For the "written" part, you will submit your answers via Gradescope. If you are enrolled in the class, then during the first week, you should receive an email about your enrollment.

You should typeset your homeworks using LaTeX or some other system that supports both code snippets and mathematical formulas.

Programming part
For the programming part, you are welcome to use whatever programming language you choose, though you are encouraged to use Python together with the Numpy and Pyplot packages. You will turn in all your code files using a Canvas dropbox. These should be submitted as a single zip file. If you do not work with a partner, please name the file where "lastname" is replaced by your last name. If you work with a partner (recall that only one person should submit), please use the naming convention: You are allowed to submit multiple times. New submissions will replace the old submission.

Note that all your code files should be submitted in this way; some of your code will also be required in the written part, as specified in the assignment.


This should be considered an advanced undergraduate course.

The formal requirements are CSE 311 and 312 (or equivalent background), but the course is intended for senior CSE students. Students should be comfortable with data structures, probability theory, and linear algebra. I will assume you know what a hash table is, what a balanced search tree is, what a cache does, etc., and that you are capable of translating high-level ideas into code. I will assume that you know basic probability (like linearity of expectation), that you know how to multiply a matrix times a vector, that you know what an eigenvalue is. (Or that you can learn very fast on the fly.)

Collaboration policy

You are allowed to refer to your course notes, as well any books, lecture notes, papers, and additional resources that are directly linked from the course web page. Remember to cite the source appropriately and always write discussions and homework answers in your own words.

It is also permissible to use general-purpose references for the programming languages and libraries that you employ.

You may discuss assignments at a high level with other groups in the class, as well as with the course staff during office hours, or via EdTech.