Toolkit for modern algorithms

Winter, 2024

T Th 11:30-12:50, SMI 211

Instructor: James R. Lee (jrl@cs)

Office hours: Tues, 330-430pm in CSE 574

Teaching assistants:

  • Xun Cao (whdecx@cs), Wed 330-430pm in CSE2 151
  • William Howard-Snyder (howarwil@cs), Mon 9-10am in CSE2 150/Zoom

Course description:

This course provides a rigorous introduction to the principles of modern algorithm design, with a particular focus on the analysis of large, noisy data sets, and the algorithmic principles underlying modern statistics and machine learning. For most topics, there will be an associated assignment, where students will get their hands dirty, experimenting with underlying ideas.

Evaluation is based on nine weekly assignments. No late assignments are allowed, but your lowest score will be dropped.

Lectures

Introduction
Thu, Jan 04
Modern hashing Assignment #1 [gradescope, dropbox for code], due Wed, Jan 17, 11:59pm
Tue, Jan 09
Consistent hashing
video
Thu, Jan 11
Heavy hitters, Count-Min sketch
video

Mathematical analysis of balls in bins (uniform + two choices):

Metric data Assignment #2 [gradescope, dropbox for code], due Wed, Jan 24th, 11:59pm
Tue, Jan 16
Similarity search
video

Intrisinc dimensionality:

k-d tree animation:

Thu, Jan 18
Curse of dimensionality
video

We discussed Euclidean dimension reduction based on linear maps. But there are some data sets with low-dimensional structure (for instance, points coming from a manifold) where one wants to do non-linear dimension reduction.

Learning theory Assignment #3 [gradescope, dropbox for code], due Wed, Jan 31, 11:59pm
Mon, Jan 22
Generalization
video
Thu, Jan 25
Regularization
video
Linear algebra Assignment #4 [gradescope, dropbox for code], due Wed, Feb 7, 11:59pm
Tue, Jan 30
Principal component analysis (PCA)
video

Thu, Feb 01
Computing the principal components
video
Low-rank approximation Assignment #5 [gradescope, dropbox for code], due Wed, Feb 14, 11:59pm
Tue, Feb 06
Singular value decompositions
video
Fri, Feb 09
Tensors and low-rank tensor recovery
video
Spectral graph theory Assignment #6 [gradescope, dropbox for code], due Wed, Feb 21, 11:59pm
Mon, Feb 12
Graphs as matrices
video
Thu, Feb 15
Spectral clustering
video
Mathematical programming Assignment #7 [gradescope, dropbox for code], due Wed, Feb 28, 11:59pm
Tue, Feb 20
Compressive sensing
video
Thu, Feb 22
Linear programming
video
Sampling and estimation Assignment #8 [dropbox for code], due Wed, Mar 6, 11:59pm
Tue, Feb 27
Reservoir sampling
video
Thu, Feb 29
Markov chains
video
Blockchains Assignment #9 [gradescope, dropbox for code], due Wed, Mar 13, 11:59pm
Tue, Mar 05
Distributed consensus and blockchains
video
Thu, Mar 07
Bitcoin and proof of work
video

Completing and submitting assignments

Assignments are released on Tuesdays, and everything (code + written answers) is due at 11:59pm on the following Wednesday. Each assignment is based on the week's lecture and reading material, and can be done alone, or with one other student enrolled in the course. In the latter case, only one student should submit files.

Written part
For the "written" part, you will submit your answers via Gradescope. If you are enrolled in the class, then during the first week, you should receive an email about your enrollment.

You should typeset your homeworks using LaTeX or some other system that supports both code snippets and mathematical formulas.

Programming part
For the programming part, you are welcome to use whatever programming language you choose, though you are encouraged to use Python together with the Numpy and Pyplot packages. You will turn in all your code files using a Canvas dropbox. These should be submitted as a single zip file. If you do not work with a partner, please name the file lastname.zip where "lastname" is replaced by your last name. If you work with a partner (recall that only one person should submit), please use the naming convention: lastname1-lastname2.zip. You are allowed to submit multiple times. New submissions will replace the old submission.

Note that all your code files should be submitted in this way; some of your code will also be required in the written part, as specified in the assignment.

Prerequisites

This should be considered an advanced undergraduate course.

The formal requirements are CSE 311 and 312 (or equivalent background), but the course is intended for senior CSE students. Students should be comfortable with data structures, probability theory, and linear algebra. I will assume you know what a hash table is, what a balanced search tree is, what a cache does, etc., and that you are capable of translating high-level ideas into code. I will assume that you know basic probability (like linearity of expectation), that you know how to multiply a matrix times a vector, that you know what an eigenvalue is. (Or that you can learn very fast on the fly.)

Collaboration policy

You are allowed to refer to your course notes, as well any books, lecture notes, papers, and additional resources that are directly linked from the course web page. Remember to cite the source appropriately and always write discussions and homework answers in your own words.

It is also permissible to use general-purpose references for the programming languages and libraries that you employ.

You may discuss assignments at a high level with other groups in the class, as well as with the course staff during office hours, or via EdTech.