CSE 535: Theory of Optimization and Continuous Algorithms

Autumn 2021

Tues, Thurs 1130am-1250pm in CSE2 G04

Instructor: James R. Lee

Office hours: Tues 3-4pm in CSE 574
(or Zoom by appointment)

Teaching assistant(s):

Course email list [archives]

Class discussion: CSE 535 Piazza

Course evaluation: 100% Homework

Course description:

The design of algorithms is traditionally a discrete endeavor. However, many advances have come from a continuous viewpoint. Typically, a continuous process, deterministic or randomized is designed (or shown) to have desirable properties, such as approaching an optimal solution or a desired distribution, and an algorithm is derived from this by appropriate discretization. In interesting and general settings, the current fastest methods are a consequence of this perspective. We will discuss several examples of algorithms for high-dimensional optimization and sampling, and use them to understand the following concepts in detail:

  • Elimination
  • Reduction
  • Geometrization
  • Acceleration
  • Sparsification
  • Decomposition
  • Discretization


Thu, Sep 30
Intro to convexity and optimization
LV Book: 1.1-1.5
Homework #0
  • Convex sets, convex functions, convex optimization
  • $f : \mathbb{R}^n \to \mathbb{R}$ convex and differentiable, then local optimality implies global optimality:
  • This holds because the following are equivalent:

    • $f : \mathbb{R}^n \to \mathbb{R}$ convex
    • $f(y) \geq f(x) + \langle \nabla f(x), y-x\rangle$ for all $x,y \in \mathbb{R}^n$.
  • If $f$ is $\mathcal{C}^2$ (if it has continuous second-order partial derivatives), then the Hessian matrix $\nabla^2 f(x)$ is defined for every $x \in \mathbb{R}^n$:

  • To check convexity, often it is useful to employ the following:

    Theorem: If $f : \mathbb{R}^n \to \mathbb{R}$ is $\mathcal{C}^2$, then $f$ is convex if and only if the Hessian $\nabla^2 f(x)$ is positive semidefinite for all $x \in \mathbb{R}^n$.

  • As a simple example, consider the two-variate function defined on the positive orthant $u,v > 0$. Then we have

    The trace of this matrix is $1/u + 1/v^2 > 0$ for $u,v > 0$. The determinant of this matrix is $1/v^2 - 1/v^2 = 0$ for $u,v > 0$. Therefore if $\lambda_1,\lambda_2$ are the eigenvalues of $\nabla^2 f(u,v)$, it holds that $\lambda_1 + \lambda_2 > 0$ and $\lambda_1 \lambda_2 = 0$. The only possibility is that both eigenvalues are nonnegative, meaning that $\nabla^2 f(u,v)$ is positive semidefinite on the positive orthant.

Tue, Oct 05
Gradient descent, I
LV Book: 2.1-2.3
  • Gradient descent : $f : \mathbb{R}^n \to \mathbb{R}$, parameters $\eta, \epsilon > 0$, initial point .

    • If , stop and output $x^{(k)}$.
    • Else update .
  • A function has -Lipschitz gradients if

  • The following are equivalent:

    • has -Lipschitz gradients.
    • for all .
    • For all ,

  • Theorem: If has -Lipshitz gradients, then stops within steps.

    Note that in this case we have no guarantee of optimality, only that .

  • Theorem: If is convex with -Lipschitz gradients, then satisfies

Thu, Oct 07
Gradient descent, II
LV Book: 2.3-2.7
Homework #1
  • A function is -strongly convex if for any ,

  • For and , the following are equivalent:

    • $f$ is -strongly convex

    • For all , it holds that

    • for all .

  • Theorem: Suppose that is -strongly convex and has -Lipschitz gradients. Then satisfies

  • Gradient descent is a discretization of the continuous algorithm:

    Note that:

    And if is -strongly convex, then


  • Line search and the Wolfe conditions (Sec 2.5)

Tue, Oct 12
Cutting plane methods, the Ellipsoid algorithm
LV Book: 3.1-3.2
  • The hyperplane separation theorem
  • Cutting plane framework
  • Affine geometry of ellipsoids
  • Ellipsoid algorithm and analysis
Thu, Oct 14
Tue, Oct 19
From volume to function value
LV Book: 3.2-3.3
  • Using ellipsoid to minimize non-smooth convex functions
  • Volume shrinkage yields small function value
  • Application to Linear Programming
Thu, Oct 21
Center of gravity
LV Book: 3.4
Homework #2
  • Exact center of gravity converges to $\varepsilon$-approximate minimum in iterations
  • Analysis using Grunbaum’s Theorem
  • Approximate center of gravity suffices: Robust Grunbaum Theorem
  • Logconcave distributions
  • Isotropic distributions
Tue, Oct 26

Thu, Oct 28

Tue, Nov 02

Thu, Nov 04

Tue, Nov 09

Tue, Nov 16

Thu, Nov 18

Tue, Nov 23

Thu, Nov 25

Tue, Nov 30

Thu, Dec 02

Tue, Dec 07

Thu, Dec 09


  • You may discuss problems with your classmates, but when you write down the solutions, you should do so by yourself. You can use the internet and books for reference material but you should cite every source that you consulted (the name of the book or web page suffices). You should also cite any classmates with whom you discussed solutions.
  • Homework should be typeset.
  • In general, late submissions are not accepted.
  • We follow the standard UW CSE policy for academic integrity.
  • The office hours are for general questions about material from the lecture and homework problems.
  • Please refer to university policies regarding disability accomodations or religious accommodations