The greedy algorithm for monotone submodular maximization

First, some context:

Submodular Functions

Submodular functions have the intuitive diminishing returns property. Formally, a submodular function assigns a subset a utility value such that

for any and . We call the ground set.

Note that this definition just means that adding an element to a subset of set yields at least much value (or more) as if we add to . In other words, the marginal gain of adding to is greater or equal to the marginal gain of adding to . The notation for the marginal gain is:

We can also define the marginal gain for a set, which is basically the same thing:

We say that a submodular function is monotone if for any we have . Intuitively, this means that adding more elements to a set cannot decrease its value.

An example

Let . We have the set , and we choose and .

Note that and . The marginal gain of items is :

Note that for any choice of , and . This is because is submodular and monotone.

Maximization of monotone submodular functions

Since the functions we're dealing with are monotone, it is obvious that the set with maximum value is always the ground set . However, we impose a cardinality constraint - that is, finding the set of size at most that maximizes the utility. Formally,

Unfortunately, this problem is NP-hard. Fortunately, a simple greedy algorithm provides a solution with a nice approximation guarantee, which we will prove soon. The algorithm starts with the empty set , and then repeats the following step for :

Note that

Greedy algorithm guarantee and proof

Let:

  • be the the chain formed by the greedy algorithm, as defined above
  • be the optimal solution, in an arbitrary order
  • be a monotone submodular function. Let (Update on 04/25/2019: I thought this was w.l.o.g., but Andrey Kolobov pointed out that we actually need to be non negative)
  • , the value of the optimal solution.

We will prove that

Note that . We now proceed to the proof.

For all , we have:

Rearranging the terms, we have proved that

Now we define . This implies

Plugging this into our previous equation, we have:

In other words, we have proved that the element added at iteration by the greedy algorithm reduces the gap to the optimal solution by a significant amount - by at least . Another way to write the same equation is

If we recursively apply this definition, we have that

Now, . Thus, using the well-known bound for , we have that

We now plug the definition of back, to get

Or equivalently:

Which concludes our proof.

Lazy greedy

The runtime of the greedy algorithm is function evaluations, since at each step we have to find the element from the ground set that maximizes the marginal gain. There is a trick we can do to make the running time faster in practice.

The trick involves using a max-heap ( lookup and insertion) to keep an upper bound on the gain of each item. This upper bound comes straight from submodularity, and is as follows:

Using this, at iteration we evaluate for the elements in the ground set in the max-heap order. As soon as is greater than the upper bound on the top of the heap, we do not need to evaluate any more elements - we select . If we evaluate an element at the top of the heap and it does not satisfy this condition, we just insert it again with as the new upper bound.

While the worst case is the same, in practice this trick usually has enormous speedups over the standard greedy procedure.

Written on July 20, 2015