The greedy algorithm for monotone submodular maximization
First, some context:
Submodular Functions
Submodular functions have the intuitive diminishing returns property. Formally, a submodular function assigns a subset a utility value such that
for any and . We call the ground set.
Note that this definition just means that adding an element to a subset of set yields at least much value (or more) as if we add to . In other words, the marginal gain of adding to is greater or equal to the marginal gain of adding to . The notation for the marginal gain is:
We can also define the marginal gain for a set, which is basically the same thing:
We say that a submodular function is monotone if for any we have . Intuitively, this means that adding more elements to a set cannot decrease its value.
An example
Let . We have the set , and we choose and .
Note that and . The marginal gain of items is :
Note that for any choice of , and . This is because is submodular and monotone.
Maximization of monotone submodular functions
Since the functions we're dealing with are monotone, it is obvious that the set with maximum value is always the ground set . However, we impose a cardinality constraint - that is, finding the set of size at most that maximizes the utility. Formally,
Unfortunately, this problem is NP-hard. Fortunately, a simple greedy algorithm provides a solution with a nice approximation guarantee, which we will prove soon. The algorithm starts with the empty set , and then repeats the following step for :
Note that
Greedy algorithm guarantee and proof
Let:
- be the the chain formed by the greedy algorithm, as defined above
- be the optimal solution, in an arbitrary order
- be a monotone submodular function. Let (Update on 04/25/2019: I thought this was w.l.o.g., but Andrey Kolobov pointed out that we actually need to be non negative)
- , the value of the optimal solution.
We will prove that
Note that . We now proceed to the proof.
For all , we have:
Rearranging the terms, we have proved that
Now we define . This implies
Plugging this into our previous equation, we have:
In other words, we have proved that the element added at iteration by the greedy algorithm reduces the gap to the optimal solution by a significant amount - by at least . Another way to write the same equation is
If we recursively apply this definition, we have that
Now, . Thus, using the well-known bound for , we have that
We now plug the definition of back, to get
Or equivalently:
Which concludes our proof.
Lazy greedy
The runtime of the greedy algorithm is function evaluations, since at each step we have to find the element from the ground set that maximizes the marginal gain. There is a trick we can do to make the running time faster in practice.
The trick involves using a max-heap ( lookup and insertion) to keep an upper bound on the gain of each item. This upper bound comes straight from submodularity, and is as follows:
Using this, at iteration we evaluate for the elements in the ground set in the max-heap order. As soon as is greater than the upper bound on the top of the heap, we do not need to evaluate any more elements - we select . If we evaluate an element at the top of the heap and it does not satisfy this condition, we just insert it again with as the new upper bound.
While the worst case is the same, in practice this trick usually has enormous speedups over the standard greedy procedure.