This is the (much delayed) second post in a series on More Sums than Difference Sets. In this post, we'll take a first crack at the question, "How many MSTD sets are there?" To do so, we'll write a straightforward C program that counts MSTD sets. Then we'll run it to count MSTD sets and benchmark its performance.

Last time we introduced More Sums than Differences (MSTD) sets. Recall that given a set \(A\) of integers, \[A + A = \{x + y \mid x,y\in A\}\] and \[A - A = \{x - y \mid x,y \in A\},\] are called the sum set and difference set of \(A\), respectively. An MSTD set \(A\) is one such that \(|A + A| > |A - A|\).

We obtained tight upper and lower bounds on the size of the sum and difference sets of a set \(A\). Because the upper bound for difference sets is nearly twice as large as that for sum sets, one expects MSTD sets to be relatively rare. More intuitively:

Addition is commutative. Subtraction isn't. So we generally expect \(|A+A| < |A-A|\).

With this fact in mind, we were led to wonder whether any MSTD sets existed at all. It turned out that such sets do exist. Especially industrious readers constructed their own example, while everyone else looked at the answers at the bottom of the last post.

Finally, we closed with the question, "How many MSTD sets are there?" This post continues by examining this question.


How many MSTD sets are there?

What exactly do we mean by "how many"? After all, we showed last time that there were infinitely many! So I guess our work is done!

Not quite. There are several ways to make this question more interesting, some of which were mentioned in the previous post. Here we'll focus on a version of this question that can be explored using computation:

For each \(n\), how many MSTD sets are there in \(\{1, \dots, n\}\)?

Let's define \[a_n = \#\{A\subseteq\{1, \dots, n\} \mid A \text{ is MSTD}\}.\] Then the above question simply asks for the value of \(a_n\).

It is probably unreasonable to expect a closed form expression for \(a_n\) in terms of \(n\), so our approach will be to compute the answer for small \(n\) and see what we see.

Counting MSTD sets: A straightforward approach

Let's dive right in. To compute \(a_n\), we'll take the following straightforward approach:

    answer = 0
    for each subset A of {1, ..., n}:
        if A is MSTD:
            answer = answer + 1

To refine this into an actual implementation, we'll need a way of representing sets of integers, a way of considering all subsets of a range of integers, and a way to check if a particular set is MSTD.

To represent sets, we'll use the standard bitset representation. A set \(A \subseteq \{1, \dots, n\}\) is represented as an \(n\)-bit binary number, whose \(i\)-th bit is 1 if and only if \(i\in A\). With this representation, we can iterate over all subsets by iterating over all \(n\)-bit numbers. Our implementation is starting to take shape.

    typedef long long bitset;

    int a_n(int n) {
        int answer = 0;
      
        bitset s;
        for (s = 0; s < (1LL << n); s++) {
            if (is_mstd(n, s)) {
                answer++;
            }
        }
  
      return answer;
    }

This is a straightforward translation of the above pseudocode into C. One difference is that we represent all sets as 64-bit integers for convenience, which constrains us to \(n\le 64\). It now remains to define is_mstd.

We'll take a very straightforward approach to this as well. For all pairs of elements of \(A\), we compute their sum and both differences, adding them to a set. At the end we look at the sizes of the sets to decide whether \(A\) was MSTD.

    is_mstd(A):
        sums = {}
        diffs = {}
        for x in A:
            for y in A:
                add x + y to sums
                add x - y to diffs
        return |sums| > |diffs|

Again, a few details show up on the way to a C implementation of this algorithm. We need three new operations on bitsets: add an element, query for membership, and count the number of elements.

    int bitset_get(bitset s, int i) {
        return (s >> i) & 1;
    }
    
    bitset bitset_set(bitset s, int i) {
        return s | (1LL << i);  // this LL is important.
    }
    
    int bitset_count(int n, bitset s) {
        int result = 0;
        int i;
        for (i = 0; i < n; i++) {
            result += bitset_get(s, i);
        }
      
        return result;
    }

There is a small gotcha in the implementation of bitset_set. If one leaves off the LL, then the left shift occurs in 32-bit arithmetic, causing undefined behavior when i is 32 or larger. So, don't forget your LLs, folks!

Adding a member and checking for membership are simple bitwise operations. To count the number of elements, we simply loop over each possible element, incrementing a counter for each member.

We're now ready to translate is_mstd.


    int is_mstd(int n, bitset s) {
        bitset sums = 0; 
        bitset diffs = 0;
      
        int i, j;
        for (i = 0; i < n; i++) {
            if (!bitset_get(s, i)) continue;
            for (j = 0; j < n; j++) {
                if (!bitset_get(s, j)) continue;
          
                sums  = bitset_set(sums, i + j);
                diffs = bitset_set(diffs, n + i - j);
            }
        }
      
        return bitset_count(2*n, sums) > bitset_count(2*n, diffs);
    }

So, how many MSTD sets are there?

Now we can run our program to answer our question! Instead of just counting MSTD sets, we instead look at the proportion of sets in \(\{1, ..., n\}\) that are MSTD. This normalizes for the size of the interval we're searching.

Without further ado, the proportions for \(n = 1\) to \(23\).

Do you notice anything conspicuous? How about the fact that the proportions are strictly increasing? Perhaps this is always the case. To investigate this question, we can compute \(p_n\) for larger values of \(n\). Unfortunately, this gets slow, fast!

How fast is this program?

In short: hopelessly slow. Go ahead, try it for \(n > 20\). On my laptop it takes about 7 seconds to compute the answer for \(n=23\).

At first glance we might throw up our hands and say, "The problem is exponential, so there's no hope of an efficient implementation." You'd be partially right: there is (probably) no hope of finding an efficient implementation. But that doesn't mean there isn't a more efficient implementation. Next time we'll speed this program by a factor of 500 or so. Doing so will let us answer our question about whether the proportion always increases (you can probably already guess the answer...).