How to build a simple system in Verdi

In this long-awaited post, we'll show how to implement and verify a simple distributed system.

Last time we described how Verdi models the execution environment of distributed systems using network semantics. In this post, we'll use those semantics to verify a simple primary-backup distributed counter.

The system maintains a counter, which external clients can increment by sending a request. Internally, the system consists of two nodes, a primary and a backup, each of which maintains a copy of the counter. Clients send increment requests to the primary, which increments its local copy of the counter and forwards the increment request to the backup node. When the backup receives the request, it increments its local copy of the counter and responds to the primary with an acknowledgment. Finally, when the primary receives the acknowledgment, it informs the external client that the request has been completed.

This is a very simple system. Most glaringly, it is write-only; there is no way for the external clients to ask what the current value of the counter is. However, this system already illustrates the basic ideas and proof techniques required to use Verdi. After understanding the code and proof below, it's a good exercise to extend the system with reads.

The property we will prove of this system is that the backup's copy of the counter is always less than or equal to the primary's copy. The backup's copy might be strictly less than the primary's due to pending requests in the network, but the backup can never be ahead of the primary.

There are a few things to note about this property. First, it does not guarantee that the backup ever does any work. An implementation in which the primary immediately responded to client requests without contacting the backup would also satisfy the property. Second, the property is internal and state-based; it says nothing about how the system interacts with external clients. Verdi also supports proving external properties by reasoning about the trace, but we will not show how to do that here. Despite these shortcomings, this property illustrates the basics of reasoning about systems in Verdi.

This blog post is a literal Coq file. For the purposes of presentation, I've omitted some low-level details, but you can always refer to the source for the full story. Also, this series of blog posts presents a simplification of the actual Verdi sources. If you're curious the Verdi distribution contains an implementation of this system as an example.

Implementing the Primary-Backup Counter System

We will use the network semantics from last time, so we first import that code.

Require Import NetworkSemantics.

To implement a system using the network semantics, we have to declare types describing the nodes of the system, the messages exchanged internally by the system, the type of state kept at each node, the type of input requests made by external clients, and the type of output responses.

These types for the counter system are given below. There are two nodes, the primary and the backup. They exchange two messages: increments and acknowledgments. The local state of each node is just a natural number representing the counter. The system expects increment requests as inputs and sends increment responses as output.

Inductive node := primary | backup.

Inductive msg := inc | ack.

Definition state := nat.

Inductive input := request_inc.

Inductive output := inc_response.

With the types out of the way, we're ready to implement the system. The implementation consists of an initialization function that computes the initial state of each node, a function that processes external inputs, and a function that processes internal messages from other nodes.

The initial state of each node's copy of the counter is 0.

Definition initState (_ : node) : state := 0.

The functions for handling input and messages are written in a monadic style, with read-write access to the local state, the ability to send messages to other nodes, and the ability to send output to external clients.

Definition handler_monad A :=
  state -> A * state * list packet * list output.

Definition handler :=
  state -> state * list packet * list output.

Definition do {A : Type} (m : handler_monad A) : handler :=
  fun s => let '(a, s', ps, os) := m s in (s', ps, os).

handler_monad is the type of these monadic actions, while handler is isomorphic to handler_monad unit. The latter is used by the network semantics, which has no notion of the monad. The suggestively named function do converts a monadic action into a handler by ignoring the return value.

At this point in the code are definitions for the usual ret (aka return) and bind operations, and of monadic actions for reading and writing the local state, and for sending messages and external output. While these definitions are left out this post, you can view the source for the details.

The core action of the implementation is incrementing the local counter. We factor this out into a helper function do_inc, as follows.

Definition do_inc : handler_monad unit :=
  x <- get ;; set (x + 1).

do_inc uses the monad to get the local state and then set it to be one more, as expected.

The function processInput handles external inputs from clients.

Definition processInput (h : node) (i : input) : handler :=
  do
    match h with
    | primary => match i with
                | request_inc => do_inc;; send backup inc
                end
    | _ => nop
    end.

processInput takes the name of the node it's running on as its first argument, and the input being processed as its second argument. It returns a handler, which is defined (not shown) to be

It then does case analysis to see if it is running on the primary or the backup. If it is running on the primary, then it performs the increment locally and sends an inc message to the backup. On the other hand, if processInput is running on the backup, it does nothing, effectively ignoring the request. External clients are expected to communicate only with the primary.

Note that there is an unnecessary case analysis on the input above. In this simple system, there is only one possible input, but I've included the useless pattern matching because typically systems will have more than one possible input.

The function processMsg handles messages from other nodes in the system.

Definition processMsg (h : node) (m : msg) : handler :=
  do
    match h with
    | backup => match m with
               | inc => do_inc ;; send primary ack
               | _ => nop
               end

    | primary => match m with
                | ack => out inc_response
                | _ => nop
                end
    end.

If processMsg is running on the backup and receives an increment message, it performs an increment and sends an acknowledgment to the primary. If processMsg is running on the primary and receives an acknowledgment, it outputs an inc_response to the outside world. In all other cases, processMsg ignores the incoming message. It is an invariant of the system that the only messages in the network are increments to the backup or acknowledgments to the primary, so these cases never arise. When we get around to verifying the system we will either have to prove that these cases never arise, or at least show that they do not violate our specification.

This completes the implementation of the system.

Verifying the Counter System

We're now ready to formalize our correctness property, that the backup never gets ahead of the primary.

Definition backup_le_primary (w : world) : Prop :=
  localState w backup <= localState w primary.

We can then state the correctness theorem, which says that in any reachable world of the system, the property is true.

Theorem backup_le_primary_true :
  forall w,
    reachable w ->
    backup_le_primary w.

At this point we might be tempted to proceed by induction on reachable worlds, showing that the property is true of the initial world, and that it is preserved by every step of the system. However, a moment's thought shows that our property is not inductive! That is, it is not preserved by arbitrary steps of the system.

In particular, the property constrains only the state of the nodes, and says nothing at all about what messages are in flight in the network. For example, consider a world where the primary and backup each have the value 10 for their local copy of the counter, but where there are also 5 increment requests in the network on their way from the primary to the backup. Note that this world satisfies the property, so if we were to reason inductively, we would be obliged to show that any step from this world also satisfies the property. But this is not true, since a step that delivers one the requests in the network to the backup will cause it to get ahead of the primary! The resolution is that even though the world where both counters are 10 and there are 5 increments destined to the backup satisfies backup_le_primary, this world is not actually reachable.

In order to prove that the backup never gets ahead of the primary, we need to find an inductive invariant. That is, a property which is true in the initial world and is preserved by all possible steps of the system. We also need the inductive invariant to imply our desired property. The discussion above shows why our property itself cannot be used as an inductive invariant. Since the above counterexample exploited the fact that backup_le_primary placed no constraints on the messages in the network, it stands to reason that an inductive invariant for this property will constrain the network in some way.

If you've never done this kind of thing before, I'd recommend stopping here and trying to come up with an inductive invariant on your own. This is where all the creativity in program verification is; it's really fun, and it's a good thing to get good at.

Welcome back.

One way to come up with an inductive invariant in this case is to think about why the counterexample world above is not reachable. Recall that the primary and backup both had a counter value of 10, but there were 5 additional increments in the network. Something is already wrong here: in some sense the primary has already given the backup permission to get 5 steps ahead of it. But looking at the code for the primary, this can never happen. The primary never sends an increment message without also incrementing its local counter.

There are a couple ways of formalizing this intuition. One would be to say that the local state of the backup plus the number of increments in the network on their way to the backup is less than or equal to the local state of the primary. This would work fine (it is inductive and implies the desired property), but the proof is a little bit simpler if we use an even stronger invariant, namely, that the above inequality is actually an equality. That is, the local state of the backup plus the number of increments on their ways is equal to the local state of the primary.

You should take a moment to convince yourself informally that this really is an inductive invariant.

To sum up this discussion, to prove that the backup never gets ahead of the primary, we will first prove the inductive invariant, and then show that the property we actually care about follows. First, let's give up on completing the proof directly.

Abort.

To define the inductive invariant, we will need to count the number of increment messages in the network that are destined to the backup. This is the job of the function count.¹

Definition count (l : list packet) : nat :=
  length
    (filter (fun p : packet =>
               node_eqb (dest p) backup &&
               msg_eqb (payload p) inc)
            l).

The network semantics models the collection of in-flight packets as a list, so our packet-counting function takes a list of packets and returns a nat. It first filters the list, retaining only increment messages destined to the backup, and then takes the length of the list. I have omitted the definitions of node_eqb and msg_eqb, which check for equality of nodes and messages, but they are available in the source (and do what you expect).

Now we can state our inductive invariant.

Definition backup_plus_count_eq_primary (w : world) : Prop :=
  localState w backup + count (inFlightMsgs w) = localState w primary.

In practice, when developing an inductive invariant, it is a good idea at this point to first prove that the stated invariant implies the desired top-level specification. This is true in this case because our inductive invariant is an equation of the form a + b = c, and our specification is a <= c, which follows because a, b, and c are all nonnegative.

We're now ready to prove the inductive invariant.

Here's how I might convince a human of this. The invariant is true in the initial world because the local state of each node is 0 and there are no in flight packets, so the count is also 0. Now suppose we're in an arbitrary world satisfying the invariant and consider all possible steps of the system. There are only two interesting cases.

An external input is delivered to the primary. The primary increments its local state and sends an increment to the backup. This preserves the equality because it adds one to both sides.
An increment message is delivered to the backup. The backup increments its local state and sends an acknowledgment to the primary. This preserves the equality because it subtracts one from the count but adds one to the backup's local state. These cancel out because they are on the same side of the equation.

In all other cases, no quantities mentioned in the invariant change.

Let's now convince Coq of this argument.

Lemma backup_plus_count_eq_primary_true :
  forall w, reachable w ->
       backup_plus_count_eq_primary w.
Proof.
  intros w Hreach.
  induction Hreach as [|w1 w2 Hstep _ IH].
  - (* base case *)

Our goal is to prove that the inductive invariant is true in all reachable worlds. We proceed by induction on the execution leading to w. (You can hover over the proof script to see the context.)

    reflexivity.

The base case is easy because in the initial world, each node has a local state of 0, and the network is empty.

  - invc Hstep.
    + (* input step *)

We begin the inductive case by considering each possible step of the system. The first case to consider is when an external input i is delivered to a node n.

      unfold processInput in *.
      destruct i, n; invc H;
        unfold backup_plus_count_eq_primary, state in *;
        simpl; rewrite update_same, update_diff by congruence.
      * (* input delivered to primary *)

We proceed by case analysis on i and n, use inv to "execute" the monadic handler action, unfold the definition of the invariant, and then rewrite by some helper lemmas about update. (Recall from the network semantics that update takes a function modeling the global state name -> state, a host, and a new local state for that host, and returns a function modeling the new global state, where the local state of the given host has been updated.)

The first subgoal is when the input is delivered to the primary.

        rewrite count_cons_backup_inc.
        omega.

We use a lemma that states that count counts increments to the backup (lemma statement not shown).

The next case is when the input is delivered to the backup, and it's easy.

      * auto.
    + (* msg step *)

That takes care of all possible input steps. We now consider steps that deliver a message.

      unfold processMsg in *.
      destruct p. simpl in *.
      destruct dest, payload;
        invc H0; unfold backup_plus_count_eq_primary, state in *;
        simpl; rewrite update_same, update_diff by congruence.
      * (* primary, inc *)

We proceed by case analysis on the destination and payload of the message. The first case to consider is when an increment message arrives at the primary. In fact, this will never happen in any reachable world, but instead of explicitly ruling it out, we have stated our invariant so that it is preserved even in this case. This case follows formally from a lemma stating that increments to the primary are not counted (lemma statement not shown) and the induction hypothesis.

        now rewrite <- IH, count_remove_primary.

The next case is when an acknowledgment is delivered to the primary. This case follows from a lemma stating that acknowledgments are not counted and the induction hypothesis.

      * now rewrite <- IH, count_remove_ack.

The next case is the only interesting kind of message delivery: when an increment is delivered to the backup. In addition to incrementing the local state, the backup sends an acknowledgment to the primary, so we use a lemma stating that messages to the primary do not affect the count. The execution of this step implies that the world before the step contained an increment on its way to the backup, so we use a lemma stating that this contributes exactly 1 to the count, which justifies the backup incrementing its count by 1.

      * rewrite count_cons_primary.
        find_apply_lem_hyp count_remove_backup_inc.
        omega.

The very last case to consider is when an acknowledgment is delivered to the backup. Again, this cannot actually happen in a reachable world, but our inductive invariant is still preserved. The case follows from the induction hypothesis and a lemma stating that acknowledgments do not affect the count.

      * now rewrite <- IH, count_remove_ack.
Qed.

This completes the proof that the inductive invariant is true in all reachable worlds.

We can finally conclude that our desired top-level specification is true. The proof considers a reachable world w, uses the above lemma to establish the inductive invariant in w, and then concludes by showing that the inductive invariant implies the specification.

Theorem backup_le_primary_true :
  forall w,
    reachable w ->
    backup_le_primary w.
Proof.
  intros w Hreach.
  apply backup_plus_count_eq_primary_true in Hreach.
  unfold backup_plus_count_eq_primary, backup_le_primary in *.
  omega.
Qed.

Conclusion

In this post, we've seen how to implement and verify a simple system using network semantics in the style of Verdi. Please do look at the source for more detail. If you want to play around with extending the system in various ways, here are some suggested exercises:

Extend the system with an "increment by n" operation. Find a new state-based specification that characterizes the behavior of your system and prove it.
Extend the system with a decrement operation. Find a new state-based specification that characterizes the behavior of your system and prove it. What difficulties arise?
Extend the system with reads and prove that the specification is still true.
In a version of the system without the decrement extension, prove that the backup's state is always at least as large as the number of responses in the trace. This rules out a buggy implementation that doesn't actually replicate requests. You'll need to look at the network semantics to figure out how to prove properties of the execution trace, which I haven't discussed at all here.
Prove that the number of responses in the trace is less than or equal to the number of responses. This is a completely trace-based/external specification.
Notice that this system does not tolerate duplicate messages: if an increment message to the backup gets duplicated by the network, then the backup could get ahead of the primary. Implement a version of this system that tolerates duplicates and prove that it does so. (This is harder than the previous exercises. You'll need to complete the exercise from the network semantics post to get a semantics with duplication.) Is there anything about the way you tolerate duplicate messages that is specific to this counter system?

In the next post on Verdi, we'll discuss verified system transformers, which are a way of implementing and verifying fault tolerance mechanisms in a system-agnostic way.

Note that we only count increment messages that are destined to the backup, instead of all increment messages. While these quantities are actually equal in all reachable states of the system, this fact would require an additional (inductive) proof. By counting only increments to the backup, we avoid this complication.↩

How to build a simple system in Verdi

James Wilcox

April 24, 2016

Implementing the Primary-Backup Counter System

Verifying the Counter System

Conclusion