Emina Torlak

Part 2 of the Tutorial on Solver-Aided Programming

The following notes are the second (and last) part of the tutorial on solver-aided programming presented at CAV 2019. They are intended to be read together with the accompanying slides and code.

You might also want to try the exercises for this part of the tutorial and then take a look at the reference solutions.

The tutorial starts with Part 1, which covers the basic concepts of solver-aided programming.

Going pro with solver-aided programming

In this part of the tutorial, we’ll see how to build a solver-aided programming tool.

We’ll start by taking a brief look at the classic way to build solver-aided tools and why this is hard to do.

Next, we’ll see how Rosette helps you bypass the most difficult part of this process, by allowing you to define the semantics of your target language and get the tool(s) for free.

Third, we’ll go over how Rosette works; this is useful to know if / when you run into performance problems.

Finally, we’ll talk about some of the recent applications built with Rosette!

The classic (hard) way to build a solver-aided tool

Recall the solver-aided programming tool chain: the tool reduces a query about program behavior to a logical formula, which is then solved with an off-the-shelf SMT solver.

What all of these tools have in common is the need for an engine—a symbolic compiler—that can translate questions about program behavior into satisfiability queries.

Building this compiler is what takes a long time and a lot of expertise.

Wanted: an easier way to build tools

Rosette offers a new way to build verification and synthesis tools that saves you from the need to build a symbolic compiler.

Instead, you simply implement an interpreter or a library that provides the semantics for your language, and Rosette automatically compiles both your interpreter and programs in your language to constraints.

This is done with the help of a Symbolic Virtual Machine, which combines symbolic execution and bounded model checking in a new way. The technical challenge addressed by the SVM is how to efficiently translate both a program in some language and an interpreter for that language to constraints.

Tools as languages

So how do you create tools by creating languages?

We will first introduce the two basic approaches to language creation, and then see how to use one of these approaches to construct a set of solver-aided tools for a toy language.

Layers of languages

Using Rosette essentially involves building a domain-specific language (DSL) in which to describe the class of problems you want to solve.

A DSL is a formal language specialized to a given domain.

You build DSLs by implementing them in a general-purpose host language in one of two ways: by writing either a library (shallow embedding) or an interpreter (deep embedding).

Some examples of DSLs and host languages are listed below, and you’ve probably used at least some of these languages.

So, why bother building DSLs? Why not just use general purpose languages to express and solve problems? There are two main reasons.

First, DSLs include high-level constructs that make it easy to succinctly express problems in the target domain. For example, consider matrix multiplication expressed in Eigen (C = A * B) versus C (a triple-nested loop): the former is much shorter than then latter. Of course, the less code you write, the less opportunity there is to introduce bugs.

Second, tools that operate on DSLs can exploit high-level properties of the domain (such as associativity of matrix multiplication) for easier verification or for better performance. These properties cannot, in general, be recovered from general-purpose code.

Layers of solver-aided languages

Creating a solver-aided DSL, with built-in support for verification and synthesis, is done in exactly the same way as building a regular DSL, through a shallow or deep embedding.

The only difference is that the host language runs on top of a non-standard virtual machine, which can translate both your DSL implementation and programs in that DSL to constraints.

A few examples of tools built with Rosette are listed here. As you can see, they span a wide range of problem domains, from intelligent tutoring to compilation to radiotherapy control software. We will talk about three of these tools in detail later on.

But first, let’s see how to build these tools.

A tiny example SDSL (demo)

For our example, we will build a set of solver-aided tools for programs in a toy assembly-like language. Such a language is at the core of many low-level synthesis and verification tools.

We want to be able to execute and test BV programs, but also to verify, debug, and synthesize them.

To do so, we will build an interpreter for BV in Rosette, with about 10 lines of code, and we will get everything else for free.

BV interpreter

We use a simple representation of BV programs in which an instruction is a list of symbols such that

For example, bvmax is a BV program that computes the maximum of two 32-bit integer without doing any branching:

(def bvmax
  ([0 r0] [1 r1]) 
  (2 bvsge 0 1)
  (3 bvneg 2)
  (4 bvxor 0 2)
  (5 bvand 3 4)
  (6 bvxor 1 5))

The BV interpreter gives meaning to BV programs:

(define (interpret prog inputs)
  (make-registers prog inputs)
  (for ([stmt prog])
    (match stmt
      [(list out opcode in ...)
       (define op (eval opcode))
       (define args (map load in))
       (store out (apply op args))]))
  (load (last)))

Given a program and an input, the interpreter first allocates enough registers to represent the inputs and outputs of every instruction. The first $n$ registers are populated with inputs to the program—in the case of bvmax, $n$ is two.

Next, for each instruction in the program, we use Racket’s pattern matching to extract the instruction’s output register, opcode, and list of inputs into local variables out, opcode, and in.

We then use Racket’s eval to obtain the procedure that corresponds to this opcode; we load the values from the input registers into the args variable; we apply the procedure to these values, and store the result in the output register.

This process is repeated for every instruction.

(Note that we wouldn’t use eval in a robust SDSL implementation because it’s not lifted by Rosette and thus can only be used in limited ways; we’re including it here to demonstrate that it can be used with great (!) care.)

In the end, we return the value of the last register as the program’s output.

Before we proceed with our implementation of BV tools, note that the BV interpreter uses advanced programming language features such as first-class procedures, higher-order procedures, pattern-matching, side effects, and dynamic evaluation. Each of these features is challenging to translate to constraints using standard symbolic evaluation techniques. We’ll see shortly how Rosette manages to make this work.

BV verifier

Suppose that we want to verify that bvmax behaves exactly like a reference max implementation, which uses a conditional.

(define (max x y)
  (if (equal? (bvsge x y) (int32 1)) x y))

To build a verifier for BV, we use Rosette’s verify construct as follows:

(define (ver impl spec)
  (define-symbolic* in int32? [(procedure-arity spec)])
  (evaluate in (verify (assert (equal? (apply impl in) (apply spec in))))))

First, we create $n$ fresh symbolic constants of type bitvector-32, which are finite-precision, 32-bit integers.

Next, we use Rosette’s verify form to search for a concrete interpretation of these symbolic constants that causes any assertion to fail.

In our case, the verifier says that bvmax differs from max on the input #x7ffefbdf and #x40000001.

> (ver bvmax max)
'((bv #x7ffefbdf 32) (bv #x40000001 32))

> (max (bv #x7ffefbdf 32) (bv #x40000001 32))  ; spec
(bv #x7ffefbdf 32)

> (bvmax (bv #x7ffefbdf 32) (bv #x40000001 32)) ; buggy impl
(bv #x3ffefbdf 32)

Indeed, when we execute bvmax on this input, we see that it produces the wrong result.

Where is the bug? Even in this very simple example, it’s hard to tell. So we’ll ask the solver to find and fix it for us.

BV debugger

Creating a debugger is analogous to creating a verifier.

(define (dbg impl spec in)
  (render (debug [register?] (assert (equal? (apply impl in) (apply spec in))))))

We implement a BV debugger using Rosette’s debug form. It asks the solver to find a minimal subset of BV register expressions that need to be changed to fix the bug.

Applying the debugger to our example, we see that it highlights the first input register at lines 3 and 6, and the last register at line 4:

> (dbg bvmax max (list (bv #x7ffefbdf 32) (bv #x40000001 32)))

In general, solver-aided debugging is slow and fragile, so using this feature in real applications is not recommended. Its main use case is for educational purposes—explaining and demonstrating this form of debugging on small examples.

BV synthesizer

We can now replace the highlighted registers with ?? and ask the synthesizer to replace them with register names so that bvmax is equivalent to max on all 32-bit integers.

(define (syn impl spec)
  (define-symbolic* in int32? [(procedure-arity spec)])
  (print-forms
   (synthesize 
     #:forall in
     #:guarantee (assert (equal? (apply impl in) (apply spec in))))))

(def bvmax??
  ([0 r0] [1 r1]) 
  (2 bvsge 0 1)
  (3 bvneg (??))
  (4 bvxor 0 (??))
  (5 bvand 3 4)
  (6 bvxor (??) 5))

> (syn bvmax?? max)
'(def
  bvmax??
  ((0 r0) (1 r1))
  (2 bvsge 0 1)
  (3 bvneg 2)
  (4 bvxor 0 1)
  (5 bvand 3 4)
  (6 bvxor 1 5))

The synthesizer then fills in the holes with the right registers.

The resulting program is now correct on all inputs.

Symbolic Virtual Machine (SVM)

Now that you know how to build an SDSL, let’s see how Rosette works.

How it all works: a big picture view

At a high level, Rosette takes as input a program in a DSL, an interpreter for that DSL, and a query about program behavior. It uses the SVM to translate all this to constraints, which are then solved with Z3. The result produced to Z3 is then translated back into an answer that is meaningful at the DSL level.

But how can we perform this reduction to constraints in the presence of complex language features used by BV: that is, how do we go from Racket to SMT?

Translation to constraints by example

We will see how SVM works on the following example query, written in pseudocode:

vs = (a, b)
solve:
  ps = ()
  for v in vs:
    if v > 0:
      ps = insert(v, ps)
  assert len(ps) == len(vs)

Our goal is to translate this query into constraints that are satisfiable whenever the output list ps has the same length as the input list vs.

Suppose that the input list consists of two symbolic integer constants, a and b.

In this case, an ideal symbolic compiler would produce the formula $a > 0 \wedge b > 0$.

Design space of precise symbolic encodings

There are two standard ways to translate a finite program to constraints: symbolic execution and bounded model checking.

Symbolic execution translates each path through the code individually, leading to path explosion. That is, the size of the encoding is, conceptually, exponential in program size.

However, the SE encoding has one very nice property. The output formula for our example is expressed solely in terms of a and b! Even though the program is operating on lists, those operations don’t make it into the formula. They are evaluated away concretely along each path.

Bounded model checking avoids path explosion by merging states eagerly at every control flow join. But immediately after the first iteration, we run into trouble: we can no longer execute list insertion concretely, so have to encode the meaning of lists and insertions into the formula.

While encoding lists is possible (but tedious), our target language has all sorts of features that we don’t want to try and translate to constraints.

A new design: type-driven state merging

So, how do we get the simple formulas and the concrete evaluation that comes from symbolic execution, while producing a polyomially-sized encoding, like BMC?

Rosette’s answer is to change the way we perform merging. In particular, we will merge values of

So, let’s see in detail how this new kind of merging works on our example.

The first two steps are the same as BMC: we execute the then and else branches of the conditional, and we are now ready to merge them. Given that ps is null in the then branch and (a) in the else branch, how do we merge these two values? Since they have different shape (length), they cannot be merged structurally. We therefore merge them using a symbolic union.

Next, we execute the then and else branch again, this time on the symbolic union. The way we do that is by performing the list insert procedure on every element of the union individually. That is, we execute insertion concretely.

Now, we get to a tricky part. How do we merge two unions? What would happen if we merge them naively by performing a set-union of their contents? We would get state explosion! The size of the unions would grow exponentially with program size (that is, with the number of control flow joins).

To avoid this, we maintain the following invariant when merging unions: each union contains at most one value of a given primitive type, and at most one list of size $k$.

In our example, we merge the two lists of size 1 element-wise, and we add the lists of size 2 and 1 to the output union.

This merging process ensures that the size of the unions grows polynomially with the size of the program’s execution graph. See the PLDI 2014 paper for a proof sketch.

Finally, given the last state of ps, we can evaluate the assertion concretely on each element of the union, and find that it only holds for the guard $g_2$, so the encoding that we send to the solver is very close to the encoding that the ideal compiler would produce.

To sum up, the SVM combines symbolic execution and BMC to get the best of both worlds: a polynomially-sized encoding and concrete evaluation of tricky language constructs.

(Sym)Pro tip: use symbolic profiling to find performance bottlenecks

Just as certain algorithms, data structures, and coding patterns perform better than others under concrete evaluation, the same is true for programs that run under Rosette’s symbolic evaluation engine.

To help optimize solver-aided code, Rosette provides a tool called SymPro that can identify performance bottlenecks in the symbolic evaluation of these programs.

We have used SymPro to find and repair bottlenecks in many real applications, including some discussed below, improving their performance by orders of magnitude.

To learn more about SymPro and how it works, see our OOPSLA 2018 paper and the Performance chapter of the Rosette Guide.

Applications

In the rest of tutorial, we will briefly talk about three recent applications of Rosette.

Verifying a radiotherapy system

Our first application is a verifier for safety critical code—specifically, a controller for a radiotherapy machine at UW.

The Clinical Neutron Therapy System (CNTS) has provided 30 years of incident-free service to patients. It is controlled by custom software, built by CNTS engineering staff. The third generation of Therapy Control software has been built a few years ago.

The Therapy Control Software takes as input a prescription, which specifies various machine settings, and the readings from various hardware sensors. Based on these inputs, it controls the machine movements, and most importantly, its job is to shut off the beam if any of the machine settings go out of prescribed parameters during treatment.

The new Therapy Control Software is written in a dataflow language called EPICS, which is commonly used for controlling scientific instruments (telescope, particle accelerators, etc.).

EPICS semantics can be tricky to reason about, so the CNTS engineers wanted additional insurance that their code was doing the right thing—in particular, that it would stop the beam if anything went wrong.

So we built a verifier for EPICS by implementing an interpreter for it in Rosette.

The initial prototype of this verifier was built by Calvin Loncaric (who was a 2nd year grad and first-time user of Rosette at that time) in just a few days. The verifier scales to 90,000 lines of code, and it found two safety-critical bugs in a pre-release version of the therapy controller software. One of these bugs was in the EPICS runtime—in particular, the runtime didn’t implement the documented semantics, and the therapy code depended on that bug for correct operation.

This work is described in detail in our CAV 2016 and ICALEPCS 2017 papers.

Synthesizing strategies for games and education

Our second application is a system for synthesizing programs to be executed by humans rather than computers; in particular, these programs represent strategies for solving problems in various domains, from K-12 algebra to logic puzzles.

For example, here is an instance of a popular logic puzzle called Nonograms, where the goal is to fill out a grid with T/F values according to the hints provided next to each row and column. We use black squares to represent T values and red crosses to represent F values. The hint states how many contiguous blocks of cells in a given row or column are filled with T. So, the hint “2 1” for the middle column says there are two blocks of T cells: the first is 2 cells long and the second one is 1 cell long, and all of these blocks have to be separated by at least one F cell.

This is the game mechanics for Nonograms, and a computer solves can easily solve a logic puzzle by reducing its game mechanics to backtracking search, essentially SAT solving because the general version of the game is NP-complete, just like Sudoku.

This fundamental hardness is what makes the game interesting, but human players wouldn’t want to play a game via search and backtracking! Instead, people such solve puzzles by using multiple strategies to make progress without guessing.

Finding these strategies is a key challenge in game design, because if they don’t exist for a game, or are not effective, people won’t enjoy playing it. And you also need these strategies to create puzzle instances of various difficulties.

In practice, finding strategies is done through human testing, which is expensive, and it is not guaranteed that you’ll find the best possible strategies in this way.

To give you an idea of what these strategies look like, here is an illustration of a popular strategy for Nonograms called the Big Hint. This strategy is a rule of deduction that is sound with respect to the game mechanics. It says that for a sufficiently large hint, you know that some cells in the middle must be T no matter where the continguous block starts. There are also versions of this rule for multiple hints, as well as many other rules you will find in Nonograms tutorials. What we wanted to do is find these rules automatically.

To do so, we built a system that takes as input the game mechanics and a set of example game states—think of them as unfilled or partially filled rows or columns, along with the hint.

Given these inputs, the system produces an optimal set of most concise, general, and sound strategies for playing the game, expressed in a high-level domain-specific language, implemented in Rosette.

The initial prototype of this tool was developed by Eric Butler in just a few weeks.

Eric’s tool synthesized strategies that outperform documented strategies for Nonograms, both in terms of coverage and quality.

We have also applied this kind of a framework to synthesizing custom strategies for solving K-12 algebra problems and proofs for propositional logics, recovering and outperforming textbook strategies for these domains.

You can learn more about this work from our FDG 2017 and VMCAI 2018 papers, and even more from Eric’s thesis.

Verifying systems software

Our last application is a new framework called Serval for verifying low-level software that makes up an operating system (OS).

As a quick reminder, an OS is a set of software components that mediate access to hardware and provide services to user applications. Bugs in these components have serious implications for reliability, security, and performance of computer systems, from causing significant performance degradation to allowing malicious applications to compromise the entire system (see, e.g., this OSDI 2014 paper).

Many parts of an OS have been verified using interactive theorem provers (such as Coq and Isabelle) as well as autoactive verifiers (such as Dafny). But this involved significant manual effort: for example, the seL4 kernel took 11 person-years to verify, with a proof-to-implementation ratio of 20:1.

To address this problem, we developed Serval, a framework that uses Rosette to verify low-level systems code fully automatically and with low proof burden (e.g., no code level annotations such as loop invariants). Of course, full automation is possible because we give up some generality, by requiring the components to have finite interfaces. It turns out that many OS components already do (or can be retrofitted to) satisfy this restriction, including security monitors and JIT compilers for in-kernel DSLs.

Serval implements verifiers for binaries written in RISCV, LLVM, BPF, and x86. To extend Serval with a verifier for a new target language, the developer simply writes a new Rosette interpreter for that target. Serval provides libraries for modeling the memory and dealing with unstructured control flow in a way that is amenable to efficient symbolic evaluation. Luke Nelson, the lead developer of Serval, added these verifiers to the framework in just a few weeks.

We used Serval to verify three existing security monitors: CertiKOS, Komodo, and Keystone. The first two of these have been previously verified using Coq (CertiKOS) and Dafny (Komodo); replicating these efforts shows that Serval can prove key properties (e.g., non-interference) of system components with full automation and low developer burden. The third monitor (Keystone) is a previously unverified open-source system for RISCV, and while verifying it with Serval, we found both design-level bugs (violations of non-intereference) and implementation-level bugs (buffer overflows).

We have also used Serval to verify two Just-in-Time (JIT) compilers for the Berkeley Packet Filter (BPF) language that are part of Linux: one compiles BPF to 64-bit RISCV, and the other to 32-bit x86. The verification process revealed 15 bugs, which we patched and our patches were accepted by the Linux developers.

This work is currently under submission; stay tuned for details.