## Part 1 of the Tutorial on Solver-Aided Programming

The following notes are the first part of the tutorial on solver-aided programming presented at CAV 2019. They are intended to be read together with the accompanying slides and code.

You might also want to try the exercises for this part of the tutorial and then take a look at the reference solutions.

The tutorial continues in Part 2, which covers more advanced topics.

### Getting started with solver-aided programming

What is solver-aided programming?

Briefly, it is a new programming model that integrates SMT solvers into a programming language. Essentially, a solver-aided programming langauge provides dedicated language constructs for verifying and synthesizing code with the help of an SMT solver. The key is that these constructs allow you to interact with the solver at a very high-level: you’ll never need to write any SMT constraints yourself; instead, the language mediates the interaction with the solver for you.

This tutorial introduces solver-aided programming in two parts.

First, we’ll cover the basics of solver-aided programming, and we’ll learn how to use an existing solver-aided language to verify and synthesize small pieces of code.

But for synthesis and verification to scale, you have to provide a lot of domain
or application-specific knowledge to the underlying solver, and we do that by
building our own *solver-aided domain-specific languages*. So in the second
part of the tutorial, we’ll see how to build such a language via direct
symbolic evaluation or, more easily, via language embedding.

The ideas that we’ll cover in this tutorial have a long history, and pieces of them have been realized in many different langauges and tools. We will focus on one specific solver-aided programming language, Rosette, which has been used in many different applications, both in industry and academia, from verifying safety-critical software to synthesizing efficient code for low-power hardware.

### Classic programming

#### From spec to code

At the most basic level, programming is the task of translating some specification you have in mind into a programming language understood by a computer.

#### Check the code against the spec

In the classic programming workflow, we check that a program satisfies its
specification by testing it: we express the specification as some function
(see, e.g., `safe`

), that returns true if the program produces the right output,
and we check the program’s output on a few concrete inputs.

```
P(x) {
...
...
}
assert safe(2, P(2))
```

### Solver-aided programming concepts and workflow

#### Symbolic values

Solver-aided programming generalizes the classic programming workflow by
allowing you to check the program’s behavior on *symbolic* inputs.

You can think of a symbolic value as a placeholder for an arbitrary concrete
value of a given type. In other words, the type of the symbolic value is known
at runtime—so, `x`

is an integer—but its concrete interpretation is not. We
don’t know if `x`

is 0, 1, 2, etc.

```
P(x) {
...
...
}
assert safe(x, P(x)) // x is a symbolic integer
```

#### Queries

The solver-aided language determines the meaning of symbolic values such as `x`

by using an SMT to answer *queries* about program behavior. There are four kinds
of basic solver-aided queries.

`verify`

the code against the spec

First, we can use symbolic values to express the classic verification query. To check if our example program $P$ is correct on all inputs, we ask the solver-aided runtime to find an input on which $P$ violates the specification. The runtime does this by encoding the semantics of $P$ as an SMT formula and asking the solver if the formula $\exists x. \neg \mathit{safe}(x, \mathbf{P}(x))$ is satisfiable.

Here, bold $\mathbf{P}$ denotes the encoding of the semantics of $P$; we’ll see in Part 2 how this encoding is produced.

If the verification formula is unsatisfiable, we know there is no input on which $P$ violates its specification, and we are done.

But if the verification formula is satisfiable, the solver will produce a
concrete *counterexample*, an input on which the program fails to meet the
specification.

Let’s suppose that $P$ fails on the input 42 in our example.

`debug`

the code against the spec

Given a failing input, we now want to debug the program.

In the solver-aided programming model, the runtime can tell us why $P$ violates the specification on the input 42 by applying the solver to the formula $x=42 \wedge \mathit{safe}(x, \mathbf{P}(x))$.

Is this formula satisfiable? No! We know that because we ran $P$ on 42 and it failed.

So, given this unsatisfiable formula, we can ask the solver to produce a
*minimal unsatisfiable core*, which is a minimal subformula that is itself
unsatisfiable but becomes satisfiable if any parts are removed. This core, when
mapped back to program text, tells us which expressions need to be changed to
fix the bug. In particular, at least one of the core expressions needs changing
to fix the program.

Let’s suppose that in our case, the core consists of a single expression:

```
P(x) {
v = x + 2 // 'debug' marks x + 2 as the core for x = 42
...
}
assert safe(x, P(x))
```

`solve`

for values from the spec

Given the above core, we can repair the behavior of $P$ on the input 42 by
replacing `x + 2`

with a *non-deterministic choice* expression, and
issuing a `solve`

query. This query causes the runtime to pick a value that
causes $P$ to satisfy its specification on the input 42. The runtime does so by
applying the solver to the formula $\exists v. \mathit{safe}(42,
\mathbf{P}_v(42))$.

```
P(x) {
v = choice() // 'solve' chooses v = 40 for x = 42
...
}
assert safe(x, P(x))
```

The `solve`

query is sometimes referred to *angelic execution*, and it can be
used as an admittedly expensive fallback mechanism for recovering from runtime
failures.

`synthesize`

code from the spec

Finally, if we want to repair the behavior of $P$ on all inputs, we can replace
the buggy expression with `??`

and have the runtime synthesize a fix that works
for all inputs.

```
P(x) {
v = ?? // 'synthesize' replaces ?? with x - 2
...
}
assert safe(x, P(x))
```

These double question marks stand for a *hole* in the program that is to be
filled with some expression $e$ drawn from some finite set of expressions, for
example, all arithmetic expressions with at most two arguments. To find this
expression, the runtime applies the solver the formula $\exists e. \forall x.
\mathit{safe}(x, \mathbf{P}_e(x))$, where $\mathbf{P}_e$ stands for the meaning
of the program $P$ when `??`

is replaced with the expression $e$.

#### The solver-aided programming workflow

To recap, in a solver-aided programming language, we use *assertions* and
*symbolic values* to express the specification. And we then pose *queries* about
program behavior on arbitrary inputs with respect to the specification.

### Solver-aided programming constructs

Now that we’ve seen the basic concepts behind solver-aided programming, let’s see how they are realized in the Rosette language. These concepts and many others are explained in detail in the Rosette Guide.

#### Rosette extends Racket with solver-aided constructs

Rosette is based on an existing programming language called Racket, and it extends Racket with constructs for expressing symbolic values, assertions, and queries.

Racket is “a programming language for creating new programming languages”. It is a modern descendent of Scheme and Lisp that is designed for rapid creation of new languages, and includes dedicated facilities that make this possible (e.g., macros).

You tell the Racket system what dialect or language your program is written in
by providing the `#lang`

line at the top of the file. So, Racket programs start
with `#lang racket`

and Rosette programs start with `#lang rosette`

.

You can freely mix programs and libraries written in different languages, but
it’s best not to mix Racket and Rosette modules until you’ve mastered the
basics. Another good practice for beginners is to write all solver-aided
programs in `#lang rosette/safe`

.

### Rosette symbolic values, assertions, and queries

So let’s see how we create symbolic values, assertions, and queries in Rosette.

#### Creating primitive symbolic values (demo)

Rosette provides two constructs for creating symbolic constants:
`define-symbolic`

and `define-symbolic*`

.

The construct `(define-symbolic id type)`

creates a fresh symbolic constant of
the given `type`

and binds it to the variable `id`

. The `type`

can be boolean,
integer, real, bitvector, or an uninterpreted function over these.

For example:

```
> (define-symbolic x integer?)
> (+ 1 x 2 3)
(+ 6 x)
```

The above code creates a value of type `integer?`

and binds it to the variable `x`

.

Symbolic values of a given type can be used just like concrete values of that
type; here, we’ve applied the plus operator to `x`

and some concrete integers.

One important feature of `define-symbolic`

is that `id`

is bound to the *same*
symbolic constant every time `define-symbolic`

is evaluated, as shown below:

```
> (define (same-x)
(define-symbolic x integer?)
x)
> (same-x)
x
> (same-x)
x
> (eq? (same-x) (same-x))
#t
```

The construct `define-symbolic*`

also creates symbolic values and binds them to variables, but it binds the identifier `id`

to a *different* symbolic constant every time the form is evaluated, as shown below.

```
> (define (new-x)
(define-symbolic* x integer?)
x)
> (new-x)
x$0
> (new-x)
x$1
> (eq? (new-x) (new-x))
(= x$2 x$3)
```

#### Creating complex symbolic values (demo)

Both `define-symbolic`

and `define-symbolic*`

create symbolic values of primitive types, which Rosette calls *solvable types*.

But we can also create values of complex types, such as lists, by using primitive symbolic constants to determine the contents and shape of complex values. For example:

```
> (define-symbolic* xs integer? [4])
> xs
(list xs$0 xs$1 xs$2 xs$3)
> (define-symbolic* len integer?)
> (take xs len)
{[(= 0 len$0) ()]
[(= 1 len$0) (xs$0)]
[(= 2 len$0) (xs$0 xs$1)]
[(= 3 len$0) (xs$0 xs$1 xs$2)]}
```

The first line creates a list of 4 symbolic integers; this is just a short-hand for evaluating `define-symbolic*`

4 times and collecting the results into a list.

The fourth and fifth line are more interesting: they create a symbolic list of length up to 4, consisting of symbolic integers.

So, by using symbolic constants as the list contents and to determine the list length, we have created a symbolic value that represents the infinite set of *all* lists of 0-4 integers.

`assert`

(demo)

The form `(assert expr)`

checks that `expr`

evaluates to a true value, e.g.:

```
> (assert (>= 2 1)) ; passes
> (assert (< 2 1)) ; fails
assert: failed
```

Note that in Racket, any value other that `#f`

is considered true, similar to C.

When `expr`

is symbolic though, instead of being immediately evaluated, the resulting value gets added to the assertion store. Its meaning (true or false) is eventually determined by the solver in response to queries.

```
> (define-symbolic* x integer?)
> (assert (>= x 1))
> (asserts)
(list (<= 1 x$0) ...)
```

Now that we know how to create symbolic values and assertions, let’s see how to issue solver-aided queries.

`verify`

(demo)

Consider the following program:

```
#lang rosette
(define (poly x)
(+ (* x x x x) (* 6 x x x) (* 11 x x) (* 6 x)))
(define (fact x)
(* x (+ x 1) (+ x 2) (+ x 2)))
(define (same p f x)
(assert (= (p x) (f x))))
(same poly fact 0) ; pass
(same poly fact -1) ; pass
(same poly fact -2) ; pass
```

The function `poly`

implements a polynomial and `fact`

is a factorization of
`poly`

. We can run a few test cases, on the zeros of the polynomial to see that
it does the right thing.

But do `poly`

and `fact`

produce the same output on all infinitely many inputs?

To determine this, we use the `(verify expr)`

query, which searches for a
binding of symbolic constants to concrete values that causes at least one
assertion in `expr`

to fail. (Note that this means `expr`

should contain some
assertions or verification will trivally succeed!)

Here is how we issue the query for our example:

```
> (define-symbolic i integer?)
> (verify (same poly fact i))
(model [i -6])
```

As you can see, the two procedures aren’t equivalent on all inputs, and the solver returns an input, -6, on which they disagree.

We can store bindings, or *models*, in variables and evaluate arbitrary
expressions against them, e.g.:

```
> (define cex (verify (same poly fact i)))
> (evaluate i cex)
-6
> (poly -6)
360
> (fact -6)
480
```

Another important thing to note is that the assertions encountered while
evaluating `expr`

are removed from the asserts store once a query (such as
`verify`

) completes.

```
> (asserts)
(list)
```

`debug`

(demo)

Why do `poly`

and `fact`

output different values on the input -6?

To find out, we issue the `(debug [type ...+] expr)`

query, which searches for a
minimal set of expressions of the given `type`

s that cause the evaluation of
`expr`

to fail.

To use `debug`

, require the debugging libraries, mark `fact`

as the candidate
for debugging, *save* the module to a file, and issue a `debug`

query.

```
(require rosette/query/debug rosette/lib/render)
(define (poly x)
(+ (* x x x x) (* 6 x x x) (* 11 x x) (* 6 x)))
(define/debug (fact x)
(* x (+ x 1) (+ x 2) (+ x 2)))
(define (same p f x)
(assert (= (p x) (f x))))
> (render ; visualize the result
(debug [integer?]
(same poly fact -6)))
```

```
(define/debug (fact x)
(*
```

x `(+ x 1)`

(+ x 2) (+ x 2) `))`

Rosette will print all the `define/debug`

forms, using red to highlight the buggy expressions. Everything that’s grayed out cannot be used to repair the program on the input -6. That is, whatever expressions we wrote instead of the non-highlighted expressions, the result would still be incorrect. To repair the program, we therefore need to change at least one of the highlighted expressions.

`solve`

(demo)

Let’s see if we can repair `fact`

on the input -6 as suggested by `debug`

.

To do so, we issue the `(solve expr)`

query, which searches for a binding of
symbolic constants to concrete values that causes all assertions in `expr`

to
pass. (Note that this dual of verification.)

```
(define (poly x)
(+ (* x x x x) (* 6 x x x) (* 11 x x) (* 6 x)))
(define (fact x)
(define-symbolic* c1 c2 c3 integer?)
(* (+ x c1) (+ x 1) (+ x c2) (+ x c3)))
(define (same p f x)
(assert (= (p x) (f x))))
> (solve (same poly fact -6))
(model [c1$0 -66] [c2$0 7] [c3$0 7])
```

The solver finds concrete values for `c1`

, `c2`

, and `c3`

that work for the
input -6.

Can we repair `fact`

on multiple inputs individually? Yes, we just solve `same`

for multiple inputs:

```
> (solve
(begin
(same poly fact -6)
(same poly fact 12)))
(model [c1$1 -66] [c2$1 7] [c3$1 7]
[c1$2 2508] [c2$2 -11] [c3$2 -11])
```

Note the behavior of `define-symbolic*`

here: we are getting separate `c1`

,
`c2`

, and `c3`

values for each of the two inputs.

But can we repair `fact`

on multiple inputs simultaneously? Yes! Just replace
`define-symbolic*`

with `define-symbolic`

:

```
(define (fact x)
(define-symbolic c1 c2 c3 integer?)
(* (+ x c1) (+ x 1) (+ x c2) (+ x c3)))
> (solve
(begin
(same poly fact -6)
(same poly fact 12)))
(model [c1 2] [c2 3] [c3 0])
```

So, now we have some constants that work for both of these inputs. Can we find some constants that work on all inputs?

`synthesize`

(demo)

To find out, we issue the query `(synthesize `

`#:forall expr`

`#:guarantee expr)`

,
which searches for a binding that causes all assertions in `#:guarantee expr`

to
pass for all bindings of the symbolic constants in the `#:forall expr`

.

The solver finds concrete values for `c1`

, `c2`

, and `c3`

that work for every input `i`

.

```
(define (poly x)
(+ (* x x x x) (* 6 x x x) (* 11 x x) (* 6 x)))
(define (fact x)
(define-symbolic c1 c2 c3 integer?)
(* (+ x c1) (+ x 1) (+ x c2) (+ x c3)))
(define (same p f x)
(assert (= (p x) (f x))))
> (define-symbolic i integer?)
> (synthesize
#:forall i
#:guarantee (same poly fact i)))
(model [c1 3] [c2 0] [c3 2])
```

To generate code instead of a model (demo), require the sketching library, *save* the module to a file, and issue a synthesize query.

```
(require rosette/lib/synthax)
(define (poly x)
(+ (* x x x x) (* 6 x x x) (* 11 x x) (* 6 x)))
(define (fact x)
(* (+ x (??)) (+ x 1) (+ x (??)) (+ x (??))))
(define (same p f x)
(assert (= (p x) (f x))))
> (define-symbolic i integer?)
> (print-forms
(synthesize
#:forall i
#:guarantee (same poly fact i)))
'(define (fact x) (* (+ x 3) (+ x 1) (+ x 0) (+ x 2)))
```

### Rosette gotchas

So, these are the basics. We’ll now briefly review three most common problems people encounter when programming in Rosette.

#### Reasoning precision (demo)

Reasoning precision determines if integers and reals are approximated using $k$-bit words or treated as infinite-precision values.

It’s controlled by setting `current-bitwidth`

to an integer $k > 0$ or `#f`

for
approximate or precise reasoning, respectively.

By default, `current-bitwidth`

is set to `#f`

, giving the expected semantics.

```
> (current-bitwidth)
#f
> (define-symbolic x integer?)
> (solve (assert (= x 64)))
(model [x 64])
> (verify (assert (not (= x 64))))
(model [x 64])
```

What happens when we set `current-bitwidth`

to 5?

```
> (current-bitwidth 5)
> (solve (assert (= x 64)))
(model [x 0])
> (verify (assert (not (= x 64))))
(model [x 0])
```

The result is 0 because 64 is equal to 0 in 5-bit precision!

#### Unbounded loops

Another common source of confusion is the behavior of loops and recursion in the presence of symbolic values.

Specifically, loops and recursion must be bounded (aka *self-finitizing*) by
concrete termination conditions, or upper bounds on the size of iterated
(symbolic) data structures.

Unbounded loops and recursion run forever.

##### Self-finitizing code (demo)

Here is an example of self-finitizing code:

```
(define (search x xs)
(cond
[(null? xs) #f]
[(equal? x (car xs)) #t]
[else (search x (cdr xs))]))
> (define-symbolic xs integer? [5])
> (define-symbolic xl i integer?)
> (define ys (take xs xl))
> (verify
(when (<= 0 i (- xl 1))
(assert (search (list-ref ys i) ys))))
(unsat)
```

The procedure `search`

takes as input a value and a list, and returns true if the list contains the value; otherwise returns false.

The code terminates because `search`

iterates over a bounded structure.

##### Unbounded recursion (demo)

Now consider the following example:

```
(define (factorial n)
(cond [(= n 0) 1]
[else (* n (factorial (- n 1)))]))
> (factorial 3)
6
> (factorial 4)
24
> (define-symbolic k integer?)
> (solve (assert (> (factorial k) 10)))
```

The above query runs forever because the termination of `factorial`

termination
depends on `k`

, which is symbolic.

To fix this (demo), we will artifically bound the maximum number of iterations of
`factorial`

using a concrete guard:

```
(define (factorial n g)
(assert (>= g 0))
(cond
[(= n 0) 1]
[else (* n (factorial (- n 1) (- g 1))]))
> (define-symbolic k integer?)
> (solve (assert (> (factorial k 3) 10)))
(unsat)
```

Now we get `unsat`

because there is no way to satisfy the assertion with the
bound of 3 (or less).

Repeating the same query with a larger bound produces a model:

```
> (solve (assert (> (factorial k 4) 10)))
(model [k 4])
```

#### Unsafe features (demo)

The final pitfall is the use of so-called *unsafe features*.

Rosette *lifts* only a core subset of Racket to operate on symbolic values.
This includes all constructs in `#lang rosette/safe`

Unlifted constructs can be used in `#lang rosette`

but require care: the
programmer must determine when it is okay for symbolic values to flow to
unlifted code.

For example, vectors are lifted, and they behave correctly in the presence of symbolic values:

```
> (define v (vector 1 2))
> (define-symbolic k integer?)
> (vector-ref v k)
(ite* (⊢ (= 0 k) 1) (⊢ (= 1 k) 2)))
```

But hash tables aren’t lifted, and they behave incorrectly in the presence of symbolic values.

```
> (define h (make-hash '((0 . 1)(1 . 2))))
> (hash-ref h k)
hash-ref: no value found for key
key: k
> (hash-set! h k 3)
> (hash-ref h k)
3
```

These gotchas are easy to avoid by staying in the safe fragment of Rosette,
`#lang rosette/safe`

, when you’re starting out. But as you get familiar with the
programming model and idioms, you can also avoid them in `#lang rosette`

programs, while benefiting from the full power of Racket constructs and
libraries.