Quala: Custom Type Systems for Clang

August 17, 2014

Quala is an open-source research tool I’ve been working on to implement pluggable type systems for C and C++. The idea is to make it easy for researchers to add new type qualifiers and typing rules to Clang as plugins, without getting their hands dirty by modifying Clang itself. In the same way that loadable LLVM passes can extend the compiler with optimizations and analyses, pluggable Clang type systems should make it possible to prototype ideas that need compiler–programmer collaboration.

Catching Null-Pointer Dereferences

Quala is inspired by UW’s Checker Framework for Java, so it seemed appropriate to try re-implementing a simplistic port of its flagship nullness checker. The nullness type system for Quala helps avoid confusion between pointers that always point to something and pointers that can “meaningfully” be null—and which therefore have to be checked before they can be dereferenced. As the CF docs put it, a program that emits no warnings is guaranteed never to dereference a null pointer.

By default, all pointers are considered non-null and the type system emits a warning whenever you try to nullify one:

int *p = 0;  // WARNING
int *q = nullptr;  // EXTRA-SPECIAL C++11 WARNING

To allow a pointer to be null, you have to mark it as nullable:

int * NULLABLE r = 0;  // ok
p = r;  // WARNING

That NULLABLE qualifier is a little macro that expands to a type-annotation attribute recognized by my small patch to Clang:

#define NULLABLE __attribute__((type_annotate("nullable")))

That type_annotate syntax extension, along with Quala’s fledgling type-checker support library, makes it easy to implement the typing rules in a plugin that the programmer can enable with a few compiler flags.

The nullness type system highlights the contrast between type-qualifier annotations and simpler declaration annotations. The Clang analyzer also has a null-pointer dereference analysis—Quala’s nullness system can be seen as a sound alternative to the Clang analyzer’s heuristic. Both Clang and GCC have a nonnull attribute, but it’s awkward and limited: it can only appear on declarations, not on types. For example, there’s no way to distinguish which level of pointerdom in an int** value is non-null. In Quala, you can even use annotations in typedefs:

typedef int * NULLABLE int_ptr_or_null;

Quala’s “real” type annotations can improve the annotation systems that Clang already provides in simpler forms. This presentation from 2011, for example, wished for them in the design of Clang’s thread safety analysis annotations.

Coming Soon

This is only the first milestone for Quala. The example type systems—nullness and another example demonstrating information flow tracking—are really just toys at this point. There’s a lot more work to come.

Qualifiers in the IR

The reason I’m writing Quala—rather than just using the Checker Framework or even Cqual—is to connect type annotations to LLVM IR. For many research projects I want to do, I need serious compiler analysis and instrumentation driven by custom types: tasks for which AST-level tools are nightmarish debacles. Representing types in a friendly, regular compiler IR like LLVM’s will make this kind of work sane.

The nullness system, for example, could benefit from type-directed automatic insertion of null checks. With an LLVM compiler pass that adds a run-time check wherever a nullness warning is emitted, a programmer could conservatively eliminate all undefined behavior due to null dereferences. This kind of instrumenation would be invasive and fragile at the source or AST level but is trivial at the LLVM IR level—provided type information is available.

Data-Flow Type Rules

One important Checker Framework feature worth replicating is flow-sensitive type inference. It’s crucial for the nullness checker to avoid obvious cases where warnings should not be emitted. Namely, when you use an explicit null check:

int * NULLABLE p;
...
if (p) {
    *p = 3;  // fine!
}

the type system should not complain. It should be able to refine the type of p to non-nullable inside the true branch of the if.

Since rules like this depend on data-flow properties, they seem well-suited to implementation at the LLVM IR level. This presents an interesting challenge: how should an AST-level type checker employ facts from an IR-level analysis?

C and C++ Misfortunes

Even the tiny prototype systems I’ve implemented so far have needed to grapple with unfortunate, essential complexities in C and C++. For example, arbitrarily deep chains of pointers make subtyping relationships more complicated than they are in a pointer-free language like Java. C++ references further muddy the waters. Clearly, binding a reference to a nullable pointer to a non-null pointer is unsound:

int *q;
int * NULLABLE &p = q;
p = NULL;  // mutation ruins everything

but binding a const reference is probably safe. C++ template parameters pose another practical challenge (as others have also noticed). So does overloading. This long tail of language complexity will only be addressed by an equally long journey of iteration.