ALPHABET

ALPHABET: Analyzing Large Programs with Help of Assertions (Before of during Execution Time)

The problem. It is well known that compile-time analysis of interesting program properties---such as whether two pointers may alias to the same object---is in general undecidable, and that existing analyzers are either scalable or (relatively) accurate, but not both. The long-standing problem of how to accurately analyze large programs is more important now than ever, mainly due to the needs of software-verification tools that---if supported with good analyzers---could help us conquer the growing debugging complexity of large software projects.

The quest for a satisfactory analyzer is even more challenging in the run-time setting, where some modern languages like Java are compiled. Because run-time analysis is subject to stringent time constraints, whole-program analysis is typically avoided, which is one reason why current Java VMs are not equipped with an accurate pointer analyzer and other indispensable tools needed for effective compilation. Poorly compiled, safe modern languages remain less attractive than the error-prone C/C++.

Our Approach. A well-established approach for scaling an accurate analyzer has been to come up with smart decompositions of the problem. Since this strategy has been only partially successful in reducing the analysis time, our project follows the opposite approach---rather than scaling an accurate analyzer, we improving the accuracy of an already scalable method, by selectively refining its results using another, more expensive analysis. The key requirement is to invoke the expensive analysis only when the scalable one fails to provide an accurate answer (i.e., to guarantee either the yes
or the no answer), and do so only on program statements that made it fail. Unfortunately, finding these failure statements is an open problem, as is the problem of how to paraphrase the original query in terms of these statements.

We seek to solve both problems with refinement assertions, which are synthesized by the scalable analysis when it fails, to be answered by the expensive analysis. A refinement assertion is an expression (over the failure statements or their variables) whose value answers the original query. For example, when asked can p ever be null?, the scalable analysis may answer: I don't know, but if the statement 'r:=t' is never executed (the refinement assertion), then p is indeed never null. Depending on the setting, refinement assertion can be exploited in one of two ways:

Off-line analysis for software-quality tools. As described above, the synthesized assertion can be handed to a more powerful analyzer when the scalable one fails. In this setting, the goal is to synthesize assertions that minimize the running time of the accurate analysis.
Run-time analysis for virtual machines. Rather than analyzing the assertion, we can turn it into a run-time check.
Evaluating the assertion on the concrete program state enables optimistic program optimizations: we carry out the
optimization assuming the run-time check will always hold; when it fails, the execution will undo or bypass the invalidated transformation. In this setting, the goal is to synthesize assertions whose run-time evaluation cost will be offset by the optimization benefit.

Results

ABCD: Eliminating Array-Bounds Checks on Demand. ABCD, presented in our PLDI'00 paper, is our first assertion analyzer, working in the run-time setting. Its goal is to help efficiently implement type-safe languages, by removing run-time tests of array-bounds violations---one of the largest overheads mandated by type safety. Determining if a bounds test is unnecessary is expensive because a large fraction of the program may need to be examined. However, thanks to synthesizing a run-time check whenever the analysis would like to ``crawl'' too far, ABCD is largely insensitive to the program size. For example, to remove a test i<length that is frequently executed in a loop, ABCD may need to know whether n<length holds at the procedure entry; rather than proving this assertion via analysis, guided with a program profile, ABCD turns the assertion into a run-time check that has small execution cost and saves a lot of analysis time.

With its combination of efficiency, effectiveness, and simple implementation, ABCD is likely the best array-test optimizer for virtual machines. In contrast to conventional wisdom that a powerful bounds-tests optimizer requires a powerful theorem prover, ABCD relies on a simple traversal of a sparse program representation; to analyze one test, it visits only about 10 nodes. Despite its simplicity, ABCD is surprisingly powerful: it can remove about 85% of all executed bounds tests, nearly all of compiler-removable tests.

People

Denis Gopan
Rastislav Bodik
Jong-Deok Choi
Rajiv Gupta
Vivek Sarkar

Papers

ABCD: Eliminating Array-Bounds Checks on Demand [PLDI'00]

Funding

NSF
IBM
Microsoft