The problem. It is well known that compile-time analysis of interesting
program properties---such as whether two pointers may alias to the same object---is in general undecidable, and that existing analyzers
are either scalable or (relatively) accurate, but not both. The long-standing problem of how to accurately analyze large
programs is more important now than ever, mainly due to the needs of software-verification tools that---if supported with good
analyzers---could help us conquer the growing debugging complexity of large software projects.
The quest for a satisfactory analyzer is even more challenging in the run-time setting, where some modern languages like
Java are compiled. Because run-time analysis is subject to stringent time constraints, whole-program analysis is typically avoided, which is one
reason why current Java VMs are not equipped with an accurate pointer analyzer and other indispensable tools needed for effective
compilation. Poorly compiled, safe modern languages remain less attractive than the error-prone C/C++.
Our Approach. A well-established approach for scaling an accurate analyzer has been to come up with smart
decompositions of the problem. Since this strategy has been only partially successful in
reducing the analysis time, our project follows the opposite approach---rather than scaling an accurate analyzer, we improving the
accuracy of an already scalable method, by selectively refining its results using another, more expensive analysis. The key requirement
is to invoke the expensive analysis only when the scalable one fails to provide an accurate answer
(i.e., to guarantee either the yes
or the no answer), and do so only on program statements that made it fail. Unfortunately, finding these
failure statements is an open problem, as is the problem of how to paraphrase the
original query in terms of these statements.
We seek to solve both problems with refinement assertions, which are synthesized by the scalable analysis when it fails, to be
answered by the expensive analysis. A refinement assertion is an expression (over the failure statements or their variables) whose value
answers the original query. For example, when asked can p ever be null?, the scalable analysis may answer: I don't
know, but if the statement 'r:=t' is never executed (the refinement assertion), then
p is indeed never null. Depending on the setting, refinement assertion can be exploited in one
of two ways:
ABCD: Eliminating Array-Bounds Checks on Demand. ABCD, presented
in our PLDI'00
paper, is our first assertion analyzer, working in the run-time setting. Its goal is to help efficiently
implement type-safe languages, by removing run-time tests of array-bounds violations---one of the largest overheads mandated by
type safety. Determining if a bounds test is unnecessary is expensive because a large fraction of the program may need to be examined. However,
thanks to synthesizing a run-time check whenever the analysis would like to ``crawl'' too far, ABCD is largely insensitive to the program
size. For example, to remove a test i<length that is frequently executed in a loop, ABCD may need to know whether
n<length holds at the procedure entry; rather than proving this assertion via analysis, guided with a program profile, ABCD turns
the assertion into a run-time check that has small execution cost and saves a lot of analysis time.
With its combination of efficiency, effectiveness, and simple implementation, ABCD is likely the best array-test optimizer for
virtual machines. In contrast to conventional wisdom that a powerful bounds-tests optimizer requires a powerful theorem prover, ABCD relies
on a simple traversal of a sparse program representation; to analyze one test, it visits only about 10 nodes. Despite its
simplicity, ABCD is surprisingly powerful: it can remove about 85% of all executed bounds tests, nearly all of compiler-removable tests.