Specification Mining

Specification Mining: Learning Specifications of Software Correctness

The problem. Formal verification is a promising alternative to software testing, the current method of choice for software quality assurance. Unlike testing---which does not scale because new test cases must be created as the code grows---verification can check the same correctness property (e.g., whether each acquired lock is eventually released) against the entire code. Even more importantly, verification examines all execution paths of the program, providing a guarantee that the program is entirely free of the specified type of error.

Recent research results promise that practical verifiers will be soon available. These tools, however, will be of little use without enough correctness specifications to be verified. Where these specifications will come from is an open question: While it may be economical to formulate broadly applicable language-specific properties (e.g., a dereferenced pointer must not be null), complex properties that are highly specific to a single program (e.g., how to rotate a pink-purple tree) may never be formalized, partly due to programmers' reluctance to embrace formal methods, and partly because there are too many properties to be specified. Consequently, we expect that software verification will be widely adopted only if we develop techniques that will reduce the cost of formulating specifications.

Our approach. To supply future verifiers with useful non-trivial properties, we are developing a (semi-)automatic machine-learning approach to discovering formal specifications of the protocols that code must obey when interacting with an application program interface (API) or abstract data type (ADT). Ours is the first system capable of learning rather general protocols such as ``A socket x may be read only after it was accept-ed from a socket y that had been first bind-ed and then listen-ed.'' In general, our specifications encode legal partial orders of API calls, together with data-flow constraints on their arguments, encoded as (potentially cyclic) state machines.

Our miner follows a (probabilistic) maxim that ``Common behavior is correct behavior.'' Starting from the assumption that a working program is debugged well enough to reveal strong hints of correct protocols, it infers a specification by observing dynamic program execution and concisely summarizing the frequent API interaction patterns as state machines. After these state machines are examined and refined by an expert, they can be utilized by automatic
verification tools, to find bugs.

Specification mining offers several unique benefits:

It exploits the massive programmers' effort that is reflected in the code (and nowhere else). Typically, whenever programmers resolve ambiguous requirements, incomplete API documentation, or target-dependent API quirks, they document the results of their effort only implicitly, in the form of the code itself. Specification mining can extract that information.
It provides (some) benefits of redundant programming, without the need to develop multiple versions of code. Exposed to a large code base, the miner can collect and summarize API protocols that are followed by various programmers (some of them wrong), effectively taking their vote on which protocol is correct.
It offers convenience and insights. It may often be easier to approve a mined formal specification than to write one, either from scratch or from an ambiguous API documentation. Furthermore, mining may discover specifications surprising to the programmer himself.

Results

Specification mining by reduction to learning regular languages. Specification mining is not data mining; they learn different properties, and require different algorithms. Our POPL'02 paper formulated the specification mining problem, and developed an algorithm based on a interesting reduction to the
simpler problem of learning regular languages. To be robust in the presence of bugs in the training set, the algorithm follows two design decisions: First, to attack the problem that bugs might be learned into the resulting specification, it learns not from the source code itself, but from traces of executions, which contain fewer bugs.

Second, to take a ``vote'' on what the common program behavior is, the miner learns a probabilistic regular language, whose high-probability core encodes the frequently followed protocol. The approach promises to be robust: even a poorly tuned miner prototype was able to learn specifications that not only captured the correct protocol, but also discovered serious bugs.

People

Glenn Ammons
David Mandelin
Prof. Rastislav Bodik
Dr. James Larus

Papers

Mining Specifications [POPL'02a]

Funding

Microsoft
IBM