This is a synopsis of Daniel's presentation of his C/C++ analysis tools.
You have an analysis and you want to apply it to real world software, much of which is written in C or C++.
You download the project you want to analyze from the internet. Your first problem is that you need the pre-processed .i files, not the .c or .cc files; the solution to this is to use our tool "build interceptor".
Second, you need to parse and typecheck C or C++; the solution to this is to use "Elsa". If you want a dataflow analysis, a client project of Elsa is "Oink". Oink also provides a linker-imitator for whole-program analysis.
Oink is organized to be a framework in which many analyses collaborate, so if you write your analysis in two parts 1) to annotate the AST and types, and 2) then make conclusions from those annotations, then your analysis will be able to combine with others in the same framework: each can be run in turn and then you can make conclusions from the annotations of both. You can even have your analysis shipped with the Oink distribution.
Third, you need to debug your analysis on often large inputs. This often involves minimizing a large input which exhibits a bug in your analysis. Our tool "delta" tool helps greatly with this.
See http://www.cs.berkeley.edu/~dsw/ for build interceptor, oink and delta. See http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/ for Elsa.
Build Interceptor captures the .i files of any project while it is
built from source using the gcc tool-chain. Anyone who has tried this
on a large scale will find out that it is non-trivial to build a
project from source and obtain the .i files generated during the build
process. I give step-by-step instructions on how to use the provided
scripts to do this without *any* modification to the build process of
the project you are trying to capture.
These scripts were used to capture the build process of 92.5% of the projects in the Red Hat Linux 7.3 distribution, resulting in the RH7.3.i dataset. This dataset is handy for getting started on debugging your analysis without having to first use build interceptor to capture data.
Elsa is a C and C++ parser. It is based on the Elkhound parser
generator. It lexes and parses input C/C++ code into an abstract
syntax tree. It does some type checking, in the interest of
elaborating the meaning of constructs, but it does not (yet?) reject
all invalid programs. The only major C++ features still not
implemented are namespaces and template partial specializations.
Oink is a collection of composable backends for the Elsa C and C++ front-end. Oink computes both 1) expression-level and type-level dataflow, and 2) statement-level intra-procedural control-flow [by delegating to Elsa]. Oink also comes with a client of the dataflow analysis that does type qualifier inference: Cqual++, a C/C++ front-end for Cqual. Whole-program analyses may be attempted using the linker imitator.
Delta assists you in minimizing "interesting" files subject to a test
of their interestingness. A common such situation is when attempting
to isolate a small failure-inducing substring of a large input that
causes your program to exhibit a bug. Our implementation is based on
the algorithm of the Delta Debugging project at Saarland University.