• One thing that I was wondering about was how the algorithms here scale with the size of the CF grammar. I guess the running time figures
    assume a fixed grammar. This might be fine for most problems encountered in practice, but from my understanding of the balanced-parentheses grammar, the number of non-terminals (in the normalized form with rule right-hand sides of length at most 2) seems to grow in proportion to the number of call sites, so scaling information could be important.
  • The paper does not mention that the "standard" algorithm for CFL-reachability is not scalable, as it requires O(N^2) space. While CFL-reachability is a nice theoretical framework, I'm not sure there has been much work on efficient algorithms for solving the general CFL-reachability problem. Also, the presentation of the exploded supergraph sort of obscures how their efficient algorithm for interprocedural dataflow analysis actually works, and makes it seem more complicated than it actually is.
  • The paper explains that the interprocedural data-flow analysis algorithm described above takes time N^3 * D^3, where N is the number nodes in the standard CFG and D is the size of the set of data-flow facts. However, the paper also explains that the analysis can be done in time E * D^3, where E is the number of edges in the supergraph. Not knowing that algorithm, it's not clear to me whether it's faster because (a) it solves CFR faster using optimizations valid for balanced parentheses grammars, (b) it constructs a different graph, or (c) it's just completely different and doesn't particularly use CFR.
  • Interprocedural dataflow analysis could be used to enforce a system of simple user provided annotations specifying the temporal safety properties of library calls, for example locking protocols or get_range()/set_range() ala lightweight recoverable virtual memory. The existence of a general framework for computing interprocedural dataflow analyses would ease the implementation of such a system.
  • There is a large class of dataflow analysis problems that are not suited to this framework. Analyses that model how program computes generally do not have
    distributive dataflow functions. The D^3 term in the running time can be large in practice.
  • The techniques don't appear to be useful for languages with non-trivial control flow or dynamic binding. It may not be possible to construct a graph G for these languages, and even for imperative languages it's not obvious how function pointers can be accommodated. This is not, however, a limitation of this particular approach.
  • Although CFL-reachability has O(n^3) complexity in the general case, certain analysis problems can be solved faster due to the way the graph is constructed. This is the real contribution of the paper. \
  • The cubic running time of the general case CFL-reachability problem is a bit problematic. The paper mentions that some problems are subsets of this problem, and therefore asymptotically easier. A discussion of language features that are the root of the algorithms' complexity would have been helpful. For instance, if a program does not make use of recursion, or can be translated into a program that does not make use of recursion, then it would be possible to 'inline' each function invocation. In this case, linear algorithms for graph reachability could replace the cubic algorithms mentioned in the paper with no loss of precision or soundness. However, such a technique would increase the size of the program being analyzed significantly. Even if this simple optimization is not worthwhile, are there other program properties that could be exploited?