• The technique only appears to work for sparse matrices, and it is not clear it can be extended to more complicated data structures like those used in adaptive mesh refinement. ... It would be useful to extend the system to allow specification and generation of arbitrary parallel data structures. For example, a system could take as input a parallel program over a naively distributed data structure and a specification for how to distribute it intelligently, and automatically convert it into a parallel program over an intelligently-distributed data structure. Amir
  • Extension of this work to domains that have recieved less attention than matrix multiplication is promising. I don't think it makes sense to confine the fragmentation relation, \delta, to parallel implementations. If you ignore locality on a modern uniprocessor, it is difficult to implement matrix multiplication that performs better than 1-10% of peak processor performance on large (in-memory) matrices. In contrast, locality aware algorithms often run at >90% of peak. It might be interesting to consider this work from the perspective that sequential and parallel implementations should be treated identically until code is translated into C. Rusty
  • An obvious extension is to carry this forward to a more mainstream language (e.g. C). The same techniques could also be used to generate parallel code from sequential code in a broader scope (not necessarily sparse matrices). Liviu
  • The good part of this paper seems to resemble the good parts of the other papers we've read in this class: the realization that the specifics of a particular domain are limiting its optimization/maintenance opportunities, and the proposal of a more abstract, usable, high-level representation from which these properties can be more easily derived. In this case (or, it seems, in the case of a previous paper by the authors, [5]), that abstraction is a relation notation for matrix operations. ... It wasn't clear to me how much of this system was automated. Is the relational abstraction mostly useful so that the user can identify smarter optimizations himself? The example (2.6) was particularly confusing; a lot of "we cans" just makes me wonder how much of this optimization technique needs to be done by a smart person who may have known the optimization before abstracting the code in the first place. AJ
  • The major problem is perhaps one that could be raised for any synthesis system that has not actually been widely adopted; that is, does it actually do anything that can't be done just as easily with a good library. From the experiments in the thesis, it appears that the performance is comparable to that obtained by programming against the BlockSolve library. The question is, can the techniques of this paper be applied to new code to make it run faster than some equivalent with BlockSolve, or that would require a lot of manual effort to write with BlockSolve? If not, then it would seem the paper is not an improvement over BlockSolve in practice. This problem isn't specific to this paper, rather it is a general question when a synthesizer is really needed over a library, especially because the effort to produce the synthesizer is generally greater. Dave
  • While this paper provides a clean abstraction of the matrix storage formats as access methods, it neglects to mention the performance consequences that may result from such an abstraction. For example, since the matrix data layout is hidden under the search/enum/deref interfaces, programmers would no longer be able to apply cache optimizations. While the paper provides examples on how the compiler could perform gather and scatter optimizations, it is unclear whether such automatic optimizations would apply to different algorithms. Wei
  • I would argue that this paper is a nice complementary to the previous survey of transformational synthesis: while the latter is concerned with the high-level specification of semantics and functionality (but is oblivious to complexity considerations and cost models), the former
    addresses specification of complicated data structures and their implementation characteristics (but its expressive power seems to be restricted when it comes to describing, for instance, a recursive
    computation). Gilad
  • I also don't have a feeling for how close they are to exhausting the space of possible optimizations for this type of code. The future work section hints that much remains to be done, but I wish they had at least formally characterized the types of programs for which they are confident that they can generate very efficient code. I'm not sure how simple the programming model of relational algebra will be to programmers in these domains. I would have liked to see more of the syntax used to specify the relations. Again, most of these criticisms might have been addressed in a longer paper. Manu
  • The method provides a way to generate sparse matrix code for different storage formats and for parallel execution. A relational query can be used to specify execution loops, which could be straightforward for many users. It appears to be scalable to large computational programs that have regular execution loops. The approach could be used generally for sparse matrix based computation programs that contain "Do all" loop constructs. Mark
  • Solving problems using relational approach can only be applied in very few domains of problems. Cindy
  • Another idea is whether some crazy variation on this system could be
    used to mitigate the 'object-relational impedance mismatch'? Imagine
    that we have a well-defined Java object schema. Now, let us define a
    'language' for operating on these objects in an object-relational style.
    This 'language' could be a subset of Java that is easier to analyze, SQL
    with extra Java-like junk, or anything else. (Hopefully something
    familiar, to ease adoption.) The desired characteristics of the language
    are (a) that OO-type operations (jungloids, method calls, pointer
    traversals) are easy to write, (b) relational-type operations can be
    written in a declarative-ish style instead of yucky nested loops (maybe
    this is not so important), and (c) the code written will be independent
    of the storage location of the objects. The system will then synthesize
    a program that a mixture of OO code and database queries to implement
    the functionality efficiently. It would work by translating the OO parts
    into relational operators, resulting in a relational algebra tree, and
    would generate an execution plan that uses both RDBMS and Java
    operations. One benefit would be that it could push functionality back
    and forth between Java and RDBMS as necessary. For example, say that an
    object has a field that is stored directly in the DB. It also has a
    getter that gets a computed value based on that field. If the user
    writes queries that search for things based on the indexed value, the
    system has the choice of (a) reading the whole table and filtering in
    Java using the accessor, (b) computing the inverse function of the
    accessor and filtering in SQL, (c) materializing the result of the
    accessor function in the DB and indexing on it, and (d) generating a
    function in the RDBMS extension language and filtering 'manually' on the
    server. It is not even important for the system to determine the best
    choice fully automatically; if the user can choose which plan based on a
    simple declaration (a la sketching), it would be of great practical
    benefit. Dave