- The technique only appears to
work for sparse matrices, and it is not clear it can be extended to more
complicated data structures like those used in adaptive mesh refinement.
...
It would be useful to extend the system to allow specification and
generation of arbitrary parallel data structures. For example, a system
could take as input a parallel program over a naively distributed data
structure and a specification for how to distribute it intelligently,
and automatically convert it into a parallel program over an
intelligently-distributed data structure. Amir
- Extension of this work to domains that have recieved less attention than matrix multiplication is promising. I don't think it makes sense
to confine the fragmentation relation, \delta, to parallel
implementations. If you ignore locality on a modern uniprocessor, it is
difficult to implement matrix multiplication that performs better than
1-10% of peak processor performance on large (in-memory) matrices. In
contrast, locality aware algorithms often run at >90% of peak. It might
be interesting to consider this work from the perspective that
sequential and parallel implementations should be treated identically
until code is translated into C. Rusty
- An obvious extension is to carry this forward to a more mainstream
language (e.g. C). The same techniques could also be used to generate
parallel code from sequential code in a broader scope (not necessarily
sparse matrices). Liviu
- The good part of this paper seems to resemble the good parts of the
other papers we've read in this class: the realization that the specifics
of a particular domain are limiting its optimization/maintenance
opportunities, and the proposal of a more abstract, usable, high-level
representation from which these properties can be more easily derived. In
this case (or, it seems, in the case of a previous paper by the authors,
[5]), that abstraction is a relation notation for matrix operations. ... It wasn't clear to me how much of this system was automated. Is the
relational abstraction mostly useful so that the user can identify smarter
optimizations himself? The example (2.6) was particularly confusing; a lot
of "we cans" just makes me wonder how much of this optimization technique
needs to be done by a smart person who may have known the optimization
before abstracting the code in the first place. AJ
- The major problem is perhaps one that could be raised for any
synthesis system that has not actually been widely adopted; that is,
does it actually do anything that can't be done just as easily with a
good library. From the experiments in the thesis, it appears that the
performance is comparable to that obtained by programming against the
BlockSolve library. The question is, can the techniques of this paper be
applied to new code to make it run faster than some equivalent with
BlockSolve, or that would require a lot of manual effort to write with
BlockSolve? If not, then it would seem the paper is not an improvement
over BlockSolve in practice. This problem isn't specific to this paper,
rather it is a general question when a synthesizer is really needed over
a library, especially because the effort to produce the synthesizer is
generally greater. Dave
- While this paper provides a clean abstraction of the matrix storage
formats as access methods, it neglects to mention the performance
consequences that may result from such an abstraction. For example,
since the matrix data layout is hidden under the search/enum/deref
interfaces, programmers would no longer be able to apply cache
optimizations. While the paper provides examples on how the compiler
could perform gather and scatter optimizations, it is unclear whether
such automatic optimizations would apply to different algorithms. Wei
- I would argue that this paper is a nice complementary to the previous
survey of transformational synthesis: while the latter is concerned with
the high-level specification of semantics and functionality (but is
oblivious to complexity considerations and cost models), the former
addresses specification of complicated data structures and their
implementation characteristics (but its expressive power seems to be
restricted when it comes to describing, for instance, a recursive
computation). Gilad
- I also don't
have a feeling for how close they are to exhausting the space of
possible optimizations for this type of code. The future work section
hints that much remains to be done, but I wish they had at least
formally characterized the types of programs for which they are
confident that they can generate very efficient code. I'm not sure how
simple the programming model of relational algebra will be to
programmers in these domains. I would have liked to see more of the
syntax used to specify the relations. Again, most of these criticisms
might have been addressed in a longer paper. Manu
- The method provides a way to generate sparse matrix code for different storage formats and for parallel execution. A relational query can be used to specify execution loops, which could be straightforward for many users. It appears to be scalable to large computational programs that have regular execution loops. The approach could be used generally for sparse matrix based computation programs that contain "Do all" loop constructs. Mark
- Solving problems using relational approach can only be applied in very few domains of problems. Cindy
- Another idea is whether some crazy variation on this system could be
used to mitigate the 'object-relational impedance mismatch'? Imagine
that we have a well-defined Java object schema. Now, let us define a
'language' for operating on these objects in an object-relational style.
This 'language' could be a subset of Java that is easier to analyze, SQL
with extra Java-like junk, or anything else. (Hopefully something
familiar, to ease adoption.) The desired characteristics of the language
are (a) that OO-type operations (jungloids, method calls, pointer
traversals) are easy to write, (b) relational-type operations can be
written in a declarative-ish style instead of yucky nested loops (maybe
this is not so important), and (c) the code written will be independent
of the storage location of the objects. The system will then synthesize
a program that a mixture of OO code and database queries to implement
the functionality efficiently. It would work by translating the OO parts
into relational operators, resulting in a relational algebra tree, and
would generate an execution plan that uses both RDBMS and Java
operations. One benefit would be that it could push functionality back
and forth between Java and RDBMS as necessary. For example, say that an
object has a field that is stored directly in the DB. It also has a
getter that gets a computed value based on that field. If the user
writes queries that search for things based on the indexed value, the
system has the choice of (a) reading the whole table and filtering in
Java using the accessor, (b) computing the inverse function of the
accessor and filtering in SQL, (c) materializing the result of the
accessor function in the DB and indexing on it, and (d) generating a
function in the RDBMS extension language and filtering 'manually' on the
server. It is not even important for the system to determine the best
choice fully automatically; if the user can choose which plan based on a
simple declaration (a la sketching), it would be of great practical
benefit. Dave