Untitled Document

CS294-2: Software Synthesis

Spring 2006

Time and Place: Tue and Thu 9:30-11:00, 310 Soda
Units: 3
Recommended background: CS 164 or equivalent
Instructor: Rastislav Bodik, 773 Soda
Office hours: Tue 11-12, Thu 3-4

Course overview: Programmers would love to have their code synthesized from a concise specification, and computer science has been trying to satisfy their wishes for three decades. While exciting theoretical results exists and successful tools for specific domains have emerged, synthesis has not yet entered the mainstream. The purpose of the course is to understand the reasons and to synthesize a research direction for software synthesis that suits today's programmers and processes. We'll base our discussions on lessons from old classics, on successes and failures of past synthesis tools, and promises of recent technologies, some achieved here in Berkeley. Our view of code synthesis will be broad, spanning from deductive synthesis to genetic programming (see the topics below). Similarly, we'll cover a range of applications, from high-performance computing, object-oriented programming, API programming, assembly-level programming, to agent programming.

Intended audience: Graduate students in EECS. Seniors with interest in programming systems are encouraged to enroll. The course will cover diverse code synthesis technology and applications, and hence may be of interest not only to students in programming languages, but also in scientific computing, graphics, CAD, and other areas.

Student workload: Reading assigned papers and participating in class discussions. Presenting one paper in class (undergraduates may choose to present a demo of a tool). Project (literature review, novel algorithm design, or implementation).

Lecture format: Each lecture will discuss a paper, with active participation from students. Project presentations. Some lectures will be presented by guest speakers.

Initial list of topics and papers (under construction). To be refined according to student interests:

Deductive software synthesis: the proof is the program
- Toward automatic program synthesis (another candidate paper)
- KIDS: A Semi-Automatic Program Development System
- Amphion (more amphion papers)
Transformational synthesis

A Transformation System for Developing Recursive Programs
Program improvement by internal specialization
Synthesis of concurrent garbage collectors (likely guest speaker)

Program differentiation
- Finite Differencing of Computable Expressions or Formal differentiation: A program synthesis technique (more)
- Incrementalization across object abstraction
Superoptimizers: a search for the best assembly code sequence
- main paper: Denali: a goal-directed superoptimizer
- supplementary reading:
Programming by demonstration, scenarios, examples:
Synthesis with partial programs:
- Program synthesis as machine learning: ALisp
- Programming by Sketching
Scientific computing
- Synthesis of irregular codes
- FLAME: Formal Linear Algebra Methods Environment (more)
- sparse code synthesis (Yelick et al) TBD
Schema-based synthesis:
- AutoBayes
- supplemental reading: Using Domain Models in Extensible Schema-based Software Synthesis
Object-oriented programming, components:
Genetic algorithms:
- TBD

Papers suggested by students:

Genetic algorithms for solving heuristic compiler optimization problems: paper on using machine learning to find a good inlining heuristic (more, more) --Manu
Automated tuning. Self Adapting Linear Algebra Algorithms and Software: This is a long paper that talks about how to create linear algebra software that during installation examines properties of the machine it's on and based on what it sees chooses which algorithms to make use of. This is a long paper, and perhaps it might be best to focus on just one or two major sections of it though. These sections talk about 1) Creating dense numerical linear algebra libraries by examining machine properties during install-time and selecting the algorithms to install based on the observed properties. 2) The more complicated process of doing something similar for sparse numerical linear algebra libraries. 3) Using statistical learning techniques to pick the algorithms for linear algebra libraries. This fits under the synthesis umbrella because while the space of available algorithms might be known (and thus no new code is "synthesized"), the particular algorithms that would be best on a particular platform vary from platform to platform. This means that on each machine it's installed on, the final library is in effect synthesized from a "grab-bag" of available algorithms during the installation process. --Hormozd
Inductive logic programming: given a bunch of example sentences in some logical language such as Prolog, find a logic program that 'explains' those sentences; relevant, e.g. to programming by demonstration, paper ---Bhaskara
Web service synthesis: Automated Synthesis of Executable Web Service Compositions from BPEL4WS Processes. From the Service Oriented Architecture point of view, any modular web service can be [potentially produced with] automated synthesis. This paper discusses one example of such process. Semantic Web Service Composition via Logic-based Program Synthesis This paper goes more in depth on what defines a web service, what challenges come with the increasing number of web services, and one of the solutions as an approach to web service composition via logic-based program synthesis. --Cindy
Synthesis for numerical kernels and DSP algorithms. The SPIRAL project at CMU, overview paper --Amir
Automatic parallelization: parallelization is a form of software synthesis: the programmer supplies a sequential program with annotations on data layout and synchronization requirements, and the compiler produces a parallel SPMD program with the appropriate data partitioning and communication calls for the compilation target. Compared to other parallel programming methodologies, data parallel languages have a significant advantage in productivity, since programmers are freed from the tedious and often error prone tasks of communication code generation. Unfortunately, since the overhead of a remote access is typically orders of magnitude higher than a local access on today’s clusters, a naive parallelization strategy would result in a large number of small messages and unacceptable performance. Thus, compiler writers for data parallel languages invested significant efforts in communication optimizations such as vectorization and pipelining. With the advent of multi-core architectures, there have also been renewed interests in data parallel paradigms such as HPF and OpenMP; the hardware is already available so you might as well let compilers try to automatically take advantages of it, and the cost of communication is generally much lower for such architectures, which simplifies the task of parallelization. So, I think a paper on data parallel languages and their parallelization techniques may be interesting for the class, even if it’s a bit dated. Compiling Fortran D for MIMD Distributed-Memory Machines --Wei
Meta programming: I have chosen the following paper that I believe is relevant to software synthesis: Sheard, T. and Jones, S. P. 2002. Template meta-programming for Haskell. In Proceedings of the 2002 ACM SIGPLAN Workshop on Haskell (Pittsburgh, Pennsylvania). 1-16. (more) Meta-programming refers to techniques that allows programmers to manipulate program code as data, so you can, for example, do some computation at compile-time to produce the program to be executed. This technique might be used to specialize code for efficiency reasons (like in partial evaluation) or for overcoming expressibility issues in statically-typed languages. I view this technique as the providing the ability to define high-level DSLs (embedded in a general purpose "host" language) and describe how to (efficiently) synthesize/compile that DSL. I chose this paper over others in "Meta-programming" because it is fairly recent with a reasonably detailed related work section comparing with the long-standing quotation system in Lisp/Scheme and even some discussion on C++ templates. --Evan
Synthesis from temporal logic specifications. paper 1 This seems like a fundamental piece on successful synthesis of reactive systems based on temporal logic specifications; it also seems to discuss algorithmic bounds on the synthesis problem for such scenarios. (I found it pretty interesting following Amir
Pnueli's talk at VMCAI, in which he hinted towards the question of synthesis -- using TL specs, of course -- versus design (UML, etc)... This is a more recent work, still concerning synthesis from TL specs, yet incorporating more interesting properties such as faultiness and tolerance. --Gilad
synthesis for the development of code for embedded systems: 1. R.K. Gupta and G. De Micheli, "Hardware-software Co-synthesis for Digital Systems": Introduces the use of "co-synthesis" for systems for which one wishes to synthesize code for hardware-software integrated platforms. 2. S. Parameswaran, "HW-SW Co-Synthesis: The Present and The Future": Gives an overview of co-synthesis for large hardware/software systems. 3. S. Srinivasan and N. K. Jha, "Hardware-Software Co-Synthesis of Fault-Tolerant Real-Time Distributed Embedded Systems": Addresses the problem of automatic hardware-software co-synthesis of fault tolerant, distributed systems. 4. A. Ledeczi, et.al., "The Generic Modeling Environment": A tool that supports creation of domain-specific modeling and program synthesis environments based on "metamodeling" concepts. Tool --Mark
Embedded software. Here is another paper: "Synthesis of software programs for embedded control applications". This work was part of POLIS project (done by CAD group here). Basically, they used S-graphs to model the specification of control applications, and used BDDs(binary decision diagrams) to optimize the S-graphs. Then software code was generated from optimized S-graphs. So I think it is synthesis of embedded software by taking advantage of restricted specifications(a particular type of control applications). Another interesting work from CAD group here. This was done by Prof. Lee's group. They did software synthesis for those applications which can be modeled as dataflow graphs. Lots of multimedia applications fit into this category. -- Qi
Synthesis and simulation of digital systems containing interacting hardware and software components. Paper . This paper broadly covers synthesis for both hardware and software systems. PL students will gain knowledge on hardware systems, and EE students will learn software code synthesis. It also talks about simulation, which usually means to execute a prototype, concerning more about correctness than performance. --Thomas
Tool adoption. DSP software synthesis. A very important problem is why would a real user choose to adopt a new tool or not. It seems like a tool that complements a user's existing workflow has a much better chance of becoming mainstream than one that drastically changes it. One example for this is in DSP development. DSP developers already have the habit of using Matlab to simulate their designs before manually coding the implementation in C. Synthesis tools that can take Matlab or Simulink and generate C code automatically have started to get traction in the DSP community. --Jimmy
DSL's: JTS: Tools for Implementing Domain-Specific Languages (http://citeseer.ist.psu.edu/171171.html) It describes an extensible superset of Java called Jak and a compiler generator named Bali. Together they can process DSLs specified by the user (i.e. can be used to compile Java code that has been extended with arbitrary augmentations). --Liviu

Links to further candidate papers:

NASA Synthesis work