Software released by Michael Ernst

This is a partial list of software packages released by Michael Ernst. I am always happy to receive comments, bug reports and fixes, and improvements to the code. I will attempt to assist you with problems or incorporate your changes into the main source. Unless otherwise noted, all software should be considered to be covered by the GNU Public License. In one or two cases, the software is available by request rather than by direct download. Also, much more software is available than is listed here; if you are interested in something specific from my research, feel free to ask me. In general, I try to release all my software, though intellectual property constraints imposed by my research partners occasionally prevents this for a specific package.

Contents:

Program analysis

Multi-language

Daikon: dynamically detect program invariants

Daikon is a system for dynamically detect program invariants of the sort that might be written in an assert statement or a formal specification. Such program properties are useful for a wide variety of software construction, understanding, reuse, and modification tasks, but they are usually absent from code. Front ends exist for C, C++, Java, Perl, and other data sources.

Daikon homepage
Download Daikon

DynComp: abstract type inference for Java, C, and C++

Programmers often use primitive types such as int (integer) to stand for a variety of concepts: day of month, file descriptor, seconds since 1970, etc. An analysis that uses declared types will perform poorly when analyzing a program that re-uses types in this way. (For instance, it is true, but useless, to report that a file descriptor's integer value is less than the number of seconds since 1970.) DynComp partition the uses into finer-grained abstract types, one per concept. It is a dynamic analysis that tracks value flows through a program to determine which values interact with one another (and so can be assumed to be of related abstract types). DynComp is implemented both for C/C++ binaries (using the Fjalar toolkit) and for Java programs (class files).

Both versions of DynComp are distributed as part of the Daikon distribution.

Java only

The Checker Framework: Pluggable type checkers for Java

Java's type system detects and prevents many errors at run time. But it doesn't prevent enough errors. We have written checkers that operate as plug-ins to javac and that check more expressive type systems. Four examples of supported checkers are:

We also distribute the Checker Framework, which makes it easy for you to write new checkers like the ones that we distribute.

These tools are distributed from the JSR 308 webpage:

JSR 308 webpage

Javarifier: inference of reference immutability

Javarifier infers the immutability (according to the definition of the Javari language) of every reference in a Java program. In other words, it converts Java programs and libraries to Javari. It annotates source or .class format with backward-compatible comments or attributes.

Annotating programs aids developers in reasoning about the code and modifying it without introducing subtle mutation errors. Annotating libraries is important because Javari programs use libraries without Javari annotations often do not typecheck. Manually determining the mutability for each (public) parameter and return type is tedious and error-prone. Javarifier automatically performs this analysis.

Javarifier webpage.

Generics-related refactorings for Eclipse

Java 5 introduces parametric polymorphism, or generic types, to the Java language. In order to reap the error-detection and documentation benefits of generic types, existing Java code must be converted to Java 5. Our research group has created refactoring tools to address two parts of this task. Instantiation replaces non-generic uses of generic types by generic uses, such as converting List strings; to List<String> strings;. Parameterization adds type parameters to class declarations, such as converting class List {...} to class List<T> {...}. A sequence of three papers (1, 2, 3) discusses these transformations.

Eclipse downloads

Concurrencer: Refactoring for concurrency

Concurrencer is a set of Eclipse refactorings that aid programmers in making their programs run faster on multi-core machines. Some of Concurrencer's refactorings convert code that uses locks into lock-free versions. Others of Concurrencer's refactorings convert sequential code into code that spawns multiple threads.

Concurrencer website

C/C++ only

Fjalar: toolkit for binary instrumentation

Fjalar is a toolkit for instrumentation of compiled executables (currently for Linux/x86); insertion of instrumentation code enables creation of dynamic analyses. Dynamic analyses that observe data values (as opposed to merely control flow) are challenging to write because of memory-unsafety, the need to record information in terms of source code constructs throughout (not just at the end of) the analysis, and other factors. Fjalar takes a “mixed-level” approach that combines the benefits of source-based instrumentation and binary instrumentation.

Fjalar is the basis for the Kvasir tool and one of the DynComp tools.

Fjalar homepage
Fjalar manual

Kvasir: data tracing for C/C++ executables

Kvasir can be described as a value profiler, or a data tracing tool. It runs a program and reports the values that variables took on during run time. Using the capabilities of the Fjalar framework, it traverses complex data structures without causing crashes or outputting nonsensical values, even in the presence of partially-initialized data and memory errors.

Kvasir is distributed as part of the Daikon distribution.

Flowcheck: Measuring quantitative information flow

A program that processes secret data inevitably reveals some information about that data (hopefully the program only reveals a small amount of information). Flowcheck is a security testing tool that measures how much information about a program's secret inputs is revealed in its public outputs.

Flowcheck website.

cline.pm: linewise C scanning and parsing

cline is a lightweight scanner and parser for C code. It reads C programs and returns physical lines, logical lines (that may span line continuation directives), and full-token lines (that may span C tokens). It works in many cases even when the input does not strictly follow the C syntax, making it useful for programs containing syntax errors.

cline.pm, 13 September 2019.

Testing

ReCrash: Creating unit tests for system crashes

ReCrash is a lightweight technique that monitors a program for failures such as crashes. When such an error occurs, ReCrash creates multiple unit tests that reproduce it. In a mode that requires two failures before ReCrash creates unit tests, its overhead is under 2%, permitting routine use. A vendor can instrument a distributed program with ReCrash, reducing the burden of bug reporting for users and the burden of bug reproduction for developers.

ReCrash homepage

Continuous testing plug-in for Eclipse

Continuous testing uses excess cycles on a developer's workstation to continuously run regression tests in the background, providing rapid feedback about test failures as source code is edited. It is intended to reduce the time and energy required to keep code well-tested and prevent regression errors from persisting uncaught for long periods of time.

Note: The Continuous Testing plugin has not been kept up to date with recent versions of Eclipse as Eclipse's interfaces change. A maintainer interested in updating it would be welcome.

Continuous testing homepage
Installation instructions for Eclipse plug-in

Eclat: random generation and classification of test inputs

Eclat is a tool that automatically generates test inputs for a Java program. Eclat differs from other tools for generating tests in that it uses a novel selection mechanism to filter out most of the candidate tests, only presenting to the user the ones that are most likely to actually reveal an error in the program or a deficiency in an existing test suite.

Eclat homepage
Requires Daikon.

Randoop: feedback-directed random test generation

Randoop (previously named “Joe”) creates test inputs for Java programs. The tests are created randomly, but Randoop incorporates a mechanism called feedback-directed test generation to direct the test generation toward interesting parts of the state space. (Naive random generation would create many illegal or uninteresting tests.) Essentially, they only build on tests that appear to indicate correct behavior; tests that indicate problems (either errors or illegal inputs) need not be extended by the random search. Randoop, and a C# implementation called Randoop, found hundreds of real, previously-unknown errors in the Sun JDK and in the Microsoft Common Language Runtime, which are widely-used and heavily-tested components.

Randoop homepage

Palulu: model-directed random test generation

Palulu extends the ideas of Randoop in two ways. First, it generates a model of legal method sequences, then uses that model to direct the random test generation, resulting in the creation of more complex states than can be produced by ordinary random generation. Second, it turns observed behavior into a regression test, by requiring similar behavior (not identical — that would be too brittle) on future runs. Palulu has discovered errors in multiple versions of the Sun JDK, and in the IBM JDK.

JUnit 4

Version 4 of the popular JUnit regression testing framework was built in part in the Program Analysis Group at MIT, by David Saff (in conjunction with Kent Beck and Erich Gamma).

JUnit 4

Eclipse downloads

JUnit plug-in for Eclipse

The JUnit plug-in distributed with Eclipse (version 3.2 and later) was built in the Program Analysis Group at MIT, by David Saff and others.

Eclipse downloads

Java extensions

JSR 308: Java with extended annotations

JSR 308, “Annotations on Java Types”, enriches the Java annotation system. It permits annotations to appear in more places than Java 6 permits; one example is generic type arguments (List<@NonNull Object>). These enhancements to the annotation system require minor, backward-compatible changes to the Java language and classfile format. These changes are planned to be part of the Java 7 language.

JSR 308 homepage

Javari: Java with reference immutability

Javari is an extension of the Java language that permits the specification and compile-time verification of immutability constraints. Specifically, the transitive state of the object to which an immutable reference refers cannot be modified using that reference.

The Javari implementation has been superseded by a new implementation based on the Checker Framework.

Also see the Javarifier tool for inference of reference immutability, which works with the new Javarifier implementation.

Javari homepage Javari checker documentation

IGJ: Immutability Generic Java

IGJ is an extension of the Java language that permits the specification and compile-time verification of both reference immutability and also object immutability. This makes it more powerful than the Javari language.

The IGJ implementation has been superseded by a new implementation based on the Checker Framework.

Checker Framework download page
IGJ checker documentation

Other research tools

Medic: SAT-compilation of planning problems

The Medic planner encodes a planning problem as a propositional logic (CNF) formula, finds a satisfying truth assignment for that formula, and then translates the assignment into a plan which solves the original planning problem. The Medic system is described in the IJCAI '97 paper Tradeoffs in Automatic SAT-Compilation of Planning Problems.

Medic was the first tool for compiling a problem into SAT, or logical satisfiability. (The idea was due to Kautz and Selman, but this is the first implementation.) Such a transformation is valuable because much research has addressed making SAT compilation very fast. Today, solving problems by reducing them to SAT is a very common technique used in all fields of computer science.

medic.tar.gz, 13 Sep 2019.

Gamesman's Toolkit: game-theoretic analysis

As of July, 2003, you are better off not using David Wolfe's Gamesman's Toolkit (nor my extension, which is linked to below). Instead, use Aaron Siegel's reimplementation, with new features, portability, extensibility, and ease of use. It includes an implementation of Konane. You can get it from https://cgsuite.sourceforge.net/.

The March 4, 1998 release of David Wolfe's Gamesman's Toolkit for combinatorial game-theoretic analysis. Includes code by Michael Ernst for evaluating positions in the ancient Hawaiian game of Konane. Game-theoretic analysis of Konane is described in the 1995 UMAP Journal paper Playing Konane Mathematically: A Combinatorial Game-Theoretic Analysis.

David Wolfe's games.tar.gz, 13 September 2019.
Superseded by Combinatorial Game Suite

Gud: unified dispatching model

Dubious With Predicate Dispatching, also known as Güd, is a simple interpreter demonstrating the features of predicate dispatching. Predicate dispatching is a unified model of dispatch that generalizes previous dispatching mechanisms; details can be found in the ECOOP '98 paper Predicate Dispatching: A Unified Theory of Dispatch. A manual is included in the implementation. The primary author of the implementation is Craig Kaplan.

gud.tar.gz, version 0.1.1, 22 July 1998.

gray-codes: search for Gray codes

gray-codes searches for Gray codes (sequences of n-bit numbers, each differing in only one bit position) that satisfy certain constraints. Some results from this program are reported in the paper “Graphs induced by Gray codes”.

gray-codes.tar.gz, 13 Sep 2019.

Teaching

The Groupthink specification exercise

The Groupthink specification exercise is a fun, gameshow-like activity that teaches students about teamwork and specifications. Also see the UpopVote software.

groupthink-specification-exercise.zip, 13 Sep 2019.

UpopVote: vote tallying for the Groupthink specification exercise

The Groupthink specification exercise can be run without software support, but the UpopVote software permits it to be run more smoothly. The UpopVote software runs in a room equipped with one InterWrite PRS remote control per student.

UpopVote.zip, 13 Sep 2019.

Re-turnin: programming assignment grading

In MIT class 6.170: Laboratory in Software Engineering, after programming assignments are graded by the course staff, students revise and re-submit the assignments. This “re-turnin” gives students a chance to correct their mistakes. Students get a small taste of working on (their own) legacy code, helping them to appreciate the benefits of good design and documentation. Students are given a few re-turnin tries, but credit for the re-turnin is all-or-nothing, so students are motivated to apply the intellectual tools of the course: testing, reasoning, etc.

This infrastructure is available on request.

BibTeX

bibtex2web: create web pages from BibTeX files

Given a collection of BibTeX files, bibtex2web can create any of the following:

This is convenient because you only need to keep one set of (BibTeX) sources up to date, thus avoiding skew between your webpages and bibliographies.

The output of bibtex2web is customizable. You can see examples of the output of bibtex2web at PAG pubs, by topic, by author.
Every file under https://groups.csail.mit.edu/pag/pubs/ is automatically generated by bibtex2web.

The bibtex2web manual is included in the distribution (but is linked here for reference).

You can obtain bibtex2web by cloning the source code from GitHub.

bib-abbreviate: shorten LaTeX bibliographies

bib-abbreviate maintains two sets of bibliography abbreviation (bibstring) files. You can choose between them to easily select either verbose or terse bibliography entries.

bib-abbreviate.pl, 9 Nov 2022.
bibstring-master.bib, 9 November 2022.

WWW and HTML

bibtex2web: create web pages from BibTeX files

See elsewhere on this page.

html-update: update HTML tables of contents and link dates

html-update consists of two scripts.

These scripts (and others) are available in the plume-lib/html-tools repository.

html2texi: convert HTML to Texinfo

html2texi converts HTML documentation trees into Texinfo format. Texinfo format can be easily converted to Info format (for browsing in Emacs or the standalone Info browser), to a printed manual, or to HTML. Thus, html2texi.pl permits conversion of HTML files to Info format, and secondarily enables producing printed versions of Web page hierarchies.

Unlike HTML, Info format is searchable. Since Info is integrated into Emacs, one can read documentation without starting a separate Web browser. Additionally, Info browsers (including Emacs) contain convenient features missing from Web browsers, such as easy index lookup and mouse-free browsing.

html2texi was used to create the Python Info files.

html2texi, 24 August 2022.
Requires checkargs.pm.

WWW-index: book-like indices from HTML documents

WWW-index is a system for creating and maintaining an index (like a book's index, not a database for programmatic searching) to a set of HTML documents. The index is an HTML document which lists terms and documents referring to the term. In the index, document titles are links to the document, and in documents, indexed items are links to the appropriate portion of the index.

www-index.tar.gz, 13 September 2019.

anonymize.pl: remove names and addresses from email

This filter removes email addresses and last names from its input (typically a mail archive); full names and email addresses are replaced by unique nicknames.

anonymize, 13 September 2019.
Requires util_mde.pm, 13 September 2019. Requires checkargs.pm.

Programming

checkargs.pm: Perl argument checking

checkargs checks the number of arguments passed to a Perl function at run time, catching some common errors that could otherwise go undetected until later in the program.

It is available in the plume-lib/html-tools repository.

cppp: CPP partial evaluator

cppp simplifies the conditional compilation structure (#if, etc.) of a C file by eliminating all those that it can, based on the values of preprocessor macros specified by the user. If those specified macros are insufficient to fully evaluate a conditional, it is left as is.

cppp, 13 September 2019.
Requires checkargs.pm, cline.pm, and paren.pm.

paren.pm: find matching delimiters

Given an argument string, the functions in paren count the number of unbalanced parentheses, braces, and other delimiters, or find the close delimiter that matches a specified open delimiter.

paren.pm, 13 September 2019.

Python Info files

Note: This program is not of interest any more. Elsewhere, you can find an Info version of the Python manuals and Emacs Lisp code for looking up Python documentation from within Emacs.

The six Python manuals are available at https://www.python.org/doc/ in a variety of formats, but not Info format. Unlike HTML, Info format is searchable. Since Info is integrated into Emacs, one can read documentation without starting a separate Web browser. Additionally, Info browsers (including Emacs) contain convenient features missing from Web browsers, such as easy index lookup and mouse-free browsing.

python-info.tar.gz, 13 September 2019, for Python 2.1.
python-info-Makefile, 13 September 2019: Makefile for creating the Info files with html2texi.

Email

sieve-delay: delay email delivery until specific times

I find it distracting to frequently receive new email. sieve-delay delays mail delivery: instead of a steady stream throughout the day, you will periodically receive a batch of messages. (If you have the willpower to simply not check your email, and you don't need to have your mailbox open for other reasons, then you don't need the sieve-delay tool!)

sieve-delay supersedes the procmail-mde tool.

sieve-delay-readme, 13 Sep 2019.

procmail-mde: mail separation and delay

procmail-mde is superseded by the sieve-delay tool.

procmail-mde is a collection of scripts and data files that support splitting your email into multiple mail drops. For instance, I filter my mail into four mail drops:

I find this convenient because my work is not interrupted by mail to mailing lists to which I am subscribed; those messages are deferred until the next morning or I decide I want to read them.

procmail-mde.tar, 13 September 2019.

Emacs

Emacs extensions

My hacks for Emacs appear separately.

EDB: the Emacs database

I no longer support EDB, as of 1997. However, Thien-Thi Nguyen has taken EDB over, and newer versions are available at https://gnuvola.org/software/edb/.

EDB is a database program for GNU Emacs. It permits you to manipulate structured (or not-so-structured) data within Emacs and provides many of the usual database features, including: customizable file layouts; typed fields (e.g., integer, date, string); arbitrary data display formats; selective display records; standard GNU Emacs editing commands; database summaries; sorting; merging and reconciliation of databases; reports generated from database information; highly customizable; documented by a 100-page manual.

(Also see my other Emacs extensions.)

Please see the main EDB webpage at https://gnuvola.org/software/edb/. Some historical information is also available, such as version 1.21 of EDB.

Miscellaneous

plume-lib: a library of useful abstractions for programming

Plume-lib is a library of useful abstractions for programming. It includes:

Please see its documentation.

countdown: large-display countdown timer

countdown.ppt is a PowerPoint presentation that indicates, in very large numbers, the amount of time remaining in a presentation. It is useful for session chairs at conferences to run this on a laptop that is facing the speaker, so that the speaker always knows exactly how much time is left and the session chair need not distract the speaker to indicate the time remaining.

countdown.ppt was inspired by the Timekeeper program written by Charles Leiserson in the late 1980s, in Future Basic for the Macintosh (see Charles E. Leiserson, “Timekeeper,” SIGACT News, Vol. 23, No. 4, Fall 1992, pp. 81-82).

A tip: when you have many presentations of the same length to time, you can edit countdown.ppt to remove all but the amount of time that you need. Then, you will be able to just hit the home key to go to the beginning of the presentation, rather than having to mouse around to get to the amount of time you want.

countdown.ppt, 13 Sep 2019.

Checkdates: birthday reminders

Checkdates reads a file containing date information and lists all dates within a specified number of days before or after today. It's crude, but it works for me.

checkdates.tar.gz, 13 September 2019.

help: paragraph grep

The “help” program essentially performs a paragraph-wise grep/search, though with a number of additional features such as file inclusion. Given a series of words, “help” prints all the paragraphs that match all the words. One use for this is to locate matching records in a plaintext file, such as an address list. Another use is to find short notes you have written to yourself that explain various important pieces of information.

The “help” program is superseded by the more general and portable “lookup” program, which is part of plume-lib. Lookup searches a set of files, much like grep does. However, Lookup searches by entry (by default, paragraphs) rather than by line, respects comments (ignores matches within them), respects \include directives (searches the named file), and has other options. See its documentation.

help.tar, 13 September 2019.


Back to Michael Ernst's home page.