Notkin Publications

Aldrich, J., Chambers, C., & Notkin, D. (2002). Architectural reasoning in ArchJava. 16th European Conference on Object-Oriented Programming (ECOOP), 334-367.

Aldrich, J., Chambers, C., & Notkin, D. (2002). ArchJava: connecting software architecture to implementation. 24th International Conference on Software Engineering (ICSE), 187-197.

Aldrich, J., Sazawal, V., Chambers, C., & Notkin, D. (2002). Architecture-centric programming for adaptive systems. Proceedings of the first workshop on Self-healing systems, 93-95.

Ubiquitous computing services are a fast-growing and challenging class of self-healing systems that must adapt to constant failures and environmental changes. Writing robust ubiquitous computing code is difficult in current programming systems. The architecture, interfaces, and logic of the program are often obscured by infrastructure details, making the development and evolution of these systems difficult and error-prone.We are exploring whether implementation language support for software architecture can aid in the development and evolution of ubiquitous computing systems. One such approach, embodied in the ArchJava language, allows programmers to express the software architecture of an application within Java source code. In this paper, we propose an extension to ArchJava allowing programmers to define custom connectors. Custom connectors are useful in many different contexts; we show how they can be used to implement part of the PlantCare ubiquitous computing application in ArchJava.

Aldrich, J., Sazawal, V., Chambers, C., & Notkin, D. (2003). Language support for connector abstractions. 17th European Conference on Object-Oriented Programming (ECOOP), 74-102.

Alverson, G. A., Griswold, W. G., Lin, C., Notkin, D., & Snyder, L. (1998). Abstractions for Portable, Scalable Parallel Programming. IEEE Transactions on Parallel and Distributed Systems, 9(1), 71-86.

In parallel programming, the need to manage communication, load imbalance, and irregularities in the computation puts substantial demands on the programmer. Key properties of the architecture, such as the number of processors and the cost of communication, must be exploited to achieve good performance, but coding these properties directly into a program compromises the portability and flexibility of the code because significant changes are then needed to port or enhance the program. We describe a parallel programming model that supports the concise, independent description of key aspects of a parallel program including data distribution, communication, and boundary conditions without reference to machine idiosyncrasies. The independence of such components improves portability by allowing the components of a program to be tuned independently, and encourages reuse by supporting the composition of existing components. The isolation of architecture-sensitive aspects of a computation simplifies the task of porting programs to new platforms. Moreover, the model is effective in exploiting both data parallelism and functional parallelism. This paper provides programming examples, compares this work to related languages, and presents performance results.

Alverson, G. A., Griswold, W. G., Notkin, D., & Snyder, L. (1990). A flexible communication abstraction for nonshared memory parallel computing. 1990 ACM/IEEE Conference on Supercomputing, 584-593.

Alverson, G. A., & Notkin, D. (1993). Program Structuring for Effective Parallel Portability. IEEE Transactions on Parallel and Distributed Systems, 4(9), 1041-1059.

The tension between software development costs and efficiency is especially high when considering parallel programs intended to run on a variety of architectures. In the domain of shared memory architectures and explicitly parallel programs, we have addressed this problem by defining a programming structure that eases the development of effectively portable programs. On each target multiprocessor, an effectively portable program runs almost as efficiently as a program fine-tuned for that machine. Additionally, its software development cost is close to that of a single program that is portable across the targets. Using our model, programs are defined in terms of data structure and partitioning-scheduling abstractions. These are activities identified as substantially affecting the performance of parallel programs; activities whose most efficient implementations can differ on the basis of the algorithm, multiprocessor, and even run-time data values and parameters. Low software development cost is attained by writing source programs in terms of abstract interfaces and thereby requiring minimal modification to port; high performance is attained by matching (often dynamically) the interfaces to implementations that are most appropriate to the execution environment. We include results of a prototype used to evaluate the benefits and costs of this approach.

Amador, F. G., Berman, D., Borning, A., Derose, T., Finkelstein, A., Neville, D., Notkin, D., Salesin, D., Salisbury, M., Sherman, J., Sun, Y., Weld, D. S., & Winkenbach, G. (1993). Electronic How Things Work Articles - 2 Early Prototypes. IEEE Transactions on Knowledge and Data Engineering, 5(4), 611-618.

The Electronic Encyclopedia/Exploratorium (E3) is a vision of a future computer system-a kind of electronic ''How Things Work'' book. Typical articles in E3 will describe such mechanisms as compression refrigerators, engines, telescopes, and mechanical linkages. Each article will provide simulations, three-dimensional animated graphics that the user can manipulate, laboratory areas that allow a user to modify the device or experiment with related artifacts, and a facility for asking questions and receiving customized, computer-generated English-language explanations. In this paper, we discuss some of the foundational technology-especially focusing on topics in artificial intelligence, graphics, and user interfaces-needed to achieve this long-term vision. We describe our initial prototype system and the technical lessons we have learned from it, as well as our second prototype currently under construction.

Ambriola, V., & Notkin, D. (1988). Reasoning About Interactive Systems. IEEE Transactions on Software Engineering, 14(2), 272-276.

Interactive systems have goals and characteristics that differ from those of batch systems. These differences lead to a need for new techniques, methods, and tools for manipulating and constructing interactive systems. The difference in structure between batch and interactive systems. The difference is considered, focusing on the distinction between command decomposition and component decomposition. The possible ways of solving a problem using an interactive system using action paths, which account for the relatively unconstrained actions of interactive users, are described. It is shown that interactivity is not an inherent characteristic of a system but rather a characteristic that depends on the error profile of its users. The requirements that interaction places on the underlying implementation, specifically the need for incrementality and integration, are considered. The results are applied to several existing classes of systems.

Anderson, R. J., Beame, P., Burns, S., Chan, W., Modugno, F., Notkin, D., & Reese, J. D. (1996). Model checking large software specifications. 4th ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE), 156-166.

Anderson, R. J., Beame, P., Chan, W., & Notkin, D. (2000). Experiences with the application of symbolic model checking to the analysis of software specifications. Perspectives of System Informatics, 1755, 460-469.

Symbolic model checking is a powerful formal-verification technique which has been used to analyze many hardware systems. In this paper we present our experiences in applying symbolic model checking to software specifications of reactive systems. We have conducted two in depth case studies: one, using the specification of TCAS II (Traffic Alert and Collision Avoidance System II), and the other using a model of an aircraft electrical system. Based on these case studies, we have gained significant experience in how model checking can be used in to analyze software specifications, and have also overcome a number of performance bottlenecks to make the analysis tractable. The emphasis of this paper is the uses of model checking in the analysis of specifications. We will discuss the types of properties which we were able to evaluate in our case studies. These include specific errors we were able to identify, as well as general properties we were able to establish for the systems. We will also discuss, in more general terms, the potential uses of symbolic model checking in the development process of software specifications.

Badros, G. J., & Notkin, D. (2000). A framework for preprocessor-aware C source code analyses. Software—Practice & Experience, 30(8), 907-924.

Analyses of C source code usually ignore the C preprocessor because of its complexity. Instead, these analyses either define their own approximate parser or scanner, or else they require that their input already be preprocessed. Neither approach is entirely satisfactory: the first gives up accuracy (or incurs large implementation costs), while the second loses the preprocessor-based abstractions. We describe a framework that permits analyses to be expressed in terms of both preprocessing and parsing actions, allowing the implementer to focus on the analysis. We discuss an implementation of such a framework that embeds a C preprocessor, a parser, and a Perl interpreter for the action hooks. Many common software engineering analyses can be written surprisingly easily using our implementation, replacing numerous ad-hoc tools. The framework's integration of the preprocessor and the parser further enables some analyses that otherwise would be especially difficult to perform.

Chan, W., Anderson, R., Beame, P., & Notkin, D. (1997). Combining constraint solving and symbolic model checking for a class of systems with non-linear constraints. Computer Aided Verification (CAV), 1254, 316-327.

We extend the conventional BDD-based model checking algorithms to verify systems with non-linear arithmetic constraints. We represent each constraint as a BDD variable, using the information from a constraint solver to prune the BDDs by removing paths that correspond to infeasible constraints. We illustrate our technique with a simple example, which has been analyzed with our prototype implementation.

Chan, W., Anderson, R., Beame, P., Notkin, D., Jones, D. H., & Warner, W. E. (2001). Optimizing Symbolic Model Checking for Statecharts. IEEE Transactions on Software Engineering, 27(2), 170-190.

Symbolic model checking based on binary decision diagrams is a powerful formal verification technique for reactive systems. In this paper, we present various optimizations for improving the time and space efficiency of symbolic model checking for systems specified as statecharts. We used these techniques in our analyses of the models of a collision avoidance system and a fault-tolerant electrical power distribution (EPD) system, both used on commercial aircraft. The techniques together reduce the time and space requirements by orders of magnitude, making feasible some analysis that was previously intractable. We also elaborate on the results of verifying the EPD model. The analysis disclosed subtle modeling and logical flaws not found by simulation.

Chan, W., Anderson, R. J., Beame, P., Burns, S., Modugno, F., Notkin, D., & Reese, J. D. (1998). Model Checking Large Software Specifications. IEEE Transactions on Software Engineering, 24(7), 498-520.

In this paper, we present our experiences in using symbolic model checking to analyze a specification of a software system for aircraft collision avoidance. Symbolic model checking has been highly successful when applied to hardware systems. We are interested in whether model checking can be effectively applied to large software specifications. To investigate this, we translated a portion of the state-based system requirements specification of Traffic Alert and Collision Avoidance System II (TCAS II) into input to a symbolic model checker (SMV). We successfully used the symbolic model checker to analyze a number of properties of the system. We report on our experiences, describing our approach to translating the specification to the SMV language, explaining our methods for achieving acceptable performance, and giving a summary of the properties analyzed. Based on our experiences, we discuss the possibility of using model checking to aid specification development by iteratively applying the technique early in the development cycle. We consider the paper to be a data point for optimism about the potential for more widespread application of model checking to software systems.

Chan, W., Anderson, R. J., Beame, P., & Notkin, D. (1998). Improving efficiency of symbolic model checking for state-based system requirements. ACM SIGSOFT international Symposium on Software Testing and Analysis (ISSTA), 102-112.

We present various techniques for improving the time and space efficiency of symbolic model checking for system requirements specified as synchronous finite state machines. We used these techniques in our analysis of the system requirements specification of TCAS II, a complex aircraft collision avoidance system. They together reduce the time and space complexities by orders of magnitude, making feasible some analysis that was previously intractable. The TCAS II requirements were written in RSML, a dialect of state-charts.

Chan, W. C., Anderson, R. J., Beame, P., Jones, D. H., Notkin, D., & Warner, W. E. (1999). Decoupling synchronization from local control for efficient symbolic model checking of statecharts. 21st international Conference on Software Engineering (ICSE), 142-151.

Chan, W. C., Anderson, R. J., Beame, P., & Notkin, D. (1997). Combining Constraint Solving and Symbolic Model Checking for a Class of a Systems with Non-linear Constraints. 9th International Conference on Computer Aided Verification (CAV), 316-327.

Chow, K., & Notkin, D. (1996). Semi-automatic update of applications in response to library changes. 1996 International Conference on Software Maintenance (ICSM), 359.

Ernst, M. D., Badros, G. J., & Notkin, D. (2002). An empirical analysis of C preprocessor use. IEEE Transactions on Software Engineering, 28(12), 1146-1170.

This is the first empirical study of the use of the C macro preprocessor, Cpp. To determine how the preprocessor is used in practice, this paper analyzes 26 packages comprising 1.4 million lines of publicly available C code. We determine the incidence of C preprocessor usage-whether in macro definitions, macro uses, or dependences upon macros-that is complex, potentially problematic, or inexpressible in terms of other C or C++ language features. We taxonomize these various aspects of preprocessor use and particularly note data that are material to the development of tools for C or C++, including translating from C to C++ to reduce preprocessor usage. Our results show that, while most Cpp usage follows fairly simple patterns, an effective program analysis tool must address the preprocessor. The intimate connection between the C programming language and Cpp, and Cpp's unstructured transformations of token streams often hinder both programmer understanding of C programs and tools built to engineer C programs, such as compilers, debuggers, call graph extractors, and translators. Most tools make no attempt to analyze macro usage, but simply preprocess their input, which results in a number of negative consequences; an analysis that takes Cpp into account is preferable, but building such tools requires an understanding of actual usage. Differences between the semantics of Cpp and those of C can lead to subtle bugs stemming from the use of the preprocessor, but there are no previous reports of the prevalence of such errors. Use of C++ can reduce some preprocessor usage, but suck usage has not been previously measured. Our data and analyses shed light on these issues and others related to practical understanding or manipulation of real C programs. The results are of interest to language designers, tool writers, programmers, and software engineers.

Ernst, M. D., Cockrell, J., Griswold, W. G., & Notkin, D. (1999). Dynamically discovering likely program invariants to support program evolution. Proceedings of the 21st international conference on Software engineering, 213-224.

Ernst, M. D., Cockrell, J., Griswold, W. G., & Notkin, D. (2001). Dynamically Discovering Likely Program Invariants to Support Program Evolution. IEEE Transactions on Software Engineering, 27(2), 99-123.

Explicitly stated program invariants can help programmers by identifying program properties that must be preserved when modifying code. In practice, however, these invariants are usually implicit. An alternative to expecting programmers to fully annotate code with invariants is to automatically infer likely invariants from the program itself. This research focuses on dynamic techniques for discovering invariants from execution traces. This article reports three results. First, it describes techniques for dynamically discovering invariants, along with an implementation, named Daikon, that embodies these techniques. Second, it reports on the application of Daikon to two sets of target programs. In programs from Gries's work on program derivation, the system rediscovered predefined invariants. In a C program lacking explicit invariants, the system discovered invariants that assisted a software evolution task. These experiments demonstrate that, at least for small programs, invariant inference is both accurate and useful. Third, it analyzes scalability issues, such as invariant detection runtime and accuracy, as functions of test suites and program points instrumented.

Ernst, M. D., Czeisler, A., Griswold, W. G., & Notkin, D. (2000). Quickly detecting relevant program invariants. Proceedings of the 22nd international conference on Software engineering, 449-458.

Garlan, D., Jha, S., Notkin, D., & David, J. (1998). Reasoning about implicit invocation. Proceedings of the 6th ACM SIGSOFT international symposium on Foundations of software engineering, 209-221.

Garlan, D., Kaiser, G. E., & Notkin, D. (1992). Using Tool Abstraction to Compose Systems. Computer, 25(6), 30-38.

The tool abstraction paradigm, which supports the evolution of large-scale software systems by easing design changes in the system functions, is discussed. Systems that support tool abstraction are structured as a pool of abstract data structures shared by a collection of cooperating 'toolies', where each toolie provides a piece of the overall system function. When one toolie updates the shared data, other toolies must be notified: otherwise, cooperating-but-independent toolies may not execute, and the overall system function may be compromised. The KWIC (key word in context) index production system is used to illustrate the idea of tool abstraction. The relationship of tool abstraction to other concepts is examined.

Garlan, D., & Notkin, D. (1991). Formalizing Design Spaces: Implicit Invocation Mechanisms. Proceedings of the 4th International Symposium of VDM Europe on Formal Software Development-Volume I: Conference Contributions - Volume I, 31-44.

Griswold, W. G., & Notkin, D. (1992). Computer-aided vs. manual program restructuring. ACM SIGSOFT Software Engineering Notes, 33-41.

Griswold, W. G., & Notkin, D. (1995). Architectural Tradeoffs for a Meaning-Preserving Program Restructuring Tool. IEEE Transactions on Software Engineering, 21(4), 275-287.

Maintaining the consistency of multiple program representations, such as abstract syntax trees and program dependence graphs, in a program manipulation tool is difficult. This paper describes a hybrid software architecture for a meaning-preserving program restructuring tool. Layering is the primary architectural paradigm, which successively provides increasingly integrated and unified abstract machines to implement the tool. However, layering does not provide adequate control over extensibility or the independence of components. Consequently, we also adopted the paradigm of keeping the key program abstractions separate throughout the layering, providing independent columns of abstract data types. A pair of columns is integrated by a mapping column that translates elements in one column's data type into related elements in the other column's data type. Thus integration of function and separation of representation can be achieved simultaneously in this complex domain.This hybrid architecture was crucial in overcoming severe performance problems classic in traditional layered systems that became apparent once the basic tool was completed. By taking advantage of the independence of the columns and the special characteristics of meaning-preserving restructuring, it was possible to extend one representation column of the architecture to the uppermost layer to provide the required access for efficient update without compromising independence. The cost of the extended architecture is that the upper layers are no longer as simple because they expose operations that only guarantee consistency under careful usage. However, the structural constraints of the hybrid architecture and provided models for building the more complicated layers minimizes the negative impact of this tradeoff.

Gunter, C., Mitchell, J., & Notkin, D. (1996). Strategic directions in software engineering and programming languages. ACM Computing Surveys (CSUR), 28(4), 727-737.

Habermann, A. N., & Notkin, D. (1986). Gandalf: software development environments. IEEE Transactions on Software Engineering, 12(12), 1117-1127.

Habermann, A. N., Notkin, D. S., & Perry, D. E. (1979). Report on the use of Ada for design and implementation of part of Gandalf. Cmu-Cs ;. Pittsburgh, Pa.: Department of Computer Science, Carnegie-Mellon University, [2], 152 p.

Henderson, P. B., & Notkin, D. (1987). Guest Editors' Introduction: Integrated Design and Programming Environments. Computer. 12-16.

Henderson, P. B., & Notkin, D. (1987). Integrated Design and Programming Environments. Computer, 20(11), 12-16.

Kim, M., Bergman, L., Lau, T. A., & Notkin, D. (2004). An Ethnographic Study of Copy and Paste Programming Practices in OOPL. Proceedings of the 2004 International Symposium on Empirical Software Engineering, 83-92.

Kim, M., & Notkin, D. (2005). Using a clone genealogy extractor for understanding and supporting evolution of code clones. Proceedings of the 2005 international workshop on Mining software repositories, 1-5.

Kim, M., & Notkin, D. (2006). Program element matching for multi-version program analyses. Proceedings of the 2006 international workshop on Mining software repositories, 58-64.

Kim, M., Notkin, D., & Grossman, D. (2007). Automatic Inference of Structural Changes for Matching across Program Versions. Proceedings of the 29th international conference on Software Engineering, 333-343.

Kim, M., Sazawal, V., Murphy, G. C., & Notkin, D. (2005). An empirical study of code clone genealogies. Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, 187-196.

Michail, A., & Notkin, D. (1998). Illustrating Object-Oriented Library Reuse by Example: A Tool-Based Approach. Proceedings of the 13th IEEE international conference on Automated software engineering, 200.

Michail, A., & Notkin, D. (1999). Assessing software libraries by browsing similar classes, functions and relationships. Proceedings of the 21st international conference on Software engineering 463-472.

Murphy, G. C., & Notkin, D. (1995). Lightweight source model extraction. Proceedings of the 3rd ACM SIGSOFT symposium on Foundations of software engineering, 116-127.

Murphy, G. C., & Notkin, D. (1996). Lightweight lexical source model extraction. ACM Transactions on Software Engineering and Methodology (TOSEM), 5(3), 262-292.

Software engineers maintaining an existing software system often depend on the mechanized extraction of information from system artifacts. Some useful kinds of information—source models—are well known: call graphs, file dependences, etc. Predicting every kind of source model that a software engineer may need is impossible. We have developed a lightweight approach for generating flexible and tolerant source model extractors from lexical specifications. The approach is lightweight in that the specifications are relatively small and easy to write. It is flexible in that there are few constraints on the kinds of artifacts from which source models are extracted (e.g., we can extract from source code, structured data files, documentation, etc.). It is tolerant in that there are few constraints on the condition of the artifacts. For example, we can extract from source that cannot necessarily be compiled. Our approach extended the kinds of source models that can be easily produced from lexical information while avoiding the constraints and brittleness of most parser-based approaches. We have developed tools to support this approach and applied the tools to the extraction of a number of different source models (file dependences, event interactions, call graphs) from a variety of system artifacts (C, C++, CLOS, Eiffel. TCL, structured data). We discuss our approach and describe its application to extract source models not available using existing systems; for example, we compute the implicitly-invokes relation over Field tools. We compare and contrast our approach to the conventional lexical and syntactic approaches of generating source models.

Murphy, G. C., & Notkin, D. (1996). On the use of static typing to support operations on frameworks. Object Oriented Systems, 3(4), 197-213.

Frameworks capture the commonalities in design and implementation between a family of related applications and are typically expressed in a general-purpose object-oriented language. Software engineers use frameworks to reduce the cost of building complex applications. Although the task of using a framework to build an application is error-prone, few techniques and tools exist to aid the engineer. This paper characterizes the operations of instantiation, extension and refinement used to build applications from frameworks and explores how these operations are supported by one common language-level tool, namely the static typing policies (and associated tools) of common object-oriented languages. It was found that both conservative contravariant and covariant static typing policies were effective in supporting the operations of framework instantiation and framework extension. However, both policies were ineffective at supporting the operation of framework refinement. Although it does not support the refinement operation itself, covariance is sufficiently expressive to support the instantiation of a properly refined framework. This result provides a basis for defining and building tools to support the effective use and evolution of frameworks in software engineering environments.

Murphy, G. C., & Notkin, D. (1997). Reengineering with Reflexion Models: A Case Study. Computer, 30(8), 29-36.

Reengineering large and complex software systems is often very costly. Reflexion models let software engineers begin with a structural high-level model that they can selectively refine to rapidly gain task-specific knowledge about the source code. In this article we give an overview of our technique and then relate how a Microsoft engineer used it to aid an experimental reengineering of Excel-a product that comprises about 1.2 million lines of C code. Our technique begins with a high-level model, which users define on the basis of a desired software engineering task. Thus, the next steps in our technique are to extract a model of the source, define a map, and, through a set of computation tools, compare the two models. This lets software engineers effectively validate their high-level reasoning with information from the source code. To satisfy the need for a quick and inexpensive method, we made the technique "lightweight" and iterative. The user can easily and rapidly access the structural information of interest and can balance the cost of refinement with the potential benefits of a more complete and accurate model. The engineer in our case study-a developer with 10-plus years at Microsoft-specified and computed an initial reflexion model of Excel in a day and then spent four weeks iteratively refining it. He estimated that gaining the same degree of familiarity with the Excel source code might have taken up to two years with other available approaches. The experimental reengineering case study shows that our technique has practical application. First, the engineer chose to use the technique even when facing extreme pressure. Second, he continued to use the technique beyond the original time (one month) to refine additional parts of the reflexion model for Excel and to compute reflexion models for successive Excel versions. Finally, the engineer stated that the slowdowns he did encounter while performing the experimental reengineering were often due to a lack of up-front understanding. Had the reflexion model technique been used more during planning, he felt that he might have understood the product better to begin with. We believe our technique was successful in large part because it uses approximation. This ensures a smooth feedback from the time invested in applying the technique to the results. The more time the Microsoft engineer spent refining the map, the more information he derived. Although this curve is not completely smooth, the engineer was able to gauge the accuracy of the results and use that information to manage the time and effort invested in using the technique.

Murphy, G. C., & Notkin, D. (1997). Reengineering with reflexion models: A case study. Computer, 30(8), 29-&.

Reengineering large and complex software systems is often very costly. Reflexion models let software engineers begin with a structural high level model that they can selectively refine to rapidly gain task specific knowledge about the source code. The authors describe how a Microsoft engineer used this technique in an experimental reengineering of Excel.

Murphy, G. C., Notkin, D., Griswold, W. G., & Lan, E. S.-C. (1998). An empirical study of static call graph extractors. ACM Transactions on Software Engineering and Methodology (TOSEM), 7(2), 158-191.

Informally, a call graph represents calls between entities in a given program. The call graphs that compilers compute to determine the applicability of an optimization must typically be conservative: a call may be omitted only if it can never occur in any execution of the program. Numerous software engineering tools also extract call graphs with the expectation that they will help software engineers increase their understanding of a program. The requirements placed on software engineering tools that compute call graphs are typically more relaxed than for compilers. For example, some false negatives—calls that can in fact take place in some execution of the program, but which are omitted from the call graph—may be acceptable, depending on the understanding task at hand. In this article, we empirically show a consequence of this spectrum of requirements by comparing the C call graphs extracted from three software systems (mapmaker, mosaic, and gcc) by nine tools (cflow, cawk, CIA, Field, GCT, Imagix, LSME, Mawk, and Rigiparse). A quantitative analysis of the call graphs extracted for each system shows considerable variation, a result that is counterintuitive to many experienced software engineers. A qualitative analysis of these results reveals a number of reasons for this variation: differing treatments of macros, function pointers, input formats, etc. The fundamental problem is not that variances among the graphs extracted by different tools exist, but that software engineers have little sense of the dimensions of approximation in any particular call graph. In this article, we describe and discuss the study, sketch a design space for static call graph extractors, and discuss the impact of our study on practitioners, tool developers, and researchers. Although this article considers only one kind of information, call graphs, many of the observations also apply to static extractors of other kinds of information, such as inheritance structures, file dependences, and references to global variables.

Murphy, G. C., Notkin, D., & Lan, E. S.-C. (1996). An empirical study of static call graph extractors. Proceedings of the 18th international conference on Software engineering, 90-99.

Murphy, G. C., Notkin, D., & Sullivan, K. (1995). Software reflexion models: bridging the gap between source and high-level models. Proceedings of the 3rd ACM SIGSOFT symposium on Foundations of software engineering, 18-28.

Murphy, G. C., Notkin, D., & Sullivan, K. (1997). Extending and Managing Software Reflexion Models. Technical Report: TR-97-15

Murphy, G. C., Notkin, D., & Sullivan, K. (2001). Software Reflexion Models: Bridging the Gap between Design and Implementation. IEEE Transactions on Software Engineering, 27(4), 364-380.

The artifacts constituting a software system often drift apart over time. We have developed the software reflexion model technique to help engineers perform various software engineering tasks by exploiting, rather than removing, the drift between design and implementation. More specifically, the technique helps an engineer compare artifacts by summarizing where one artifact (such as a design) is consistent with and inconsistent with another artifact (such as source). The technique can be applied to help a software engineer evolve a structural mental model of a system to the point that it is good-enough to be used for reasoning about a task at hand. The software reflexion model technique has been applied to support a variety of tasks, including design conformance, change assessment, and an experimental reengineering of the million-lines-of-code Microsoft Excel product. In this paper, we provide a formal characterization of the reflexion model technique, discuss practical aspects of the approach, relate experiences of applying the approach and tools, and place the technique into the context of related work.

Notkin, D. (1985). Annotated-Bibliography of Gandalf Literature. Journal of Systems and Software, 5(2), 173-176.

Notkin, D. (1985). The Gandalf Project. Journal of Systems and Software, 5(2), 91-105.

Notkin, D. (1988). Applying software process models to the full lifecycle is premature. Proceedings of the 4th international software process workshop on Representing and enacting the software process, 116-117.

Notkin, D. (1989). The relationship between software development environments and the software process. Proceedings of the third ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments, 107-109.

Notkin, D. (1990). Proxies - a Software Structure for Accommodating Heterogeneity. Software-Practice & Experience, 20(4), 357-364.

Notkin, D. (1998). The lifeblood of our field. Computer, 31(10), 123-123.

Notkin, D. (2003). Longitudinal program analysis. Proceedings of the 2002 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, 1.

Notkin, D. (2006). Unconventional Views on Conventional Wisdom about Software Engineering Research. Proceedings of the 22nd IEEE International Conference on Software Maintenance, 201.

Notkin, D. (2007). Dessert Island. Automated Software Engineering, 14(1), 123-125.

Notkin, D., Black, A. P., Lazowska, E. D., Levy, H. M., Sanislo, J., & Zahorjan, J. (1988). Interconnecting Heterogeneous Computer-Systems. Communications of the ACM, 31(3), 258-273.

A software structure created by the Heterogeneous Computer Systems (HCS) Project at the University of Washington was designed to address the problems of heterogeneity that typically arise in research computing environments.

Notkin, D., Donner, M., Ernst, M. D., Gorlick, M., & Whitehead, E. J. J. (2001). Panel: perspectives on software engineering. Proceedings of the 23rd International Conference on Software Engineering, 699-704.

Notkin, D., & Griswold, W. G. (1987). Enhancement through extension: the extension interpreter. Proceedings of the ACM SIGPLAN/SIGSOFT Symposium on Interpreters and interpretive techniques, 45-55.

Notkin, D., & Griswold, W. G. (1988). Extension and software development. Proceedings of the 10th international conference on Software engineering, 274-283.

Notkin, D., Hutchinson, N., Sanislo, J., & Schwartz, M. (1987). Heterogeneous Computing Environments - Report on the Acm Sigops Workshop on Accommodating Heterogeneity. Communications of the ACM, 30(2), 132-140.

The ACM SIGOPS Workshop on Accommodating Heterogeneity was conducted in December 1985 in Eastbound, Wash., as a forum for an international group of fifty researchers to discuss the technical issues surrounding heterogeneous computing environments.

Notkin, D., & Jeffery, D. R. (1996). Best papers of the 17th international conference on software engineering (ICSE-17). IEEE Transactions on Software Engineering, 22(6), 361-362.

Notkin, D., Kirsch, G., & Skulikaris, Y. (1999). Intellectual property issues in software (panel). Proceedings of the 21st international conference on Software engineering, 594-595.

Notkin, D., & Pezze, M. (2008). Introduction to the special section from the ACM International Symposium on Software Testing and Analysis (ISSTA 2006). Acm Transactions on Software Engineering and Methodology, 17(2), -.

Notkin, D., & Schlichting, R. D. (1993). Computer Science in Japanese Universities. Computer, 26(5), 62-70.

Despite associating Japan with high technology, most Western scientists know little about computer science in Japan. Many factors contribute to this phenomenon, including language and cultural differences, a shortage of readily available information, and a degree of technical chauvinism. On the basis of observations made during sabbaticals in Japan, the authors provide an informal portrait of computer science in Japanese universities in the hope that enhanced awareness and increased interaction will result. They look specifically at departmental structure, faculty career paths, the student population, and research activity. Probably the most important lesson drawn from their experiences is that it is difficult to overestimate the influence of culture in distinguishing the structure of American and Japanese approaches to teaching and research in computer science. The more notice able differences observed include (1) the tendency of Japanese faculty to stay in a department from undergraduate days through retirement, unlike more mobile approach in the US; (2) the narrow focus of most Japanese computer science research programs compared with generally broader research programs in the US; (3) the structural leveling of resources in general compared with the wide variation across departments in the US; (4) the marked absence of women and foreigners in the faculty and graduate student ranks compared with the more heterogeneous nature of departments in the US; and (5) the small number of faculty and PhD students in top-tier Japanese programs as opposed to the much larger programs found in comparable US universities. The influence of Japan on computer science, and in political, social, and other scientific realms, is growing. Scientists and researchers in other parts of the world must educate themselves about Japanese computer science in order to benefit from the work that goes on in that country and understand how to improve their own efforts.

Notkin, D., Snyder, L., Socha, D., Bailey, M. L., Forstall, B., Gates, K., Greenlaw, R., Griswold, W. G., Holman, T. J., Korry, R., Lasswell, G., Mitchell, R., & Nelson, P. A. (1988). Experiences with poker. Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, 10-20.

Notkin, D. S. (1980). An experience with parallelism in Ada. Proceedings of the ACM-SIGPLAN symposium on Ada programming language, 9-15.

Notkin, D. S. (1984). Interactive structure-oriented computing. Thesis (Ph D ), Carnegie-Mellon University, Dept. of Computer Science,

Carnegie-Mellon University, 1984., Pittsburgh, Pa.

Oberg, B., & Notkin, D. (1992). Error Reporting with Graduated Color. IEEE Software, 9(6), 33-38.

A technique for nonintrusive error notification during programming that uses graduated color and elision, the temporary hiding of information, is described. Users can see errors and their age by the color and can look at the associated error explanation when they are not busy. Interruption is kept to a minimum during notification, and the explanation is close at hand and complete when it is wanted. The application of this technique to the Cornell synthesizer Generator is discussed, and sample generated displays are presented.

Sazawal, V., Kim, M., & Notkin, D. (2004). A Study of Evolution in the Presence of Source-Derived Partial Design Representations. Proceedings of the Principles of Software Evolution, 7th International Workshop, 21-30.

Sazawal, V., & Notkin, D. (2004). Design snippets: partial design representations extracted from source code. Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, 13-14.

Schwartz, M., Zahorjan, J., & Notkin, D. (1987). A name service for evolving heterogeneous systems. Proceedings of the eleventh ACM Symposium on Operating systems principles, 52-62.

Shaw, M., Jones, A., Knueven, P., McDermott, J., Miller, P., & Notkin, D. (1980). Cheating Policy in a Computer Science Department. ACM SIGCSE Bulletin, 72-76.

Socha, D., Bailey, M. L., & Notkin, D. (1989). Voyeur: graphical views of parallel programs. Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging, 206-215.

Soffa, M. L., & Notkin, D. (1998). SIGPLAN and SIGSOFT joint efforts. Acm Sigplan Notices, 1-1.

Squillante, M. S., & Notkin, D. (1989). Integrating Heterogeneous Local Mail Systems. IEEE Software, 6(6), 59-67.

A description is given of the Heterogeneous Mail System. This environment is inexpensive to implement. It is also inexpensive to integrate a new mail system into the environment and to use its services effectively. Diverse classes of mail systems can be integrated into the environment. The system model is examined, with particular attention to the mail semantic managers, the system server, mail submission and delivery, accommodating new systems, security and authentication, and scalability. Experiences with an experimental prototype are reported.

Sullivan, K., & Notkin, D. (1992). Reconciling environment integration and software evolution. ACM Transactions on Software Engineering and Methodology (TOSEM), 1(3), 229-268.

Common software design approaches complicate both tool integration and software evolution when applied in the development of integrated environments. We illustrate this by tracing the evolution of three different designs for a simple integrated environment as representative changes are made to the requirements. We present an approach that eases integration and evolution by preserving tool independence in the face of integration. We design tool integration relationships as separate components called mediators, and we design tools to implicitly invoke mediators that integrate them. Mediators separate tools from each other, while implicit invocation allows tools to remain independent of mediators. To enable the use of our approach on a range of platforms, we provide a formalized model and requirements for implicit invocation mechanisms. We apply this model both to analyze existing mechanisms and in the design of a mechanism for C++.

Sullivan, K. J., Kalet, I. J., & Notkin, D. (1995). Mediators in a Radiation Treatment Planning Environment. Technical Report: CS-95-29

Sullivan, K. J., Kalet, I. J., & Notkin, D. (1996). Evaluating The Mediator Method: Prism as a Case Study. IEEE Transactions on Software Engineering, 22(8), 563-579.

A software engineer's confidence in the profitability of a novel design technique depends to a significant degree on previous demonstrations of its profitability in practice. Trials of proposed techniques are thus of considerable value in providing factual bases for evaluation. In this paper we present our experience with a previously presented design approach as a basis for evaluating its promise and problems. Specifically, we report on our use of the mediator method to reconcile tight behavioral integration with ease of development and evolution of Prism, a system for planning radiation treatments for cancer patients. Prism is now in routine clinical use in several major research hospitals. Our work supports two claims. In comparison to more common design techniques, the mediator approach eases the development and evolution of integrated systems; and the method can be learned and used profitably by practicing software engineers.

Sullivan, K. J., & Notkin, D. (1990). Reconciling environment integration and component independence. Proceedings of the fourth ACM SIGSOFT symposium on Software development environments, 22-33.

Sullivan, K. J., Notkin, D., Fuggetta, A., & Favaro, J. (1999). First workshop on economics-driven software engineering research (EDSER-1). Proceedings of the 21st international conference on Software engineering. 699-700.

Sullivan, K. J., Shaw, M., Boehm, B. W., Notkin, D., & Harrison, W. (2001). Third international workshop on economics-driven software engineering research. Proceedings of the 23rd International Conference on Software Engineering. 770.

van Dam, A., Dill, J., Dixon, D. F., & Notkin, D. (1976). Structured programming in assembly language. ACM SIGCSE Bulletin, 53-67.

VanHilst, M., & Notkin, D. (1996). Decoupling change from design. Proceedings of the 4th ACM SIGSOFT symposium on Foundations of software engineering, 58-69.

VanHilst, M., & Notkin, D. (1996). Using C++ Templates to Implement Role-Based Designs. Proceedings of the Second JSSST International Symposium on Object Technologies for Advanced Software, 22-37.

VanHilst, M., & Notkin, D. (1996). Using role components to implement collaboration-based designs. Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, 359-369.

Walker, R. J., Briand, L. C., Notkin, D., Seaman, C. B., & Tichy, W. F. (2003). Panel: empirical validation: what, why, when, and how. Proceedings of the 25th International Conference on Software Engineering, 721-722.

Xie, T., Marinov, D., & Notkin, D. (2004). Rostra: A Framework for Detecting Redundant Object-Oriented Unit Tests. Proceedings of the 19th IEEE international conference on Automated software engineering, 196-205.

Xie, T., Marinov, D., Schulte, W., & Notkin, D. (2005). Symstra: A framework for generating object-oriented unit tests using symbolic execution. Tools and Algorithms for the Construction and Analysis of Systems, 365-381.

Xie, T., & Notkin, D. (2004). Automatic extraction of object-oriented observer abstractions from unit-test executions. Formal Methods and Software Engineering, 290-305.

Xie, T., & Notkin, D. (2004). Checking Inside the Black Box: Regression Testing Based on Value Spectra Differences. Proceedings of the 20th IEEE International Conference on Software Maintenance, 28-37.

Xie, T., & Notkin, D. (2004). Mutually enhancing test generation and specification inference. Formal Approaches to Software Testing, 2931, 60-69.

Generating effective tests and inferring likely program specifications are both difficult and costly problems. We propose an approach in which we can mutually enhance the tests and specifications that are generated by iteratively applying each in a feedback loop. In particular, we infer likely specifications from the executions of existing tests and use these specifications to guide automatic test generation. Then the existing tests, as well as the new tests, are used to infer new specifications in the subsequent iteration. The iterative process continues until there is no new test that violates specifications inferred in the previous iteration. Inferred specifications can guide test generation to focus on particular program behavior, reducing the scope of analysis; and newly generated tests can improve the inferred specifications. During each iteration, the generated tests that violate inferred specifications are collected to be inspected. These violating tests are likely to have a high probability of exposing faults or exercising new program behavior. Our hypothesis is that such a feedback loop can mutually enhance test generation and specification inference.

Xie, T., & Notkin, D. (2005). Automatically Identifying Special and Common Unit Tests for Object-Oriented Programs. Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering, 277-287.

Xie, T., & Notkin, D. (2005). Checking inside the black box: Regression testing by comparing value spectra. IEEE Transactions on Software Engineering, 31(10), 869-883.

Comparing behaviors of program versions has become an important task in software maintenance and regression testing. Black-box program outputs have been used to characterize program behaviors and they are compared over program versions in traditional regression testing. Program spectra have recently been proposed to characterize a program's behavior inside the black box. Comparing program spectra of program versions offers insights into the internal behavioral differences between versions. In this paper, we present a new class of program spectra, value spectra, that enriches the existing program spectra family. We compare the value spectra of a program's old version and new version to detect internal behavioral deviations in the new version. We use a deviation-propagation call tree to present the deviation details. Based on the deviation-propagation call tree, we propose two heuristics to locate deviation roots, which are program locations that trigger the behavioral deviations. We also use path spectra ( previously proposed program spectra) to approximate the program states in value spectra. We then similarly compare path spectra to detect behavioral deviations and locate deviation roots in the new version. We have conducted an experiment on eight C programs to evaluate our spectra-comparison approach. The results show that both value-spectra-comparison and path-spectra-comparison approaches can effectively expose program behavioral differences between program versions even when their program outputs are the same, and our value-spectra-comparison approach reports deviation roots with high accuracy for most programs.

Xie, T., & Notkin, D. (2006). Tool-assisted unit-test generation and selection based on operational abstractions. Automated Software Engineering, 13(3), 345-371.

Unit testing, a common step in software development, presents a challenge. When produced manually, unit test suites are often insufficient to identify defects. The main alternative is to use one of a variety of automatic unit-test generation tools: these are able to produce and execute a large number of test inputs that extensively exercise the unit under test. However, without a priori specifications, programmers need to manually verify the outputs of these test executions, which is generally impractical. To reduce this cost, unit-test selection techniques may be used to help select a subset of automatically generated test inputs. Then programmers can verify their outputs, equip them with test oracles, and put them into the existing test suite. In this paper, we present the operational violation approach for unit-test generation and selection, a black-box approach without requiring a priori specifications. The approach dynamically generates operational abstractions from executions of the existing unit test suite. These operational abstractions guide test generation tools to generate tests to violate them. The approach selects those generated tests violating operational abstractions for inspection. These selected tests exercise some new behavior that has not been exercised by the existing tests. We implemented this approach by integrating the use of Daikon (a dynamic invariant detection tool) and Parasoft Jtest (a commercial Java unit testing tool), and conducted several experiments to assess the approach.

Xie, T., Zhao, J., Marinov, D., & Notkin, D. (2006). Detecting Redundant Unit Tests for AspectJ Programs. Proceedings of the 17th International Symposium on Software Reliability Engineering, 179-190.

Zahorjan, J., Lazowska, E. D., Levy, H. M., Notkin, D., & Sanislo, J. (1987). Accommodating Heterogeneity. Proceedings of the International Workshop on Experiences with Distributed Systems, 89-103.