Operating System Support for High-Performance Shared-Memory Multiprocessing

A primary motivation behind building multiprocessors is to cost-effectively improve system performance, but in practice it has proven difficult to obtain good performance from parallel applications. This project has been to devise and prototype solutions to several problems that have limited multiprocessor system performance. A common theme in each of these solutions is that a multiprocessor operating system should be different (although it need not be more complicated) than a uniprocessor operating system that happens to run on a multiprocessor. Specifically, the research has concerned:

Efficient threads.
Threads are the vehicle for concurrency in many approaches to parallel programming. I have shown that thread management overhead can be reduced to within an order of magnitude of the cost of a procedure call. The goal is to make the expression and control of parallelism sufficiently cheap that the ``natural parallel decomposition'' of an application can be exploited by the programmer or compiler with acceptable overhead, even in the case where this decomposition is relatively fine-grained. With high performance threads, however, small differences in thread management can have a significant performance impact for applications with fine-grained parallelism, often posing a tradeoff between throughput and latency. Per-processor data structures can be used to improve throughput, and in some circumstances to avoid locking, improving latency as well.

Efficient synchronization.
Efficient thread management is impossible without efficient low-level synchronization primitives. In the work on thread management we discovered that, contrary to intuition, the obvious methods of spin-waiting for busy critical sections have poor performance. Instead, slightly more sophisticated methods, such as Ethernet-style backoff or explicit software queueing, are needed to avoid having busy processors slowed because of the activity of spin-waiting processors.

Multiprocessor operating system structure.
Threads can be supported either by the operating system kernel or by user-level library code in each application's address space, but neither approach has been fully satisfactory. The performance of kernel-level threads, although at least an order of magnitude better than that of traditional UNIX-like processes, has been at least an order of magnitude worse than that of user-level threads. User-level threads, on the other hand, have suffered from a lack of system integration, leading to poor performance or even incorrect behavior in the presence of ``real world'' factors such as multiprogramming, I/O, and page faults. Thus the parallel programmer has been faced with a difficult dilemma: employ kernel-level threads, which ``work right'' but perform poorly, or employ user-level threads, which typically perform well but sometimes are seriously deficient. We have designed and prototyped a new kernel interface, called Scheduler Activations, and user-level thread package that together provide the same functionality as kernel-level threads without compromising the performance and flexibility advantages of managing parallelism at the user level within each application address space.

Selected Publications

T. Anderson, B. Bershad, E. Lazowska and H. Levy. ``Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism.'' ACM Transactions on Computer Systems 10, 1 (February 1992), pp. 53--79. Selected as Award Paper in Proc. Thirteenth ACM Symposium on Operating Systems Principles (October 1991). Also appeared as University of Washington Technical Report 90-04-02 (April 1990, revised August 1991).

T. Anderson. ``Operating System Support for High Performance Multiprocessing.'' Ph.D. Thesis, University of Washington. University of Washington Technical Report 91-08-10 (August 1991).

B. Bershad, T. Anderson, E. Lazowska and H. Levy. ``User-Level Interprocess Communication for Shared-Memory Multiprocessors.'' ACM Transactions on Computer Systems 9, 2 (May 1991), pp. 175--198. Also appeared as University of Washington Technical Report 90-05-07 (May 1990).

T. Anderson and E. Lazowska. ``Quartz: A Tool for Tuning Parallel Program Performance.'' Proc. 1990 ACM SIGMETRICS Conference on the Measurement and Modeling of Computer Systems (May 1990), pp. 115--125. Also appeared as University of Washington Technical Report 89-09-05 (September 1989).

T. Anderson. ``The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors.'' IEEE Transactions on Parallel and Distributed Systems 1, 1 (January 1990 ), pp. 6--16. An earlier version appeared in Proc. 1989 International Conference on Parallel Processing (August 1989), and as University of Washington Technical Report 89-04-03 (April 1989).

T. Anderson, E. Lazowska, and H. Levy. ``The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors.'' IEEE Transactions on Computers 38, 12 (December 1989), pp. 1631--1644. Selected as Award Paper in Proc. 1989 ACM SIGMETRICS and Performance '89 Conference on the Measurement and Modeling of Computer Systems (May 1989). Also appeared as University of Washington Technical Report 88-09-04 (September 1988).