The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

Thomas Anderson.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors.  IEEE Transactions on Parallel and Distributed Systems, vol. 1, num. 1, January 1990, pages 6 - 16.  An earlier version appeared in Proc. 1989 International Conference on Parallel Processing (ICPP), August 1989.


Most shared-memory architectures provide hardware support for making mutually exclusive accesses to shared data structures.  This support usually consists of instructions that atomically read and then write a single memory location.  These atomic instructions are used to manipulate locks; when a processor is accessing a shared data structure, its lock is busy, and other processors needing access must wait.  For small critical sections, spinning (or "busy waiting") for a lock to be released is more efficient than relinquishing the processor to do other work.  Unfortunately, spin-waiting can slow other processors by consuming communication bandwidth.  This paper examines the question: are there efficient algorithms for software spin-waiting given hardware support for atomic instructions, or are more complex kinds of hardware support needed for performance?  

We consider the performance of a number of software spin-waiting algorithms. Arbitration for control of a lock is in many ways similar to arbitration for control of a network connecting a distributed system. We apply several of the static and dynamic arbitration methods originally developed for networks to spin locks. We also propose a novel method for explicitly queueing spinning processors in software by assigning each a unique number when it arrives at the lock. Control of the lock can then be passed to the next processor in line with minimal effect on other processors.  Finally, we examine the performance of several hardware solutions that reduce the cost of spin-waiting.