MemSpy: Analyzing Memory System Bottlenecks in Programs
Margaret
Martonosi, Anoop Gupta, and Thomas Anderson.
MemSpy: Analyzing Memory System Bottlenecks in Programs. Proc.
1992 ACM SIGMETRICS Conference, May 1992, pages 1 - 12.
Abstract:
To cope with the increasing difference between processor and main memory
speeds, modern computer systems use deep memory hierarchies. In the presence of
such hierarchies, the performance attained by an application is largely
determined by its memory reference behavior—if most references hit in the
cache, the performance is significantly higher than if most references have to
go to main memory. Frequently, it is possible for the programmer to restructure
the data or code to achieve better memory reference behavior. Unfortunately,
most existing performance debugging tools do not assist the programmer in this
component of the overall performance tuning task.
This paper describes MemSpy, a prototype tool that helps programmers identify
and fix memory bottlenecks in both sequential and parallel programs. A key
aspect of MemSpy is that it introduces the notion of data oriented, in addition
to code oriented, performance tuning. Thus, for both source level code objects
and data objects, MemSpy provides information such as cache miss rates, causes
of cache misses, and in multiprocessors, information on cache invalidations and
local versus remote memory misses. MemSpy also introduces a concise matrix
presentation to allow programmers to view both code and data oriented statistics
at the same time. This paper presents design and implementation issues for
MemSpy, and gives a detailed case study using MemSpy to tune a parallel sparse
matrix application. It shows how MemSpy helps pinpoint memory system
bottlenecks, such as poor spatial locality and interference among data
structures, and suggests paths for improvement.
PS