Debugging distributed systems: Challenges and options for validation and debugging

Download: PDF, online deployment (try it!), ShiVector and ShiViz source code, video demo (YouTube).

“Debugging distributed systems: Challenges and options for validation and debugging” by Ivan Beschastnikh, Patty Wang, Yuriy Brun, and Michael D. Ernst. Communications of the ACM, vol. 59, no. 8, Aug. 2016, pp. 32-37.
A previous version appeared as ACM Queue, vol. 14, no. 2, March/April 2016, pp. 91-110.

Abstract

Distributed systems pose unique challenges for software developers. Reasoning about concurrent activities of system nodes and even understanding the system's communication topology can be difficult. A standard approach to gaining insight into system activity is to analyze system logs. Unfortunately, this can be a tedious and complex process. This article looks at several key features and debugging challenges that differentiate distributed systems from other kinds of software. The article presents several promising tools and ongoing research to help resolve these challenges.

Download: PDF, online deployment (try it!), ShiVector and ShiViz source code, video demo (YouTube).

BibTeX entry:

@article{BeschastnikhWBE2016,
   author = {Ivan Beschastnikh and Patty Wang and Yuriy Brun and Michael
	D. Ernst},
   title = {Debugging distributed systems: Challenges and options for
	validation and debugging},
   journal = {Communications of the ACM},
   volume = {59},
   number = {8},
   pages = {32--37},
   month = aug,
   year = {2016}
}

(This webpage was created with bibtex2web.)

Back to Michael Ernst's publications.