Course Project

Purpose: The course project is designed to serve several purposes:
  • introduction to the technical literature in the area,
  • application of ideas and techniques presented in class in a more extensive and open-ended environment, and
  • do research in some field of distributed computing and advance the state of art in that field.
  • Your topic should concern distributed algorithms. The project could fall anywhere in the spectrum between theory and practice; it could be a theoretical study of distributed algorithms or be an implementation-related topic or have some balanced combination of both theory and practice.

    The mechanics of the project will be as follows:

  • Develop a project idea and submit a project proposal by 10/31.
  • Identify two intermediate checkpoints and submit short progress reports on 11/10 and 11/20.
  • Final report and results will be due on 12/13.
  • Project Proposal: The project proposal should be a document that should include the following items:

  • Problem description: what is the goal of the project? why is it important?
  • Approach: relate to previous approaches. why proposed approach is new/better?
  • Methodology, milestones, deliverables: specific intermediate steps, resources needed (access to machines, software, etc.), identify milestones for the checkpoints.
  • Project Report: The end-of-semester report should be structured as a paper that is fully self-sufficient in terms of describing the problem, providing motivation, surveying related work, describing the approach and what got accomplished, and presenting the results.

    Project Ideas: Here are some starting points to help you formulate projects.

    1) Distributed Hash Tables: There is a nice paper by Manku on the use of randomization in DHTs. Start with this paper and address any of the open questions listed at the end.

    2) Prefix search in peer-to-peer systems: This paper addresses the problem of supporting prefix search in p2p systems. It is "plug-and-play," which usually means that the underlying DHT might not be used as efficiently as it could be if the abstraction barriers were broken. Try to improve on their approach or suggest a different way of building DHTs that support prefix searches (or other complicated searches).

    3) Evaluation of structured p2p networks: Most papers that we have covered in this course use latency as the metric for evaluating p2p systems. Suppose if you were to instead use bandwidth as the performance metric would the results be different? In particular, would structured networks (like Tapestry and CAN) match the performance obtained on unstructured networks?

    4) Overlay multicast: One of the advantages of overlay networks is that the intelligence could be pushed to the edge of the network and complex protocols such as multicast could be implemented on the edge nodes (instead of having to be supported on the routers). There are a number of overlay multicast schemes that have been proposed. Recently, there have been proposals for performing multicast of a single stream through multiple different overlay trees each of which is used for conveying only a fraction of the data stream. (The purpose is to increase the multicast bandwidth.) However, many of these schemes are heuristic-based and ignore the vast amount of theoretical results on packing spanning trees in graphs. Can these theoretical results be employed in a practical multicast mechanism to improve the performance?

    5) Distributed algorithmic mechanism design (DAMD): This recent field is at the intersection of economics, algorithms, and distributed systems. A nice paper on applying DAMD to inter-domain routing is available here. Consider applying DAMD to other distributed systems protocols.

    6) Scaling properties of the internet: A recent paper claims that the Internet will scale poorly due to congestion. However, these results are based on the assumption that the Internet graph satisfies power-law distributions. Consider other ways for modeling the internet (some of which is described here) and see whether the claims are still valid.

    7) Distributed garbage collection: There are some interesting algorithm for performing garbage collecting in distributed systems. Take a look at the paper by Shapiro and the paper by Fessant. One could also view the problem of dist-gc as finding a path from a node to the root object. Can one employ ad-hoc routing algorithms (such as TORA) to perform distributed GC?

    8) Ad-hoc routing algorithms: We considered a number of ad-hoc routing algorithms in class. One of the issues with ad-hoc routing is to ensure that paths are loop-free. A recent paper describes some of the previous approaches and claims to provide a new approach that minimizes the amount of protocol overhead. Empirically and qualitatively evaluate the new protocol with previous algorithms such as TORA.

    9) New metrics for routing in wireless networks: A recent paper criticizes the use of hop counts as the distance metric for routing in wireless metrics. Instead, the paper proposes the use of a metric that incorporates loss-rate and interference metrics to identify good paths in DSR and DSDV ad-hoc routing protocols. Extend their work by studying the use of such metrics for other ad-hoc routing protocols.

    10) Correlation between bandwidth and lossrates: There are several existing tools for measuring the bandwidth of transmission links. (See the following page for a comprehensive listing of such tools.) One of the drawbacks of such tools is that they utilize a large number of probing packets to determine the bandwidth. Instead, consider the following approach to estimating bandwidth in a less intrusive manner: Develop a tool for estimating loss-rate of a transmission link in a reasonably non-intrusive manner, perform an extensive set of loss-rate and bandwidth measurements, and then develop a model for correlating bandwidth with loss-rate. One could then use the loss-rate measurement tool to predict bandwidth of a link.

    Watch this space for more topics.

    Note on finding papers:

    Most of the papers should also be available at citeseer. You could trace back to earlier papers using the citation links on the website. You could also take a look at later papers that have cited a particular paper to determine what has been done recently in that area of research. The ACM digital library and IEEE Explorer are two other useful sites for finding papers.