Notes
Slide Show
Outline
1
cs 703: Advanced Topics in Programming Languages and Compilers

 Programming Languages and Compilers
in Systems and Architectures

Lecture 12: Presentations and Project Ideas I
  • Ras Bodik
  • University of Wisconsin
  • http://www.cs.wisc.edu/~bodik
2
Announcements
  • Today:
    • paper presentations,
    • (some) course project ideas.
  • Wednesday, 2/27:
    • course project ideas
    • finish discussion of TSM and SLE.
  • Friday, 3/1:
    • no class, but you’ll be forming project teams
    • and thinking hard about the course-project topic.
  • Monday, 3/4:
    • no class, but you’ll email me your preliminary project idea(s).
    • one page (no larger than the mini-review). One per team.
    • Due: Monday, 3/4, 7pm.
    • I’ll meet with each team on Thursday, 3/7, afternoon.
      Send me email when you are available for 20-30 minutes.
3
Presentations
  • Classroom presentation
    • Goal: learn to distill a research paper into a “teaching” talk.
    • presentation = giving a lecture + guiding a discussion.
    • in groups of two students (maybe three, for difficult topics)


  • Papers to read:
    • The class reads one paper (and writes the mini-review),
    • the presenters read at least three or four papers
      (you’ll read as many as necessary to feel comfortable).


  • Presentation format:
    • You’ll need about 20-30 slides; practice!!!
    • Prepare questions for audience; expect their questions.


  • To book a topic,
    • send email to class mailing list
    • I’ll post the schedule by Wednesday.
4
Presentations (cont.)
  • Preparing the talk: I’ll will meet with each group at least three times:


    • Agree on goals:
      • What are the papers about?
      • What do you want to teach the students?
      • Any other relevant papers?
      • Which papers should the class read and review?


    • Draft of slides:
      • you should know what the message of each slide should be,
      • but don’t spend time (yet) drawing complicated figures, etc.
    • Final slides:
      • look over the final slides,
      • go over questions for the audience.
  • Web contains plenty of good advice on how to give talks:
    • E.g., http://www.cs.wisc.edu/~markhill/conference-talk.html
5
Projects
  • Course project:
    • Goal: be exposed to a (systematic) research experience,
      • turn an idea into a research plan,
      • give talk, write a paper.
    • What you’ll do:
      • propose and test an original idea, or verify a published result.
  • Tailored to your interests and skills:
    • compilers, architectures, OS
    • prior knowledge of tools: a compiler, simulator, etc
  • Work in pairs
    • larger projects: 3 students
    • Ideal team: compiler “expert” + arch/OS “expert”
      (this course tries to foster crosspollination!)
  • Grading
    • Ideas + implementation + paper; results + effort
    • negative result is a good result: OK to take risks,
    • think big!
6
Projects
  • Timeline
    • find idea, create a research plan,
      select your infrastructure (default: Jikes RVM)


    • written project proposal: before Spring break
      • will include review of related work


    • midway interviews, plus Five-minute madness  
      (5 minute progress talks):  + 3 weeks


    • submit paper, write reviews: + 3 weeks


    • final paper, class presentation: + 2 weeks
7
Topic 1: making C safe
  • Problem:
    • many hard-to-find bugs are due to null pointers, out-of-bounds accesses, pointer arithmetic.
  • Solution:
    • make C type-safe (like Java): insert run-time checks.
    • but you pay for safety with huge run-time overhead:
      • in time: tun-time checks are costly
      • in space: must enrich pointers with extra run-time info
    • to reduce overhead in
      • … time, remove run-time checks that never fail.
      • … space, avoid extra information for pointers that are used safely.
  • Papers:
    • Safe C, Austin et al
    • CCured, Necula et al
    • ABCD, Bodik et al

8
Topic 2: run-ahead disk prefetching
  • Problem:
    • some applications stall often on disk reads (e.g., linker).
    • common cure: prefetch disk blocks (or files) needed soon.
    • problem: how to determine what to prefetch, and when?
  • Solution:
    • when the application stalls on a (synchronous) disk read,
      don’t give up.
    • instead of stalling, let the application continue, turning future reads into prefetch hints (using binary instrumentation).
    • note that the application runs with potentially wrong data, because the (stalled) read has not finished.
    • so, how do you ensure that the run-ahead does not corrupt state?
  • Papers:
    • Automatic I/O Hint Generation through Speculative Execution, Chang and Gibson (an “OS” paper)
    • Dynamically Allocating Processor Resources Between Nearby and Distant ILP, Balasubramonian, et al (hw paper)
9
Topic 3: reverse-execution debugging
  • Context:
    • When debugging, you would really like to step the program backwards, from the fault point to the point of the bug.
  • Problem:
    • Debuggers do not support back-stepping mainly because they cannot undo any instruction: i.e., recover memory & register state prior the statement.
  • Solution:
    • Take regular checkpoints of program state.
    • Now we can roll back to any checkpoint.
    • But cost of checkpoints is overhead.
    • Key problem: how to make checkpoints sparse, yet provide the illusion that we can undo statements any granularity (i.e., back step across any single instruction)?
  • Papers:
    • Algorithms for bidirectional debugging, Boothe.
    • Software Instruction Counter, Mellor and Crummey.
    • Optimal incremental rexecution, Netzer and Weaver.
10
Topic 4: watermarking, obfuscation
  • Context:
    • Software often distributed in a form isomorphic to its source code (e.g., for platform independence).
  • Problem:
    • How to prevent reengineering, which exposes the source code structure of the code, allowing malicious attacks?
  • Solution:
    • A defense against reverse engineering is obfuscation, a process that renders software unintelligible but still functional.
    • A defense against software piracy is watermarking, a process that makes it possible to determine the origin of software.
    • A defense against tampering is tamper-proofing, so that unauthorized modifications to software (for example to remove a watermark) will result in non-functional code.
  • Papers:
    • Watermarking, tamper-proofing, and obfuscation, Collberg and Thomborson.
    • Manufacturing, Collberg and Thomborson.
    • Links (obfuscators and deobfuscators, …), more links
    • Note: Christian Collberg may be in town April 22.
11
Topic 5: data race detection
  • Context:
    • Multithreaded programming is error-prone:
      synchronization bugs are easy to make but hard to find (even in Java).
  • Problem:
    • How to build debuggers that will help find the race condition?
    • Can we do it without exhaustively searching for an input and tread interleaving that will manifest the race?
  • Solution:
    • Determine if each access to a shared variable/object is guarded by the same locks.
    • static detection: many false alarms.
    • dynamic detection: high-overhead.
    • a hybrid may be the right solution.
  • Papers:
    • Eraser, (dynamic detection) Savage et al.
    • Efficient and Precise Datarace Detection (hybrid) [coming soon] …, Choi et al.
12
Topic 6: checking properties of sw
  • Context:
    • Modern programming protect against some frequent bugs, using type checking.
    • For example, Tree* cannot be assigned a float* value.
  • Problem:
    • Can we extend compilers (or other static tools) to check other useful properties, specified by the user, for example
      • “After a lock is acquired, it is eventually released.”
      • “Each lock that is released was previously acquired.”
      • “Before you call listen() on a socket, the socket must
        be open-ed and bind-ed.”
  • Solution:
    • User expresses the “useful” property as a state machine,
    • the checker “plays” the state machine along all possible execution paths of the tested program.
    • if the state machine gets to an illegal state, a “bug” is reported, together with the path that caused the “bug.”
  • Papers:
    • SLAM, Ball and Rajamani
    • Metal, Engler et al
13
Topic 7: inferring sw properties
  • Context:
    • Very soon, software verifiers will mature enough to be able to check powerful properties on large programs (see previous topic).
  • Problem:
    • But where will the properties of correctness come from?
    • These powerful tools will be of little use without enough properties to be checked!
  • Solution:
    • The huge software base contains invaluable information (that’s available nowhere else):
      • When programmers resolve an problem with an API (e.g., “what arguments to pass to a library procedure?”), they often don’t document the solution.  Instead, the solution is encoded only in the code.
      • can observe how multiple programmers use an API, and ”take a vote.”

    • So, use “data mining” to infer the properties from code
      • using the maxim that “common usage is correct usage.”


  • Papers:
    • Mining Specifications, Ammons et al
    • Bugs as deviant behavior, Engler et al
14
Topic 8: safely extending OSs
  • Context:
    • Some applications may require from the OS kernel some new or specialized functionality, for performance or security.
    • For example, a new file buffering scheme for Web server apps.
  • Problem:
    • How to allow the applications extending the OS while ensuring safety (i.e., the kernel will not crash, or be hijacked by the extension)?
  • Solution:
    • Type safety: if types match (checked by kernel or compiler) then no crash will happen.
  • Papers:
    • SPIN, Bershad et al
    • Exokernel, Kaashoek et al
    • Safe Kernel Extensions Without Run-Time Checking,  Necula and Lee
    • Disco, Bugnion et al
15
Topic 9: memory parallelism and prefetching
  • Context:
    • The relative access time to memory is growing exponentially over time.
  • Problem:
    • How to design data structures or prefetchers that will hide the memory latency?
    • What is a good way to evaluate what is a good data structure, or prefetcher?
  • Solution:
    • Understanding: a notion of memory level parallelism:
      • a prefetcher is good only if it can generate “many” udeful outstanding prefetches.
    • Prefetching:
      • jump pointers, streams
  • Papers:
    • Prefetching recursive data structures, Luk et al
    • MLP yes! ILP no!, Glew
    • Streaming based prefetching, [coming soon] Chilimbi et al