1	cs 703: Advanced Topics in Programming Languages and Compilers Programming Languages and Compilers in Systems and Architectures Lecture 12: Presentations and Project Ideas I Ras Bodik University of Wisconsin http://www.cs.wisc.edu/~bodik
2	Announcements Today: paper presentations, (some) course project ideas. Wednesday, 2/27: course project ideas finish discussion of TSM and SLE. Friday, 3/1: no class, but you’ll be forming project teams and thinking hard about the course-project topic. Monday, 3/4: no class, but you’ll email me your preliminary project idea(s). one page (no larger than the mini-review). One per team. Due: Monday, 3/4, 7pm. I’ll meet with each team on Thursday, 3/7, afternoon. Send me email when you are available for 20-30 minutes.
3	Presentations Classroom presentation Goal: learn to distill a research paper into a “teaching” talk. presentation = giving a lecture + guiding a discussion. in groups of two students (maybe three, for difficult topics) Papers to read: The class reads one paper (and writes the mini-review), the presenters read at least three or four papers (you’ll read as many as necessary to feel comfortable). Presentation format: You’ll need about 20-30 slides; practice!!! Prepare questions for audience; expect their questions. To book a topic, send email to class mailing list I’ll post the schedule by Wednesday.
4	Presentations (cont.) Preparing the talk: I’ll will meet with each group at least three times: Agree on goals: What are the papers about? What do you want to teach the students? Any other relevant papers? Which papers should the class read and review? Draft of slides: you should know what the message of each slide should be, but don’t spend time (yet) drawing complicated figures, etc. Final slides: look over the final slides, go over questions for the audience. Web contains plenty of good advice on how to give talks: E.g., http://www.cs.wisc.edu/~markhill/conference-talk.html
5	Projects Course project: Goal: be exposed to a (systematic) research experience, turn an idea into a research plan, give talk, write a paper. What you’ll do: propose and test an original idea, or verify a published result. Tailored to your interests and skills: compilers, architectures, OS prior knowledge of tools: a compiler, simulator, etc Work in pairs larger projects: 3 students Ideal team: compiler “expert” + arch/OS “expert” (this course tries to foster crosspollination!) Grading Ideas + implementation + paper; results + effort negative result is a good result: OK to take risks, think big!
6	Projects Timeline find idea, create a research plan, select your infrastructure (default: Jikes RVM) written project proposal: before Spring break will include review of related work midway interviews, plus Five-minute madness (5 minute progress talks): + 3 weeks submit paper, write reviews: + 3 weeks final paper, class presentation: + 2 weeks
7	Topic 1: making C safe Problem: many hard-to-find bugs are due to null pointers, out-of-bounds accesses, pointer arithmetic. Solution: make C type-safe (like Java): insert run-time checks. but you pay for safety with huge run-time overhead: in time: tun-time checks are costly in space: must enrich pointers with extra run-time info to reduce overhead in … time, remove run-time checks that never fail. … space, avoid extra information for pointers that are used safely. Papers: Safe C, Austin et al CCured, Necula et al ABCD, Bodik et al
8	Topic 2: run-ahead disk prefetching Problem: some applications stall often on disk reads (e.g., linker). common cure: prefetch disk blocks (or files) needed soon. problem: how to determine what to prefetch, and when? Solution: when the application stalls on a (synchronous) disk read, don’t give up. instead of stalling, let the application continue, turning future reads into prefetch hints (using binary instrumentation). note that the application runs with potentially wrong data, because the (stalled) read has not finished. so, how do you ensure that the run-ahead does not corrupt state? Papers: Automatic I/O Hint Generation through Speculative Execution, Chang and Gibson (an “OS” paper) Dynamically Allocating Processor Resources Between Nearby and Distant ILP, Balasubramonian, et al (hw paper)
9	Topic 3: reverse-execution debugging Context: When debugging, you would really like to step the program backwards, from the fault point to the point of the bug. Problem: Debuggers do not support back-stepping mainly because they cannot undo any instruction: i.e., recover memory & register state prior the statement. Solution: Take regular checkpoints of program state. Now we can roll back to any checkpoint. But cost of checkpoints is overhead. Key problem: how to make checkpoints sparse, yet provide the illusion that we can undo statements any granularity (i.e., back step across any single instruction)? Papers: Algorithms for bidirectional debugging, Boothe. Software Instruction Counter, Mellor and Crummey. Optimal incremental rexecution, Netzer and Weaver.
10	Topic 4: watermarking, obfuscation Context: Software often distributed in a form isomorphic to its source code (e.g., for platform independence). Problem: How to prevent reengineering, which exposes the source code structure of the code, allowing malicious attacks? Solution: A defense against reverse engineering is obfuscation, a process that renders software unintelligible but still functional. A defense against software piracy is watermarking, a process that makes it possible to determine the origin of software. A defense against tampering is tamper-proofing, so that unauthorized modifications to software (for example to remove a watermark) will result in non-functional code. Papers: Watermarking, tamper-proofing, and obfuscation, Collberg and Thomborson. Manufacturing, Collberg and Thomborson. Links (obfuscators and deobfuscators, …), more links Note: Christian Collberg may be in town April 22.
11	Topic 5: data race detection Context: Multithreaded programming is error-prone: synchronization bugs are easy to make but hard to find (even in Java). Problem: How to build debuggers that will help find the race condition? Can we do it without exhaustively searching for an input and tread interleaving that will manifest the race? Solution: Determine if each access to a shared variable/object is guarded by the same locks. static detection: many false alarms. dynamic detection: high-overhead. a hybrid may be the right solution. Papers: Eraser, (dynamic detection) Savage et al. Efficient and Precise Datarace Detection (hybrid) [coming soon] …, Choi et al.
12	Topic 6: checking properties of sw Context: Modern programming protect against some frequent bugs, using type checking. For example, Tree* cannot be assigned a float* value. Problem: Can we extend compilers (or other static tools) to check other useful properties, specified by the user, for example “After a lock is acquired, it is eventually released.” “Each lock that is released was previously acquired.” “Before you call listen() on a socket, the socket must be open-ed and bind-ed.” Solution: User expresses the “useful” property as a state machine, the checker “plays” the state machine along all possible execution paths of the tested program. if the state machine gets to an illegal state, a “bug” is reported, together with the path that caused the “bug.” Papers: SLAM, Ball and Rajamani Metal, Engler et al
13	Topic 7: inferring sw properties Context: Very soon, software verifiers will mature enough to be able to check powerful properties on large programs (see previous topic). Problem: But where will the properties of correctness come from? These powerful tools will be of little use without enough properties to be checked! Solution: The huge software base contains invaluable information (that’s available nowhere else): When programmers resolve an problem with an API (e.g., “what arguments to pass to a library procedure?”), they often don’t document the solution. Instead, the solution is encoded only in the code. can observe how multiple programmers use an API, and ”take a vote.” So, use “data mining” to infer the properties from code using the maxim that “common usage is correct usage.” Papers: Mining Specifications, Ammons et al Bugs as deviant behavior, Engler et al
14	Topic 8: safely extending OSs Context: Some applications may require from the OS kernel some new or specialized functionality, for performance or security. For example, a new file buffering scheme for Web server apps. Problem: How to allow the applications extending the OS while ensuring safety (i.e., the kernel will not crash, or be hijacked by the extension)? Solution: Type safety: if types match (checked by kernel or compiler) then no crash will happen. Papers: SPIN, Bershad et al Exokernel, Kaashoek et al Safe Kernel Extensions Without Run-Time Checking, Necula and Lee Disco, Bugnion et al
15	Topic 9: memory parallelism and prefetching Context: The relative access time to memory is growing exponentially over time. Problem: How to design data structures or prefetchers that will hide the memory latency? What is a good way to evaluate what is a good data structure, or prefetcher? Solution: Understanding: a notion of memory level parallelism: a prefetcher is good only if it can generate “many” udeful outstanding prefetches. Prefetching: jump pointers, streams Papers: Prefetching recursive data structures, Luk et al MLP yes! ILP no!, Glew Streaming based prefetching, [coming soon] Chilimbi et al