big_data_cloud_syllabus.docx

Scalable and Data-Intensive Computing in the Cloud

Instructor: Bill Howe, Phd

Email: billhowe@cs.washington.edu

Phone: (206) 616-5828

Office: Allen 450

Period: Mar 26 - Jun 4, 2012

Class Meeting Time: Mondays, 6:00-9:00 p.m.

Class Meeting Place: ST 359

Web: http://www.cs.washington.edu/homes/billhowe/bigdatacloud

Office Hours: by appointment

Moodle Site: http://moodle.extn.washington.edu/course/view.php?id=1731

Course Overview: We will explore the technology landscape at the intersection of “big data” and cloud computing. Articles in the NYTimes, Science, Nature, the Economist, a variety of reports from the federal government and innumerable blog posts have made the case that technology supporting the deep analysis of massive datasets – big data – is now a critical enabling technology for business and research in all fields. Cloud computing has been a catalyst for this trend, democratizing access to the infrastructure required to build big data applications. Scalable data platforms are increasingly deployed in the cloud or offered directly by cloud providers.

The course will be technology driven. We will consider relational databases (specifically in the context of cloud computing), the Hadoop ecosystem and its variants, other NoSQL platforms emphasizing low-latency access, and more. We will work directly with a selected set of these platforms, compare and contrast their relative strengths and weaknesses, and characterize the problems they are designed to solve.

Learning Objectives: By the end of this course, students will be able to:

Develop a basic test application in each of several selected technologies
Evaluate application requirements and recommend an appropriate data-intensive computing technology platform or platforms to satisfy them
evaluate an existing data-intensive computing application and identify opportunities for improvement
Explain the relative strengths and weaknesses of selected major data-intensive computing and data management platforms in use today.
Explain the major technology trends that are influencing the industry.

Course Structure: Each class will consist of a 1-hour lecture, a 1-hour case study and demonstration of a specific system, and 1-hour of discussion and hands-on work. Each week, we will consider a category of scalable data platform through a lecture and consider a representative example from this category in detail through a demonstration. Students will be asked to complete a hands-on homework assignment based on the material presented in class and, in some cases, come prepared to discuss assigned reading. The reading assignments will generally be research papers from relevant computer science conferences.

Student Assessment: Assignments: 80%, Participation: 20%. All assignments will be due 1 week later by the start of class. Participation will be a combination of attendance and discussion involvement; in class and online involvement will both contribute. Assignments will typically not be graded in terms of correct/incorrect answers, but students will be expected to demonstrate effort and insight. In this course, discussion between students is not discouraged; the goal is to learn as much as possible in a short time, and discussion is a very efficient way to do this. Some assignments may be completed in groups, depending on students’ experience level. In these cases, a portion of the grade will be based on peer review by one’s group members.

Textbook: None. All materials will be on the web. We will use a combination of slides, documentation for the selected systems covered in class, some custom material, and some relevant research papers.

Prerequisite: The assignments will involve example-oriented programming assignments in various languages, possibly including Java, C#, and Python. You will NOT be expected to be proficient in any of these languages, but you will be expected to “think computationally.” Specifically, you will work through examples, answer questions about the code, and generally be “brave” with respect to learning new technology – read the documentation, ask questions, try experiments, make some educated guesses, etc.