Building Data Analysis Pipelines (CSEP 590/Fall 2024)


Logistics

Lecture

Mon 6:30pm--9:20pm

Room

CSE2 G10

Instructor

René Just

Teaching Assistants

Hannah Potter Suh Young Choi

Communication


Syllabus

Validity, soundness, and reproducibility of data analyses and empirical results are crucial for any field that seeks empirical insights or relies on data-driven decision making.

Course description

This course focuses on the theory and practice of building sound and actionable data analysis pipelines. Specifically, it covers the following three aspects: (1) Validity of analysis designs. (2) Principles and methods for quantitative data analysis. (3) Building small- and large-scale data analysis pipelines. Topics include properly designing data analyses, choosing appropriate statistical methods and models, contextualizing analysis results, and reasoning about validity (in terms of internal, external, and construct validity). In addition to lectures and paper discussions, this course provides a hands-on experience for building data analysis pipelines -- from data collection, over data wrangling, to data analysis and visualization.

Course format

The class meets once a week in-person. Most class sessions will be divided into two parts. The first part begins with a discussion and presentation of theoretical concepts, after (or during) which the floor will be open to questions and discussion. We expect active participation in the discussions. The second part provides a hands-on learning experience in the form of small-group exercises. Materials are made available online, and lectures are enhanced with assigned readings, homeworks, small-group activities, and in-class exercises.

Grading

Grades will be based on homeworks, two-part in-class exercises, and participation:

Late policy

Assignments must be submitted on Canvas by the due date and time. Unless otherwise noted, all times are given in Pacific Time. The submission site remains open for 48 hours after the deadline. Assignments submitted within 24 hours after the deadline will incur a 10% penalty; assignments submitted within 24 to 48 hours will incur a 20% penalty. Assignments will not be accepted after the submission site is closed.

You can find the general course policies here.


Course materials