CSE 599I - Spring 2017

Accelerated Computing - Programming GPUs

EEB 037, Mon/Wed 3:00 - 4:20
Instructor: Tanner Schmidt
Office Hours: Mon 4:30 - 6:00, CSE 674

Course Description

This course is an introduction to accelerated computing using graphics processing units (GPUs). We will be focussing on CUDA programming, but the concepts taught will apply to other GPU frameworks as well. The course will start by covering CUDA syntax extensions and the CUDA runtime API, then move on to more advanced topics such as bandwidth optimization, memory access performance, and floating point considerations. We will learn about common parallel computing patterns such as scans and reductions, and study use cases for GPU acceleration such as matrix multiplication and convolution.


As CUDA is an extension of the C language, students taking this course should be familiar with C programming.

Prior knowledge of computer architecture concepts such as locality of reference will be useful but not required.


Grades for this course will be based on a series of 3-5 programming assignments designed to allow students to apply GPU programming skills taught in the lectures.

Textbook (Optional)

Programming Massively Parallel Processors, Third Edition: A Hands-on Approach
David B. Kirk and Wen-mei W. Hwu.

I can provide students with a code for a 30% discount on the textbook from Elsevier.

Computing Resources

For the programming assignments, students will need access to a computer with a CUDA-compatible GPU. I can help arrange access to a remote CUDA-capable machine for students without local access.

Schedule and Slides (subject to change)

3 / 27 Course Introduction
3 / 29 Intro to CUDA C axpy
4 / 3 CUDA parallelism model
4 / 5 Memory and data locality
Thread execution / computational efficiency
4 / 10 Memory performance
Stencil pattern
4 / 12 Prefix sum pattern
4 / 17 Histogram pattern
4 / 19 Sparse matrix pattern TiledMatrixMultiplication due
4 / 24 Merge sort pattern Assignment 2
4 / 26 Graph search pattern
5 / 1 Advanced host / device interface
Streams, events, and concurrency
5 / 3 Dynamic parallelism / recursion
5 / 8 Floating point considerations
Intrinsic Functions
Assignment 2 due
Final project
5 / 10 In-warp shuffles
5 / 15 Multi-GPU programming Final project proposal due
5 / 17 No class / Go see the CSE 599g
guest lecture on cuDNN instead:
5/18 1-2pm @CSE305
5 / 22 OpenCL / OpenACC
5 / 24 Beyond CUDA
5 / 29 Memorial Day (no class)
5 / 31 No class / work on projects
6 / 1 Final project due