SIAM PP10 Tutorial on GPU Programming

This is a half-day tutorial on "An Introduction to GPU Programming" to be held at the SIAM Conference on Parallel Processing for Scientific Computing in Seattle on Saturday, February 27th, 2010.

All that will be assumed is some proficiency with C programming. No prior experience with parallel computing will be assumed, although a background in MPI distributed-memory computing would be helpful.

The tutorial will consist of 2 hours of lectures and 2 hours of practicals. The aim is that by the end of the tutorial you will understand the basics of CUDA programming, and feel confident to continue learning on your own.

This tutorial is based on my 5-day CUDA course which provides additional lecture material and practicals for self-study.


I will be available afterwards for discussions over lunch.


The practicals will be run on a Venom T4000 workstation with 4 NVIDIA Tesla GPUs, kindly provided by Boston Ltd.

As a fall-back in the unlikely case of any difficulties, we will use the Oxford University skynet cluster for which we have these user notes.

The main CUDA reference for these practicals is the NVIDIA CUDA Programming Guide version 2.3

Practical 1

Application: a trivial "hello world" example

CUDA aspects: launching a kernel, copying data to/from the graphics card, using the emulator and debug modes, error checking using CUTIL routines from the SDK Note: files will be provided in user accounts so no need to download from here

Practical 2

Application: Monte Carlo simulation using NAG's RNG library

CUDA aspects: constant memory, random number generation, kernel timing, memory access coalescence

Practical 3

Application: 3D Laplace solver

CUDA aspects: use of shared memory and thread synchronization

Additional reading