SIAM PP10 Tutorial on GPU Programming
This is a half-day tutorial on "An Introduction to GPU Programming"
to be held at the
SIAM Conference on Parallel Processing for Scientific Computing
in Seattle on Saturday, February 27th, 2010.
All that will be assumed is some proficiency with C programming.
No prior experience with parallel computing will be assumed, although
a background in MPI distributed-memory computing would be helpful.
The tutorial will consist of 2 hours of lectures and 2 hours
of practicals. The aim is that by the end of the tutorial you will
understand the basics of CUDA programming, and feel confident
to continue learning on your own.
This tutorial is based on my
5-day CUDA course
which provides additional lecture material and practicals for self-study.
Timetable
I will be available afterwards for discussions over lunch.
Practicals
The practicals will be run on a
Venom T4000
workstation with 4
NVIDIA Tesla
GPUs, kindly provided by
Boston Ltd.
As a fall-back in the unlikely case of any difficulties, we will use the Oxford University
skynet
cluster for which we have these user notes.
The main CUDA reference for these practicals is the
NVIDIA CUDA Programming Guide version 2.3
Practical 1
Application: a trivial "hello world" example
CUDA aspects: launching a kernel, copying data to/from the graphics card,
using the emulator and debug modes, error checking using CUTIL routines
from the SDK
Note: files will be provided in user accounts so no need to download from here
Practical 2
Application: Monte Carlo simulation using NAG's
RNG library
CUDA aspects: constant memory, random number generation, kernel timing,
memory access coalescence
Practical 3
Application: 3D Laplace solver
CUDA aspects: use of shared memory and thread synchronization
Additional reading