Course on CUDA Programming on NVIDIA GPUs, July 22-26, 2024
The course will be taught by
Prof. Mike Giles
and
Prof. Wes Armour.
They have both used CUDA in their research for many years, and
set up and manage
JADE,
the first national GPU supercomputer for Machine Learning.
This is a one-week hands-on course on how to develop applications to
run on NVIDIA GPUs using the CUDA programming environment. All that
will be assumed is some proficiency with C and basic C++ programming.
No prior experience with parallel computing will be assumed.
Timetable
For the first three days we will follow this timetable:
- 10:00 - 11:30 lecture
- 11:30 - 12:00 break
- 12:00 - 13:30 practical
- 13:30 - 14:30 lunch break
- 14:30 - 16:00 lecture
- 16:00 - 16:30 break
- 16:30 - 18:00 practical
On the last two days we will switch to having both lectures in the morning,
and then have practicals all afternoon.
Preliminary Reading
Please read chapters 1 and 2 of the NVIDIA CUDA C Programming Guide
which is available both as
PDF
and
online HTML.
Additional References
Lectures
Practicals
These will be carried out on Google Colab, with your modified notebooks
automatically stored on your Google Drive.
Practical 1 is mandatory but is not assessed. Practicals 2-4 are to
be completed for assessment. Practicals 7-8 are optional and particularly
for those who may want to give a presentation on one of these topics.
Practical 1
Application: a trivial "hello world" example
CUDA aspects: launching a kernel, copying data to/from the graphics card,
error checking and printing from kernel code
Practical 2
Application: Monte Carlo simulation using NVIDIA's CURAND library
for random number generation
CUDA aspects: constant memory, random number generation, kernel timing,
minimising device memory bandwidth requirements
Practical 3
Application: 3D Laplace finite difference solver
CUDA aspects: thread block size optimisation, multi-dimensional memory layout,
performance profiling
Practical 4
Application: reduction
CUDA aspects: dynamic shared memory, thread synchronisation
Practical 7
Application: tri-diagonal equations -- see Lecture 7, slide 8, and also
this research talk
Practical 8
Application: scan operation and recurrence equations -- see Lecture 4
Ideas for presentation topics
- Parallel scan for radix sort of integers
- Parallel scan for recurrence equations
- Solution of tri-diagonal equations
- Use of tensor cores for matrix-matrix multiplication
-
-
-
Acknowledgements
Many thanks to:
- the Mathematical Institute for hosting the lectures
- the Maths Events team for the livestreaming of the lectures
- Emmanuel Ahenkan and Tlotlo Oepeng for helping with the practicals
- Google for the Google Colab system
webpage link checker