Course on CUDA Programming on NVIDIA GPUs, Feb 3-21, 2025
This is a 3 week course to learn how to develop parallel applications to
run on NVIDIA GPUs. All that will be assumed is some proficiency with C and
basic C++ programming. No prior experience with parallel computing will
be assumed.
The aim is that by the end of the course you will be able to write
relatively simple parallel programs, and will feel confident to continue
learning to use CUDA through studying the code samples provided by
NVIDIA on GitHub.
CUDA Programming references
As preliminary reading, please read chapters 1 and 2 of the
NVIDIA CUDA C Programming Guide which is available both as
PDF
and
online HTML.
CUDA is an extension of C/C++, so if you are a little rusty with C/C++
you should refresh your memory of it. Here are links to
a couple of introductory lectures on C
and an online resource.
There is lots of other information available online. You might find some of
this useful, but you definitely don't need to read most of it.
Lectures
Practicals
These will be carried out on Google Colab, with your modified notebooks
automatically stored on your Google Drive.
Practical 1 is mandatory but is not assessed. Practicals 2-4 are to
be completed for assessment. Practicals 7-8 are optional and particularly
for those who may want to give a presentation on one of these topics.
Practical 1
Application: a trivial "hello world" example
CUDA aspects: launching a kernel, copying data to/from the graphics card,
error checking and printing from kernel code
Practical 2
Application: Monte Carlo simulation using NVIDIA's CURAND library
for random number generation
CUDA aspects: constant memory, random number generation, kernel timing,
minimising device memory bandwidth requirements
Practical 3
Application: 3D Laplace finite difference solver
CUDA aspects: thread block size optimisation, multi-dimensional memory layout,
performance profiling
Practical 4
Application: reduction
CUDA aspects: dynamic shared memory, thread synchronisation, shuffles, atomics
Practical 7
Application: tri-diagonal equations -- see Lecture 7, slide 8, and also
this research talk
Practical 8
Application: scan operation and recurrence equations -- see Lecture 4
Ideas for presentation topics
- Parallel scan for radix sort of integers
- Parallel scan for recurrence equations
- Solution of tri-diagonal equations
- Use of tensor cores for matrix-matrix multiplication
-
-
-
Acknowledgements
Many thanks to:
- Google for the Google Colab system
webpage link checker