Course on CUDA Programming, November 30 -- December 15, 2023, at AIMS South Africa
This is a 2.5 week course to learn how to develop parallel applications to
run on NVIDIA GPUs. All that will be assumed is some proficiency with C and
basic C++ programming. No prior experience with parallel computing will
be assumed.
The aim is that by the end of the course you will be able to write
relatively simple parallel programs, and will feel confident to continue
learning to use CUDA through studying the code samples provided by
NVIDIA on GitHub.
CUDA Programming references
As preliminary reading, please read chapters 1 and 2 of the
NVIDIA CUDA C Programming Guide which is available both as
PDF
and
online HTML.
There is lots of other information available online. You might find some of
this useful, but you definitely don't need to read most of it.
Lectures
Week 1:
Week 2:
Week 3:
Practicals
We will be working on Linux servers within the Google Cloud.
Before starting the practicals, please read these
notes on using Google Cloud, and have a look at the online
user documentation.
The practicals all use these header files
(helper_cuda.h,
helper_string.h)
which came originally from the CUDA SDK. They provide routines for
error-checking and initialisation.
Practical 1
Application: a trivial "hello world" example
CUDA aspects: launching a kernel, copying data to/from the graphics card,
error checking and printing from kernel code
Note: the instructions explain how files can be copied from my user account
so there's no need to download from here
Practical 2
Application: Monte Carlo simulation using NVIDIA's CURAND library
for random number generation
CUDA aspects: constant memory, random number generation, kernel timing,
minimising device memory bandwidth requirements
Practical 3
Application: 3D Laplace finite difference solver
CUDA aspects: thread block size optimisation, multi-dimensional memory layout
Practical 4
Application: reduction
CUDA aspects: dynamic shared memory, thread synchronisation
Practical 5
Application: using the CUBLAS and CUFFT libraries
Practical 6
Application: revisiting the simple "hello world" example
CUDA aspects: using g++ for the main code, building libraries,
using templates
Practical 7
Application: tri-diagonal equations
Practical 8
Application: scan operation and recurrence equations
Ideas for presentation topics
- Parallel scan for radix sort of integers
- Parallel scan for recurrence equations
- Use of tensor cores for matrix-matrix multiplication
-
-
-
Acknowledgements
Many thanks to:
- Google Deepmind for provision of time on the Google Cloud
- Claire David and Jan Groenewald for their help with using the systems
webpage link checker