CUDA Programming on NVIDIA GPUs, March 23-25, 2026, at UT Austin
This will be a 3-day hands-on course for students, postdocs, academics and
others who want to learn how to develop applications to run on NVIDIA
GPUs using the CUDA programming environment. All that will be assumed
is some proficiency with C and basic C++ programming. No prior experience
with parallel computing will be assumed.
The course consists of approximately 3 hours of lectures and 3 hours
of practicals for each of the first two days, plus 6 hours of lectures
on the third day. Additional advanced practicals can be completed
afterwards.
The aim is that by the end of the course you will be able to write
relatively simple programs and will be confident and able to continue
learning through studying the examples provided by NVIDIA on GitHub.
There will be time during March 26-27 for follow-on discussions on
the use of CUDA for specific research projects.
Venue
The lectures and practicals will all take place in POB Seminar Room 6.304
in the Oden Institute. Attendees should bring fully-charged laptops
for carrying out the practicals on
TACC.
Timetable
For the first two days we will follow this approximate timetable:
- 08:00 - 09:30 lecture
- 09:30 - 10:00 break
- 10:00 - 11:30 practical
- 11:30 - 12:30 lunch break
- 12:30 - 14:00 lecture
- 14:00 - 14:30 break
- 14:30 - 16:00 practical
On the third day we will switch to having two lectures in the morning,
and two in the afternoon.
Preliminary Reading
Please read sections 1.1 and 1.2 of the new NVIDIA CUDA Programming Guide
which is available both as
PDF
and
online HTML.
CUDA is an extension of C/C++, so if you are a little rusty with C/C++
you should refresh your memory of it. Here are links to
a couple of introductory lectures on C
and an online resource.
Additional References
Lectures
Practicals
We will be working under Linux on GPU nodes which are part of TACC's
Frontera
system.
Before starting the practicals, please read these
notes on using the Frontera system,
and have a look at the online
Frontera User Guide.
Datasheet
for Quadro RTX 5000 GPU which we will be using in our practicals.
The practicals all use these header files
(helper_cuda.h,
helper_string.h)
which came originally from the CUDA SDK. They provide routines for
error-checking and initialisation.
Tar files for all practicals
Practical 1
Application: a trivial "hello world" example
CUDA aspects: launching a kernel, copying data to/from the graphics card,
error checking and printing from kernel code
Note: the Frontera notes
explain how the files for all of the practicals can be obtained
from my master tar file, so there's no need to download individual
files from here
Practical 2
Application: Monte Carlo simulation using NVIDIA's CURAND library
for random number generation
CUDA aspects: constant memory, random number generation, kernel timing,
minimising device memory bandwidth requirements
Practical 3
Application: 3D Laplace finite difference solver
CUDA aspects: thread block size optimisation, multi-dimensional memory layout,
performance profiling
Practical 4
Application: reduction
CUDA aspects: dynamic shared memory, thread synchronisation, shuffles, atomics
The following practicals provide scope for additional practice after
the course is over.
Practical 5
Application: using Tensor Cores and cuBLAS and other libraries
Practical 6
Application: revisiting the simple "hello world" example
CUDA aspects: using g++ for the main code, building libraries,
using templates
Practical 7
Application: tri-diagonal equations
Practical 8
Application: scan operation and recurrence equations
Practical 9
Application: pattern matching
Practical 10
Application: auto-tuning
Practical 11
Application: streams and OpenMP multithreading
Practical 12
Application: more on streams and overlapping computation and communication
Acknowledgements
Many thanks to:
- TACC for the GPU resources
webpage link checker