Course on CUDA Programming on NVIDIA GPUs, July 24-28, 2023
The course will be taught by
Prof. Mike Giles
and
Prof. Wes Armour.
They have both used CUDA in their research for many years, and
set up and manage
JADE,
the first national GPU supercomputer for Machine Learning.
Online registration has now closed for 2023.
This is a one-week hands-on course for students, postdocs, academics
and others who want to learn how to develop applications to run on
NVIDIA GPUs using the CUDA programming environment. All that will
be assumed is some proficiency with C and basic C++ programming.
No prior experience with parallel computing will be assumed.
The course consists of approximately 3 hours of lectures and 4 hours
of practicals each day. The aim is that by the end of the course you
will be able to write relatively simple programs and will be confident
and able to continue learning through studying the examples provided
by NVIDIA on GitHub.
All attendees should bring a laptop to access the GPUs servers
which will be used for the practicals.
The costs for the course are:
- free for everyone in Oxford (due to central funding)
- £250 for those from other UK universities
- £500 for those from UK government labs,
UK not-for-profit organisations,
and foreign universities
- £2500 for those from industry
Anyone with a status which does not fit into one of the categories above,
including those outside the UK who are not from a university or company,
should contact me
(mike.giles@maths.ox.ac.uk)
to discuss the appropriate fee category.
The intention is that these costs should not deter anyone from attending
the course. The higher costs for certain participants correspond to the
fact that they will be paying more for their travel and accommodation,
and/or their organisations will be paying more for their time spent
attending the course. It also reflects the UK funding for the facilities
being used.
Venue
The lectures and practicals will all take place in the
Mathematical Institute.
Attendees should bring laptops for accessing the remote servers to carry
out the practicals. It would be good to use fully-charged laptops, but we
will try to provide adequate charging points as far as possible.
Travel to Oxford
For those coming to Oxford, especially from abroad, there is travel advice
here.
Accommodation and food
Those attending the course must arrange their own accommodation.
These are within a few minutes walk (or bus ride), and are arranged
roughly in order of increasing cost:
Alternatively, you might consider using
Airbnb.
For coffee, breakfast and lunch, there is a good cafe in the
basement of the Mathematical Institute. Little Clarendon Street,
which is nearby, has several restaurants for dinner
(and an excellent ice cream shop), and there are two sandwich
shops for lunch on either side of its junction with Woodstock
Road (A4144 on Google Maps). The Lamb & Flag and
The Eagle & Child are two popular pubs in St. Giles
(the continuation of the A4144).
Timetable
For the first three days we will follow this timetable:
- 09:15 - 10:45 lecture
- 10:45 - 11:15 break
- 11:15 - 12:45 practical
- 12:45 - 14:00 lunch break
- 14:00 - 15:30 lecture
- 15:30 - 16:00 break
- 16:00 - 17:30 practical
On the last two days we will switch to having both lectures in the morning,
and then have practicals all afternoon. This provides more time for longer
practicals, and will also allow those coming to Oxford from far away to
leave when they wish on Friday afternoon.
Preliminary Reading
Please read chapters 1 and 2 of the NVIDIA CUDA C Programming Guide
which is available both as
PDF
and
online HTML.
CUDA is an extension of C/C++, so if you are a little rusty with C/C++
you should refresh your memory of it.
Additional References
Lectures
Full set of lecture slides: 4 slides per page,
2 slides per page
Practicals
Most attendees will be provided with accounts on the
ARC/HTC
system which has a number of NVIDIA GPU nodes.
Before starting the practicals, please read these
ARC notes.
Some details on the Slurm batch queueing system are available
here.
Those with accounts on JADE
may prefer to use it for their practicals.
The practicals all use these header files
(helper_cuda.h,
helper_string.h)
which came originally from the CUDA SDK. They provide routines for
error-checking and initialisation.
Tar files for all practicals
Practical 1
Application: a trivial "hello world" example
CUDA aspects: launching a kernel, copying data to/from the graphics card,
error checking and printing from kernel code
Note: the instructions explain how files can be copied from my user account
so there's no need to download from here
Practical 2
Application: Monte Carlo simulation using NVIDIA's CURAND library
for random number generation
CUDA aspects: constant memory, random number generation, kernel timing,
minimising device memory bandwidth requirements
Practical 3
Application: 3D Laplace finite difference solver
CUDA aspects: thread block size optimisation, multi-dimensional memory layout,
performance profiling
Practical 4
Application: reduction
CUDA aspects: dynamic shared memory, thread synchronisation
Practical 5
Application: using the CUBLAS and CUFFT libraries
Practical 6
Application: revisiting the simple "hello world" example
CUDA aspects: using g++ for the main code, building libraries,
using templates
Practical 7
Application: tri-diagonal equations
Practical 8
Application: scan operation and recurrence equations
Practical 9
Application: pattern matching
Practical 10
Application: auto-tuning
Practical 11
Application: streams and OpenMP multithreading
Practical 12
Application: more on streams and overlapping computation and communication
Acknowledgements
Many thanks to:
- the Mathematical Institute for hosting the lectures and practicals
- Oxford's Advanced Research Computing for the GPU servers used
in the practicals
- Karel Adamek for his help with the practicals
webpage link checker