Course on CUDA Programming on NVIDIA GPUs, July 23-27, 2018
The 2018 course is now finished. The course will be held again next year,
probably on July 22-26, 2019. Registration will probably begin in April.
This is a 5-day hands-on course for students, postdocs, academics and
others who want to learn how to develop applications to run on NVIDIA
GPUs using the CUDA programming environment. All that will be assumed
is some proficiency with C and basic C++ programming. No prior experience
with parallel computing will be assumed.
The course consists of approximately 3 hours of lectures and 4 hours
of practicals each day. The aim is that by the end of the course you
will be able to write relatively simple programs and will be confident
and able to continue learning through studying the examples provided
by NVIDIA as part of their SDK (software development kit).
Attendees do not need to bring a laptop; you will be provided with a
desktop PC to access the servers with the GPUs.
Costs for the course are:
Anyone with a status which does not fit into one of the categories above,
including those outside the UK who are not from a university or company,
should contact me to discuss the appropriate fee category.
- free for Oxford undergraduates -- they should contact me
- OxWaSP and AIMS CDT students are automatically enrolled
-- they don't need to do anything
- £100 for others from Oxford University
- £200 for those from other UK universities
- £500 for those from UK government labs,
UK not-for-profit organisations,
and foreign universities
- £2000 for those from industry
My intention is that these costs should not deter anyone from attending
the course. The higher costs for certain participants correspond to the
fact that they will be paying more for their travel and accommodation,
and/or their organisations will be paying more for their time spent
attending the course. It also reflects the UK funding provided for the
facilities being used.
The course is being organised by the
but is being held in the Engineering Science department
-- please come to the main entrance on the ground floor on Monday morning
Travel to Oxford
For those coming to Oxford, especially from abroad, I have travel advice
Accommodation and food
Those attending the course must arrange their own accommodation. These three are
within a few minutes walk, and are arranged in order of increasing cost:
Alternatively, you might consider using
The location for the lectures and practicals is marked on this
Little Clarendon Street, which is towards the left side of the map,
has several restaurants for dinner, and there are two sandwich
shops for lunch on either side of its junction with the road marked
as the A4144. Towards the bottom of the map, the Lamb & Flag
and The Eagle & Child are two popular pubs.
For coffee, breakfast and lunch, there is a very good cafe in the basement
of the Mathematical Institute,
which is just 2-3 minutes away on Woodstock Road.
In addition, there is an Engineering Science common room nearby (turn
left as you leave the Thom building on the first floor, and go straight
ahead into the first building) which is open 9-2 and serves sandwiches,
pizza and other snacks.
For the first three days we will follow this timetable:
On the last two days we will switch to having both lectures in the morning,
and then have practicals all afternoon. This provides more time for longer
practicals, and will also allow those coming to Oxford from far away to
leave when they wish on Friday afternoon.
- 09:15 - 10:45 lecture
- 10:45 - 11:15 break
- 11:15 - 12:45 practical
- 12:45 - 14:00 lunch break
- 14:00 - 15:30 lecture
- 15:30 - 16:00 break
- 16:00 - 17:30 practical
Please read chapters 1 and 2 of the
NVIDIA CUDA C Programming Guide.
CUDA is an extension of C/C++, so if you are a little rusty with C/C++
you should refresh your memory of it.
Some additional material on a case study concerning explicit and
implicit discretisations of a parabolic PDE:
The practicals will be held in the computer teaching laboratory on the 6th floor
of the Thom building.
We will be working under Linux on the
GPU cluster. Before starting the practicals, please read these
and have a look at this
ARCUS GPU webpage.
Further details on the Slurm batch queueing system are available on this
The practicals all use these header files
which come from the CUDA SDK. They provide routines for error-checking
Tar files for all practicals
Application: a trivial "hello world" example
CUDA aspects: launching a kernel, copying data to/from the graphics card,
error checking and printing from kernel code
Note: the instructions explain how files can be copied from my user account
so there's no need to download from here
Application: Monte Carlo simulation using NVIDIA's CURAND library
for random number generation
CUDA aspects: constant memory, random number generation, kernel timing,
minimising device memory bandwidth requirements
Application: 3D Laplace finite difference solver
CUDA aspects: thread block size optimisation, multi-dimensional memory layout
CUDA aspects: dynamic shared memory, thread synchronisation
Application: using the CUBLAS and CUFFT libraries
Application: revisiting the simple "hello world" example
CUDA aspects: using g++ for the main code, building libraries,
Application: tri-diagonal equations
Application: scan operation and recurrence equations
Application: pattern matching
Application: streams and OpenMP multithreading
Application: more on streams and overlapping computation and communication
Many thanks to:
- ARC for the GPU resources
- the Engineering Science and Computer Science departments for the
lecture room and computer labs
- Tim Lanfear from NVIDIA for his guest lecture
- Istvan Reguly and Wes Armour for their guest lectures
and help with the practicals
- Fred Dulwich and Ben Mort for their help with the practicals