Course on CUDA Programming on NVIDIA GPUs, July 22-26, 2019

This year the course will be led by Prof. Wes Armour who has given guest lectures in the past, and has also taken over from me as PI on JADE, the first national GPU supercomputer for Machine Learning.

We are now ready for online registration here.

Note that Oxford undergraduates and OxWaSP and AIMS CDT students do not need to register through this site. Undergraduates should contact Wes directly, and the CDT students are automatically enrolled.

This is a 5-day hands-on course for students, postdocs, academics and others who want to learn how to develop applications to run on NVIDIA GPUs using the CUDA programming environment. All that will be assumed is some proficiency with C and basic C++ programming. No prior experience with parallel computing will be assumed.

The course consists of approximately 3 hours of lectures and 4 hours of practicals each day. The aim is that by the end of the course you will be able to write relatively simple programs and will be confident and able to continue learning through studying the examples provided by NVIDIA as part of their SDK (software development kit).

Attendees do not need to bring a laptop; you will be provided with a desktop PC to access the servers with the GPUs.

Costs for the course are: Anyone with a status which does not fit into one of the categories above, including those outside the UK who are not from a university or company, should contact Wes to discuss the appropriate fee category.

The intention is that these costs should not deter anyone from attending the course. The higher costs for certain participants correspond to the fact that they will be paying more for their travel and accommodation, and/or their organisations will be paying more for their time spent attending the course. It also reflects the UK funding for the facilities being used.

We are now ready for online registration here.

To encourage early registration, the costs will increase by 50% after July 4th.


The course is being organised by the Department of Engineering Science and is being held in the Thom building -- please come to the main entrance on the ground floor on Monday morning at 9:00.

Travel to Oxford

For those coming to Oxford, especially from abroad, there is travel advice here.

Accommodation and food

Those attending the course must arrange their own accommodation. These three are within a few minutes walk, and are arranged in order of increasing cost: Alternatively, you might consider using Airbnb.

The location for the lectures and practicals is marked on this Google map. Little Clarendon Street, which is towards the left side of the map, has several restaurants for dinner, and there are two sandwich shops for lunch on either side of its junction with the road marked as the A4144. Towards the bottom of the map, the Lamb & Flag and The Eagle & Child are two popular pubs.

For coffee, breakfast and lunch, there is a very good cafe in the basement of the Mathematical Institute, which is just 2-3 minutes away on Woodstock Road.

In addition, there is an Engineering Science common room nearby (turn left as you leave the Thom building on the first floor, and go straight ahead into the first building) which is open 9-2 and serves sandwiches, pizza and other snacks.


For the first three days we will follow this timetable: On the last two days we will switch to having both lectures in the morning, and then have practicals all afternoon. This provides more time for longer practicals, and will also allow those coming to Oxford from far away to leave when they wish on Friday afternoon.

Preliminary Reading

Please read chapters 1 and 2 of the NVIDIA CUDA C Programming Guide.

CUDA is an extension of C/C++, so if you are a little rusty with C/C++ you should refresh your memory of it.

Additional References



The practicals will be held in the computer teaching laboratory on the 6th floor of the Thom building.

We will be working under Linux on the ARCUS GPU cluster. Before starting the practicals, please read these ARCUS notes and have a look at this ARCUS GPU webpage. Further details on the Slurm batch queueing system are available on this webpage.

The practicals all use these header files (helper_cuda.h, helper_string.h) which come from the CUDA SDK. They provide routines for error-checking and initialisation.

Tar files for all practicals

Practical 1

Application: a trivial "hello world" example

CUDA aspects: launching a kernel, copying data to/from the graphics card, error checking and printing from kernel code Note: the instructions explain how files can be copied from my user account so there's no need to download from here

Practical 2

Application: Monte Carlo simulation using NVIDIA's CURAND library for random number generation

CUDA aspects: constant memory, random number generation, kernel timing, minimising device memory bandwidth requirements

Practical 3

Application: 3D Laplace finite difference solver

CUDA aspects: thread block size optimisation, multi-dimensional memory layout

Practical 4

Application: reduction

CUDA aspects: dynamic shared memory, thread synchronisation

Practical 5

Application: using the CUBLAS and CUFFT libraries

Practical 6

Application: revisiting the simple "hello world" example

CUDA aspects: using g++ for the main code, building libraries, using templates

Practical 7

Application: tri-diagonal equations

Practical 8

Application: scan operation and recurrence equations

Practical 9

Application: pattern matching

Practical 10

Application: auto-tuning

Practical 11

Application: streams and OpenMP multithreading

Practical 12

Application: more on streams and overlapping computation and communication


Many thanks to:
webpage link checker