Course on CUDA Programming on NVIDIA GPUs, July 23--27, 2012
We are now ready for online bookings
here.
This is a 5-day hands-on course for students, postdocs, academics and
others who want to learn how to develop applications to run on NVIDIA
GPUs using the CUDA programming environment. All that will be assumed
is some proficiency with C and basic C++ programming. No prior experience
with parallel computing will be assumed.
The course consists of approximately 3 hours of lectures and 4 hours
of practicals each day. The aim is that by the end of the course you
will be able to write relatively simple programs and will be confident
and able to continue learning through studying the examples provided
by NVIDIA as part of their SDK (software development kit).
This course is open to all -- to register please go to the
registration webpage
Costs for the course will be as follows:
- £100 for those from Oxford University and other members of the
e-Infrastructure South consortium (Bristol, Southampton, STFC and UCL)
- £200 for those from other UK universities
- £500 for those from other government labs, not-for-profit organisations,
and foreign universities
- £2000 for those from industry (this will include lunch each day)
An online registration page will be provided soon.
EPSRC-supported PhD students may be able to get assistance with travel
and accommodation costs from the
EPSRC Coordinated HPC Training Centre.
Venue
The course is being organised by OeRC
but is being held in the Engineering Science department
Thom building
-- please come to the main entrance on the first floor (go up the external
stairs to reach an elevated walkway linking several buildings)
on Monday morning at 9:00.
Accommodation and food
Those attending the course must arrange their own accommodation. These three are
within a few minutes walk, and are arranged in order of increasing cost:
The location for the lectures and practicals is marked as A on this
Google map.
Little Clarendon Street, which is towards the left side of the map,
has several restaurants for dinner, and there are two sandwich
shops for lunch on either side of its junction with the road marked
as the A4144. Towards the bottom of the map, the Lamb & Flag
and The Eagle & Child are two popular pubs.
In addition, there is an Engineering Science common room nearby (turn
left as you leave the Thom building and go straight ahead into the first
building) which is open 9-2 and serves sandwiches, pizza and other snacks.
Timetable
For the first three days we will follow this timetable:
- 09:15 - 10:45 lecture
- 10:45 - 11:15 coffee
- 11:15 - 12:45 practical
- 12:45 - 14:00 lunch break
- 14:00 - 15:30 lecture
- 15:30 - 16:00 coffee
- 16:00 - 17:30 practical
On the last two days we will switch to having both lectures in the morning,
and then have practicals all afternoon. This will also allow those coming
to Oxford from far away to leave when they wish on Friday afternoon.
Preliminary Reading
Please bring a printed copy of the
NVIDIA CUDA C Programming Guide version 4.1
and have read chapters 1 and 2.
CUDA is an extension of C/C++, so if you are a little rusty with C/C++
you should refresh your memory of it.
Please also look at
lecture 0
(4 slides per page for printing)
which gives an overview of trends in computing and explains why I believe
GPU computing is an important direction for scientific computing
for the next 5-10 years.
Additional References
The lectures and practicals below were for the course in 2011 -- they will be
revised for 2012.
Lectures
Additional NVIDIA presentations:
Code generator for case study 2:
Practicals
The practicals will be held in the computer teaching laboratory on the 6th floor
of the Thom building.
We will be working under Linux, and because of the large number of people
taking the course we will be using two different GPU clusters:
- OSC
skynet
cluster: please read these notes
and there is some further information
here.
- HECToR
GPU cluster,
with many thanks to EPSRC for access to this: please read these
notes
The practicals all use this
cutil_inline.h
header file which is based on one in the CUDA SDK.
Practical 1
Application: a trivial "hello world" example
CUDA aspects: launching a kernel, copying data to/from the graphics card,
error checking and printing from kernel code
Note: the instructions explain how files can be copied from my user account
so there's no need to download from here
Practical 2
Application: Monte Carlo simulation using NAG's
RNG library
CUDA aspects: constant memory, random number generation, kernel timing,
minimising device memory bandwidth requirements
Practical 3
Application: 3D Laplace and ADI finite difference solvers
CUDA aspects: array padding and thread block size optimisation
Practical 4
Application: reduction
CUDA aspects: dynamic shared memory, thread synchronisation
Practical 5
Application: using the CUBLAS and CUFFT libraries
Practical 6
Application: revisiting the simple "hello world" example
CUDA aspects: using g++ for the main code, building libraries,
using templates
Practical 7
Application: tri-diagonal equations
Practical 8
Application: scan operation and recurrence equations
Practical 9
Application: pattern matching
Practical 10
Application: auto-tuning
Acknowledgements
Many thanks to
- the OSC and HECToR staff for providing the GPU cluster resources
- the Engineering Science and Computer Science departments for the
lecture room and computer labs
- Clementine Harris and Kay Sutton for administrative support
- Wes Armour, Guido Klingbeil and Stuart Golodetz for help with the practicals
- Gernot Ziegler for his guest lecture