OpenMP programming for parallel/vector computing

The latest generation of Intel Xeon SP (Scalable Processors) CPUs is very powerful, with each CPU having up to 64 cores, each with 1 or 2 AVX-512 vector units. The purpose of this 1-day course is to introduce students, postdocs and others to OpenMP programming which is often the best way to obtain high performance on such systems.

The course consists of approximately 3.5 hours of lectures and 2.5 hours of practicals. All that will be assumed is some basic proficiency with C. No prior experience with parallel computing will be assumed.

In case you do not have a lot of experience with C, here are links to a couple of introductory lectures on C, a larger online resource and an even larger online resource. This reddit critique particularly recommends that last one, and mentions various other ones in addition.

Timetable

This is a possible timetable, but the timing is approximate and might be changed.

10:00 - 11:15 lecture 1
11:15 - 11:30 break
11:30 - 12:45 lecture 2
12:45 - 13:30 lunch
13:30 - 15:00 practical 1
15:00 - 16:00 lecture 3
16:00 - 17:00 practical 2

References

Wikipedia info on Emerald Rapids Xeon processors
Wikipedia info on Raptor Cove cores in Emerald Rapids processors
Wikipedia page on AVX-512 instructions, including FP16 half-precision

Intel webpage on Xeon Gold 6538Y+ processor in NA group's "mimic" system

Intel's oneAPI C/C++ Compiler
OpenMP support in Intel oneAPI C/C++ Compiler
oneAPI Math Kernel Library (MKL) documentation
MKL Statistical Functions documentation for random number generation
Old (2021) Intel documentation on random number generation performance

Wikipedia page on OpenMP
OpenMP 6.0 reference guide
LLNL OpenMP info

Lectures

Lecture 1: Introduction to Intel CPUs
Lecture 2: Introduction to OpenMP
Lecture 3: More on OpenMP

Practical 1

Application: a simple 2D finite difference method

OpenMP aspects: #pragma omd parallel for, shared and private variables, reduction clause, run-time library, environment variables, avoiding false sharing

Practical 2

Application: Monte Carlo simulation using Intel's MKL library

OpenMP aspects: #pragma omp parallel, #pragma omp threadprivate, array reduction, MKL random number generation

Acknowledgements

Many thanks to:

Wes Armour and Ian Bush for feedback on my original lectures and practicals
Ian Bush, EPCC and NAG for their OpenMP training materials which were very helpful in preparing my original materials