AVX-512 vector intrinsics

Intel Xeon SP CPUs are very powerful, with each CPU having up to 64 cores, each with 1 or 2 AVX-512 vector units. In simple cases, the compiler is capable of vectorising code, producing executable code which uses vector instructions. However, in other cases it is necessary for the programmer to explicitly use vector intrinsic functions.

Standard C++ code has scalar floating point variables of type double (64-bit), float (32-bit), and _Float16 (16-bit). When using Intel's 512-bit vectors, the corresponding variable types are __m512d, __m512, __m512h, and Intel now provides operator overloading for these variables, so that one can write code like c = a + b; to add vectors a and b to form c.

Some functions (such as exp, log, sin, cos) are not supported natively, so I have created a header file my_mm512.h which provides many of these as well as a few additional functions for debugging purposes and the generation of Normal random variables from 32-bit or 16-bit random integers.

With this it is possible to write code which looks very much like the original scalar code, but using a vector datatype.

Note that they must be executed on a system such as our Maths server "mimic" which has the required AVX-512_FP16 hardware support.

References