AVX-512 vector intrinsics

Intel Xeon SP CPUs are very powerful, with each CPU having up to 64 cores, each with 1 or 2 AVX-512 vector units. In simple cases, the compiler is capable of vectorising code, producing executable code which uses vector instructions. However, in other cases it is necessary for the programmer to explicitly use vector intrinsic functions.

Standard C++ code has scalar floating point variables of type double (64-bit), float (32-bit), and _Float16 (16-bit). When using Intel's 512-bit vectors, the corresponding variable types are __m512d, __m512, __m512h, and Intel now provides operator overloading for these variables, so that one can write code like c = a + b; to add vectors a and b to form c.

Some functions (such as exp, log, sin, cos) are not supported natively, so I have created a header file my_mm512.h which provides many of these as well as a few additional functions for debugging purposes and the generation of Normal random variables from 32-bit or 16-bit random integers.

With this it is possible to write code which looks very much like the original scalar code, but using a vector datatype.

dyadic.m -- MATLAB code used to generate dyadic and super-dyadic piecewise linear approximations to the inverse of the Normal CDF function, producing tabulated data used in both my_mm512.h and normals_test.cpp

my_mm512.h -- header file defining various fp16 and fp32 functions
normals_test.cpp -- code to assess the accuracy in generating 16-bit Normals
speed_test.cpp -- code to assess the relative computational performance
Makefile -- Makefile to compile both of them

Note that they must be executed on a system such as our Maths server "mimic" which has the required AVX-512_FP16 hardware support.

References

Wikipedia page on AVX-512 instructions, including FP16 half-precision

Intel's AVX-512 Intrinsics Guide
Intel's AVX-512_FP16 Instruction Set
Intel's Permuting Data Within and Between AVX Registers

Oliver Sheridan-Methven's piecewise_polynomial_approximation.c with code at the bottom for piecewise linear approximation using AVX intrinsics, and his DPhil (PhD) dissertation.
Jason Tay's Github repository with additional software, and his MSc dissertation.