Mike Giles - Parallel Computing Using NVIDIA GPUs

Random number generation

Efficient parallel random number generation is a basic requirement of Monte Carlo simulation.

So far, I have implemented two generators:

L'Ecuyer's mrg32k3a pseudo-random generator, with uniform, exponential, Normal and gamma output distributions
Sobol's quasi-random generator (starting from a sequential implementation by Joe and Kuo) with uniform, exponential and Normal output distributions -- an early version of this is the basis for the Sobol example in NVIDIA's CUDA SDK.

I worked with NAG to incorporate these into a new set of numerical routines for GPUs available free to academics who sign a collaborative agreement.

LIBOR Monte Carlo application

In my initial work with GPUs, a visiting student and I implemented a LIBOR market model. An updated version of this code uses the mrg32k3a random number generator described above, and includes a comparison between the output of CPU and GPU code to show that the results are identical to machine precision

libor.cu, main code
libor_kernels.cu, CUDA kernel code
libor_gold.cpp, CPU code for comparison
params.h, header file
mrg32k3a_gold.cpp, random number generator
Makefile following CUDA SDK style
more information on the LIBOR testcase

Single precision issues

Current NVIDIA GPUs have double precision support, but it is 2-4 times slower than single precision. Similarly, when using SSE vectorisation on Intel CPUs double precision is 2 times slower than single precision.

Many in the finance sector consider single precision to be inadequate, but my own view is that it is perfectly adequate for Monte Carlo applications except when computing sensitivities ("Greeks") by finite difference perturbation ("bumping").

It is also important to use either a double precision accumulator or some form of binary tree summation to minimise the accumulation of roundoff error when averaging the payoffs from a very large number of paths. The links below give a test implementation of a binary tree summation, and a reference on the error analysis of related methods:

Acknowledgements

This research has been supported by

Oxford-Man Institute of Quantitative Finance
NVIDIA through a CUDA Fellowship Award
EPSRC, through funding for the Many-core and Reconfigurable Supercomputing Network
Microsoft
CRL, now part of TCS (Tata Consultancy Services)