Efficient parallel random number generation is a basic requirement of Monte Carlo simulation.
So far, I have implemented two generators:
I worked with NAG to incorporate these into a new set of numerical routines for GPUs available free to academics who sign a collaborative agreement.
In my initial work with GPUs, a visiting student and I implemented a LIBOR market model. An updated version of this code uses the mrg32k3a random number generator described above, and includes a comparison between the output of CPU and GPU code to show that the results are identical to machine precision
Current NVIDIA GPUs have double precision support, but it is 2-4 times slower than single precision. Similarly, when using SSE vectorisation on Intel CPUs double precision is 2 times slower than single precision.
Many in the finance sector consider single precision to be inadequate, but my own view is that it is perfectly adequate for Monte Carlo applications except when computing sensitivities ("Greeks") by finite difference perturbation ("bumping").
It is also important to use either a double precision accumulator or some form of binary tree summation to minimise the accumulation of roundoff error when averaging the payoffs from a very large number of paths. The links below give a test implementation of a binary tree summation, and a reference on the error analysis of related methods: