In the summer of 2007, a visiting student, Su Xiaoke, worked with me to investigate the performance of NVIDIA GPUs on a LIBOR market model Monte Carlo application. Using an NVIDIA 8800 GTX graphics card with 128 cores, we achieved a speedup of over 100 relative to a single Xeon core.

- libor_as.c, original C code
- libor.cu, CUDA code
- Makefile
- report
- more information on the LIBOR testcase

As a first experiment with a finite difference application, in early 2008 I wrote a 3D Laplace solver using simple Jacobi iteration. Using a low-cost 8800GT card, Gerd Heber and I achieved a speedup of factor 50 relative to a single thread on a Xeon, and factor 10 compared to 8-threads running on two quad-core Xeons:

Using texture mapping, a newer code achieves slightly poorer performance but with a much simpler code. This might be an excellent approach for applications in which there is more computation per grid point and so the performance penalty is minimal.

- laplace3d_new.cu, main code
- laplace3d_new_kernels.cu, CUDA kernel code

Following on from this, I developed a generic 3D ADI solver for three-factor finance applications. As well as demonstrating the parallel solution of he sets of tridiagonal equations which arise from the ADI time-marching, this work also demonstrates my interest in developing high-level packages which can be used without detailed understanding of CUDA programming. The user supplies a C routine which defines the drift, volatility, correlation and source functions which define the 3D PDE, and then the package carries out the parallel execution.

With help from visiting students Abinash Pati and Vignesh Sunderam in the summer of 2008, I achieved a speedup of factor 30 on a 8800GT relative to a single thread on a Xeon:

- Oxford-Man Institute of Quantitative Finance
- NVIDIA through a CUDA Fellowship Award
- EPSRC, through funding for the Many-core and Reconfigurable Supercomputing Network
- Microsoft
- CRL, now part of TCS (Tata Consultancy Services)