Parallel computing is an important component of large-scale CFD computations. 20 years ago, the advent of massively-parallel machines such as the Connection Machine suggested that entirely new algorithms would be required in order to obtain good parallel speedup. Instead, the size of parallel computer used for most CFD computations has remained modest, with 4-16 processors being typical, and 64 processors being the most used in industry. In particular, distributed-memory PC clusters have come into widespread use in industry as well as in academia in the past 10 years, and give the highest price performance ratio. Accordingly, the most efficient CFD algorithms for sequential computing are usually also the best choice for parallel computing. There are exceptions to this, but in general I think this is an accurate statement. It is also usually clear where the inherent parallelism lies within an algorithm, and how the data should be partitioned for distributed-memory parallel execution. In practice, the headache is in the nuts-and-bolts of writing the parallel code. Therefore, I believe the main research focus in parallel computing needs to be on ways of easing the burden of application developers.
In the long term, the answer may be parallelising compilers, perhaps with the assistance of user-inserted compiler directives. HPF was an interesting initiative in this direction, but it was relatively unsuccessful. Writing a more general parallelising compiler which will recognise data structures and choose an appropriate data partitioning for distributed-memory execution is an even more difficult task.
In the shorter term, I think the answer is parallel libraries, created to simplify the task of parallelising certain specific classes of algorithms. This is the approach Paul Crumpton, David Burgess and I took in 1993 in developing OPlus, the Oxford Parallel Library for Unstructured grid Solvers. As its name suggests, it addresses the needs of algorithms on unstructured grids. It is based on the idea of a number of sets (such as nodes, edges, cells) connected by pointers (such as cell-node pointers). The key restriction which enables parallelisation is that operations are performed over all members of a set, and the result is independent of the order in which the members are processed. This restriction is satisfied by all explicit algorithms, and algorithms such as multigrid using explicit smoothers.
The OPlus library is the parallel framework on which the HYDRA CFD code has been developed. More recently, as part of the HYDRA development which now encompasses a number of university groups within the UK, Nick Hills at the University of Sussex has ported OPlus from PVM to MPI.
The OPlus research was jointly funded by Rolls-Royce plc, EPSRC and DTI.