Dynamic execution of task DAGs

Dynamic execution of task DAGs

Numerical algorithms are sometimes defined as a task DAG (Directed Acyclic Graph) as a number of tasks with specified interdependencies. This project supports the dynamic execution of such algorithms on a GPU.

The emphasis in the code is on ease-of-use for the end user, so all of the DAG information is held within a single CUDA array allocated by the host function. Unfortunately, the consequence of this is that it is harder to understand the implementation because different parts of the array correspond to different aspects of the DAG.