Task-based parallel programming is a common approach to using modern multicore architectures efficiently. Hereby a programmer describes the computation as a set of possibly nested tasks and their dependencies. The dependencies can be dynamic, meaning that they can only be discovered at runtime. Dynamic dependencies can be expressed with the future construct, which comes in several variants. The C++ standard, for instance, defines (shared) futures that may be stored in data structures, accessed by multiple tasks, and filled through an associated promise that can be transferred between tasks. These futures cannot be instantiated with incomplete types, however. Recent algorithmic research suggested that both the features of C++ futures and support for incomplete types are necessary to enable nested futures for the synchronization of nested tasks. This paper describes the first implementation of such futures, called flex-futures, in the Taskflow programming system. It describes the corresponding extensions of the Taskflow programming model, user interface, and runtime system. The extended system is evaluated with a benchmark that mimics the LU decomposition of hierarchical matrices. We found that flex-futures come with a higher overhead than static dependencies, but still achieve comparable performance while offering greater flexibility.
Nather et al. (Wed,) studied this question.