Shared memory
Distributed memory
Data parallel
Task Based
Regular
Irregular
To study complex parallel programming patterns, abstract their components, and develop solutions that ease the implementation of programs that use the patterns.
Object oriented
Template metaprogramming
Interoperable with C
Variadic templates
Lambda functions
Divide-and-conquer
Amorphous data-parallelism
Dependent tasks
Conclusions and future work
subdivision
base case
reduction
High-level parallel programming models for parallel and distributed computing. Algorithmic skeletons take advantage of common programming patterns to hide the complexity of parallel and distributed applications. Starting from a basic set of patterns (skeletons), more complex patterns can be built by combining the basic ones.
Communication
Syncronization
Parallel execution flow
Name | Description | Arity | Assoc |
---|---|---|---|
fib | recursive computation of 43rd Fibonacci number | 2 | Yes |
merge | merge two sorted sequences of 100 million integers each | 2 | - |
qsort | quicksort of 10 million integers | 2 | - |
nqueens | N Queens solution count in 14 × 14 board | var | Yes |
treeadd | add values in binary tree with 24 levels | 2 | Yes |
bisort | sort balanced binary tree with 22 levels | 2 | No |
health | 2000 simulation steps in 4-ary tree with 6 levels | 4 | No |
tsp | traveling salesman problem on binary tree with 23 levels | 2 | No |
Given a set of active nodes and an ordering on active nodes, amorphous data-parallelism is the parallelism that arises from simultaneously processing active nodes, subject to neighborhood and ordering constraints.
The properties of a workitem are defined on a domain, that can be used to classify each workitem
Element classification
Work distribution
Conflict detection
Name | Description |
---|---|
Boruvka | Minimal spanning tree through successive edge-contraction |
Delaunay Mesh Refinement | Triangulation of a set of points that fulfills quality constraints |
Spanning Tree | Simple spanning tree of an unweighted graph |
Graph Labelling | Identify separated clusters in a graph |
Redirect
Over-decomposition
Worklist types
Domains
Clustered Domain
Clustered Tree Domain
DomainND
Each thread processes its assigned subdomain from the start
Threads with active workitems start processing. New workitems are sent to idle threads
Create more subdomains than threads to make better use of the scheduler. The Threading Building Blocks scheduler implements task-stealing, so it can profit from the excess parallelism.
Analysis of the signature of the function
Detection of risks
Dependencies based on the arguments
T is requested when spawn is called.
T is an ancestor of U if T requests U or an ancestor of U.
T is a descendant of U if U is an ancestor of T.
T logically precedes U if it is invoked before U in the sequential execution.
T dynamically precedes U if T is requested before U.
T has a dependency with U if they both access a memory position that one or both of them modify.
U dynamically precedes T, U is not an ancestor of T, and T has a dependency with U or
U is a descendant of a task that fulfills condition 1.
Similar to structures when statically declared.
Dynamic arrays lack size information.
Algorithms may operate on regions that do not overlap.
C++ library for numeric computing
Provides classes for Arrays
Allows to declare sub-Arrays that reference other Arrays
A Range class stores the positions of theses sub-Arrays
Barrier for all threads.
Wait while the given variables are in use.
Wakes up the tasks waiting for the given variables before the current task ends.
Name | Description |
---|---|
Barnes-Hut N-Body simulation | Simulates a dynamical system of bodies under the influence of physical forces |
LU decomposition | Factorizes a matrix as a lower triangular matrix and an upper triangular matrix |
Cholesky decomposition | Factorizes a matrix as a lower triangular matrix and its conjugate transpose |
Sylverster equations resolution | Solves the matricial equation AX + XB = C on X |
Analysis of three kinds of problems with complex patterns of parallelism.
Divide-and-conquer.
Amorphous data parallelism.
Dependent tasks.
Design and implementation of libraries that abstract their characteristics, including one of the first parallel skeletons for amorphous data parallelism.
Evaluation of those libraries in terms of programmability and performance.
Expansion of the set of libraries.
Support for distributed memory systems.
Increase of the configuration capabilities.
Support for futures in DepSpawn.
Carlos H. González, Basilio B. Fraguela, Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular Computations, 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 119–127, 2015
Carlos H. González, Basilio B. Fraguela, An Algorithm Template for Domain-Based Parallel Irregular Algorithms, International Journal of Parallel Programming, 42 (6), pp. 948–967, 2014
Carlos H. González, Basilio B. Fraguela, A framework for argument-based task synchronization with automatic detection of dependencies, Parallel Computing, 39 (9), pp. 445–489, 2013
Carlos H. González, Basilio B. Fraguela, An Algorithm Template for Domain-Based Parallel Irregular Algorithms Proceedings of the International Symposium on High-level Parallel Programming and Applications (HLPP2013), 2013
Carlos H. González, Basilio B. Fraguela, A framework for argument-based task synchronization, Proceedings of the 16th Workshop on Compilers for Parallel Computing (CPC’12), 2012
Carlos H. González, Basilio B. Fraguela, An algorithm template for parallel irregular algorithms, Proceedings of the ACACES 2011, pp. 163–166., 2011
Carlos H. González, Basilio B. Fraguela, A generic algorithm template for divide-and-conquer in multicore systems, Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications 2010 (HPCC’10), pp. 79–88, 2010
Carlos H. González, Basilio B. Fraguela, Una plantilla genérica para el patrón de paralelismo divide-y-vencerás en sistemas multinúcleo, Actas das XXI Jornadas de Paralelismo (JP’10), pp. 19–26, 2010