next up previous contents index
Next: Mixed Shared/Distributed Memory Parallelization Up: CPMD on Parallel Computers Previous: Distributed Memory Parallelization using   Contents   Index

Shared Memory Parallelization

This strategy uses OpenMP directives embedded into the code (as comments) and thus needs a compiler that recognizes them when using the corresponding flag(s). With OpenMP the major parallelization strategy is to distribute loops across threads, i.e. different threads handle different values of a loop index. This can be done without a problem, for as long as no two threads write to the same memory location at the same time. To learn more about OpenMP, see e.g. http://www.openmp.org/.

The code is compiled without the -DPARALLEL preprocessor flag and compilation and linking needs the corresponding OpenMP flags (dependent on compiler). Since a significant part of the CPU time of a CPMD run is spent in performing FFTs and BLAS/LAPACK calls, it is imperative to have both parallelized with OpenMP as well to achieve maximum performance. OpenMP can incur a significant overhead from spawning and collecting threads, and not all time consuming parts of CPMD are suitable for OpenMP parallelization. As a consequence the MPI parallelization scheme is in general more efficient and scales much better. Depending on the overhead of the OpenMP system implementation good speedups can be achieved for small numbers of OpenMP threads (typically 4 to 8) at around 60-80% efficiency of the MPI parallelization. The advantages of this version of the code are small additional memory usage and it can be used in non-dedicated CPU environments.


next up previous contents index
Next: Mixed Shared/Distributed Memory Parallelization Up: CPMD on Parallel Computers Previous: Distributed Memory Parallelization using   Contents   Index
Costas Bekas 2008-09-04