The code is compiled without the -DPARALLEL preprocessor flag and compilation and linking needs the corresponding OpenMP flags (dependent on compiler). Since a significant part of the CPU time of a CPMD run is spent in performing FFTs and BLAS/LAPACK calls, it is imperative to have both parallelized with OpenMP as well to achieve maximum performance. OpenMP can incur a significant overhead from spawning and collecting threads, and not all time consuming parts of CPMD are suitable for OpenMP parallelization. As a consequence the MPI parallelization scheme is in general more efficient and scales much better. Depending on the overhead of the OpenMP system implementation good speedups can be achieved for small numbers of OpenMP threads (typically 4 to 8) at around 60-80% efficiency of the MPI parallelization. The advantages of this version of the code are small additional memory usage and it can be used in non-dedicated CPU environments.