Next: Preprocessor Flags
Up: Compiling CPMD
Previous: Compiling CPMD
Contents
Index
There are four main areas where one can improve the CPMD performance:
- -
- by using an optimized BLAS/LAPACK library. Especially an optimized set
of the DGEMM and DGEMV subroutines can improve the CPMD performance significantly.
When alternatives are available, it is worth doing some test runs: not always is
the faster library in theory the fastest solution in practice.
- -
- by using an optimized FFT library. There are a number of vendor
optimized and Open Source FFT libraries available, several of which are supported
by the current CPMD code, but also the integrated FORTRAN FFT code (-DFFT_DEFAULT)
is can be pretty competitive. There is a parameter, NCACHE, in the file
mltfft.F which can be optimized for your platform.
- -
- by using a suitable set of compiler flags. Although the Configure script
will provide you with a set of optimizing compiler flags, you should better check
with your compiler manual/manpage whether those are applicable for your platform.
Most importantly you should set the appropriate flags for your CPU, but also - as
with many floating-point intensive codes - turning on features like loop unrolling
may improve CPMD performance. On the other hand, too high optimization levels seem
to slow down execution, and also introduce a higher risk of miscompilation (the compiler
changes the semantics of the code in a way that the computations give wrong results).
With modern CPUs memory bandwidth and good CPU-cache utilization can be very important
and lower optimization level usually generate more compact and cache friendly code.
- -
- by using alternate algorithms at runtime. Keywords like REAL SPACE WFN KEEP,
MEMORY, TASKGROUPS, or DISTRIBUTED LINALG allow
to trade off performance against memory usage or use alternate algorithms that
parallelize differently and have more or less overhead.
Next: Preprocessor Flags
Up: Compiling CPMD
Previous: Compiling CPMD
Contents
Index
Costas Bekas
2008-09-04