[CPMD-list] A missing library?

Axel Kohlmeyer axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Sun May 29 09:30:38 CEST 2005


On Sat, 28 May 2005, Carl Krauthauser wrote:

dear carl,

CK> > difference should be quite small, since a p3 and an athlon are
CK> > quite similar architectures as far as ATLAS is concerned).
CK> 
CK> No, I compiled the ATLAS libraries as well as the LAPACK libraries using 
CK> the new IFC (see attached make.inc files).  I am very puzzled now.

there are a few subtleties involved here. ATLAS does ship with a
partial LAPACK library containing some functions, that already take
direct advantage of the ATLAS tuning infrastructure. but to get a
full LAPACK, you have to _merge_ it with a standard LAPACK. this is
not as bad in terms of performance, as it initially may sound, since
most of the speed of LAPACK originates in the fact, that it uses
BLAS extensively, and so with a well optimized BLAS (as ATLAS provides
it), you'll get a good performance. if you just have two separate
lapack libraries, then you'll either get a suboptimal performance,
when you give the standard lapack library first, since you skip the
tuned parts from ATLAS, or you're missing some parts.

your linker sequence and the error message, however, suggest that
you are just using the minimal LAPACK bundled with ATLAS. in that
case, there seems to be a problem with the compilation. as the c-compiler
part, seems to be using the g77 conventions, but the fortran part
is using different conventions. my suggestion is to compile ATLAS
with g77. the fortran compiler has _no_ impact on the speed of
atlas, as the fortran parts are just wrappers and since BLAS and LAPACK
do not contain any symbols with underscores, the resulting binary
should be compatible with ifort as well. but please note, that to
get the best performing ATLAS for CPMD on a (dual-)athlon machine
you have to be extremely careful. especially, you should not(!) compile
a multithreaded library. the performance gain is small compared
to running MPI locally. if at all, it only makes sense in combination
with an OpenMP compile, but then you should use MKL, as ATLAS cannot
detect, whether it was called from an OpenMP block and thus will
always be using two threads, MKL however will only use one thread
and be thus more efficient. performance of the p3 MKL is roughly
the same as with a well tuned ATLAS and then there is the issue
of the (sometimes) inconsistent 3dnow!/fp-register handling on dual 
athlon machines...

in case you really did the merge, as it is described in the ATLAS
docs, then you may have mixed up some g77 compiled library with
an ifort compiled library, or been reusing flags from a previous
compile that were directing the c/f77 interface to use g77 conventions.
the problem obviously is in the c-compiled part of ATLAS, not the
fortran part. 

to cut a long story short. the speed differences between the different
libraries (ATLAS, MKL, ACML, GOTO-dgemm) for any large package program
are usually not so large. especially on 32-bit x86 platforms with
the extreme lack of registers, the main speed gain is from optimizing
memory bandwidth and cache use, this needs about the same strategy
for most x86 platforms (with the exception of the pentium-4/xeon).

best regards,
	axel.

[...]

-- 
=======================================================================
Dr. Axel Kohlmeyer   e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Lehrstuhl fuer Theoretische Chemie          Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53         Fax:   ++49 (0)234/32-14045
D-44780 Bochum  http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.




More information about the CPMD-list mailing list