[CPMD-list] Intel compiler 10 seg fault resolution on quadcore Xeon.
Joseph Hargitai
joseph.hargitai at nyu.edu
Tue Oct 16 12:53:58 CET 2007
> you should also add the -DMYRINET define here. most infiniband
> implementations not support calling SYSTEM() except for very,
> very new kernel/driver combos. -DMYRINET will use alternate
> methods to achieve what otherwise is done with composing a
> small shell command in a string and then doing a CALL SYSTEM(string).
with or without the -DMYRINET - the speed is almost identical. few seconds.
> hmmmm... since you are on an intel quad-core, it might be worth
> trying out the intel MKL instead. that could be a bit faster...
do you have the flags to use MKL? We also have the cluster edition of MKL - did not try it yet.
I use:
IRAT=2
CFLAGS='-Wall'
CPP='/lib/cpp -P -C -traditional'
CPPFLAGS='-D__Linux -D__PGI -DLINUX_IFC -DPOINTER8 -DFFT_DEFAULT \
-DINTEL_MKL'
FFLAGS='-pc64 -tpp6 -unroll'
FFLAGS_GROMOS='-Dgood_luck $(FFLAGS)'
LFLAGS='-L/usr/local/intel/mkl/10.023/lib/em64t -lmkl -lmkl_lapack -lmkl \
-lm'
> not very surprising. it would actually be very interesting
> to compare 8 nodes with 4PE vs 4 nodes with 8PE. since in
> the case of 4PE/node each of the MPI threads has twice the
> cache and CPMD will benefit from it. with 5PE/node two of
> them have to share the cache, and they should slow down all
> of it. with 8PE/node you have a lot of memory contention
> and collisions at the communication.
8 nodes 4 cpus/node 32 total cpu.
without Myrinet option
CPU TIME : 0 HOURS 16 MINUTES 22.95 SECONDS
with Myrinet option
CPU TIME : 0 HOURS 16 MINUTES 24.12 SECONDS
It is quite slower than 8 nodes 6 cpus. (6 minutes slower)
More information about the CPMD-list
mailing list