[CPMD-list] Intel compiler 10 seg fault resolution on quadcore Xeon.

Joseph Hargitai joseph.hargitai at nyu.edu
Tue Oct 16 12:53:58 CET 2007



> you should also add the -DMYRINET define here. most infiniband 
> implementations not support calling SYSTEM() except for very, 
> very new kernel/driver combos. -DMYRINET will use alternate
> methods to achieve what otherwise is done with composing a
> small shell command in a string and then doing a CALL SYSTEM(string).

with or without the -DMYRINET - the speed is almost identical. few seconds.

> hmmmm... since you are on an intel quad-core, it might be worth
> trying out the intel MKL instead. that could be a bit faster...

do you have the flags to use MKL? We also have the cluster edition of MKL - did not try it yet.
I use:

     IRAT=2
     CFLAGS='-Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -DLINUX_IFC -DPOINTER8 -DFFT_DEFAULT \
         -DINTEL_MKL'
     FFLAGS='-pc64  -tpp6 -unroll'
     FFLAGS_GROMOS='-Dgood_luck $(FFLAGS)'
   LFLAGS='-L/usr/local/intel/mkl/10.023/lib/em64t -lmkl -lmkl_lapack -lmkl \
              -lm'


> not very surprising. it would actually be very interesting
> to compare 8 nodes with 4PE vs 4 nodes with 8PE. since in 
> the case of 4PE/node each of the MPI threads has twice the
> cache and CPMD will benefit from it. with 5PE/node two of
> them have to share the cache, and they should slow down all
> of it. with 8PE/node you have a lot of memory contention
> and collisions at the communication.

8 nodes 4 cpus/node  32 total cpu.

without Myrinet option
CPU TIME :    0 HOURS 16 MINUTES 22.95 SECONDS

with Myrinet option
CPU TIME :    0 HOURS 16 MINUTES 24.12 SECONDS 

It is quite slower than 8 nodes 6 cpus. (6 minutes slower) 


More information about the CPMD-list mailing list