[CPMD-list] CPMD parallel scalability

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Mon Apr 14 15:06:20 CET 2008


On Mon, 14 Apr 2008, Maurice de Koning wrote:

maurice,

MdK> Hi all,
MdK> 
MdK> I´m running CPMD on an Altix 4700 system with 44 CPU´s and 88 Gb of RAM 
MdK> memory.
MdK> At the moment I´m running a CP MD run of a cell containing 96 water 
MdK> molecules using the
MdK> BLYP functional at 300 K. I noticed that the scalability is not very 
MdK> good. If I run on more than

please check carefully how your job is propagated through the
machine and what settings you use to compile and what tools.

i have access to an very new altix4700 and noticed some oddities.

- when using intel MKL you have to set OMP_NUM_THREADS to 1 or else
  MKL will try to multi-thread across the whole machine or at least
  across one blade (two dual-core cpus). if that overlaps with your
  MPI parallelization you are screwed.

  BTW: regardless of your sysadmins tell you, don't compile in OpenMP,
  and better link MKL without threading support. i tried a hybrid 
  compile and it does work, but its performance is inferior to MPI.

- make sure that you use SGI's MPI. i tried compiling my own MPI
  because of a bug in SGI's MPI that affects path-integrals in CPMD,
  but those jobs would not go across more than one blade (= 4cpus).

- check that you have enough memory (i.e. that nobody else is using
  excessive amounts of memory). using more cpus with increase the
  total memory usage and on top of that the SGI mpi will create
  large RDMA buffers across the whole address space for each MPI task
  unless instructed via environment variable to not do so.
  
MdK> about 16 CPU´s, the time per MD step starts even increases, such that 
MdK> the total time starts growing with the

on most linux machine the TCPU number is pretty much useless, 
particularly with multi-threading (as it includes the combined 
cpu time of all threads but not the time spent, e.g. swapping). 
always check the ELAPSED TIME at the end.

MdK> number of CPU´s. Is there anything I can do about this?

as alessandro already mentioned, your system should scale 
well. thus experience tells us that your scaling problems
are either a problem of the machine setup or of the way how
you run your job or of how you compiled the exectutable. unless
your provide more details, nobody will be able to give a 
specific advice. there is just too much guesswork needed.
 
MdK> Below is a part of the input script

this is useless and quoting incomplete inputs is turning into 
a IMNSHO really bad habit on this list. _if_ you made an error 
in the input it is most likely in the part that you didn't quote.

so either post the whole file, or make it available via some webserver 
or don't post anything, or even better use one of the test examples from 
CPMD-test archive. we know they work, everybody can download them if 
needed and many of us already have done tests with them.

thanks,
   axel.


MdK> 
MdK> Cheers,
MdK> 
MdK> Maurice
MdK> 

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.


More information about the CPMD-list mailing list