[CPMD-list] how to speed up on amd64 machines

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Tue Apr 17 15:29:49 CEST 2007


On Tue, 17 Apr 2007, Bin Pan wrote:

dear bin,

BP> 2. For every MD cycle:
BP> On Dual Core AMD Opteron(tm) Processor 870 connected by myrinet,
BP>         710  0.07737   290.2    -408.64335    -408.54915 
BP> -408.47177   0.153E+01   91.77
BP>         711  0.07742   289.8    -408.64330    -408.54920 
BP> -408.47177   0.154E+01   76.60
BP>         712  0.07748   289.4    -408.64325    -408.54925 
BP> -408.47177   0.154E+01  132.64
BP>         713  0.07753   289.0    -408.64320    -408.54930 
BP> -408.47178   0.154E+01   74.70
BP> The TCPU varies a lot.

so what BLAS/LAPACK did you use? how about memory usage?
is somebody else using those machines? or are there 'runaway'
processes? those fluctuations in time are a sign that something
is fishy on that machine.

BP> On Dual Core AMD Opteron(tm) Processor 870 connected by myrinet for 
BP> 980 MD steps,
BP>   TOTAL TIME                          80248.07            89842.31
BP>   ****************************************************************
BP> 
BP>         CPU TIME :   22 HOURS 38 MINUTES 41.85 SECONDS
BP>     ELAPSED TIME :   25 HOURS 20 MINUTES 49.45 SECONDS

this is worriying. you should not lose almost three hours
of wall time with a fast network. please check those machines
carefully. are you _really_ using the myrinet? please check
with some MPI benchmarks.

[...]
BP> I am going to try to recompile as what Dr. Kohlmeyer suggested for 
BP> the flags. Also do some additional benchmarking.
BP> BTW, one thing I guess that I did not mention is that on intel 
BP> cluster, I used IFC to compile, while on AMD I used PGI.
BP> I am wondering whether IFC can be used on AMD machines so that I have 
BP> one more option to try out.

you can use the em64t version of intel fortran also on AMD processors
and AMD now ships an version of ACML compiled with them (checked with 
version 3.6). i would not expect miracles from switching the compiler.
the 74-76 seconds TCPU look quite reasonable (due to memory bandwidth
limitations, you will get only about 1.5 times the single cpu 
performance out of a dual core cpu (i.e. ~0.75 per core). so i would
try to investigate what is causing the fluctuations.

cheers,
   axel.

BP> 
BP> Thanks a lot and I do appreciate!
BP> Best regards,
BP> Bin.

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.



More information about the CPMD-list mailing list