[CPMD-list] how to speed up on amd64 machines

Bin Pan binpan at MIT.EDU
Tue Apr 17 16:10:18 CEST 2007


Hi Prof. Hutter and Dr. Kohlmeyer:

Thank you very much for replying.
I did some further observations (which may not totally reflect what 
you suggested) as below.
1. On Dual Core AMD Opteron(tm) Processor 870 connected by myrinet, 
the output looks like:
   NCPU     NGW     NHG  PLANES  GXRAYS  HXRAYS ORBITALS Z-PLANES
      0    4375   48825      24     214    1062      33       1
      1    4369   48782      24     213    1061      33       1
      2    4363   48827      24     214    1062      33       1
      3    4365   48823      24     214    1062      33       1
      4    4367   48819      24     214    1062      33       1
      5    4368   48822      24     212    1060      33       1
                 G=0 COMPONENT ON PROCESSOR :     1
while on 4 CPU's on a Intel(R) Xeon(TM) CPU 3.00GHz connected by ethernet
   NCPU     NGW     NHG  PLANES  GXRAYS  HXRAYS ORBITALS Z-PLANES
      0    6553   73223      36     322    1594      49       1
      1    6550   73185      36     319    1591      50       1
      2    6550   73246      36     320    1592      49       1
      3    6554   73244      36     320    1592      50       1
                 G=0 COMPONENT ON PROCESSOR :     1

2. For every MD cycle:
On Dual Core AMD Opteron(tm) Processor 870 connected by myrinet,
        710  0.07737   290.2    -408.64335    -408.54915 
-408.47177   0.153E+01   91.77
        711  0.07742   289.8    -408.64330    -408.54920 
-408.47177   0.154E+01   76.60
        712  0.07748   289.4    -408.64325    -408.54925 
-408.47177   0.154E+01  132.64
        713  0.07753   289.0    -408.64320    -408.54930 
-408.47178   0.154E+01   74.70
The TCPU varies a lot.
while on 4 CPU's on a Intel(R) Xeon(TM) CPU 3.00GHz connected by ethernet
        122  0.03874   335.5    -408.65297    -408.56072 
-408.52198   0.144E+00   64.90
        123  0.03842   333.9    -408.65206    -408.56040 
-408.52198   0.146E+00   65.10
        124  0.03808   332.2    -408.65113    -408.56006 
-408.52198   0.148E+00   65.03
        125  0.03774   330.5    -408.65020    -408.55972 
-408.52198   0.151E+00   64.90
The TCPU almost stayed as a constant.

3. I do not currently have the timing info for exactly the same MD 
steps. However,
On Dual Core AMD Opteron(tm) Processor 870 connected by myrinet for 
980 MD steps,
  TOTAL TIME                          80248.07            89842.31
  ****************************************************************

        CPU TIME :   22 HOURS 38 MINUTES 41.85 SECONDS
    ELAPSED TIME :   25 HOURS 20 MINUTES 49.45 SECONDS
  ***      CPMD| SIZE OF THE PROGRAM IS   65772/ 218376 kBYTES ***

while on 4 CPU's on a Intel(R) Xeon(TM) CPU 3.00GHz connected by 
ethernet for 1000 MD steps.

  TOTAL TIME                          68513.29            68799.50
  ****************************************************************

        CPU TIME :   19 HOURS 39 MINUTES  1.36 SECONDS
    ELAPSED TIME :   20 HOURS  7 MINUTES 12.96 SECONDS
  ***      CPMD| SIZE OF THE PROGRAM IS   78444/ 250344 kBYTES ***

I am going to try to recompile as what Dr. Kohlmeyer suggested for 
the flags. Also do some additional benchmarking.
BTW, one thing I guess that I did not mention is that on intel 
cluster, I used IFC to compile, while on AMD I used PGI.
I am wondering whether IFC can be used on AMD machines so that I have 
one more option to try out.

Thanks a lot and I do appreciate!
Best regards,
Bin.


At 12:27 4/16/2007, Juerg Hutter wrote:
>Hi
>
>your additional information indicates that there is a
>problem. However, we still need more information to
>locate the problem.
>In addition to Axel's advice and request,
>please send the timing information from an output with
>a full run.
>In addition it would be helpfull if you could run your
>example with the serial code and with the parallel code
>for 1, 2, 4, 8 cpus.
>
>regards
>
>Juerg Hutter
>
>
>----------------------------------------------------------
>Juerg Hutter                   Phone : ++41 44 635 4491
>Physical Chemistry Institute   FAX   : ++41 44 635 6838
>University of Zurich           E-mail: hutter at pci.unizh.ch
>Winterthurerstrasse 190
>CH-8057 Zurich, Switzerland
>----------------------------------------------------------
>
>
>On Mon, 16 Apr 2007, Bin Pan wrote:
>
>>Hi Prof. Hutter,
>>
>>I said that it is very slow because I used 6 CPU's on a Dual Core AMD
>>Opteron(tm) Processor 870 connected by myrinet and compared the
>>time required to finish one MD cycle with that done by 4 CPU's on a
>>Intel(R) Xeon(TM) CPU 3.00GHz connected by ethernet for the same system.
>>The first needs ~90 seconds, however the second only needs ~60.
>>I expected the first one has faster CPU and better network connection.
>>
>>Are these enough info for you to give me some advice in how to
>>improve the speed?
>>Thanks a lot!
>>
>>Best regards,
>>Bin.
>>
>>At 02:57 4/16/2007, Juerg Hutter wrote:
>>>Hi
>>>
>>>you have to give us more precise information.
>>>What exactly do you mean by 'it is very slow'.
>>>Do you mean slow in general, e.g. compared to
>>>another plane wave code or slow because
>>>of bad speedups in parallel compared to the
>>>serial runs?
>>>You should also give us some information on your
>>>computer. What is the exact version of your CPUs
>>>and what is the type of your network?
>>>At the end of each run CPMD prints timing information.
>>>Please attach at least one such timing information
>>>block for a run that you consider too slow.
>>>
>>>regards
>>>
>>>Juerg Hutter
>>>
>>>----------------------------------------------------------
>>>Juerg Hutter                   Phone : ++41 44 635 4491
>>>Physical Chemistry Institute   FAX   : ++41 44 635 6838
>>>University of Zurich           E-mail: hutter at pci.unizh.ch
>>>Winterthurerstrasse 190
>>>CH-8057 Zurich, Switzerland
>>>----------------------------------------------------------
>>>
>>>
>>>On Sun, 15 Apr 2007, Bin Pan wrote:
>>>
>>>>Hi CPMD users,
>>>>
>>>>I am wondering how to speed up CPMD runs on amd64 cluster using MPI
>>>>parallelization.
>>>>Previously I compiled CPMD using pgi compiler with acml. However, I
>>>>found it is very slow.
>>>>The flags I used are:
>>>>
>>>>CPPFLAGS = -P -C -traditional -D__Linux -D__PGI -DFFT_DEFAULT
>>>>-DPOINTER8 -D__pgf90 -DPARALLEL -DMYRINET
>>>>CC = gcc -O2 -Wall -m64
>>>>FC = pgf90 -c -fastsse -tp k8-64
>>>>LD = pgf90 -fastsse -tp k8-64
>>>>
>>>>Can you please let me know how to improve the performance?
>>>>Thanks a lot!
>>>>
>>>>Best regards,
>>>>Bin.
>>>>
>>>>_______________________________________________
>>>>CPMD-list mailing list
>>>>CPMD-list at cpmd.org
>>>>http://cpmd.org/mailman/listinfo/cpmd-list
>>
>>
>>
>>---------------------------------------------------------------------
>>Bin Pan
>>Ph.D. Candidate
>>Department of Chemical Engineering
>>Massachusetts Institute of Technology
>>Tel: 617-253-6675
>>E-mail: binpan at mit.edu
>>Room E19-528, 77 Mass. Ave. Cambridge, MA
>>----------------------------------------------------------------------
>>
>>_______________________________________________
>>CPMD-list mailing list
>>CPMD-list at cpmd.org
>>http://cpmd.org/mailman/listinfo/cpmd-list



More information about the CPMD-list mailing list