[CPMD-list] TCPU

Daniel Sebastiani sebastia at mpip-mainz.mpg.de
Tue Oct 1 10:48:35 CEST 2002


Hello,

there are many aspects that determine the overall performance
of a parallel computer, but besides the speed of the
individual computing nodes, the network latency and throughput
is very important. With PC networks, it is usually a good idea
to invest into (expensive) high-performance interconnects
other than Ethernet. I do not know whether anyone already has
concrete experience with the new 1GB-Ethernet standard, but up
to now, one of the few relibale fast communication interfaces
for parallel computing is "MyriNet" (another one is called
"SCI"  but I do not know it very much). It is rated 2
Gigabit/s.

To give you an impression of the overall performance in a
large calculation, I am quoting below the timings for a given
system of about 4 GB total memory. As you can see, even with a
2 GBit/s network interface, a PC will typically spend 20% of
its time in communication, whereas the IBM power4 architecture
will only wait about 7-8%. BTW, the tasks are different, so
that total times cannot be compared.


Daniel.



IBM Regatta, 16 nodes:
 ****************************************************************
 SUBROUTINE            CALLS         CPU TIME        ELAPSED TIME
    FFT-G/S          1899032          5914.30             5942.00
 ...
 ----------------------------------------------------------------
 TOTAL TIME                          39748.15            39872.68
 ****************************************************************
 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE               56157. BYTES              48000.  =
 = BROADCAST                   4536. BYTES                231.  =
 = GLOBAL SUMMATION           25152. BYTES             191769.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM           550549. BYTES             633362.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE              834.526  MB/S           3.230 SEC  =
 = BROADCAST                  52.385  MB/S           0.020 SEC  =
 = GLOBAL SUMMATION           22.770  MB/S         847.320 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM           174.729  MB/S        1995.650 SEC  =
 = SYNCHRONISATION                                   2.580 SEC  =
 ================================================================




12-processor pentium + MyriNet:
 ****************************************************************
 SUBROUTINE            CALLS         CPU TIME        ELAPSED TIME
     FFTCOM           124845          3888.45             3889.89
   S_INVFFT            82000          3312.97             3314.76
 ...
 ----------------------------------------------------------------
 TOTAL TIME                          20782.73            20790.89
 ****************************************************************
 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE               74845. BYTES              52822.  =
 = BROADCAST                 319343. BYTES                810.  =
 = GLOBAL SUMMATION           74440. BYTES              17657.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM           797078. BYTES             124845.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE               75.235  MB/S          52.549 SEC  =
 = BROADCAST                  20.400  MB/S          12.680 SEC  =
 = GLOBAL SUMMATION            9.909  MB/S         475.514 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM            25.596  MB/S        3887.839 SEC  =
 = SYNCHRONISATION                                   1.369 SEC  =
 ================================================================







On Mon, 30 Sep 2002, HW Sheng wrote:

> Greetings, all.
>
> I am runing an ab intio job on a Linux/Alpha cluster. (10
> dual Ev 6 alpha processors + 3com network switch + LAM). I
> noticed that the TCPU of the output is around 20 seconds
> (per step), but the real time almost doubles the number.
> Does that mean that half of the running time was wasted on
> the data transfer? Did this happen to you before? I used
> netPIPE to check the network performance, and nothing unusal
> was detected. Thanks for your input.
>
> Howard Sheng
>

--------------------------------------------------------------
Daniel Sebastiani
Max-Planck-Institut            Dept. Prof. Spiess
für Polymerforschung           Phone  +49 6131 379 126
Ackermannweg 10                Fax    +49 6131 379 100
D-55128 Mainz, Germany         sebastia at mpip-mainz.mpg.de
--------------------------------------------------------------




More information about the CPMD-list mailing list