[CPMD-list] TCPU
Daniel Sebastiani
sebastia at mpip-mainz.mpg.de
Tue Oct 1 10:48:35 CEST 2002
Hello,
there are many aspects that determine the overall performance
of a parallel computer, but besides the speed of the
individual computing nodes, the network latency and throughput
is very important. With PC networks, it is usually a good idea
to invest into (expensive) high-performance interconnects
other than Ethernet. I do not know whether anyone already has
concrete experience with the new 1GB-Ethernet standard, but up
to now, one of the few relibale fast communication interfaces
for parallel computing is "MyriNet" (another one is called
"SCI" but I do not know it very much). It is rated 2
Gigabit/s.
To give you an impression of the overall performance in a
large calculation, I am quoting below the timings for a given
system of about 4 GB total memory. As you can see, even with a
2 GBit/s network interface, a PC will typically spend 20% of
its time in communication, whereas the IBM power4 architecture
will only wait about 7-8%. BTW, the tasks are different, so
that total times cannot be compared.
Daniel.
IBM Regatta, 16 nodes:
****************************************************************
SUBROUTINE CALLS CPU TIME ELAPSED TIME
FFT-G/S 1899032 5914.30 5942.00
...
----------------------------------------------------------------
TOTAL TIME 39748.15 39872.68
****************************************************************
================================================================
= COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
= SEND/RECEIVE 56157. BYTES 48000. =
= BROADCAST 4536. BYTES 231. =
= GLOBAL SUMMATION 25152. BYTES 191769. =
= GLOBAL MULTIPLICATION 0. BYTES 1. =
= ALL TO ALL COMM 550549. BYTES 633362. =
= PERFORMANCE TOTAL TIME =
= SEND/RECEIVE 834.526 MB/S 3.230 SEC =
= BROADCAST 52.385 MB/S 0.020 SEC =
= GLOBAL SUMMATION 22.770 MB/S 847.320 SEC =
= GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
= ALL TO ALL COMM 174.729 MB/S 1995.650 SEC =
= SYNCHRONISATION 2.580 SEC =
================================================================
12-processor pentium + MyriNet:
****************************************************************
SUBROUTINE CALLS CPU TIME ELAPSED TIME
FFTCOM 124845 3888.45 3889.89
S_INVFFT 82000 3312.97 3314.76
...
----------------------------------------------------------------
TOTAL TIME 20782.73 20790.89
****************************************************************
================================================================
= COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
= SEND/RECEIVE 74845. BYTES 52822. =
= BROADCAST 319343. BYTES 810. =
= GLOBAL SUMMATION 74440. BYTES 17657. =
= GLOBAL MULTIPLICATION 0. BYTES 1. =
= ALL TO ALL COMM 797078. BYTES 124845. =
= PERFORMANCE TOTAL TIME =
= SEND/RECEIVE 75.235 MB/S 52.549 SEC =
= BROADCAST 20.400 MB/S 12.680 SEC =
= GLOBAL SUMMATION 9.909 MB/S 475.514 SEC =
= GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
= ALL TO ALL COMM 25.596 MB/S 3887.839 SEC =
= SYNCHRONISATION 1.369 SEC =
================================================================
On Mon, 30 Sep 2002, HW Sheng wrote:
> Greetings, all.
>
> I am runing an ab intio job on a Linux/Alpha cluster. (10
> dual Ev 6 alpha processors + 3com network switch + LAM). I
> noticed that the TCPU of the output is around 20 seconds
> (per step), but the real time almost doubles the number.
> Does that mean that half of the running time was wasted on
> the data transfer? Did this happen to you before? I used
> netPIPE to check the network performance, and nothing unusal
> was detected. Thanks for your input.
>
> Howard Sheng
>
--------------------------------------------------------------
Daniel Sebastiani
Max-Planck-Institut Dept. Prof. Spiess
für Polymerforschung Phone +49 6131 379 126
Ackermannweg 10 Fax +49 6131 379 100
D-55128 Mainz, Germany sebastia at mpip-mainz.mpg.de
--------------------------------------------------------------
More information about the CPMD-list
mailing list