[CPMD-list] help for low cpu efficiency
=?gb2312?B?wu3J0NLl?=
shyma at imr.ac.cn
Mon Jun 13 11:16:58 CEST 2005
Dear Axel:
Thanks for your last resply for my question about low cpu efficiency in the computation! Now I will describe my small cluster in details and I think it would help you locate the problem!
My NFS NIS cluster constist of 5 P4 PC machines(single processor), tf0 is the master node and it share its home directory to other slave nodes tf1 tf2 tf3 tf4. And I link the directory /usr/local to /home directory so the user can can install the program to the /usr/local in anynode! The network cards is Realtek RTL8319 Family PCI Fast Ethernat NIC(100 Mbps) and the switch is 3C16980A (10/100 Mbps,24 ports),no hub. I install the pgi5.2 under the directory of /usr/local on the master node and compile the lam-7.1.1 with pgf90 under the /usr/local/ directory, the user can use them on anynode. Then I get the cpmd.x exectallbe with the configre file PC-PGI-MPI. Lastly, I copy the cpmd.x to every node under the /bin.(Need I copy it to every node? It didn't run if I just copy it to master node under the share directory /usr/local/bin and export it path. But why I needn't copy pgi and to evergy nose? )
My cluster is just like what I describe above! But I find its cpu efficiency is very low in the lam parallel enviroment. The cpu time is equal to elapsed time when I use the single machine to run cpmd.x (the input file is cpmd-test file al001geo.inp). But if I use two or three machines to run it, the elapsed time is about 2 or 3 times than cpu time. The time part is like this:
single machine:
****************************************************************
* *
* TIMING *
* *
****************************************************************
SUBROUTINE CALLS CPU TIME ELAPSED TIME
S_INVFFT 359172 1431.84 1442.89
S_FWFFT 322518 1352.48 1362.99
FFT-G/S 2049258 1241.91 1242.14
EHPSI_C 27459 303.83 303.93
EVPSI 248918 220.83 218.67
OVLAP2_C 28289 124.48 181.66
VBETA 2930 122.21 121.11
FRIESNER_C 2930 97.21 96.76
FFTCOM 683784 64.48 66.20
RHOOFR_C 294 59.46 59.77
OVLAP_H 6060 19.88 19.96
RGS_C 2958 18.92 23.18
----------------------------------------------------------------
TOTAL TIME 5057.54 5139.27
****************************************************************
tf0 and tf1 nodes:
****************************************************************
* *
* TIMING *
* *
****************************************************************
SUBROUTINE CALLS CPU TIME ELAPSED TIME
S_INVFFT 359172 713.14 731.76
S_FWFFT 322518 682.59 700.11
FFT-G/S 2049258 348.21 358.42
EHPSI_C 27459 133.98 134.44
EVPSI 248918 90.77 92.18
FFTCOM 683784 63.16 1772.80
OVLAP2_C 28289 55.99 84.66
VBETA 2930 49.19 49.31
FRIESNER_C 2930 48.24 51.46
RHOOFR_C 294 21.82 22.30
RGS_C 2958 9.75 12.25
OVLAP_H 6060 8.02 8.57
----------------------------------------------------------------
TOTAL TIME 2224.86 4018.26
****************************************************************
tf0 tf1 and tf2 nodes
****************************************************************
* *
* TIMING *
* *
****************************************************************
SUBROUTINE CALLS CPU TIME ELAPSED TIME
S_INVFFT 359175 377.80 396.70
S_FWFFT 322521 362.50 369.35
FFT-G/S 2049276 179.85 186.67
EHPSI_C 27459 90.71 94.63
FFTCOM 683790 67.74 2216.95
OVLAP2_C 28289 33.68 51.80
FRIESNER_C 2930 33.47 40.22
EVPSI 248921 33.09 34.26
VBETA 2930 25.33 25.35
RHOOFR_C 294 13.19 12.74
GLOSUM 402851 9.11 113.39
RGS_C 2958 6.46 8.29
OVLAP_H 6060 5.52 5.57
----------------------------------------------------------------
TOTAL TIME 1238.45 3555.94
And you can find that the FFTCOM and GLOSUM parts(what's the meaning of them?) cost much elapsed time! So how and what should I do if I want to improve the cpu efficiency and reduce the elapsed time? Any ideas will be appreciated!
Thanks in advance !
Best wishes!
shyma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cpmd.org/pipermail/cpmd-list/attachments/20050613/6b632d60/attachment.html
More information about the CPMD-list
mailing list