[CPMD-list] Help,some question about cpmd-3.9.2
Axel Kohlmeyer
axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Thu Jun 9 10:51:48 CEST 2005
On Thu, 9 Jun 2005, [gb2312] ÂíÉÐÒå wrote:
> Dear Axel and other CPMD users:
> Recently I completed my small NFS cluster which constist of 4 P4
> 2.4G PC machines and compilied the new version CPMD 3.9.2. This is my
are those single or dual processor machines?
what kind of interconnect (network cards/switch/hub) do you have?
> compilation steps: firstly I install the pgi5.2 compiler and then
> compile lam-7.1.1 with pgf90, lastly I use the configure file PC-PGI-MPI
> and get the cpmd.x executalbe. There is some questions appearing after
> some exemple testing with the new cpmd.x.
> The first question is I find the cpu efficiency is very low if I use
> the lamboot machicefile like this:
> "tf0
> tf1
> tf2
> tf3"
> it's just about 25%. So I assumed I have 4 cpu in per computer and write
> the lamboot machinefile like this:
how do you determine the 25%? top is a bad indicator.
if your mpi library does call sched_yield() the %CPU value
in top is misleading (and lam does just that so that other
processes can use the remaining cpu power machine!
MPICH does busy looping, i.e. wastes the remaining cpu time).
please compare ELAPSED TIME and CPU TIME at the end of a cpmd
job to a serial job.
you probably have a very slow network and CPMD does have
to wait a lot.
please do a 'cat /proc/cpuinfo' to see how many processors per
node you really have.
> "tf0
> tf0
> tf0
> tf0
> tf1
> .......
> tf3"
> in this case, the total cpu efficiency is about 80%. Could anyone tell
> me if I want to improve the cpu effiency what shout I do? Is there
> anything wrong in my cluster or my lamboot machinefile?
to be able to make any recommendation, we need to know
more details about your hardware.
> The second question is that I use the cpmd-test input file under the
> directory of /CPMD-test/metadynamics/ANALYSIS, the resutlt I get about
> the total energy is -25.80212091 A.U. and the result from the origin
> test-out is -25.80212047 A.U., does it mean that my compilation is
> sucessful though there is little difference?
when you run in parallel (or using different LAPACK/FFT libraries or
compiliers) there are small differences. you always have to compare
the difference to the wavefunction convergence threshold. especially
for isolated systems, you have the largest chance of getting differences
in the total energies due to the large amount of noise in the vacuum area.
when running with a different numbers of nodes (or serially) the
electron density is scattered across the mpi-nodes and the energies are
summed up per mpi-node and then globally. this will always give small
differences due to the limited numerical accuracy in floating point
calculations.
> The third question is I find I take about 7 hour to complete the above test example with 4 nodes and it's just 1 hour in the cpmd-test. It displays in my outfile is
>
> " CPU TIME : 0 HOURS 31 MINUTES 43.19 SECONDS
> ELAPSED TIME : 7 HOURS 14 MINUTES 10.19 SECONDS
with four copies per node you spend about 31.75*4 minutes
= 127 minutes cpu time per node, yet the whole job
takes 434 minute. therefore you still have a 30% 'efficiency'.
without knowing any details about the hardware, i would suspect,
that you have a band or not sufficient interconnect. please
try a very small example (e.g. one of the tutorial examples from
my home page and test a series with 'mpirun -np 1 cpmd.x',
'mpirun -np 2 cpmd.x', 'mpirun -np 3 cpmd.x', and 'mpirun -np 4 cpmd.x'
to see when and where your network 'shuts down'.
> "
> and in the origin test-outfile it shows
>
> " CPU TIME : 0 HOURS 42 MINUTES 22.85 SECONDS
> ELAPSED TIME : 1 HOURS 0 MINUTES 52.24 SECONDS
> "
> I don't know how many nodes in the origin cpmd-test computation, but I
you can easily see from the output file. there is a block bracketed
with PARAPARAPARAPARA that tells you how the job is distributed across
the MPI nodes.
> think I waste much of time on something eles. So how and what should I
> do if I want to improve my cpmd.x performance? Please help me if you
> have any advice!
> The fourth question is that I compiled two different cpmd
> executable, cpmd1.x (7712K) and cpmd2.x (10148K). I get the cpmd1.x with
> the default blas/lapck from the linux installation and the cpmd2.x with
> the "Optimized LAPACK/BLAS/ATLAS Library Binaries - libatlas_p4.a " from
> Axel web. I find the cpmd1.x and cpmd2.x have about the same computation
> time in the above example testing. Does it mean there is something wrong
> in my compilation? The following is my Makefile, please check and
no your code is waiting so much for the network, that the speed of
blas lapack does not matter.
regards,
axel.
> correct it if anyone have good advice to improve its performance:
--
=======================================================================
Dr. Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the CPMD-list
mailing list