[CPMD-list] SOFT EXIT REQUEST
Axel Kohlmeyer
axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Fri Aug 27 10:29:42 CEST 2004
>>> "GG" == Griselda Garcia <ggarcia at fis.puc.cl> writes:
GG> Hello! Axel
GG> Thank you for your help!
GG> The system has Red Hat 9.0 SMP (kernel 2.4.20-8smp) and the program was
GG> compiled using MPICH 2.5.5 and shared memory.
GG> The cluster consists in 10 dual machines and the processors are intel.
GG> Could be possible that CPMD code catches a SIGXCPU, SIGPWR, SIGHUP,
GG> SIGUSR1, or SIGUSR2 signals due to the fact that the machines use
GG> multithreading?? How could I resolve this?
hmmm,
some of your statements make sense to me, some don't.
let's see:
a) multithreading only happens on the 'application level',
so whether you have an SMP machine or your _application_
is multithreaded, these do not technically require each other.
however, if you have application that is multithreaded, you
_can_ (but not always will!) benefit from an SMP machine.
b) what do you mean by MPICH and shared memory?
i hope you mean, that the code will use shared memory
for intra-node and tcp/ip for inter-node communication.
c) as i wrote before. you can disable the code in cpmd.F
and then recompile. then signals won't be trapped. they
will be delivered _nevertheless_. some (e.g. SIGUSR1/2)
will be ignored by default, others (e.g. SIGXCPU) will cause
'instant suicide'. more important is finding the reason.
the fact, that you only receive them after the job is running
for some time, it points to the qeueing system (if it is
able to monitor parallel jobs, e.g. by using mpiexec) or
some kind of watchdog daemon, that monitors the machines for
non-batch cpu-time eaters (if the parallel executables are
not launched via the batch system).
one final note: depending on the specific combination
of cpu/mainboad/memory you _may_ find, that your cpmd jobs
will run faster, if you only use _one_ cpu per node.
cf. http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/cpmd-bench.html#smpovr
the xeon test was done on a somewhat older machine and
it is supposed to be better on the newer cpus/mainboards, but
i had so far no chance to test it. remember that you always
compare wall times, TCPU is sometimes not very useful as
it is only the cpu time of the master thread on the master node.
under some circumstances, a combined OpenMP/MPI approach
may be slightly better, but on current linux machines openmp
is a bit tricky, since not all openmp directives in cpmd are
handled correctly by current linux fortran compilers.
also the thread creation overhead and the memory bandwidth
limits are a problem. most of the time it makes only sense
to try openmp when the mpi parallelization has scaled out.
regards,
axel.
GG> Your help is very appreciated!
GG> Griselda.
GG> _______________________________________________
GG> CPMD-list mailing list
GG> CPMD-list at cpmd.org
GG> http://cpmd.org/mailman/listinfo/cpmd-list
--
=======================================================================
Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the CPMD-list
mailing list