[CPMD-list] error in BO-PIMD
Axel Kohlmeyer
akohlmey at cmm.chem.upenn.edu
Sun Jun 10 21:32:54 CEST 2007
On Sun, 10 Jun 2007, qfzhang wrote:
QZ> Hi,
QZ> Thanks for your advice! Can I sovle the problem by rewriting some part of the
QZ> program? I have add some "CALL MY_SYNC(SUPERGROUP)" sentence in pi-diag.F, but
if it were that simple to fix, i'd have done it already. :)
the problem seems to arise from different replica needing to
do a different number of wfopt steps and some implicit synchronization
because of that. the code in pi_diag.F is just the starting point...
QZ> it seems not to work. And you also mentioned the compiler. Can I solve the probl
QZ> em by some change during compiling?
no. please read my reply more carefully. the remark about the compilers
(actually, it is about the runtime library default behavior of the
compiler implementation) was only to explain, why you did see files
with 0 byte length.
to fix this problem, you need to trace the MPI calls and
find the exact place, where the code deadlocks. how to do
this cannot be explained in a few lines and is also very
MPI library specific, so if you want to fix it, you have to
dig out the corresponding information from various places
(MPI library documentation, the web, tutorial literature etc.).
cheers,
axel.
QZ>
QZ> Best wishes
QZ> Qianfan Zhang
QZ>
QZ> Axel Kohlmeyer д:
QZ>
QZ> > On Sat, 9 Jun 2007, qfzhang wrote:
QZ> >
QZ> > QZ> Hi,
QZ> > QZ> So sorry for that. It is very strange that when running BO-PIMD job,noth
QZ> ing is
QZ> > QZ> written to the output file since "force initialization", and nothing to t
QZ> he fil
QZ> > QZ> e TRAJECTORY and ENERGY. But the job will not stop until the walltime limi
QZ> t, and
QZ> >
QZ> > hi,
QZ> > this is not strange, you just discovered a \'deadlock\' bug due to
QZ> > a so-called race condition. this can happen with PI-MD, when the
QZ> > individual replica take significantly different time to do some
QZ> > work yet the code is written in a way that expects about the same
QZ> > time spent.
QZ> >
QZ> > you don\'t see any output to the files, since you compiler defaults
QZ> > to buffered output (inded something is written, but the first MD
QZ> > step stalls, at least when trying to reproduce it on my machine).
QZ> >
QZ> > QZ> no error message.But when specify PORCESSOR GROUP=1,no problem.when I use
QZ> CP-PI
QZ> >
QZ> > with no processor groups there is no parallelization over replica,
QZ> > and it seems that exactly that is causing the problems. with CP-MD
QZ> > all operations take about the same time per replica, but with BO-MD
QZ> > this is not always the case (different number of WF-opt steps for
QZ> > different replica).
QZ> >
QZ> > QZ> MD for calculations, no such problem. So I really don\'t know what\'s wron
QZ> g with i
QZ> > QZ> t.the output file is as below.
QZ> >
QZ> > the cause can probably be found or narrowed down by tracing
QZ> > the parallelization in pi_diag.F.
QZ> >
QZ> > please note, that even though your job appears to be working, all
QZ> > it does, is checking for the other parts to communicate which are
QZ> > waiting for the first nodes in return (=> deadlock).
QZ> >
QZ> > cheers,
QZ> > axel.
QZ> >
QZ> > [...]
QZ> >
QZ> > QZ> > --
QZ> > QZ> > =======================================================================
QZ> > QZ> > Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
QZ> > QZ> > Center for Molecular Modeling -- University of Pennsylvania
QZ> > QZ> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
QZ> > QZ> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
QZ> > QZ> > =======================================================================
QZ> > QZ> > If you make something idiot-proof, the universe creates a better idiot.
QZ> > QZ>
QZ> > QZ>
QZ> > QZ>
QZ> >
QZ> > --
QZ> > =======================================================================
QZ> > Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
QZ> > Center for Molecular Modeling -- University of Pennsylvania
QZ> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
QZ> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
QZ> > =======================================================================
QZ> > If you make something idiot-proof, the universe creates a better idiot.
QZ> >
QZ>
QZ>
QZ>
--
=======================================================================
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the CPMD-list
mailing list