[CPMD-list] p4_error: : 14
Martijn Zwijnenburg
M.A.Zwijnenburg at tnw.tudelft.nl
Mon Jul 21 17:21:34 CEST 2003
Hi,
We highly anticipating the new version, md with only 205 steps is
just no fun...................
Since in the meantime we also compiled CPMD with lam, herewith, for
historical reasons probably only, it's error message:
204 0.14522 1500.0 -37.48879 -37.48166 -37.33644
0.151
0.14
205 0.13394 1500.0 -37.50746 -37.50033 -37.36639
0.153
0.15
MPI_Recv: message truncated (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - MPI_Bcast()
Rank (1, MPI_COMM_WORLD): - MPI_Allreduce()
Rank (1, MPI_COMM_WORLD): - main()
----------------------------------------------------------------------
----
---
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If
your process did not finish in error, be sure to include a "return 0"
or
"exit(0)" in your C code before exiting the application.
PID 28006 failed on node n0 with exit status 1.
Cheers,
Martijn
On 21 Jul 2003 at 14:43, Alessandro Curioni wrote:
>
>
>
>
> The error you are experiencing has probably to do with a bug in a the
> memory allocation of a timing routine.
> This has been corrected in the new version 3.7.2 that will be distributed
> in the next few days.
>
>
> best regards,
>
> Alessandro CURIONI, PhD
> Research Staff Member
> Computational Biochemistry and Material Science group
> IBM Research Division - Zurich Research Laboratory
> Saumerstrasse 4
> 8003 Rueschlikon - Switzerland
> e-mail: cur at zurich.ibm.com
> www: www.zurich.ibm.com
> Tel: +41-1-7248633
> Fax: +41-1-7248958
>
>
>
>
> "Martijn
> Zwijnenburg"
> <M.A.Zwijnenburg@ To
> tnw.tudelft.nl> Juerg Hutter <hutter at pci.unizh.ch>
> Sent by: cc
> cpmd-list-admin at c cpmd-list at cpmd.org
> pmd.org Subject
> Re: [CPMD-list] p4_error: : 14
>
> 07/21/2003 10:21
> AM
>
>
>
>
>
>
>
> Hi Juerg,
>
> Thanx, I will recompile the code ASAP (or as soon the system
> administrator will let me).
>
> In the meantime we've been doing some experiments regarding the error
> message. First looking through the source code of mpich the p4_error
> 14 message appears to be linked to a SIGALRM 14 signal. Somewhere,
> in mpich an alarm() appears to be set, which expires an terminates
> the proces in question. However, we've not been able as yet to pin-
> point where (alarm can be set multiple times), although the most of
> times with an error string which seems to be absent in our particular
> example.
>
> Furthermore, I ran the nh3-md example from last years CPMD tutorial
> in Lyon which finished without problems. Then I performed md runs on
> a small silica molecule (employing a 4 rather then 15A box) with
> "normal" and Vanderbilt pp's and the md settings from the nh3-md
> example. While the run with normal pp's finished normally after 30000
> steps, the Vanderbilt run crashed at precisely the same point (after
> cycle 205) as the previous md runs on larger clusters. The problem
> thus appears to be linked in someway to using Vanderbilt pp's.
>
> Finally, the same error message also popped up friday in an geometry
> optimization job of a colleague of mine. The problems therefore
> doesn't need to be linked to md (exclusively).
>
> Hope this helps,
>
> Gr. Martijn
>
>
> On 19 Jul 2003 at 13:06, Juerg Hutter wrote:
>
> > Hi Martijn
> >
> > I don't know what really causes your problems.
> > However, I found a bug related to the type of job you
> > are running. The variable "SLIMIT" is not initialized.
> > An easy workaround would be to put the line
> >
> > SLIMIT = 0.D0
> >
> > somewhere in the file control_def.F
> >
> > Juerg
> >
> > ----------------------------------------------------------
> > Juerg Hutter Phone : ++41 1 635 4491
> > Physical Chemistry Institute FAX : ++41 1 635 6838
> > University of Zurich E-mail: hutter at pci.unizh.ch
> > Winterthurerstrasse 190
> > CH-8057 Zurich, Switzerland
> > ----------------------------------------------------------
> >
> >
> > On Tue, 15 Jul 2003, Martijn Zwijnenburg wrote:
> >
> > > Hi,
> > >
> > > When I run the inputfile given below on our linux-cluster (cpmd 3.7.1
> > > / compiled with PGI compiler/ MPICH) the MD job always crashes after
> > > the 205th step (independent of the precise nodes I'm running / or the
> > > number of nodes I'm running on). The error message seems to be
> > > precisely repeatable, so I checked if a file nears a limit (say 2GB)
> > > but everything is much much smaller. The precise error message (for 3
> > > procs) is:
> > >
> > > 205 0.00852 1007.3 -143.81623 -143.76839 -143.75987
> > > 0.180 70.43
> > > p1_15605: p4_error: : 14
> > > p2_25632: p4_error: : 14
> > > bm_list_27416: (20873.130096) wakeup_slave: unable to interrupt slave
> > > 0 pid 27415
> > > Broken pipe
> > > Broken pipe
> > >
> > > and in case of the 2 proc job:
> > >
> > > 205 0.00852 1007.3 -143.81623 -143.76839 -143.75987
> > > 0.180 108.08
> > > p1_23514: p4_error: : 14
> > > Broken pipe
> > >
> > > The funny thing is that we never had such problems with CPMD
> > > optimization jobs, which ran for weeks, nor with any other program on
> > > the cluster (gamess-uk, dlpolly).
> > > It appears that MPI is the problem but does anybody have a solution?
> > >
> > > Gr. Martijn
> > >
> > > Inputfile:
> > >
> > > ! Si4O8 cluster MD
> > > !
> > > &CPMD
> > > MOLECULAR DYNAMICS
> > > STRUCTURE BONDS ANGLES DIHEDRALS
> > > SPLINE RANGE
> > > 5.00
> > > SPLINE POINTS
> > > 2500
> > > CONVERGENCE
> > > 1E-6 5E-6 1.
> > > ISOLATED MOLECULE
> > > HESSIAN UNIT
> > > RESTART WAVEFUNCTION GEOMETRY LATEST
> > > TEMPERATURE
> > > 1000
> > > TEMPCONTROL IONS
> > > 1000 20
> > > TIMESTEP
> > > 5.0
> > > EMASS
> > > 700
> > > TRAJECTORY XYZ
> > > &END
> > > &SYSTEM
> > > SYMMETRY
> > > 0
> > > ANGSTROM
> > > CELL
> > > 15 1.0 1.0 0 0 0
> > > ANGSTROM
> > > CUTOFF
> > > 30.00
> > > sTATES
> > > 36
> > > POINT GROUP
> > > AUTO
> > > &END
> > > &ATOMS
> > > *Si_ps.uspp BINARY new.f
> > > LMAX=P
> > > 4
> > > 0.000000000000 0.000000000000 0.000000000000
> > > 0.471809914327 0.000000000000 3.182682023275
> > > 3.653225710458 -0.034526749450 2.711038719844
> > > 3.181409275132 -0.034559479116 -0.471644591443
> > >
> > > *o_ps.uspp BINARY new.f
> > > LMAX=P
> > > 8
> > > 0.000000000000 0.000000000000 1.626292446278
> > > 2.098080331254 -0.016336484977 3.186832736955
> > > 3.653206618118 -0.034138170947 1.084746035991
> > > 1.555131813001 -0.018896584142 -0.475808582925
> > > 4.871737465360 -0.049224011432 3.612939458029
> > > -0.432765838343 0.012873808042 4.399223302846
> > > -1.218511827443 0.015066749685 -0.901895052361
> > > 4.086009030180 -0.046817197390 -1.688173543327
> > >
> > > &END
> > > &DFT
> > > FUNCTIONAL PW91
> > > GC-CUTOFF
> > > 5E-5
> > > &END
> > > ----------------------------------------------------------------------
> > > ---
> > > Martijn Zwijnenburg
> > > Lab. of Applied Organic Chemistry and Catalysis
> > > Delft University of Technology
> > > Julianalaan 136
> > > 2628 BL Delft
> > > The Netherlands
> > > Tel: 0031-(0)152782691
> > > Fax: 0031-(0)152784700
> > > e-mail: M.A.Zwijnenburg at tnw.tudelft.nl
> > > web page: http://come.to/tock
> > >
> > >
> > > _______________________________________________
> > > CPMD-list mailing list
> > > CPMD-list at cpmd.org
> > > http://www.cpmd.org/mailman/listinfo/cpmd-list
> > >
> > _______________________________________________
> > CPMD-list mailing list
> > CPMD-list at cpmd.org
> > http://www.cpmd.org/mailman/listinfo/cpmd-list
>
> -------------------------------------------------------------------------
> Martijn Zwijnenburg
> Lab. of Applied Organic Chemistry and Catalysis
> Delft University of Technology
> Julianalaan 136
> 2628 BL Delft
> The Netherlands
> Tel: 0031-(0)152782691
> Fax: 0031-(0)152784700
> e-mail: M.A.Zwijnenburg at tnw.tudelft.nl
> web page: http://come.to/tock
>
>
> _______________________________________________
> CPMD-list mailing list
> CPMD-list at cpmd.org
> http://www.cpmd.org/mailman/listinfo/cpmd-list
>
>
> _______________________________________________
> CPMD-list mailing list
> CPMD-list at cpmd.org
> http://www.cpmd.org/mailman/listinfo/cpmd-list
-------------------------------------------------------------------------
Martijn Zwijnenburg
Lab. of Applied Organic Chemistry and Catalysis
Delft University of Technology
Julianalaan 136
2628 BL Delft
The Netherlands
Tel: 0031-(0)152782691
Fax: 0031-(0)152784700
e-mail: M.A.Zwijnenburg at tnw.tudelft.nl
web page: http://come.to/tock
More information about the CPMD-list
mailing list