[CPMD-list] p4_error: : 14

Alessandro Curioni cur at zurich.ibm.com
Mon Jul 21 14:43:38 CEST 2003




The error  you are experiencing has probably to do  with a  bug in  a the
memory allocation of a timing routine.
This has been corrected in the new version 3.7.2 that will be distributed
in the next few days.


best regards,

Alessandro CURIONI, PhD
Research Staff Member
Computational Biochemistry and Material Science group
IBM Research Division - Zurich Research Laboratory
Saumerstrasse 4
8003 Rueschlikon - Switzerland
e-mail: cur at zurich.ibm.com
www:    www.zurich.ibm.com
Tel: +41-1-7248633
Fax: +41-1-7248958



                                                                           
             "Martijn                                                      
             Zwijnenburg"                                                  
             <M.A.Zwijnenburg@                                          To 
             tnw.tudelft.nl>           Juerg Hutter <hutter at pci.unizh.ch>  
             Sent by:                                                   cc 
             cpmd-list-admin at c         cpmd-list at cpmd.org                  
             pmd.org                                               Subject 
                                       Re: [CPMD-list] p4_error: : 14      
                                                                           
             07/21/2003 10:21                                              
             AM                                                            
                                                                           
                                                                           
                                                                           




Hi Juerg,

Thanx, I will recompile the code ASAP (or as soon the system
administrator will let me).

In the meantime we've been doing some experiments regarding the error
message. First looking through the source code of mpich the p4_error
14 message appears to be linked to a SIGALRM 14  signal. Somewhere,
in mpich an alarm() appears to be set, which expires an terminates
the proces in question. However, we've not been able as yet to pin-
point where (alarm can be set multiple times), although the most of
times with an error string which seems to be absent in our particular
example.

Furthermore, I ran the nh3-md example from last years CPMD tutorial
in Lyon which finished without problems. Then I performed md runs on
a small silica molecule (employing a 4 rather then 15A box) with
"normal" and Vanderbilt pp's and the md settings from the nh3-md
example. While the run with normal pp's finished normally after 30000
steps, the Vanderbilt run crashed at precisely the same point (after
cycle 205) as the previous md runs on larger clusters. The problem
thus appears to be linked in someway to using Vanderbilt pp's.

Finally, the same error message also popped up friday in an geometry
optimization job of a colleague of mine. The problems therefore
doesn't need to be linked to md (exclusively).

Hope this helps,

Gr. Martijn


On 19 Jul 2003 at 13:06, Juerg Hutter wrote:

> Hi Martijn
>
> I don't know what really causes your problems.
> However, I found a bug related to the type of job you
> are running. The variable "SLIMIT" is not initialized.
> An easy workaround would be to put the line
>
>       SLIMIT = 0.D0
>
> somewhere in the file control_def.F
>
> Juerg
>
> ----------------------------------------------------------
> Juerg Hutter                   Phone : ++41 1 635 4491
> Physical Chemistry Institute   FAX   : ++41 1 635 6838
> University of Zurich           E-mail: hutter at pci.unizh.ch
> Winterthurerstrasse 190
> CH-8057 Zurich, Switzerland
> ----------------------------------------------------------
>
>
> On Tue, 15 Jul 2003, Martijn Zwijnenburg wrote:
>
> > Hi,
> >
> > When I run the inputfile given below on our linux-cluster (cpmd 3.7.1
> > / compiled with PGI compiler/ MPICH) the MD job always crashes after
> > the 205th step (independent of the precise nodes I'm running / or the
> > number of nodes I'm running on). The error message seems to be
> > precisely repeatable, so I checked if a file nears a limit (say 2GB)
> > but everything is much much smaller. The precise error message (for 3
> > procs) is:
> >
> > 205  0.00852  1007.3    -143.81623    -143.76839    -143.75987
> > 0.180 70.43
> > p1_15605:  p4_error: : 14
> > p2_25632:  p4_error: : 14
> > bm_list_27416: (20873.130096) wakeup_slave: unable to interrupt slave
> > 0 pid 27415
> > Broken pipe
> > Broken pipe
> >
> > and in case of the 2 proc job:
> >
> > 205  0.00852  1007.3    -143.81623    -143.76839    -143.75987
> > 0.180  108.08
> > p1_23514:  p4_error: : 14
> > Broken pipe
> >
> > The funny thing is that we never had such problems with CPMD
> > optimization jobs, which ran for weeks, nor with any other program on
> > the cluster (gamess-uk, dlpolly).
> > It appears that MPI is the problem but does anybody have a solution?
> >
> > Gr. Martijn
> >
> > Inputfile:
> >
> > ! Si4O8 cluster MD
> > !
> > &CPMD
> >   MOLECULAR DYNAMICS
> >   STRUCTURE BONDS ANGLES DIHEDRALS
> >   SPLINE RANGE
> >   5.00
> >   SPLINE POINTS
> >   2500
> >   CONVERGENCE
> >   1E-6 5E-6 1.
> >   ISOLATED MOLECULE
> >   HESSIAN UNIT
> >   RESTART WAVEFUNCTION GEOMETRY LATEST
> >   TEMPERATURE
> >   1000
> >   TEMPCONTROL IONS
> >   1000 20
> >   TIMESTEP
> >   5.0
> >   EMASS
> >   700
> >   TRAJECTORY XYZ
> > &END
> > &SYSTEM
> >   SYMMETRY
> >   0
> >   ANGSTROM
> >   CELL
> >   15 1.0 1.0 0 0 0
> >   ANGSTROM
> >   CUTOFF
> >   30.00
> >   sTATES
> >   36
> >   POINT GROUP
> >   AUTO
> > &END
> > &ATOMS
> > *Si_ps.uspp BINARY new.f
> >   LMAX=P
> >  4
> >    0.000000000000      0.000000000000      0.000000000000
> >    0.471809914327      0.000000000000      3.182682023275
> >    3.653225710458     -0.034526749450      2.711038719844
> >    3.181409275132     -0.034559479116     -0.471644591443
> >
> > *o_ps.uspp BINARY new.f
> >   LMAX=P
> >  8
> >   0.000000000000      0.000000000000      1.626292446278
> >   2.098080331254     -0.016336484977      3.186832736955
> >   3.653206618118     -0.034138170947      1.084746035991
> >   1.555131813001     -0.018896584142     -0.475808582925
> >   4.871737465360     -0.049224011432      3.612939458029
> >  -0.432765838343      0.012873808042      4.399223302846
> >  -1.218511827443      0.015066749685     -0.901895052361
> >   4.086009030180     -0.046817197390     -1.688173543327
> >
> > &END
> > &DFT
> >  FUNCTIONAL PW91
> >  GC-CUTOFF
> >  5E-5
> > &END
> > ----------------------------------------------------------------------
> > ---
> > Martijn Zwijnenburg
> > Lab. of Applied Organic Chemistry and Catalysis
> > Delft University of Technology
> > Julianalaan 136
> > 2628 BL Delft
> > The Netherlands
> > Tel: 0031-(0)152782691
> > Fax: 0031-(0)152784700
> > e-mail: M.A.Zwijnenburg at tnw.tudelft.nl
> > web page: http://come.to/tock
> >
> >
> > _______________________________________________
> > CPMD-list mailing list
> > CPMD-list at cpmd.org
> > http://www.cpmd.org/mailman/listinfo/cpmd-list
> >
> _______________________________________________
> CPMD-list mailing list
> CPMD-list at cpmd.org
> http://www.cpmd.org/mailman/listinfo/cpmd-list

-------------------------------------------------------------------------
Martijn Zwijnenburg
Lab. of Applied Organic Chemistry and Catalysis
Delft University of Technology
Julianalaan 136
2628 BL Delft
The Netherlands
Tel: 0031-(0)152782691
Fax: 0031-(0)152784700
e-mail: M.A.Zwijnenburg at tnw.tudelft.nl
web page: http://come.to/tock


_______________________________________________
CPMD-list mailing list
CPMD-list at cpmd.org
http://www.cpmd.org/mailman/listinfo/cpmd-list





More information about the CPMD-list mailing list