[CPMD-list] MPI-Problem with WF optimization

Alessandro Curioni cur at zurich.ibm.com
Thu Apr 14 13:06:44 CEST 2005


Bernd,

thank you for  input -

this is a known  bug and   has been corrected in the  soon to be next 
minor release.

REgards


Alessandro CURIONI, PhD
Research Staff Member
Computational Biochemistry and Material Science group
IBM Research Division - Zurich Research Laboratory
Saumerstrasse 4
8003 Rueschlikon - Switzerland
e-mail: cur at zurich.ibm.com
www:    www.zurich.ibm.com
Tel: +41-1-7248633
Fax: +41-1-7248958




Bernd Kallies <kallies at zib.de> 
Sent by: cpmd-list-bounces at cpmd.org
04/12/2005 08:41 PM

To
cpmd-list at cpmd.org
cc

Subject
[CPMD-list] MPI-Problem with WF optimization






Dear all,
I ran into a curious problem when running the attached input with CPMD
v3.9.1 (downloaded 3. June 2004) on an IBM p690.
The run aborts in a wavefunction optimization with the error msg from
the MPI-library:

ERROR: 0032-117 User pack or receive buffer is too small  (32768) in
MPI_Allreduce, task 0

The error occurs in different stages, depending on the number of tasks
or machine state. Buffer sizes that are detected to be wrong for
MPI_Allreduce differ.

Debugging showed a code problem, which seems to be fundamental to me.
The error is generated because MPI_Allreduce is called by MPI tasks that
are out of sync (calling glosum in different code contexts). The master
task is in different context than the others. The reason for that is
that the variable TNOFOR (set in tol_chk_cnvener) evaluates to different
results (task 0 different from other tasks). The reason for that is that
task 0 has a different total energy than the others on entry of
tol_chk_cnvener. And the reason for that is, that subroutine linesr
(pcgrad.f) contains the lines

      IF(PARENT) THEN
        CALL EBACK(0)
      ENDIF
...
      IF(PARENT) THEN
        CALL EBACK(1)
      ENDIF

This yields different total energies for the MPI tasks when checking for
wavefunction convergence. When letting all tasks backing up and
restoring energy values in linesr, the error mentioned disappears, and
the calculation finishes properly.

It is not really clear to me which impact this finding has, since
line-searching wavefunctions is a task that is done in many cpmd runs.

--the input--
&CPMD
  OPTIMIZE GEOMETRY
  LBFGS
  PCG MINIMIZE
  CONVERGENCE ORBITALS
    1.d-7
  CONVERGENCE ADAPT
    0.02
  CONVERGENCE ENERGY
    0.05
  STORE
    50
  TASKGROUPS
    1
&END
 
&DFT
  FUNCTIONAL PBE
  GC-CUTOFF
    1.d-6
&END
 
&SYSTEM
  ANGSTROM
  SYMMETRY
    8
  CELL ABSOLUTE
    8.34 8.34 14.255 0  0  0
  CUTOFF
    40.0
  TESR
    4
  DUAL
    6.0
&END
 
 
&ATOMS
*Mg_VDB_PBE.psp BINARY NEWF
  LMAX=D
   32
    0.000  0.000  0.000
    4.170  0.000  0.000
    2.085  2.085  0.000
[bzfbbk at berni1 test]> cat inp-geo
&CPMD
  OPTIMIZE GEOMETRY
  LBFGS
  PCG MINIMIZE
  CONVERGENCE ORBITALS
    1.d-7
  CONVERGENCE ADAPT
    0.02
  CONVERGENCE ENERGY
    0.05
  STORE
    50
  TASKGROUPS
    1
&END
 
&DFT
  FUNCTIONAL PBE
  GC-CUTOFF
    1.d-6
&END
 
&SYSTEM
  ANGSTROM
  SYMMETRY
    8
  CELL ABSOLUTE
    8.34 8.34 14.255 0  0  0
  CUTOFF
    40.0
  TESR
    4
  DUAL
    6.0
&END
 
 
&ATOMS
*Mg_VDB_PBE.psp BINARY NEWF
  LMAX=D
   32
    0.000  0.000  0.000
    4.170  0.000  0.000
    2.085  2.085  0.000
    6.255  2.085  0.000
    0.000  4.170  0.000
    4.170  4.170  0.000
    2.085  6.255  0.000
    6.255  6.255  0.000
    2.085  0.000  2.085
    6.255  0.000  2.085
    0.000  2.085  2.085
    4.170  2.085  2.085
    2.085  4.170  2.085
    6.255  4.170  2.085
    0.000  6.255  2.085
    4.170  6.255  2.085
    0.000  0.000  4.170
    4.170  0.000  4.170
    2.085  2.085  4.170
    6.255  2.085  4.170
    0.000  4.170  4.170
    4.170  4.170  4.170
    2.085  6.255  4.170
    6.255  6.255  4.170
    2.085  0.000  6.255
    6.255  0.000  6.255
    0.000  2.085  6.255
    4.170  2.085  6.255
    2.085  4.170  6.255
    6.255  4.170  6.255
    0.000  6.255  6.255
    4.170  6.255  6.255
*O_VDB_PBE.psp BINARY NEWF
  LMAX=D
   32
    2.085  0.000  0.000
    6.255  0.000  0.000
    0.000  2.085  0.000
    4.170  2.085  0.000
    2.085  4.170  0.000
    6.255  4.170  0.000
    0.000  6.255  0.000
    4.170  6.255  0.000
    0.000  0.000  2.085
    4.170  0.000  2.085
    2.085  2.085  2.085
    6.255  2.085  2.085
    0.000  4.170  2.085
    4.170  4.170  2.085
    2.085  6.255  2.085
    6.255  6.255  2.085
    2.085  0.000  4.170
    6.255  0.000  4.170
    0.000  2.085  4.170
    4.170  2.085  4.170
    2.085  4.170  4.170
    6.255  4.170  4.170
    0.000  6.255  4.170
    4.170  6.255  4.170
    0.000  0.000  6.255
    4.170  0.000  6.255
    2.085  2.085  6.255
    6.255  2.085  6.255
    0.000  4.170  6.255
    4.170  4.170  6.255
    2.085  6.255  6.255
    6.255  6.255  6.255
 CONSTRAINTS
  FIX ATOMES
   32
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
 END CONSTRAINTS
&END

-- 
Dr. Bernd Kallies
Konrad-Zuse-Zentrum für Informationstechnik Berlin
Takustr. 7
14195 Berlin
Tel: +49-30-84185-270
Fax: +49-30-84185-311
e-mail: kallies at zib.de

_______________________________________________
CPMD-list mailing list
CPMD-list at cpmd.org
http://cpmd.org/mailman/listinfo/cpmd-list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cpmd.org/pipermail/cpmd-list/attachments/20050414/f732f2aa/attachment.html 


More information about the CPMD-list mailing list