[CPMD-list] CPMD parallel run crash when trying to use kpoints

Axel Kohlmeyer axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Sun Aug 29 16:27:53 CEST 2004


On Sun, 29 Aug 2004, Axel Kohlmeyer wrote:

eduoardo,

sorry to reply to my own mail. but i dug a little deeper,
and hope to have found the real solution (or at least a better
workaround) to your problem.

the problem seems to be a combination of two problems.
1) if you specify BLOCK=... you apparently have to also give 
   one of three keywords CALCULATED, ALL, or NOSWAP.
   now BLOCK=... implicitely turns on CALCULATED (like it
   seems to be implied by the manual).
2) the recordlength was not correct for the pgi (and many
   other compilers). i have added the appropriate preprocessor
   defines and a corresponding warning, if this is not set.

please apply the attached patch (gzip -cd kp-fix-patch.diff.gz | patch ) 
to your sources and recompile. the patch is relative to the
current cpmd-v3.9 cvs branch, so there may be warnings about 
offsets. anyway. the changes are small and almost self-explanatory, 
so you could probably also apply them by hand.

good luck,
	axel.

p.s.: juerg, this should also patch cleanly into the devel tree.

> 
> >>> "EL" == EDUARDO JORGE LAMAS <LAMAS> writes:
> 
> EL> Hi, sorry if this is sent twice but I sent the first one from an
> EL> address that is not subscribed to the list and maybe that's way it
> EL> didn't go thru.
> 
> EL> I am trying to install CPMD in our Opteron cluster. The compilation
> EL> goes well and the executable seems to be working fine except when I try
> EL> to use the kpoints keyword in a parallel run (the serial version works
> EL> well).
> 
> eduardo,
> 
> please try running without a swapfile (i.e. without
> BLOCK=100). i tried a smaller but similar input and 
> it always choked on the swapfile handling. i then tried
> to run the same input with older executables and the latest
> executable, that would work, was a version 3.3 binary.
> 
> regards,
>         axel.
> 
> EL> The cpmd version that I am trying to install is the last one (3.9.1)
> EL> but I had the same problem with version 3.7.2. The atlas library I am
> EL> using is the one that is available at Axel Kohlmeyer's web page.
> 
> EL> The error I am getting is: 
> 
> EL> PARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARA
> EL>   NCPU     NGW     NHG  PLANES  GXRAYS  HXRAYS ORBITALS Z-PLANES
> EL>      0     924    6042       5      98     350      11       1
> EL>      1     924    6040       5      98     350      10       1
> EL>      2     922    6036       5      98     350      11       1
> EL>      3     920    6036       5      98     350      11       1
> EL>      4     922    6036       5      98     350      10       1
> EL>      5     923    6040       5     100     350      11       1
> EL>      6     922    6037       5     100     350      11       1
> EL>      7     925    6029       5     100     350      10       1
> EL>      8     925    6037       5     100     350      11       1
> EL>      9     926    6020       5      99     349      11       1
> EL>     10     925    6035       5     100     348      10       1
> EL>     11     924    6038       5     100     348      11       1
> EL>                 G=0 COMPONENT ON PROCESSOR :     9
> EL>  PARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARA
> 
> EL>  ***    LOADPA| THE NEW SIZE OF THE PROGRAM IS    3544 kBYTES ***
> EL>  ***     RGGEN| THE NEW SIZE OF THE PROGRAM IS    3688 kBYTES ***
> EL> p4_2993:  p4_error: interrupt SIGFPE: 8
> EL> p8_3601:  p4_error: interrupt SIGFPE: 8
> EL> p1_11826:  p4_error: interrupt SIGFPE: 8
> EL> p5_2995:  p4_error: interrupt SIGFPE: 8
> EL> p2_11827:  p4_error: interrupt SIGFPE: 8
> EL> p6_2996:  p4_error: interrupt SIGFPE: 8
> EL> p9_3602:  p4_error: interrupt SIGFPE: 8
> EL> p7_2997:  p4_error: interrupt SIGFPE: 8
> EL> p10_3603:  p4_error: interrupt SIGFPE: 8
> EL> p11_3604:  p4_error: interrupt SIGFPE: 8
> EL> bm_list_11825: (1.570312) net_send: could not write to fd=5, errno = 32
> EL> bm_list_11825:  p4_error: net_send write: -1
> 
> EL> The same system will work ok in parallel if the kpoint keyword is removed.
> 
> EL> My make file is:
> 
> EL> SRC  = .
> EL> DEST = .
> EL> BIN  = .
> EL> #QMMM_FLAGS = -D__QMECHCOUPL
> EL> #QMMM_LIBS  = -L. -lmm
> EL> FFLAGS = -r8 -pc=64 -Msignextend
> EL> LFLAGS = -Bstatic -L. -latlas $(QMMM_LIBS)
> EL> CFLAGS =
> EL> CPP = /lib/cpp -P -C -traditional
> EL> CPPFLAGS = -D__Linux -D__PGI -DLAPACK -DFFT_DEFAULT -DPOINTER8 -D__pgf90 \
> EL>                -DPARALLEL -DMP_LIBRARY=__MPI
> EL> NOOPT_FLAG =
> EL> CC = cc
> EL> FC = mpif90 -c -O0 -tp k8-64
> EL> LD = mpif90 -O0 -tp k8-64
> EL> AR =
> 
> EL> And my input file is:
> 
> EL> &INFO
> EL>   Wavefunction optimization bulk platinum
> EL> &END
> EL> &CPMD
> EL>     rESTART WAVEFUNCTIONS OCCUPATION KPOINTS LATEST
> EL>     OPTIMIZE WAVEFUNCTION
> EL>     LSD
> EL>     FREE ENERGY FUNCTIONAL
> EL>     ELECTRON TEMPERATURE
> EL>       1000.
> EL>     STORE
> EL>       5
> EL> &END
> EL> &DFT
> EL>    FUNCTIONAL BLYP
> EL> &END
> EL> &SYSTEM
> EL>    POINT GROUP
> EL>     AUTO
> EL>    SYMMETRY
> EL>     14
> EL>    CELL DEGREE
> EL>      5.54846   1   1.5   90   90  120
> EL>    CUTOFF
> EL>      80.000
> EL>     ANGSTROMS
> EL>     TESR
> EL>      3
> EL>     KPOINTS MONKHORST-PACK BLOCK=100
> EL>      5 5 1
> EL> &END
> EL> &ATOMS
> EL> *Pt_TM_BLYPspd5.psp GAUSS-HERMIT=10 NLCC
> EL>  LMAX=D LOC=S
> EL>  12
> EL>         0.00000 0.00000 0.00000
> EL> .....
> EL> .....
> EL>         1.60193 0.00000 4.53093
> EL> &END
> EL> &BASIS
> EL>  PSEUDO AO 2 OCUPPATION
> EL> 0   2
> EL> 1   9
> EL> &END
> 
> 
> EL> Any help will be appreciated.
> EL> Best Regards,
> 
> EL> Eduardo
> 
> EL> _______________________________________________
> EL> CPMD-list mailing list
> EL> CPMD-list at cpmd.org
> EL> http://cpmd.org/mailman/listinfo/cpmd-list
> 
> 
> 
> --
> 
> =======================================================================
> Axel Kohlmeyer       e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
> Lehrstuhl fuer Theoretische Chemie          Phone: ++49 (0)234/32-26673
> Ruhr-Universitaet Bochum - NC 03/53         Fax:   ++49 (0)234/32-14045
> D-44780 Bochum  http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
> =======================================================================
> If you make something idiot-proof, the universe creates a better idiot.
> _______________________________________________
> CPMD-list mailing list
> CPMD-list at cpmd.org
> http://cpmd.org/mailman/listinfo/cpmd-list
> 
> 

-- 


=======================================================================
Dr. Axel Kohlmeyer                        e-mail: axel.kohlmeyer at rub.de
Lehrstuhl fuer Theoretische Chemie          Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53         Fax:   ++49 (0)234/32-14045
D-44780 Bochum  http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: kp-fix-patch.diff.gz
Type: application/x-gzip
Size: 753 bytes
Desc: k-point swapfile patch
Url : http://cpmd.org/pipermail/cpmd-list/attachments/20040829/ca65592f/attachment.gz 


More information about the CPMD-list mailing list