[CPMD-list] CPMD parallel run crash when trying to use kpoints
Axel Kohlmeyer
axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Sun Aug 29 16:27:53 CEST 2004
On Sun, 29 Aug 2004, Axel Kohlmeyer wrote:
eduoardo,
sorry to reply to my own mail. but i dug a little deeper,
and hope to have found the real solution (or at least a better
workaround) to your problem.
the problem seems to be a combination of two problems.
1) if you specify BLOCK=... you apparently have to also give
one of three keywords CALCULATED, ALL, or NOSWAP.
now BLOCK=... implicitely turns on CALCULATED (like it
seems to be implied by the manual).
2) the recordlength was not correct for the pgi (and many
other compilers). i have added the appropriate preprocessor
defines and a corresponding warning, if this is not set.
please apply the attached patch (gzip -cd kp-fix-patch.diff.gz | patch )
to your sources and recompile. the patch is relative to the
current cpmd-v3.9 cvs branch, so there may be warnings about
offsets. anyway. the changes are small and almost self-explanatory,
so you could probably also apply them by hand.
good luck,
axel.
p.s.: juerg, this should also patch cleanly into the devel tree.
>
> >>> "EL" == EDUARDO JORGE LAMAS <LAMAS> writes:
>
> EL> Hi, sorry if this is sent twice but I sent the first one from an
> EL> address that is not subscribed to the list and maybe that's way it
> EL> didn't go thru.
>
> EL> I am trying to install CPMD in our Opteron cluster. The compilation
> EL> goes well and the executable seems to be working fine except when I try
> EL> to use the kpoints keyword in a parallel run (the serial version works
> EL> well).
>
> eduardo,
>
> please try running without a swapfile (i.e. without
> BLOCK=100). i tried a smaller but similar input and
> it always choked on the swapfile handling. i then tried
> to run the same input with older executables and the latest
> executable, that would work, was a version 3.3 binary.
>
> regards,
> axel.
>
> EL> The cpmd version that I am trying to install is the last one (3.9.1)
> EL> but I had the same problem with version 3.7.2. The atlas library I am
> EL> using is the one that is available at Axel Kohlmeyer's web page.
>
> EL> The error I am getting is:
>
> EL> PARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARA
> EL> NCPU NGW NHG PLANES GXRAYS HXRAYS ORBITALS Z-PLANES
> EL> 0 924 6042 5 98 350 11 1
> EL> 1 924 6040 5 98 350 10 1
> EL> 2 922 6036 5 98 350 11 1
> EL> 3 920 6036 5 98 350 11 1
> EL> 4 922 6036 5 98 350 10 1
> EL> 5 923 6040 5 100 350 11 1
> EL> 6 922 6037 5 100 350 11 1
> EL> 7 925 6029 5 100 350 10 1
> EL> 8 925 6037 5 100 350 11 1
> EL> 9 926 6020 5 99 349 11 1
> EL> 10 925 6035 5 100 348 10 1
> EL> 11 924 6038 5 100 348 11 1
> EL> G=0 COMPONENT ON PROCESSOR : 9
> EL> PARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARA
>
> EL> *** LOADPA| THE NEW SIZE OF THE PROGRAM IS 3544 kBYTES ***
> EL> *** RGGEN| THE NEW SIZE OF THE PROGRAM IS 3688 kBYTES ***
> EL> p4_2993: p4_error: interrupt SIGFPE: 8
> EL> p8_3601: p4_error: interrupt SIGFPE: 8
> EL> p1_11826: p4_error: interrupt SIGFPE: 8
> EL> p5_2995: p4_error: interrupt SIGFPE: 8
> EL> p2_11827: p4_error: interrupt SIGFPE: 8
> EL> p6_2996: p4_error: interrupt SIGFPE: 8
> EL> p9_3602: p4_error: interrupt SIGFPE: 8
> EL> p7_2997: p4_error: interrupt SIGFPE: 8
> EL> p10_3603: p4_error: interrupt SIGFPE: 8
> EL> p11_3604: p4_error: interrupt SIGFPE: 8
> EL> bm_list_11825: (1.570312) net_send: could not write to fd=5, errno = 32
> EL> bm_list_11825: p4_error: net_send write: -1
>
> EL> The same system will work ok in parallel if the kpoint keyword is removed.
>
> EL> My make file is:
>
> EL> SRC = .
> EL> DEST = .
> EL> BIN = .
> EL> #QMMM_FLAGS = -D__QMECHCOUPL
> EL> #QMMM_LIBS = -L. -lmm
> EL> FFLAGS = -r8 -pc=64 -Msignextend
> EL> LFLAGS = -Bstatic -L. -latlas $(QMMM_LIBS)
> EL> CFLAGS =
> EL> CPP = /lib/cpp -P -C -traditional
> EL> CPPFLAGS = -D__Linux -D__PGI -DLAPACK -DFFT_DEFAULT -DPOINTER8 -D__pgf90 \
> EL> -DPARALLEL -DMP_LIBRARY=__MPI
> EL> NOOPT_FLAG =
> EL> CC = cc
> EL> FC = mpif90 -c -O0 -tp k8-64
> EL> LD = mpif90 -O0 -tp k8-64
> EL> AR =
>
> EL> And my input file is:
>
> EL> &INFO
> EL> Wavefunction optimization bulk platinum
> EL> &END
> EL> &CPMD
> EL> rESTART WAVEFUNCTIONS OCCUPATION KPOINTS LATEST
> EL> OPTIMIZE WAVEFUNCTION
> EL> LSD
> EL> FREE ENERGY FUNCTIONAL
> EL> ELECTRON TEMPERATURE
> EL> 1000.
> EL> STORE
> EL> 5
> EL> &END
> EL> &DFT
> EL> FUNCTIONAL BLYP
> EL> &END
> EL> &SYSTEM
> EL> POINT GROUP
> EL> AUTO
> EL> SYMMETRY
> EL> 14
> EL> CELL DEGREE
> EL> 5.54846 1 1.5 90 90 120
> EL> CUTOFF
> EL> 80.000
> EL> ANGSTROMS
> EL> TESR
> EL> 3
> EL> KPOINTS MONKHORST-PACK BLOCK=100
> EL> 5 5 1
> EL> &END
> EL> &ATOMS
> EL> *Pt_TM_BLYPspd5.psp GAUSS-HERMIT=10 NLCC
> EL> LMAX=D LOC=S
> EL> 12
> EL> 0.00000 0.00000 0.00000
> EL> .....
> EL> .....
> EL> 1.60193 0.00000 4.53093
> EL> &END
> EL> &BASIS
> EL> PSEUDO AO 2 OCUPPATION
> EL> 0 2
> EL> 1 9
> EL> &END
>
>
> EL> Any help will be appreciated.
> EL> Best Regards,
>
> EL> Eduardo
>
> EL> _______________________________________________
> EL> CPMD-list mailing list
> EL> CPMD-list at cpmd.org
> EL> http://cpmd.org/mailman/listinfo/cpmd-list
>
>
>
> --
>
> =======================================================================
> Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
> Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
> Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
> D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
> =======================================================================
> If you make something idiot-proof, the universe creates a better idiot.
> _______________________________________________
> CPMD-list mailing list
> CPMD-list at cpmd.org
> http://cpmd.org/mailman/listinfo/cpmd-list
>
>
--
=======================================================================
Dr. Axel Kohlmeyer e-mail: axel.kohlmeyer at rub.de
Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kp-fix-patch.diff.gz
Type: application/x-gzip
Size: 753 bytes
Desc: k-point swapfile patch
Url : http://cpmd.org/pipermail/cpmd-list/attachments/20040829/ca65592f/attachment.gz
More information about the CPMD-list
mailing list