[CPMD-list] parallel cpmd errors 'Null communicator' and 'semget failed'
Dan Chipman
chipman.1 at nd.edu
Thu Dec 7 17:42:34 CET 2006
Dear cpmd users,
I am having trouble producing a parallel mpich version of cpmd on our
cluster of EM64T chips. The compiler is Intel 8.1 with math kernel
library 7.2, and the OS is Fedora Core 3. The mpich implementation is
a standard version obtained from Argonne.
When I create parallel cpmd with the following makefile:
SHELL = /bin/sh
SRC = .
DEST = .
BIN = .
FFLAGS = -pc64 -tpp7 -O2 -unroll
LFLAGS = -L/opt/p4-intel/lib \
/opt/mpich/p4-intel/lib/libmpich.a \
/opt/mpich/p4-intel/lib/libpmpich.a \
/opt/mpich/p4-intel/lib/libfmpich.a \
-Wl,-rpath,/opt/lib64 -L/opt/lib64 \
/opt/gm/lib/libgm.a \
/opt/intel/mkl72/lib/em64t/libmkl_lapack.a \
/opt/intel/mkl72/lib/em64t/libmkl_em64t.a
CFLAGS = -O2 -Wall -m64
CPP = /lib/cpp -P -C -traditional
CPPFLAGS = -D__Linux -D__PGI -DFFT_DEFAULT -DPOINTER8 -DLINUX_IFC \
-DPARALLEL -DMYRINET
NOOPT_FLAG =
CC = mpicc
FC = ifort -I/opt/p4-intel/include -c
LD = ifort -i-static -openmp
AR = /usr/bin/ar -r
it successfully produces the executable file cpmd.x. I then run this
with the command:
time /opt/mpich/p4-intel/bin/mpirun -nolocal -v -np $NCPUS
$EXEDIR/cpmd.x $JOB.inp
with appropriate substitutions for the variables $NCPUS, $EXEDIR, and
$JOB. However, executing the program always fails after just a few
seconds.
When I try to run this on a single cpu I get the error message:
0 - MPI_COMM_RANK : Null communicator
p0_25827: p4_error: : 197
[0] Aborting program !
Alternatively, when I try to run this in parallel on several cpus I
get the error message:
rm_16079: p4_error: semget failed for setnum: 0
p0_25714: (0.523438) net_recv failed for fd = 5
p0_25714: p4_error: net_recv read, errno = : 104
Killed by signal 2.
Broken pipe
Can anyone tell me what I am doing wrong? Thanks.
Dan Chipman
More information about the CPMD-list
mailing list