[CPMD-list] parallel cpmd errors 'Null communicator' and 'semget failed'

Dan Chipman chipman.1 at nd.edu
Thu Dec 7 17:42:34 CET 2006


Dear cpmd users,

I am having trouble producing a parallel mpich version of cpmd on our 
cluster of EM64T chips. The compiler is Intel 8.1 with math kernel 
library 7.2, and the OS is Fedora Core 3. The mpich implementation is 
a standard version obtained from Argonne.

When I create parallel cpmd with the following makefile:

SHELL = /bin/sh
SRC  = .
DEST = .
BIN  = .
FFLAGS = -pc64  -tpp7 -O2 -unroll
LFLAGS = -L/opt/p4-intel/lib \
          /opt/mpich/p4-intel/lib/libmpich.a  \
          /opt/mpich/p4-intel/lib/libpmpich.a  \
          /opt/mpich/p4-intel/lib/libfmpich.a  \
          -Wl,-rpath,/opt/lib64 -L/opt/lib64 \
          /opt/gm/lib/libgm.a \
          /opt/intel/mkl72/lib/em64t/libmkl_lapack.a \
          /opt/intel/mkl72/lib/em64t/libmkl_em64t.a
CFLAGS = -O2 -Wall -m64
CPP = /lib/cpp -P -C -traditional
CPPFLAGS = -D__Linux -D__PGI -DFFT_DEFAULT -DPOINTER8 -DLINUX_IFC \
        -DPARALLEL -DMYRINET
NOOPT_FLAG =
CC = mpicc
FC = ifort -I/opt/p4-intel/include -c
LD = ifort -i-static -openmp
AR = /usr/bin/ar -r

it successfully produces the executable file cpmd.x. I then run this 
with the command:

time /opt/mpich/p4-intel/bin/mpirun -nolocal -v -np $NCPUS 
$EXEDIR/cpmd.x $JOB.inp

with appropriate substitutions for the variables $NCPUS, $EXEDIR, and 
$JOB. However, executing the program always fails after just a few 
seconds.


When I try to run this on a single cpu I get the error message:

0 - MPI_COMM_RANK : Null communicator
p0_25827:  p4_error: : 197
[0]  Aborting program !

Alternatively, when I try to run this in parallel on several cpus I 
get the error message:

rm_16079:  p4_error: semget failed for setnum: 0
p0_25714: (0.523438) net_recv failed for fd = 5
p0_25714:  p4_error: net_recv read, errno = : 104
Killed by signal 2.
Broken pipe

Can anyone tell me what I am doing wrong? Thanks.
Dan Chipman



More information about the CPMD-list mailing list