[CPMD-list] parallel cpmd errors 'Null communicator' and 'semget failed'

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Sat Dec 9 02:15:03 CET 2006


On Thu, 7 Dec 2006, Dan Chipman wrote:

dan,

DC> When I try to run this on a single cpu I get the error message:
DC> 
DC> 0 - MPI_COMM_RANK : Null communicator
DC> p0_25827:  p4_error: : 197
DC> [0]  Aborting program !
DC> 
DC> Alternatively, when I try to run this in parallel on several cpus I 
DC> get the error message:
DC> 
DC> rm_16079:  p4_error: semget failed for setnum: 0
DC> p0_25714: (0.523438) net_recv failed for fd = 5
DC> p0_25714:  p4_error: net_recv read, errno = : 104
DC> Killed by signal 2.
DC> Broken pipe

can you run other MPI software on that machine?

it looks like there either is an incompatibility of the
(default) settings for SYSV shared memory and the requirements 
of MPICH. redhat/fedora default setup is usually _very_ 
conservative, or you're running out of SYSV semaphores.
you can check with ipcs, if this is the case.

you may want to search the MPICH or CPMD mailing list 
archives for hints on how to work around this (this looks
familiar, but it might have been a while...).

generally i'd recommend to use LAM/MPI as it is
very clean, robust and performing more consistently 
than MPICH over ethernet. it also does not 'swallow' the
error messages leading up to crashes, as MPICH does, which
makes debugging CPMD input errors in parallel jobs so 
difficult with MPICH.

openMPI is slated to be a successor to LAM and
combines many of the advantages of LAM/MPI with
features of other MPI packages, but currently 
still tends to leave behind runaway communication
daemons after crashes, that need to be cleaned up.

cheers,
 axel.

DC> Can anyone tell me what I am doing wrong? Thanks.


DC> Dan Chipman
DC> _______________________________________________
DC> CPMD-list mailing list
DC> CPMD-list at cpmd.org
DC> http://cpmd.org/mailman/listinfo/cpmd-list
DC> 

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.





More information about the CPMD-list mailing list