[CPMD-list] script for PGI-LAMMPI
mjensen at fysik.dtu.dk
mjensen at fysik.dtu.dk
Fri Oct 11 20:26:48 CEST 2002
Hi
Sorry for interfereing, Carme Rovira is teaching me CPMD,
and I'm involved in the problem with running CPMD using LAM MPI.
When running CPMD on two nodes (0 and 1)
mpirun N -x PP_LIBRARY_PATH cpmd-mpi-lam.x test.inp
we get this error:
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 525 failed on node n1 with exit status 1.
-----------------------------------------------------------------------------
Running on ONLY node 1 or node 0, respectively there's no problem.
Running a small mpi send-recieve program in exactely the same way
is OK that being using either n0 .or. n1 .or. on both (n0,1) a.k.a option N
to mpirun (the cluster has only single processor per cpu (i.e. node) ).
Since there's no LAM-MPI option in the Configure script to generate a
LAM Linux version of CPMD we just used the LAM MPI version to wrap
the PGI compilers (mpif77 and mpicc), and in the .tcsh we have
setenv PGI /usr/local/lib/PGI
setenv LAMHOME /usr/local/lib/LAM/
setenv PATH ${PATH}:/usr/local/lib/LAM/bin
but had littel luck when running. Any suggestion is highly appreciated.
Futhermore, is there any experience on compiling CPMD with
large file support on Linux?
Adding in this case to a MPICH-PGI CPMD Makefile
-Mlsf
as FFLAG
and
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
as CFLAG
resulted in a perfectly clean compilation but
this problem with the executeable:
-----------------------------------------------------------------------------
PARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARA
LOADPA| PROCESSOR 1 HAS NO G COMPONENT.
PROGRAM STOPS IN SUBROUTINE LOADPA| TOO MANY PROCESSORS [PROC= 0]
[0] MPI Abort by user Aborting program !
[0] Aborting program!
p0_1356: p4_error: : 999
Broken pipe
-----------------------------------------------------------------------------
Running the same job with a small ( MPICH CMPD ) exec works fine
Any way out of this?
Thanks -
Morten Jensen
>
> >>> "CR" == Carme Rovira <crovira at pcb.ub.es> writes:
>
> CR> Dear Axel,
>
> dear carme,
>
> CR> You are absolutely right. Here are some details
> CR> of how we proceed:
>
> [setup detail deleted]
>
> that is perfect so far.
>
> >>
> >> if this works, you usually just have to create a file
> >> (e.g. hostlist) with all the machines you want to use
> >> in the lam-parallel-machine (name hosts where you want to use
> >> 2 cpus twice) and then initialize lam with
>
> CR> This is done using PBS batch system as (lines ripped from the
> CR> batch script):
>
> ok, pbs works well with lam. one more question, are you using
> dual cpu nodes? if yes, then 'pbsnodes -a' should give you something
> like this:
>
> dust
> state = free
> np = 2
> properties = dust,dualamd
> ntype = cluster
>
> important are 'np=2' and 'ntype=cluster'
> for single cpu accordingly:
>
> vivaldi
> state = job-exclusive
> np = 1
> properties = athlon,vivaldi,server,medium
> ntype = cluster
> jobs = 0/7945.monteverdi.theochem.ruhr-uni-bochum.de
>
> again, important are 'np=1' and 'ntype=cluster'.
> but this only determines how you use the cpus and how many
> and should not affect the running of a parallel job.
>
>
>
> CR> #create nodelist
> CR> set nodelist = `cat $PBS_NODEFILE`
>
> CR> # calc number of nodes
> CR> set N = `wc $PBS_NODEFILE | awk '{print $1}'`
>
> CR> # create lamhost file
>
> CR> cat $PBS_NODEFILE > lamhosts
>
> >> lamboot -v hostlist
>
>
> ok, your script assumes a csh/tcsh syntax. have you verified, that
> this is actually the case? pbs usually passes the batch script to
> /bin/sh, if i remember correctly.
>
>
> CR> Works on the nodes (i.e. lamboot -v lamhosts)
>
> >>
> >> then you can start parallel cpmd by
> >>
> >> mpirun C cpmd.x inputfile > outputfile
>
> CR> What is "C" doing (is it equivalent to "c")
>
> no. C is like N but starts multiple copies if you
> have hosts with multiple cpus.
>
>
> >>
>
> CR> We tried this as well as
>
> CR> mpirun -O -s n0 N cpmd-mpi-lam-large.x test.inp > test.out
>
> CR> and
>
> CR> mpirun N cpmd-mpi-lam-large.x test.inp > test.out
>
> CR> but no sucess...
>
>
> all in all you can simplyfy that (and make it shell syntax independent)
> by just using the following script.
>
> cd $PBS_O_WORKDIR
>
> lamboot -v $PBS_NODELIST
>
> mpirun -O C cpmd-mpi-lam-large.x test.inp > test.out
>
> lamhalt -v
>
>
> CR> Should we copy cpmd.x and input to the remote nodes,
> CR> i.e., tried adding the following to the pbs script:
>
> CR> #
> CR> shift nodelist
> CR> foreach node ($nodelist)
> CR> rcp /scratch/{test.inp,cpmd-mpi-lam-large.x} ${node}:/scratch
> CR> end
>
> CR> Note that the calculation is perfomed on /scratch
> CR> First everything (cpmd.x, input) is copied to here (/scratch)
> CR> one cd's to /scratch and then possibly remote copies
> CR> cpmd.x, input to the nodes
>
> ok, but if you have a shared, nfs mounted home directory, you
> could put the pseudopotentials and the cpmd executables say in
> $HOME/cpmd and run it with:
>
> mpirun -O C $HOME/cpmd/cpmd-mpi-lam-large.x test.inp $HOME/cpmd >
> test.out
>
>
> >> or however you would run a serial cpmd job.
> >> after your job is finished you can stop the
> >> lam infrastructure with
> >>
> >> lamhalt -v
> >>
> >> or
> >>
> >> wipe -v hostlist
> >>
> CR> Also works fine
>
> >> if you have to submit your script to a batch system,
> >> then you have to determine how you get the list of
> >> allocated hosts from the batch system. with e.g.
> >> openpbs you have to use $PBS_NODEFILE instead of the
> >> file 'hostlist'.
>
> CR> Hope this is clear from the lines above
>
>
> yes, that was very helpful. if you still can not get it to work, you
> should also look into the stdout/stderr logs of the batch system.
> those are usually files with the name of the job script and an
> .e<jobid> or .o<jobid> appended.
>
>
> good luck,
> axel.
>
> >>
> >> i hope this helps.
> >>
> >> cheers,
> >> axel.
> >>
> >> >
> >> > Saludos,
> >> >
> >> > Carme
> >> >
> >> > -------------------------------------------------------------
> >> > Carme Rovira i Virgili Tel: +34 93 4037112
> >> > Centre de Recerca en Química Teòrica Fax: +34 93 4037225
> >> > Parc Científic de Barcelona (http://www.pcb.ub.es)
> >> > Josep Samitier 1-5 Annex A E-mail: crovira at pcb.ub.es
> >> > 08028 Barcelona, Spain URL:http://www.qf.ub.es/personal/crovira
> >> > --------------------------------------------------------------
> >> > _______________________________________________
> >> > CPMD-list mailing list
> >> > CPMD-list at cpmd.org
> >> > http://www.cpmd.org/mailman/listinfo/cpmd-list
> >> >
> >>
> >> --
> >>
> >> =======================================================================
> >> Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
> >> Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
> >> Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
> >> D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de
> >> =======================================================================
> >> If you make something idiot-proof, the universe creates a better idiot.
>
> CR> --
> CR> -------------------------------------------------------------
> CR> Carme Rovira i Virgili Tel: +34 93 4037112
> CR> Centre de Recerca en Química Teòrica Fax: +34 93 4037225
> CR> Parc Científic de Barcelona (http://www.pcb.ub.es)
> CR> Josep Samitier 1-5 Annex A E-mail: crovira at pcb.ub.es
> CR> 08028 Barcelona, Spain URL:http://www.qf.ub.es/personal/crovira
> CR> --------------------------------------------------------------
> CR> _______________________________________________
> CR> CPMD-list mailing list
> CR> CPMD-list at cpmd.org
> CR> http://www.cpmd.org/mailman/listinfo/cpmd-list
>
>
>
> --
>
> =======================================================================
> Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
> Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
> Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
> D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de
> =======================================================================
> If you make something idiot-proof, the universe creates a better idiot.
>
More information about the CPMD-list
mailing list