[CPMD-list] MPI jobs hang...for ever.
HW Sheng
hwsheng at jhu.edu
Fri Oct 4 20:30:47 CEST 2002
Hi, Axel.
> the reason for this question is, that some ethernet cards (and their
> linux drivers) are not well suited for the extreme load parallel cpmd
> jobs will create. we have made some bad experiences with 3com 3c905
> cards (in pc's though) and especially with the intel chipset based
> ethernet cards that originally came with our linux/alphas (i replaced them
> with then over 3 year old dec tulip chipset cards and they are very
> reliable).
>
Indeed, we are using 3com ethernet cards (precisely, 3com 3c905-TX, fast
etherlink) that came with the cluster. Had you experienced what we are
experiencing - the constant crashes? Thanks for pointing that out for us. We
apparently benefited a lot from your "bad experiences".
Axel, can you think of other remedies without replacing hardware components?
Would a different compiler other than Fortran Compaq work wonders?
> also, if you only have a 100MBit connection, you should better try
> to run 10 jobs, each on only a single smp node, or you will waste
> most of the available cpu power.
>
Ditto. Currently, we are testing CPMD on the smp mode. It's slightly
faster.
> if you cannot do this, you should seriously consider hooking up those
> machines with a small SCI or myrinet network, and you will probably more
> than double the 'usable' cpu power for large jobs. compared to the cost
> of the machines itself the high-speed interconnect will come rather cheap.
Sad to say, this revamp has to be deferred to a later time because of the
budget cap. But, we are looking forward to it.
I know it's tedious to maintain a cluster. Your input is invaluable to us,
and I am making progress. Kudos to you, Axel.
Howard
More information about the CPMD-list
mailing list