[CPMD-list] NCACHE size
Axel Kohlmeyer
axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Thu Aug 26 18:51:56 CEST 2004
On Thu, 26 Aug 2004, Philippe Blaise wrote:
PB> Hello,
hi!
PB>
PB> just to say that a value of ncache = 4 * 1024 is optimal on my opteron
PB> machine,
PB> but the difference between the default value 2 * 1024 and 4 * 1024 is
PB> below 10% in time.
well, even 5% saving means a lot, if you need to run your job for
weeks or months.
PB> More generally, it seems to me that when you play with the ncache value,
PB> you don't always obtain
PB> huge differences, it's probably due to compiler optimizations and large
PB> sizes of nowaday caches -
PB> so, does the ncache value effect was larger some years ago ?
PB> Anyway, the fft that comes with cpmd is very good, and I would be
PB> suprised if you
PB> obtain better performances with another one - for example on a NEC
PB> vector machine and
PB> on a alpha superscalar machine, the vendor implementations are not better.
that depends on the architecture. if your machine has little memory
bandwidth and few registers (like in the case of pentium-IV/Xeon/athlon)
clever use of prefetch instructions and exploiting the SIMD units
can be a large gain. e.g. on an athlon XP machine i found a 2.5 cpmd
speedup using ATLAS over a (fully optmized BLAS). compared to a
generic ATLAS (not using any cpu specific instructions) there was still
a 10% speedup (so this is mainly the effect of prefetching, since the
SIMD unit on an athlon XP does not support double precision).
salut,
axel kohlmeyer.
PB> Good luck,
PB>
PB> Philippe Blaise
PB>
PB>
PB> Axel Kohlmeyer wrote:
PB>
PB> >>>>"IK" == I Kozin <Kozin> writes:
PB> >>>>
PB> >>>>
PB> >
PB> >hello,
PB> >
PB> >IK> Hello,
PB> >IK> I'd appreciate to hear any comments on varing NCACHE size
PB> >IK> (as given in mltfft.F) on various platforms.
PB> >IK> Particularly Xeon, Itanium2, Opteron, Power4,5, NEC sx6.
PB> >
PB> >IK> In the file above NCACHE is given as 1024*N where N varies.
PB> >IK> So I'd guess it has to do with L1 cache.
PB> >IK> For any i386 NCACHE=1024*10.
PB> >IK> Xeon's L1 cache is 8 KB. So should N be 8?
PB> >
PB> >well. i don't know whether there was historically an explicit
PB> >relation to the L1 cache. but due to way modern cpus and compilers
PB> >work, (e.g. by automatic prefetching) there is none anymore.
PB> >for instance the i386 value came about by running a series of
PB> >calculations with different values for NCACHE and picking the
PB> >value that gave the on average best performance. since the various
PB> >x86 platforms have quite different characteristics internally,
PB> >this always has to be a compromise. if you want to squeeze out
PB> >the last bit of performance, you may want to tune it for your
PB> >specific machine and your specific example.
PB> >
PB> >btw: the best way to optimize the fft is use it as little as
PB> >possible (e.g. by using the REAL SPACE WFN KEEP keyword, provided
PB> >there is enough memory on your machine).
PB> >
PB> >IK> Itanium2: NCACHE=1024*8 but L1 cache size is 16 KB.
PB> >IK> This is the only machine I've experimented with so far.
PB> >IK> Taking N = 8, 16, 32 I found that N = 16 is marginally quicker
PB> >IK> (wat32 benchmark, run 1: 1372 s, 1363 s, 1388 s).
PB> >IK> So basically no need to bother.
PB> >
PB> >you should also try 'in-between' values.
PB> >
PB> >IK> More interesting cases:
PB> >IK> Itanium2 + HPUX: NCACHE=1024*64
PB> >IK> (obviously the L1 cache is the same as above)
PB> >
PB> >IK> Operon: NCACHE=1024*2 (default) but L1 cache is 64 KB.
PB> >
PB> >this has not been tuned yet. if you find an optimal value,
PB> >please let us know.
PB> >
PB> >IK> BTW, are there any efforts in attempting to use ACML or MKL
PB> >IK> on AMD and Intel respectively for FFT instead of default FFT?
PB> >
PB> >not that i know of. feel free to try. the default fft is quite
PB> >competitive, considering that it is portable fortran code.
PB> >IMO it would be more interesting to see, how well a port
PB> >to fftw3 performs.
PB> >
PB> >regards,
PB> > axel kohlmeyer.
PB> >
PB> >IK> Thanks,
PB> >
PB> >IK> Igor Kozin
PB> >IK> Computational Science & Engineering Dept.
PB> >IK> CCLRC Daresbury Laboratory
PB> >IK> Keckwick Lane
PB> >IK> Warrington
PB> >IK> WA4 4AD
PB> >IK> UK
PB> >
PB> >IK> i. kozin at dl.ac.uk
PB> >IK> +44 (0) 1925 603308
PB> >IK> http://www.cse.clrc.ac.uk/disco
PB> >IK> _______________________________________________
PB> >IK> CPMD-list mailing list
PB> >IK> CPMD-list at cpmd.org
PB> >IK> http://cpmd.org/mailman/listinfo/cpmd-list
PB> >
PB> >
PB> >
PB> >--
PB> >
PB> >=======================================================================
PB> >Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
PB> >Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
PB> >Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
PB> >D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
PB> >=======================================================================
PB> >If you make something idiot-proof, the universe creates a better idiot.
PB> >_______________________________________________
PB> >CPMD-list mailing list
PB> >CPMD-list at cpmd.org
PB> >http://cpmd.org/mailman/listinfo/cpmd-list
PB> >
PB> >
PB> >
PB>
PB> _______________________________________________
PB> CPMD-list mailing list
PB> CPMD-list at cpmd.org
PB> http://cpmd.org/mailman/listinfo/cpmd-list
PB>
PB>
--
=======================================================================
Dr. Axel Kohlmeyer e-mail: axel.kohlmeyer at rub.de
Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================
More information about the CPMD-list
mailing list