[CPMD-list] NCACHE size
Axel Kohlmeyer
axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Thu Aug 26 12:56:08 CEST 2004
>>> "IK" == I Kozin <Kozin> writes:
hello,
IK> Hello,
IK> I'd appreciate to hear any comments on varing NCACHE size
IK> (as given in mltfft.F) on various platforms.
IK> Particularly Xeon, Itanium2, Opteron, Power4,5, NEC sx6.
IK> In the file above NCACHE is given as 1024*N where N varies.
IK> So I'd guess it has to do with L1 cache.
IK> For any i386 NCACHE=1024*10.
IK> Xeon's L1 cache is 8 KB. So should N be 8?
well. i don't know whether there was historically an explicit
relation to the L1 cache. but due to way modern cpus and compilers
work, (e.g. by automatic prefetching) there is none anymore.
for instance the i386 value came about by running a series of
calculations with different values for NCACHE and picking the
value that gave the on average best performance. since the various
x86 platforms have quite different characteristics internally,
this always has to be a compromise. if you want to squeeze out
the last bit of performance, you may want to tune it for your
specific machine and your specific example.
btw: the best way to optimize the fft is use it as little as
possible (e.g. by using the REAL SPACE WFN KEEP keyword, provided
there is enough memory on your machine).
IK> Itanium2: NCACHE=1024*8 but L1 cache size is 16 KB.
IK> This is the only machine I've experimented with so far.
IK> Taking N = 8, 16, 32 I found that N = 16 is marginally quicker
IK> (wat32 benchmark, run 1: 1372 s, 1363 s, 1388 s).
IK> So basically no need to bother.
you should also try 'in-between' values.
IK> More interesting cases:
IK> Itanium2 + HPUX: NCACHE=1024*64
IK> (obviously the L1 cache is the same as above)
IK> Operon: NCACHE=1024*2 (default) but L1 cache is 64 KB.
this has not been tuned yet. if you find an optimal value,
please let us know.
IK> BTW, are there any efforts in attempting to use ACML or MKL
IK> on AMD and Intel respectively for FFT instead of default FFT?
not that i know of. feel free to try. the default fft is quite
competitive, considering that it is portable fortran code.
IMO it would be more interesting to see, how well a port
to fftw3 performs.
regards,
axel kohlmeyer.
IK> Thanks,
IK> Igor Kozin
IK> Computational Science & Engineering Dept.
IK> CCLRC Daresbury Laboratory
IK> Keckwick Lane
IK> Warrington
IK> WA4 4AD
IK> UK
IK> i. kozin at dl.ac.uk
IK> +44 (0) 1925 603308
IK> http://www.cse.clrc.ac.uk/disco
IK> _______________________________________________
IK> CPMD-list mailing list
IK> CPMD-list at cpmd.org
IK> http://cpmd.org/mailman/listinfo/cpmd-list
--
=======================================================================
Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the CPMD-list
mailing list