[CPMD-list] Platform-dependent wave function optimization results
Axel Kohlmeyer
akohlmey at cmm.chem.upenn.edu
Tue May 1 21:06:52 CEST 2007
On Tue, 1 May 2007, Vladimir Stegailov wrote:
VS> Dear Axel,
dear vladimir,
VS> thank you for the prompt reply.
VS>
VS> > hard to tell. i don't know the code so well. it might be worth comparing
VS> > to a completely different platform. it looks a bit suspicious.
VS>
VS> I have made an attempt to perform the same WFO of the IBM cluster using xlf
VS> compiled code according to the IBM-JS20-ESSL-MPI-SMP predefined Makefile.
VS> The -O3 option stands there by default.
VS> I have not expected that but the results are _identical_ to the case of IFC
VS> v. 9.0 and MKL 8.0.1 with -O3 optimization.
that means, that either intel fortran and IBMs xlf employ the same
optimizations at this point or that the numbers are numerically
unstable. since you are using floating point numbers with limited
accuracy, not all algorithms can be matched 1:1 to the number space.
occationally, the order in which you sum up numbers can make
a difference.
VS> However when I changed the xlf option '-O3 -qstrict' to '-O0' the results
VS> become different and do not coincide with the -O0 case for ICF+MKL. I am
VS> going to make some more tests.
VS>
VS> I have another related question. One of the platforms I am working on,
VS> corresponds to the LINUX_IA64_INTEL-MPI case (just that case of IFC v. 9.0
VS> and MKL 8.0.1). By default there stands no optimization options at all. I
VS> had included -O3 deliberatly.
with intel fortran -O (which is equivalent to -O2) is the default (check
the documentation).
VS> But you wrote that for the IFC the '-O2 -unroll -pc64' options seem to be
VS> the best choice. Is it so for this case as well?
the itanium cpu is a significantly different design, but
the optimizations with -unroll should help here as well.
-pc64 is needed to set the rounding mode to 64bit default
precision. otherwise intel defaults to 80-bit extended
precision, which can have very bad side effects (e.g. real
space and g-space density not equal) since for as long as
a part of a calculation can stay in the floating point registers,
it will be done with 80-bit, but if it has to be stored in
memory, it will be rounded to 64-bit. this can lead to all
kind of inconsistencies. with -pc64 you are always in 64-bit
mode and there is not difference whether you compute in register
or not. also sine, cosine, exp, sqrt and related operations
and tiny bit faster in 64-bit mode (a few less iterations
within the FPU are needed until convergence...). this should
be less of an issue on itanium, but you are welcome to try.
my rule of the thumb is, small programs -> highest optimization,
large programs -> lower optimization
cheers,
axel.
VS>
VS> Kind regards,
VS> Vladimir
VS>
VS> ----- Original Message -----
VS> From: "Axel Kohlmeyer" <akohlmey at cmm.chem.upenn.edu>
VS> To: "Vladimir Stegailov" <stegailov at ihed.ras.ru>
VS> Cc: <cpmd-list at cpmd.org>
VS> Sent: Tuesday, May 01, 2007 6:44 PM
VS> Subject: Re: [CPMD-list] Platform-dependent wave function optimization
VS> results
VS>
VS>
VS> > On Tue, 1 May 2007, Vladimir Stegailov wrote:
VS> >
VS> > VS> Dear Axel,
VS> >
VS> > dear vladimir,
VS> >
VS> > [...]
VS> > VS> However that figures differ from the other case
VS> > VS> 3. cpmd compiled with IFC v. 9.0 and MKL 8.0.1, -O3 optimization
VS> >
VS> > -O3 is generally best avoided with the intel compiler on any
VS> > quantum chemistry package. as of version 9.0 a few more aggressive
VS> > optimizations are enabled by default for -O3. regardles, i usually
VS> > found that '-O2 -unroll -pc64' to be the best combination of flags
VS> > for many quantum chemical software packages to get a fast and
VS> > sufficiently accurate/reliable compilation with intel compilers.
VS> > in fact on some machines (older pentium 4) it turned out to be
VS> > up to 20% faster than using -O3 + SSE vectorization (each contributed
VS> > about half of the slowdown).
VS> >
VS> >
VS> > VS> For example, the output from the very first lanczos diagonalization
VS> > step
VS> > VS> in the 1st and 2nd case is
VS> > VS> <<1:4<<<<<<<<<<<<<< LANCZOS DIAGONALIZATION <<<<<<<<<<<<<<<<<<<<
VS> > VS> >> TIME FOR INITIAL SUBSPACE DIAGONALIZATION: 21.98
VS> > VS> >> CYCLE NCONV B2MAX B2MIN #HPSI TIME
VS> > VS> 1 0 3.367E-07 2.547E-11 6.00 77.39
VS> > VS>
VS> > VS> in the 3rd case is
VS> > VS> <<1:4<<<<<<<<<<<<<< LANCZOS DIAGONALIZATION <<<<<<<<<<<<<<<<<<<<
VS> > VS> >> TIME FOR INITIAL SUBSPACE DIAGONALIZATION: 0.61
VS> > VS> >> CYCLE NCONV B2MAX B2MIN #HPSI TIME
VS> > VS> 1 0 3.360E-07 2.475E-11 6.00 2.18
VS> > VS>
VS> > VS> In the 3rd case WFO goes without warnings as well.
VS> > VS>
VS> > VS> Should I consider this difference in the output from the -O0 and -O3
VS> > binaries as an error?
VS> >
VS> > hard to tell. i don't know the code so well. it might be worth comparing
VS> > to a completely different platform. it looks a bit suspicious.
VS> >
VS> > VS> Is it acceptable that the output changes slightly (on the round-off
VS> > VS> errors level) when the optimization level is increased?
VS> >
VS> > yes. some rounding errors are to be expected, but they are difficult
VS> > to distinguish from memory corruption or not properly initialized
VS> > arrays...
VS> > the intel compilers have the -zero flag, might be worth trying out.
VS> > if you see a difference, that you have some uninitialized arrays.
VS> >
VS> > VS> > not a good sign. please note that the FEMD code is not thoroughly
VS> > VS> > tested on linux compilers.
VS> >
VS> > VS> Please could you specify on which compilers the FEMD code was
VS> > VS> tested: xlf, PGI fortran ... ?
VS> >
VS> > the last test of FEMD that i am aware of was in 2003 on an IBM regatta
VS> > using cpmd 3.7.2 and the IBM xlf compilers. most of the older
VS> > parts of the code were actually developed on IBM workstations!
VS> >
VS> > cheers,
VS> > axel.
VS> >
VS> > VS>
VS> > VS> Thank you.
VS> > VS>
VS> > VS> Kind regards,
VS> > VS> Vladimir
VS> > VS>
VS> > VS>
VS> > VS> ----- Original Message -----
VS> > VS> From: "Axel Kohlmeyer" <akohlmey at cmm.chem.upenn.edu>
VS> > VS> To: "Vladimir Stegailov" <stegailov at ihed.ras.ru>
VS> > VS> Cc: <cpmd-list at cpmd.org>
VS> > VS> Sent: Thursday, April 26, 2007 9:47 PM
VS> > VS> Subject: Re: [CPMD-list] Platform-dependent wave function optimization
VS> > results
VS> > VS>
VS> > VS>
VS> > VS> > On Thu, 26 Apr 2007, Vladimir Stegailov wrote:
VS> > VS> >
VS> > VS> > vladimir,
VS> > VS> >
VS> > VS> >
VS> > VS> > VS> Dear colleagues,
VS> > VS> > VS>
VS> > VS> >
VS> > VS> > VS> is it normal to get essentially different WFO processes on
VS> > different
VS> > VS> > VS> platforms using the same input script?
VS> > VS> >
VS> > VS> > how different is different. there are some small differences
VS> > VS> > possible, but within the accuracy of the method and parameters
VS> > VS> > you should get the same results.
VS> > VS> >
VS> > VS> > it is most likely, that you have a miscompiled binary
VS> > VS> > or a library with errors or numerical instabilities.
VS> > VS> >
VS> > VS> > VS> I use the 3.11 version and compare two platforms:
VS> > VS> > VS> 1. cpmd compiled with IFC v. 8.0 and libatlas_p4.a
VS> > VS> >
VS> > VS> > intel fortran 8.0 had a lot of problems, particularly
VS> > VS> > the original release. you have to upgrade to the latest
VS> > VS> > patchlevel, best to the latest patchlevel of 8.1, which
VS> > VS> > works quite reliable. if this is the bochum atlas binary,
VS> > VS> > please keep in mind, that it is now very old and has not
VS> > VS> > been tested thoroughly tested against less frequently
VS> > VS> > used parts of cpmd. it should be easy to cross check
VS> > VS> > against a different BLAS/LAPACK, e.g. mkl to see if
VS> > VS> > it the problem is in the compiler or the BLAS support.
VS> > VS> >
VS> > VS> > VS> 2. cpmd compiled with IFC v. 9.0 and MKL 8.0.1
VS> > VS> >
VS> > VS> > intel 9.0 in the original release was also very problematic,
VS> > VS> > though not as bad as 8.0. but please make sure you upgraded
VS> > VS> > to the latest patchlevel.
VS> > VS> >
VS> > VS> > VS> The input script is given below.
VS> > VS> >
VS> > VS> > VS> In the 1st case the WFO process initially gives several
VS> > "FRIESNER_C|
VS> > VS> > VS> EIGENVECTOR 4 IS VERY BAD!" warnings, but eventually goes well
VS> > and
VS> > VS> > VS> stops at NFI=179.
VS> > VS> >
VS> > VS> > not a good sign. please note that the FEMD code is not thoroughly
VS> > VS> > tested on linux compilers. i would recommend a cross-check compiling
VS> > VS> > everthing with -zero to make sure everthing is initialized. it
VS> > appears
VS> > VS> > that some parts of the code (still) rely on the fact that the
VS> > compiler
VS> > VS> > does this for you.
VS> > VS> >
VS> > VS> > VS> In the 2nd case there are no warnings and WFO stops much quicker
VS> > at NFI=44.
VS> > VS> >
VS> > VS> > VS> Is the reason just the difference in FFT libraries? Could it be
VS> > VS> > VS> specific to the FEMD simulations?
VS> > VS> >
VS> > VS> > there is no indication that you have a different FFT. but also
VS> > different
VS> > VS> > BLAS implementations can use (slightly) different algorithms and
VS> > thus
VS> > VS> > produce differences that may direct the wavefunction optimization
VS> > into
VS> > VS> > a different direction. since the FEMD code is used infrequently,
VS> > there
VS> > VS> > is a much higher chance that some of it gets miscompiled. it can
VS> > also
VS> > VS> > be more sensitive to small differences in the libraries or compiler
VS> > VS> > optimizations.
VS> > VS> >
VS> > VS> > the simplest tests are swapping libraries, turning off compiler
VS> > VS> > optimization (-O0) and swapping compilers. the risk of
VS> > miscompilation
VS> > VS> > increases with higher optimization level.
VS> > VS> >
VS> > VS> > cheers,
VS> > VS> > axel.
VS> > VS> >
VS> > VS> > VS>
VS> > VS> > VS> I would appreciate any comments!
VS> > VS> > VS>
VS> > VS> > VS> Vladimir
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS> &CPMD
VS> > VS> > VS>
VS> > VS> > VS> FILEPATH
VS> > VS> > VS>
VS> > VS> > VS> /home/stegailov/testing/cpmd/md3/
VS> > VS> > VS>
VS> > VS> > VS> OPTIMIZE WAVEFUNCTION
VS> > VS> > VS>
VS> > VS> > VS> UNIT HESSIAN
VS> > VS> > VS>
VS> > VS> > VS> BFGS
VS> > VS> > VS>
VS> > VS> > VS> FREE ENERGY FUNCTIONAL
VS> > VS> > VS>
VS> > VS> > VS> LANCZOS DIAGONALISATION
VS> > VS> > VS>
VS> > VS> > VS> LANCZOS PARAMETERS
VS> > VS> > VS>
VS> > VS> > VS> 1 6 10 1.D-18
VS> > VS> > VS>
VS> > VS> > VS> TROTTER FACTOR
VS> > VS> > VS>
VS> > VS> > VS> 0.001
VS> > VS> > VS>
VS> > VS> > VS> BOGOLIUBOV CORRECTION OFF
VS> > VS> > VS>
VS> > VS> > VS> GRAM-SCHMIDT ORTHOGONALISATION
VS> > VS> > VS>
VS> > VS> > VS> CONVERGENCE
VS> > VS> > VS>
VS> > VS> > VS> 1.D-6 5.D-6
VS> > VS> > VS>
VS> > VS> > VS> MAXSTEP
VS> > VS> > VS>
VS> > VS> > VS> 5000
VS> > VS> > VS>
VS> > VS> > VS> BROYDEN MIXING
VS> > VS> > VS>
VS> > VS> > VS> 0.15 200 0.01 0 8
VS> > VS> > VS>
VS> > VS> > VS> ALEXANDER MIXING
VS> > VS> > VS>
VS> > VS> > VS> 1.1
VS> > VS> > VS>
VS> > VS> > VS> TEMPERATURE
VS> > VS> > VS>
VS> > VS> > VS> 400.
VS> > VS> > VS>
VS> > VS> > VS> ELECTRON TEMPERATURE
VS> > VS> > VS>
VS> > VS> > VS> 10000.
VS> > VS> > VS>
VS> > VS> > VS> COMPRESS WRITE32
VS> > VS> > VS>
VS> > VS> > VS> STRUCTURE BONDS
VS> > VS> > VS>
VS> > VS> > VS> ENERGYBANDS
VS> > VS> > VS>
VS> > VS> > VS> ELECTROSTATIC POTENTIAL
VS> > VS> > VS>
VS> > VS> > VS> RHOOUT
VS> > VS> > VS>
VS> > VS> > VS> &END
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS> &SYSTEM
VS> > VS> > VS>
VS> > VS> > VS> POINT GROUP
VS> > VS> > VS>
VS> > VS> > VS> AUTO
VS> > VS> > VS>
VS> > VS> > VS> SYMMETRY
VS> > VS> > VS>
VS> > VS> > VS> 1
VS> > VS> > VS>
VS> > VS> > VS> CELL
VS> > VS> > VS>
VS> > VS> > VS> 8.064 1.0 1.0 0.0 0.0 0.0 (8.064=2*4.032, 4.032A
VS> > is the eq lattice const of Al)
VS> > VS> > VS>
VS> > VS> > VS> CUTOFF
VS> > VS> > VS>
VS> > VS> > VS> 15.000
VS> > VS> > VS>
VS> > VS> > VS> ANGSTROMS
VS> > VS> > VS>
VS> > VS> > VS> STATES
VS> > VS> > VS>
VS> > VS> > VS> 250
VS> > VS> > VS>
VS> > VS> > VS> SCALE
VS> > VS> > VS>
VS> > VS> > VS> TESR
VS> > VS> > VS>
VS> > VS> > VS> 1
VS> > VS> > VS>
VS> > VS> > VS> KPOINTS MONKHORST-PACK FULL
VS> > VS> > VS>
VS> > VS> > VS> 2 2 2
VS> > VS> > VS>
VS> > VS> > VS> &END
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS> &ATOMS
VS> > VS> > VS>
VS> > VS> > VS> *AL_SGS KLEINMAN-BYLANDER
VS> > VS> > VS>
VS> > VS> > VS> LMAX=D
VS> > VS> > VS>
VS> > VS> > VS> 32
VS> > VS> > VS>
VS> > VS> > VS> 0.0178998 -0.0197471 0.0145112
VS> > VS> > VS>
VS> > VS> > VS> ...
VS> > VS> > VS>
VS> > VS> > VS> 0.500011 0.750491 0.756745
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS> VELOCITIES
VS> > VS> > VS>
VS> > VS> > VS> 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
VS> > 24 25 26 27 28 29 30 31 32
VS> > VS> > VS>
VS> > VS> > VS> -5.10982 0.101812 -1.43574
VS> > VS> > VS>
VS> > VS> > VS> ...
VS> > VS> > VS>
VS> > VS> > VS> 7.04835 2.96887 2.84766
VS> > VS> > VS>
VS> > VS> > VS> END VELOCITIES
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS> &END
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS> &BASIS
VS> > VS> > VS>
VS> > VS> > VS> PSEUDO AO 2
VS> > VS> > VS>
VS> > VS> > VS> 0 1
VS> > VS> > VS>
VS> > VS> > VS> &END
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS> &DFT
VS> > VS> > VS>
VS> > VS> > VS> NEWCODE
VS> > VS> > VS>
VS> > VS> > VS> &END
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> > VS>
VS> > VS> >
VS> > VS> > --
VS> > VS> >
VS> > =======================================================================
VS> > VS> > Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
VS> > http://www.cmm.upenn.edu
VS> > VS> > Center for Molecular Modeling -- University of Pennsylvania
VS> > VS> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA
VS> > 19104-6323
VS> > VS> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel:
VS> > 1-215-898-5425
VS> > VS> >
VS> > =======================================================================
VS> > VS> > If you make something idiot-proof, the universe creates a better
VS> > idiot.
VS> > VS> >
VS> >
VS> > --
VS> > =======================================================================
VS> > Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
VS> > Center for Molecular Modeling -- University of Pennsylvania
VS> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
VS> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
VS> > =======================================================================
VS> > If you make something idiot-proof, the universe creates a better idiot.
VS> >
VS>
--
=======================================================================
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the CPMD-list
mailing list