[CPMD-list] Platform-dependent wave function optimization results

Vladimir Stegailov stegailov at ihed.ras.ru
Tue May 1 21:00:31 CEST 2007


Dear Axel,

thank you for the prompt reply.

> hard to tell. i don't know the code so well. it might be worth comparing
> to a completely different platform. it looks a bit suspicious.

I have made an attempt to perform the same WFO of the IBM cluster using xlf 
compiled code according to the IBM-JS20-ESSL-MPI-SMP predefined Makefile. 
The -O3 option stands there by default.
I have not expected that but the results are _identical_ to the case of IFC 
v. 9.0 and MKL 8.0.1 with -O3 optimization.

However when I changed the xlf option '-O3 -qstrict' to '-O0' the results 
become different and do not coincide with the -O0 case for ICF+MKL. I am 
going to make some more tests.

I have another related question. One of the platforms I am working on, 
corresponds to the LINUX_IA64_INTEL-MPI case (just that case of IFC v. 9.0 
and MKL 8.0.1). By default there stands no optimization options at all. I 
had included -O3 deliberatly.
But you wrote that for the IFC the '-O2 -unroll -pc64' options seem to be 
the best choice. Is it so for this case as well?

Kind regards,
Vladimir

----- Original Message ----- 
From: "Axel Kohlmeyer" <akohlmey at cmm.chem.upenn.edu>
To: "Vladimir Stegailov" <stegailov at ihed.ras.ru>
Cc: <cpmd-list at cpmd.org>
Sent: Tuesday, May 01, 2007 6:44 PM
Subject: Re: [CPMD-list] Platform-dependent wave function optimization 
results


> On Tue, 1 May 2007, Vladimir Stegailov wrote:
>
> VS> Dear Axel,
>
> dear vladimir,
>
> [...]
> VS> However that figures differ from the other case
> VS>     3. cpmd compiled with IFC v. 9.0 and MKL 8.0.1, -O3 optimization
>
> -O3 is generally best avoided with the intel compiler on any
> quantum chemistry package. as of version 9.0 a few more aggressive
> optimizations are enabled by default for -O3. regardles, i usually
> found that '-O2 -unroll -pc64' to be the best combination of flags
> for many quantum chemical software packages to get a fast and
> sufficiently accurate/reliable compilation with intel compilers.
> in fact on some machines (older pentium 4) it turned out to be
> up to 20% faster than using -O3 + SSE vectorization (each contributed
> about half of the slowdown).
>
>
> VS> For example, the output from the very first lanczos diagonalization 
> step
> VS> in the 1st and 2nd case is
> VS>  <<1:4<<<<<<<<<<<<<< LANCZOS DIAGONALIZATION <<<<<<<<<<<<<<<<<<<<
> VS>  >> TIME FOR INITIAL SUBSPACE DIAGONALIZATION:              21.98
> VS>  >> CYCLE     NCONV        B2MAX        B2MIN     #HPSI      TIME
> VS>         1         0    3.367E-07    2.547E-11      6.00     77.39
> VS>
> VS> in the 3rd case is
> VS> <<1:4<<<<<<<<<<<<<< LANCZOS DIAGONALIZATION <<<<<<<<<<<<<<<<<<<<
> VS>  >> TIME FOR INITIAL SUBSPACE DIAGONALIZATION:               0.61
> VS>  >> CYCLE     NCONV        B2MAX        B2MIN     #HPSI      TIME
> VS>         1         0    3.360E-07    2.475E-11      6.00      2.18
> VS>
> VS> In the 3rd case WFO goes without warnings as well.
> VS>
> VS> Should I consider this difference in the output from the -O0 and -O3 
> binaries as an error?
>
> hard to tell. i don't know the code so well. it might be worth comparing
> to a completely different platform. it looks a bit suspicious.
>
> VS> Is it acceptable that the output changes slightly (on the round-off
> VS> errors level) when the optimization level is increased?
>
> yes. some rounding errors are to be expected, but they are difficult
> to distinguish from memory corruption or not properly initialized
> arrays...
> the intel compilers have the -zero flag, might be worth trying out.
> if you see a difference, that you have some uninitialized arrays.
>
> VS> > not a good sign. please note that the FEMD code is not thoroughly
> VS> > tested on linux compilers.
>
> VS> Please could you specify on which compilers the FEMD code was
> VS> tested: xlf, PGI fortran ... ?
>
> the last test of FEMD that i am aware of was in 2003 on an IBM regatta
> using cpmd 3.7.2 and the IBM xlf compilers. most of the older
> parts of the code were actually developed on IBM workstations!
>
> cheers,
>   axel.
>
> VS>
> VS> Thank you.
> VS>
> VS> Kind regards,
> VS> Vladimir
> VS>
> VS>
> VS> ----- Original Message ----- 
> VS> From: "Axel Kohlmeyer" <akohlmey at cmm.chem.upenn.edu>
> VS> To: "Vladimir Stegailov" <stegailov at ihed.ras.ru>
> VS> Cc: <cpmd-list at cpmd.org>
> VS> Sent: Thursday, April 26, 2007 9:47 PM
> VS> Subject: Re: [CPMD-list] Platform-dependent wave function optimization 
> results
> VS>
> VS>
> VS> > On Thu, 26 Apr 2007, Vladimir Stegailov wrote:
> VS> >
> VS> > vladimir,
> VS> >
> VS> >
> VS> > VS> Dear colleagues,
> VS> > VS>
> VS> >
> VS> > VS> is it normal to get essentially different WFO processes on 
> different
> VS> > VS> platforms using the same input script?
> VS> >
> VS> > how different is different. there are some small differences
> VS> > possible, but within the accuracy of the method and parameters
> VS> > you should get the same results.
> VS> >
> VS> > it is most likely, that you have a miscompiled binary
> VS> > or a library with errors or numerical instabilities.
> VS> >
> VS> > VS> I use the 3.11 version and compare two platforms:
> VS> > VS> 1. cpmd compiled with IFC v. 8.0 and libatlas_p4.a
> VS> >
> VS> > intel fortran 8.0 had a lot of problems, particularly
> VS> > the original release. you have to upgrade to the latest
> VS> > patchlevel, best to the latest patchlevel of 8.1, which
> VS> > works quite reliable. if this is the bochum atlas binary,
> VS> > please keep in mind, that it is now very old and has not
> VS> > been tested thoroughly tested against less frequently
> VS> > used parts of cpmd. it should be easy to cross check
> VS> > against a different BLAS/LAPACK, e.g. mkl to see if
> VS> > it the problem is in the compiler or the BLAS support.
> VS> >
> VS> > VS> 2. cpmd compiled with IFC v. 9.0 and MKL 8.0.1
> VS> >
> VS> > intel 9.0 in the original release was also very problematic,
> VS> > though not as bad as 8.0. but please make sure you upgraded
> VS> > to the latest patchlevel.
> VS> >
> VS> > VS> The input script is given below.
> VS> >
> VS> > VS> In the 1st case the WFO process initially gives several 
> "FRIESNER_C|
> VS> > VS> EIGENVECTOR 4 IS VERY BAD!" warnings, but eventually goes well 
> and
> VS> > VS> stops at NFI=179.
> VS> >
> VS> > not a good sign. please note that the FEMD code is not thoroughly
> VS> > tested on linux compilers. i would recommend a cross-check compiling
> VS> > everthing with -zero to make sure everthing is initialized. it 
> appears
> VS> > that some parts of the code (still) rely on the fact that the 
> compiler
> VS> > does this for you.
> VS> >
> VS> > VS> In the 2nd case there are no warnings and WFO stops much quicker 
> at NFI=44.
> VS> >
> VS> > VS> Is the reason just the difference in FFT libraries? Could it be
> VS> > VS> specific to the FEMD simulations?
> VS> >
> VS> > there is no indication that you have a different FFT. but also 
> different
> VS> > BLAS implementations can use (slightly) different algorithms and 
> thus
> VS> > produce differences that may direct the wavefunction optimization 
> into
> VS> > a different direction. since the FEMD code is used infrequently, 
> there
> VS> > is a much higher chance that some of it gets miscompiled. it can 
> also
> VS> > be more sensitive to small differences in the libraries or compiler
> VS> > optimizations.
> VS> >
> VS> > the simplest tests are swapping libraries, turning off compiler
> VS> > optimization (-O0) and swapping compilers. the risk of 
> miscompilation
> VS> > increases with higher optimization level.
> VS> >
> VS> > cheers,
> VS> >    axel.
> VS> >
> VS> > VS>
> VS> > VS> I would appreciate any comments!
> VS> > VS>
> VS> > VS> Vladimir
> VS> > VS>
> VS> > VS>
> VS> > VS> &CPMD
> VS> > VS>
> VS> > VS>     FILEPATH
> VS> > VS>
> VS> > VS>       /home/stegailov/testing/cpmd/md3/
> VS> > VS>
> VS> > VS>     OPTIMIZE WAVEFUNCTION
> VS> > VS>
> VS> > VS>     UNIT HESSIAN
> VS> > VS>
> VS> > VS>     BFGS
> VS> > VS>
> VS> > VS>     FREE ENERGY FUNCTIONAL
> VS> > VS>
> VS> > VS>     LANCZOS DIAGONALISATION
> VS> > VS>
> VS> > VS>     LANCZOS PARAMETERS
> VS> > VS>
> VS> > VS>       1   6 10   1.D-18
> VS> > VS>
> VS> > VS>     TROTTER FACTOR
> VS> > VS>
> VS> > VS>       0.001
> VS> > VS>
> VS> > VS>     BOGOLIUBOV CORRECTION OFF
> VS> > VS>
> VS> > VS>     GRAM-SCHMIDT ORTHOGONALISATION
> VS> > VS>
> VS> > VS>     CONVERGENCE
> VS> > VS>
> VS> > VS>       1.D-6  5.D-6
> VS> > VS>
> VS> > VS>     MAXSTEP
> VS> > VS>
> VS> > VS>       5000
> VS> > VS>
> VS> > VS>     BROYDEN MIXING
> VS> > VS>
> VS> > VS>       0.15 200   0.01  0   8
> VS> > VS>
> VS> > VS>     ALEXANDER MIXING
> VS> > VS>
> VS> > VS>       1.1
> VS> > VS>
> VS> > VS>     TEMPERATURE
> VS> > VS>
> VS> > VS>       400.
> VS> > VS>
> VS> > VS>     ELECTRON TEMPERATURE
> VS> > VS>
> VS> > VS>       10000.
> VS> > VS>
> VS> > VS>     COMPRESS WRITE32
> VS> > VS>
> VS> > VS>     STRUCTURE BONDS
> VS> > VS>
> VS> > VS>     ENERGYBANDS
> VS> > VS>
> VS> > VS>     ELECTROSTATIC POTENTIAL
> VS> > VS>
> VS> > VS>     RHOOUT
> VS> > VS>
> VS> > VS>  &END
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>  &SYSTEM
> VS> > VS>
> VS> > VS>    POINT GROUP
> VS> > VS>
> VS> > VS>     AUTO
> VS> > VS>
> VS> > VS>    SYMMETRY
> VS> > VS>
> VS> > VS>     1
> VS> > VS>
> VS> > VS>    CELL
> VS> > VS>
> VS> > VS>     8.064    1.0    1.0   0.0 0.0 0.0   (8.064=2*4.032, 4.032A 
> is the eq lattice const of Al)
> VS> > VS>
> VS> > VS>    CUTOFF
> VS> > VS>
> VS> > VS>     15.000
> VS> > VS>
> VS> > VS>    ANGSTROMS
> VS> > VS>
> VS> > VS>    STATES
> VS> > VS>
> VS> > VS>     250
> VS> > VS>
> VS> > VS>    SCALE
> VS> > VS>
> VS> > VS>    TESR
> VS> > VS>
> VS> > VS>      1
> VS> > VS>
> VS> > VS>    KPOINTS MONKHORST-PACK FULL
> VS> > VS>
> VS> > VS>    2  2  2
> VS> > VS>
> VS> > VS>  &END
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>  &ATOMS
> VS> > VS>
> VS> > VS> *AL_SGS KLEINMAN-BYLANDER
> VS> > VS>
> VS> > VS>    LMAX=D
> VS> > VS>
> VS> > VS>    32
> VS> > VS>
> VS> > VS> 0.0178998 -0.0197471 0.0145112
> VS> > VS>
> VS> > VS> ...
> VS> > VS>
> VS> > VS> 0.500011 0.750491 0.756745
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS> VELOCITIES
> VS> > VS>
> VS> > VS> 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
> 24 25 26 27 28 29 30 31 32
> VS> > VS>
> VS> > VS> -5.10982 0.101812 -1.43574
> VS> > VS>
> VS> > VS> ...
> VS> > VS>
> VS> > VS> 7.04835 2.96887 2.84766
> VS> > VS>
> VS> > VS> END VELOCITIES
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>   &END
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>  &BASIS
> VS> > VS>
> VS> > VS>      PSEUDO AO 2
> VS> > VS>
> VS> > VS>       0  1
> VS> > VS>
> VS> > VS>  &END
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>  &DFT
> VS> > VS>
> VS> > VS>    NEWCODE
> VS> > VS>
> VS> > VS>  &END
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> > VS>
> VS> >
> VS> > -- 
> VS> > 
> =======================================================================
> VS> > Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu 
> http://www.cmm.upenn.edu
> VS> >   Center for Molecular Modeling   --   University of Pennsylvania
> VS> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 
> 19104-6323
> VS> > tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 
> 1-215-898-5425
> VS> > 
> =======================================================================
> VS> > If you make something idiot-proof, the universe creates a better 
> idiot.
> VS> >
>
> -- 
> =======================================================================
> Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
>   Center for Molecular Modeling   --   University of Pennsylvania
> Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
> tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
> =======================================================================
> If you make something idiot-proof, the universe creates a better idiot.
> 



More information about the CPMD-list mailing list