[CPMD-list] elapsed v. cpu time issue on Itanium
Axel Kohlmeyer
axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Thu Sep 9 20:01:57 CEST 2004
>>> "JN" == Jeff Nucciarone <nucci at psu.edu> writes:
JN> I recently compiled CPMD 3.9.1 on an Itanium2 Linux system using the
JN> version 8 Intel compiler and mpi-gm (Myrinet mpigm-1.2.5..10). I used
JN> Intel MKL version 7.
JN> I noticed a large disparity between the elapsed running time and cpu
JN> time when running the femd test case (al001geo.inp in the CPMD-test femd
JN> subdirectory).
JN> I ran the test case using 4 cpus. The following is a summation of the
JN> output:
JN> SUBROUTINE CALLS CPU TIME ELAPSED TIME
JN> FFTCOM 661040 62.81 63.20
JN> S_INVFFT 347217 57.20 59.47
JN> S_FWFFT 311799 52.98 53.68
JN> FFT-G/S 1981096 40.16 42.60
JN> RGS_C 2858 31.78 116.29
JN> EHPSI_C 26642 23.07 24.24
JN> EVPSI 240599 11.26 11.19
JN> FRIESNER_C 2830 9.52 10.23
JN> GLOSUM 389289 9.33 10.03
JN> VBETA 2830 8.12 8.10
JN> OVLAP2_C 27451 6.77 6.72
JN> RHOOFR_C 284 6.13 6.03
JN> JACOBI 154455 3.94 4.14
JN> CALC_BILN 26642 1.31 1.29
JN> OVLAP_H 5860 1.02 1.03
JN> W_WFNIO 20 0.74 0.93
JN> ----------------------------------------------------------------
JN> TOTAL TIME 326.13 419.17
JN> ****************************************************************
JN> CPU TIME : 0 HOURS 5 MINUTES 31.33 SECONDS
JN> ELAPSED TIME : 0 HOURS 7 MINUTES 4.64 SECONDS
JN> PROGRAM CPMD ENDED AT: Wed Aug 4 16:29:33 2004
JN> The big difference is in subroutine RGS_C:
JN> RGS_C 2858 31.78 116.29
JN> The difference between elapsed and cpu time accounts for over 90% of the
JN> difference in run times overall.
JN> I made several runs and this observation is consistent across all runs.
JN> Something in rgc_s seems to be triggering a lot of system activity.
JN> I also built versions of CPMD of Xeon (also using Myrinet MPI) and
JN> Opteron (using Infiniband and MVICH). neither of these platforms has
JN> this behavious and elapsed and cpu time for those machines is consistent.
JN> Has anyone else made this observation and if so is there any way to get
JN> around this performance issue?
jeff,
this is due to (automatic) multithreading in the MKL. if you
run on an smp-machine without setting OMP_NUM_THREADS explicitely to 1,
you will have multiple threads execute some of the LAPACK functions.
since the subthreads will not be accounted in cpu time, you get
the discrepancy.
regards,
axel.
JN> Thanks,
JN> --Jeff
JN> --
JN> Jeff Nucciarone nucci at psu.edu http://www.personal.psu.edu/nucci
JN> Senior Research Programmer, High Performance Computing Group, ITS/ASET
JN> The Pennsylvania State University
JN> "Don't just do it........ do it right."
JN> _______________________________________________
JN> CPMD-list mailing list
JN> CPMD-list at cpmd.org
JN> http://cpmd.org/mailman/listinfo/cpmd-list
--
=======================================================================
Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the CPMD-list
mailing list