[CPMD-list] MEMORY ALLOCATION FAILED at Cray XT3

Axel Kohlmeyer akohlmey at vitae.cmm.upenn.edu
Thu Aug 24 06:50:43 CEST 2006


On Wed, 23 Aug 2006, Alexandr Isayev wrote:

dear alex,

the behavior you are describing is consistent with other machines.
your job is acutally trying to allocate 1.1GB memory (each word is
8 bytes) when it has less memory left on the heap.

this is a 'feature' of the current implementation of the 
atomic guess. after that is created, the memory is freed
again. so as a workaround please try setting INITIALIZE
WAVEFUNCTION RANDOM (especially when you are already
restarting, i.e. not using the atomic guess at all.

for most machines, this 'waste' of memory does not show,
since it only happens during the initialization, but
on the XT3 you have no swap (same as on a BG/L btw),
so your job crashes.

cheers,
   axel.

AI> Dear CPMD community:
AI> 
AI> I experienced strange cpmd (v3.11.1) behavior at Cray XT3.
AI> *ANY* job that requires more than few hundred Mb of memory fails with
AI> 
AI> PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   0]
AI> 
AI> However,this particular XT3 has 2gb per CPU. First, I thought that yod
AI> can't allocate required amount of memory by default. I played with
AI> different setting, etc; it does not change anything. I also worked with
AI> local admins. They tested and found no problems with yod, catamount
AI> nodes, etc. It can allocate more than 1.8G with default settings.
AI> I also recompiled the code with default XT3 settings, but it did not
AI> help either.
AI> 
AI> This particular example below, just a standard box with 216 waters,
AI> PBE, 80Ry cutoff. It needs about 1Gb per CPU or less.
AI> 
AI> Can someone confirms my observations with other XT3s?
AI> 
AI> Thank you in advance,
AI> Alexandr
AI> 
AI> Relevant output part is attached below:
AI> ==================================================================
AI> 
AI>  PARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARA
AI>   NCPU     NGW     NHG  PLANES  GXRAYS  HXRAYS ORBITALS Z-PLANES
AI>      0   16665  133357      13     498    1986      54       1
AI>      1   16669  133355      14     498    1986      54       1
AI>      2   16669  133337      13     498    1986      54       1
AI>      3   16669  133333      14     498    1986      54       1
AI>      4   16669  133349      13     498    1986      54       1
AI>      5   16667  133340      14     498    1988      54       1
AI>      6   16665  133340      13     498    1988      54       1
AI>      7   16669  133336      14     498    1988      54       1
AI>      8   16669  133244      13     497    1987      54       1
AI>      9   16669  133318      14     498    1988      54       1
AI>     10   16669  133332      13     498    1988      54       1
AI>     11   16663  133332      14     498    1988      54       1
AI>     12   16661  133338      13     498    1988      54       1
AI>     13   16661  133324      14     498    1988      54       1
AI>     14   16659  133318      13     498    1988      54       1
AI>     15   16656  133322      14     496    1988      54       1
AI>                 G=0 COMPONENT ON PROCESSOR :     8
AI>  PARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARAPARA
AI> 
AI>  ***    LOADPA|  CURRENT HEAP USED/FREE    86007/ 1824776 kBytes ***
AI> 
AI>  OPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPEN
AI>  NUMBER OF CPUS PER TASK                                        1
AI>  OPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPEN
AI> 
AI>  ***     RGGEN|  CURRENT HEAP USED/FREE    90791/ 1819992 kBytes ***
AI> 
AI>  ************************** SUPERCELL ***************************
AI>  SYMMETRY:                                           SIMPLE CUBIC
AI>  LATTICE CONSTANT(a.u.):                                 35.34045
AI>  CELL DIMENSION:  35.3404  1.0000  1.0000  0.0000  0.0000  0.0000
AI>  VOLUME(OMEGA IN BOHR^3):                             44138.34843
AI>  LATTICE VECTOR A1(BOHR):           35.3404     0.0000     0.0000
AI>  LATTICE VECTOR A2(BOHR):            0.0000    35.3404     0.0000
AI>  LATTICE VECTOR A3(BOHR):            0.0000     0.0000    35.3404
AI>  RECIP. LAT. VEC. B1(2Pi/BOHR):      0.0283     0.0000     0.0000
AI>  RECIP. LAT. VEC. B2(2Pi/BOHR):      0.0000     0.0283     0.0000
AI>  RECIP. LAT. VEC. B3(2Pi/BOHR):      0.0000     0.0000     0.0283
AI>  REAL SPACE MESH:                   216          216          216
AI>  WAVEFUNCTION CUTOFF(RYDBERG):                           80.00000
AI>  DENSITY CUTOFF(RYDBERG):          (DUAL= 4.00)         320.00000
AI>  NUMBER OF PLANE WAVES FOR WAVEFUNCTION CUTOFF:            266649
AI>  NUMBER OF PLANE WAVES FOR DENSITY CUTOFF:                2133275
AI>  ****************************************************************
AI> 
AI>  ***  RINFORCE|  CURRENT HEAP USED/FREE    97593/ 1813190 kBytes ***
AI>  ***    FFTPRP|  CURRENT HEAP USED/FREE   116984/ 1793799 kBytes ***
AI> 
AI>  GENERATE ATOMIC BASIS SET
AI>       O        SLATER ORBITALS
AI>         2S        ALPHA=   2.2458      OCCUPATION= 2.00
AI>         2P        ALPHA=   2.2266      OCCUPATION= 4.00
AI>       H        SLATER ORBITALS
AI>         1S        ALPHA=   1.0000      OCCUPATION= 1.00
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  INITIALIZATION TIME:                               40.33 SECONDS **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** *********!
 ****!
AI>  ***************************************************
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  PROCESSOR     3 ALLOCATION OF   144020240 WORDS OF MEMORY FAILED PROCESSOR     8 ALLOCATION OF   144020240 WORDS OF MEMORY FAILED PROCESSOR    12 ALLOCATION OF   143951120 WORDS OF MEMORY FAILED PROCESSOR     4 ALLOCATION OF   144020240 WORDS OF MEMORY FAILED PROCESSOR    11 ALLOCATION OF   143968400 WORDS OF MEMORY FAILED PROCESSOR     2 ALLOCATION OF   144020240 WORDS OF MEMORY FAILED PROCESSOR     7 ALLOCATION OF   144020240 WORDS OF MEMORY FAILED PROCESSOR     1 ALLOCATION OF   144020240 WORDS OF MEMORY FAILED PROCESSOR    13 ALLOCATION OF   143951120 WORDS OF MEMORY FAILED PROCESSOR    10 ALLOCATION OF   144020240 WORDS OF MEMORY FAILED PROCESSOR     5 ALLOCATION OF   144002960 WORDS OF MEMORY FAILED PROCESSOR     9 ALLOCATION OF   144020240 WORDS OF MEMORY FAILED PROCESSOR     6 ALLOCATION OF   143985680 WORDS OF MEMORY FAILED PROCESSOR    15 ALLOCATION OF   143907920 WORDS OF MEMORY FAILED PROCESSOR    14 ALLOCATION OF   143933840 WORDS OF MEMORY FAILED
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** **************************************************************** *********!
 ****!
AI>  ***************************************************
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  PROCESSOR     0 ALLOCATION OF   143985680 WORDS OF MEMORY FAILED
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807592/ 1103191 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807573/ 1103210 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807270/ 1103513 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807595/ 1103188 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807348/ 1103435 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807593/ 1103190 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807589/ 1103194 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807596/ 1103187 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807261/ 1103522 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807592/ 1103191 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807508/ 1103275 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807585/ 1103198 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807431/ 1103351 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807064/ 1103719 kBytes ***
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   807184/ 1103599 kBytes ***
AI>  ****************************************************************
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  ***    MEMORY|  CURRENT HEAP USED/FREE   809542/ 1101241 kBytes ***
AI>                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS                       BIG MEMORY ALLOCATIONS 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  ================================================================ XF               1412670                GK                399999 XF               1412670                GK                399732 XF               1412670                GK                400014 XF               1412670                GK                400047 XF               1412670                GK                399996 XF               1412670                GK                400011 XF               1412670                GK                400008 XF               1412670                GK                400065 XF               1412670                GK                399972 XF               1412670                GK                399996 XF               1412670                GK                400020 XF               1412670                GK                399954 XF               1412670                GK                400020 XF               1412670                GK                399966 XF       !
     !
AI>      1412670                GK                399954
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>                       BIG MEMORY ALLOCATIONS  INZHP             599999                C2              28804040 INZHP             599599                C2              28804040 INZHP             600022                C2              28790216 INZHP             600071                C2              28804040 INZHP             599995                C2              28793672 INZHP             600017                C2              28804040 INZHP             600013                C2              28804040 INZHP             600098                C2              28804040 INZHP             599959                C2              28790216 INZHP             599995                C2              28804040 INZHP             600031                C2              28800584 INZHP             599932                C2              28804040 INZHP             600031                C2              28797128 INZHP             599950                C2              28781576 INZHP             599932     !
     !
AI>         C2              28786760
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  XF               1412670                GK                400071 SCG               266666                C0              28804040 SCG               266488                C0              28804040 SCG               266676                C0              28790216 SCG               266698                C0              28804040 SCG               266664                C0              28793672 SCG               266674                C0              28804040 SCG               266672                C0              28804040 SCG               266710                C0              28804040 SCG               266648                C0              28790216 SCG               266664                C0              28804040 SCG               266680                C0              28800584 SCG               266636                C0              28804040 SCG               266680                C0              28797128 SCG               266644                C0              28781576 SCG      !
     !
AI>       266636                C0              28786760
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  C0              28797128                SCG               266714 VPS               266666                SC0             28804032 VPS               266488                SC0             28804032 VPS               266676                SC0             28790208 VPS               266698                SC0             28804032 VPS               266664                SC0             28793664 VPS               266674                SC0             28804032 VPS               266672                SC0             28804032 VPS               266710                SC0             28804032 VPS               266648                SC0             28790208 VPS               266664                SC0             28804032 VPS               266680                SC0             28800576 VPS               266636                SC0             28804032 VPS               266680                SC0             28797120 VPS               266644                SC0             28781568 VPS      !
     !
AI>       266636                SC0             28786752
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  INZHP             600107                VPS               266714 RHOPS             266666                YF               1412670 RHOPS             266488                YF               1412670 RHOPS             266676                YF               1412670 RHOPS             266698                YF               1412670 RHOPS             266664                YF               1412670 RHOPS             266674                YF               1412670 RHOPS             266672                YF               1412670 RHOPS             266710                YF               1412670 RHOPS             266648                YF               1412670 RHOPS             266664                YF               1412670 RHOPS             266680                YF               1412670 RHOPS             266636                YF               1412670 RHOPS             266680                YF               1412670 RHOPS             266644                YF               1412670 RHOPS    !
     !
AI>       266636                YF               1412670
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  YF               1412670                SC0             28797120 ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------!
 ----!
AI>  ---------------------------------------------------
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  RHOPS             266714                C2              28797128 [PEAK NUMBER   68]      PEAK MEMORY     93409911 =  747.3 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93407972 =  747.3 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93368568 =  746.9 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93410341 =  747.3 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93378775 =  747.0 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93409999 =  747.3 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93409850 =  747.3 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93410345 =  747.3 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93368090 =  746.9 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93409879 =  747.3 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93399677 =  747.2 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93409576 =  747.3 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93389298 =  747.1 MBytes [PEAK NUMBER   68]      PEAK MEMORY     93342181 =  746.7 MBytes [PEAK NUM!
 BER !
AI>    68]      PEAK MEMORY     93357703 =  746.9 MBytes
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  ---------------------------------------------------------------- ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ ================================================================ =========!
 ====!
AI>  ===================================================
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  [PEAK NUMBER   66]      PEAK MEMORY     93390512 =  747.1 MBytes
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  ================================================================ PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   3] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   8] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=  12] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   4] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=  11] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   2] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   7] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   1] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=  13] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=  10] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   5] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   9] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (!
 PME)!
AI>   [PROC=   6] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=  15] PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=  14]
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI> 
AI>  PROGRAM STOPS IN SUBROUTINE MEMORY| ALLOCATION FAILED (PME) [PROC=   0]
AI> 
AI> 
AI> 
AI> 
AI>   
AI> 
AI> -------------------------------------------------------
AI> Alexandr Isayev,
AI> Graduate Research Assistant, and System Administrator
AI> @ Computational Center for Molecular Structure
AI> and Interactions (CCMSI),
AI> Jackson State University,
AI> Jackson, MS USA
AI>    Tel:  +(601) 979-1134
AI> e-mail:  alex(at)ccmsi.us
AI>    Web:  http://www.ccmsi.us
AI> --------------------------------------------------------
AI> 
AI> _______________________________________________
AI> CPMD-list mailing list
AI> CPMD-list at cpmd.org
AI> http://cpmd.org/mailman/listinfo/cpmd-list
AI> 

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.




More information about the CPMD-list mailing list