b40.rcp6_0.1deg.002


Contents:


Run Specifications


===================
General Information
===================

   Purpose of Run: ensemble member 1 of rcp6.0 runs

   Scientific Lead: Jim Hurrell

   Software Engineering Lead: Mariana Vertenstein

   Assigned to: mai
 
   Date: 2010-11-09

   Run Length:  96 years
 
=========================
Case Creation Information (all fields are required)
=========================

   CCSM tag:   cesm1_0_beta10

   Case Name:  b40.rcp6_0.1deg.002

   Machine:    bluefire

   Compset:    B_RCP6.0_CN
 
   Resolution: f09_g16


=============================
Pre-Configuration Information
=============================

   Runtype: hybrid
   RUN_STARTDATE = 2005-01-01
   RUN_REFCASE   = b40.20th.track1.1deg.009 
   RUN_REFDATE   = 2005-01-01
           
   env_conf.xml mods 
   -----------------
   xmlchange -file env_conf.xml -id RUN_TYPE -val 'hybrid'
   xmlchange -file env_conf.xml -id RUN_STARTDATE -val '2005-01-01'
   xmlchange -file env_conf.xml -id RUN_REFCASE -val 'b40.20th.track1.1deg.009'

   env_conf.xml mods 
   _________________

   * none


   env_mach_pes.xml mods
   _____________________
      
   component       comp_pes    root_pe   tasks  x threads (stride)
   ---------        ------     -------   ------   ------   ------
   cpl = cpl        320         0        320    x 1       (1     )
   glc = sglc       1           0        1      x 1       (1     )
   lnd = clm        128         320      128    x 1       (1     )
   ice = cice       320         0        320    x 1       (1     )
   atm = cam        448         0        448    x 1       (1     )
   ocn = pop2       64          448      64     x 1       (1     )


==============================
Post-Configuration Information
==============================

   Buildconf
   _________

  * none


======================
SourceMods Information
======================

  * none


==========================
Performance/Cost Estimates
==========================

  * 13.1 years/day across 8 nodes
 
====================
Special Instructions
====================

  * none
 

====================
Pre-Run Instructions
====================

  * Run create_production_test
 
  * Run debug smoke test

  * Add NCAR Software Levels info to checklist 

================
Run Instructions
================

  Run Length: 96 years

  Account key:  93300473

  Priority/Targeted queue:  economy

  Other:

================
Diagnostics Plan
================

  * vs b40.20th.track1.1deg.009 (1986-2005) at 2041-2060 and 2081-2100


======================
Additional Information
======================

  * none


Return to Top


Run Checklist


Complete the following checklist prior to beginning the production run:



1.  Update status file: /web/web-data/cseg/ccsm4_0_runs/b40.rcp6_0.1deg.002/status.html:
    assigned
    pending
    running
    completed
    stopped


2.  Document NCAR software levels at beginning of run (use the spinfo command on bluefire)
***************************************************
NCAR SOFTWARE LEVELS: Wed Nov 24 12:43:56 MST 2010.
***************************************************
AIX:                  bos.mp              5.3.10.1
CSM:                  csm.core            1.7.1.4
LoadLeveler:          LoadL.full          3.5.1.3
GPFS:                 gpfs.base           3.2.1.14
VSD:                  rsct.vsd.vsdd       4.1.0.23
POE:                  ppe.poe             5.1.1.3
PESSL:                pessl.rte.smp       3.3.0.2
ESSL:                 essl.rte.smp        4.4.0.1
FORTRAN:              xlfrte              12.1.0.8
PERL:                 perl.rte            5.8.2.100
C:                    xlC.rte             10.1.0.3


3.  Complete the following table, as necessary, showing
    the component liaison's name and the date the setup
    was approved.
 

   Component         Liaison/                     Date Approved
                     Reviewer
   ================+==========================+==================

      atm            hannay                     2010-11-09

      cpl            [kauff,mvertens,tcraig,other]    ----

      ice                 dbailey                     ----

      lnd            [erik,slevis]                    ----

      ocn            [njn01,bates,gokhan]             ----

      env_ file      [mvertens,other]                 ----
      settings

      data           [strand,other]                   ----
 

4.  Create_production_test completed   [who,when]
 

5.  Debug smoke test completed         [who,when]


6.  Performance review completed [who,when]
 


Return to Top


Comments

On 12 Dec 2010 the run died during model date Oct 2100 (less than three
months from the finish of the run) with the following error from the
lnd.log.101211-204945 file:

 clm2: completed timestep  1678961
(shr_tInterp_getFactors)  ERROR illegal linear times:             -NaNQ       0.00000000            -NaNQ
(shr_sys_abort) ERROR: (shr_tInterp_getFactors)  illegal itimes
(shr_sys_abort) WARNING: calling shr_mpi_abort() and stopping


This is the exact same problem at the exact same time step as occurred in
the b40.rcp6_0.1deg.001 case run. The issue is bad source code in the
routine shr_tInterp_mod.F90. From an email sent by Mariana to Andy Mai:

   I have looked at the csm_share ChangeLog - and I think the only thing
   that would be worth trying is to ONLY change shr_tInterp_mod.F90 with the
   following diffs (< are the changes you want to put in)

   115c115
   <    integer(SHR_KIND_I8)   :: spd         ! seconds per day
   ---
   >    integer(SHR_KIND_IN)   :: spd         ! seconds per day
   143c143
   <    itimein = int(edayin-edayin,SHR_KIND_I8)*spd + int(sin-sin,SHR_KIND_I8)
   ---
   >    itimein = (edayin-edayin)*spd + sin-sin
   147c147
   <    itime1 = int(eday-edayin,SHR_KIND_I8)*spd + int(s1-sin,SHR_KIND_I8)
   ---
   >    itime1 = (eday-edayin)*spd + s1-sin
   151c151
   <    itime2 = int(eday-edayin,SHR_KIND_I8)*spd + int(s2-sin,SHR_KIND_I8)
   ---
   >    itime2 = (eday-edayin)*spd + s2-sin


This change allowed the run to finish, re-starting from 2099-01-01-00000. Since
this is all integer arithmetic, the change is bfb except in cases where the
intermediate results on the RHS overflowed in the original code. The NaNQs in
the printed error message are another confusing problem. The print statement
used an "F" specification to print three integers. There are apparently no plans
to correct this problem.


Return to Top