- Definitions
- Basics
- Creating a Case
- Building the executable for a case (5.2)
- Running a Case
- Output Data(5.5)
- Adding Non-default behavior to scripts
- Other
Definitions
- Computer input is written in a fixed point font
- xml and csh environment variables are in italics
- > is a unix prompt
- $CCSMROOT the full pathname of the root directory of your ccsm source code
- $CASE both the case name and case directory name containing build and run scripts
- $CASEROOT the full pathname of the root directory for case scripts (e.g. /user/$CASE)
- $EXEROOT case executable directory root
- $MACH supported machine name
Basics
What is the ccsm directory tree?
Climate Validation
Although CCSM4.0 can be run "out of the box" for a variety or
resolutions and component sets, it must be stressed that not all
combinations of component sets, resolution and machines have been
tested or have undergone full climate validations.
Long control runs have been carried out on the IBM systems at NCAR
with the fully active CCSM components (component set B below) at three
different resolutions: T42_gx1v3, T31_gx3v5. As a
result, NCAR will only guarantee the scientific validity of runs using
the above component set and resolutions on the IBM. No other
combination of resolutions, component sets or machines are considered
scientifically validated.
Model output from these long control runs will accompany the release.
Users should be able to duplicate the climate of the released control
runs using the CCSM4.0 source code on the NCAR IBM systems.
Bit-for-bit duplication cannot be ensured due to post-release compiler
changes that may occur. Users should carry out their own validations
on any platform prior to doing scientific runs or scientific analysis
and documentation.
CCSM Model components
The CCSM consists of four dynamical geophysical models linked by a
central coupler. Each model contains ``active'', ``data'', or ``dead''
component versions allowing for a variety of ``plug and play''
combinations. The active dynamical models consume substantial amounts
of memory and CPU time and produce large volumes of output data. The
data-cycling models (data models), on the other hand, are small,
simple models which simply read existing datasets that were previously
written by the dynamical models and pass the resulting data to the
coupler. These data-cycling components are very inexpensive to run and
produce no output data. For these reasons they are used for both test
runs and certain types of model simulation runs. Currently, the data
models run only in serial mode on a single processor. The dead models
are simple codes that facilitate system testing. They generate
unrealistic forcing data internally, require no input data and can be
run on multiple processors to simulate the software behavior of the
fully active system.
The CCSM components can be summarized as follows:
| Model | Model Name | Component name | Component Version | Type |
| atmosphere | atm | cam | cam3 | active |
atmosphere | atm datm | datm6 | data |
atmosphere | atm latm | latm6 | data |
atmosphere | atm xatm | dead | dead |
land | lnd | clm | clm3 | active |
land | lnd | dlnd | dlnd6 | data |
land | lnd | xlnd | dead | dead |
ocean | ocn | pop | ccsm_pop_1_4 | active |
ocean | ocn | docn | docn6 | data |
ocean | ocn | xocn | dead | dead |
sea-ice | ice | csim | csim5 | active |
sea-ice | ice | dice | dice6 | data |
sea-ice | ice | xice | dead | dead |
coupler | cpl | cpl | cpl6 | active |
During the course of a CCSM run, the four non-coupler
components simultaneously integrate forward in time, periodically
stopping to exchange information with the coupler. The coupler
meanwhile receives fields from the component models, computes, maps
and merges this information and sends the fields back to the component
models. By brokering this sequence of communication interchanges, the
coupler manages the overall time progression of the coupled system.
Each model has a full dynamical component, a data-cycling component
(atm actually has 2 data cycling components), and a dead-version
component. A CCSM component set is comprised of five model components
- one component from each model. All model components are written
primarily in FORTRAN 90.
-
- Active atmospheric component
The dynamical atmosphere model is the Community Atmosphere Model (CAM), a
global atmospheric general circulation model developed from the NCAR
CCM3. The primary horizontal resolution is 128 longitudinal by 64
latitudinal points (T42) with 26 vertical levels. The hybrid vertical
coordinate merges a terrain-following sigma coordinate at the bottom
surface with a pressure-level coordinate at the top of the model.
-
- Active land component
The Community Land Model, CLM, is the result of a collaborative
project between scientists in the Terrestrial Sciences Section of the
Climate and Global Dynamics Division (CGD) at NCAR and the CCSM Land
Model Working Group. Other principal working groups that also
contribute to the CLM are Biogeochemistry, Paleoclimate, and Climate
Change and Assessment. The land model grid is identical to the atmosphere
model grid.
-
- Active ocean component
The ocean model is an extension of the Parallel Ocean Program (POP)
Version 1.4.3 from Los Alamos National Laboratory (LANL). POP grids
in CCSM are displaced-pole grids (centered at Greenland) at
approximately 1-degree (gx1v3) and 3.6-degree (gx3v5) horizontal
resolutions with 40 and 25 vertical levels, respectively. POP does
not support a slab ocean model (i.e. SOM) as is supported by the
stand-alone atmosphere model (CAM).
-
- Active sea-ice component
The sea-ice component of CCSM is the Community Sea-Ice Model
(CSIM). The sea-ice component includes the elastic-viscous-plastic
(EVP) dynamics scheme, an ice thickness distribution,
energy-conserving thermodynamics, a slab ocean mixed layer model, and
the ability to run using prescribed ice concentrations. It is
supported on high- and low-resolution Greenland Pole grids, identical
to those used by the POP ocean model.
Creating a Case
- How do I obtain the options for creating a case?
-
> $CCSMROOT/scripts/create_newcase -help
will provide a list of options to create_newcase
> $CCSMROOT/scripts/create_newcase -list
will list the available grids, component sets, and machines
- How do I create a case?
-
-
> cd /user
-
> $CCSMROOT/scripts/create_newcase -case Test1 -mach bluefire -res f19_g15 -compset B
will create a case root directory, /user/Test1/, for machine bluefire at a resolution of 1.9x2.5_gx1v5 for component set "B".
The contents of /user/Test1/ will be as follows:
Furthermore, $CASEROOT and $CASE will be now be set to /user/Test1 and Test1, respectively.
-
> cd /user/Test1
-
modify env_conf.xml and/or env_mach_pes.xml if appropriate
-
> configure -case
will create two new directories in /user/Test1
- Buildnml_prestage/
- Buildexe/
and will also produce the following three files
- $CASE.$MACH.build, will build the ccsm model executables, create the component namelists
- $CASE.$MACH.run, will run the CCSM system.
- $CASE.$MACH.l_archive will perform long-term archiving on output model data if appropriate for the machine
- What are the template scripts?
-
The Components/ directory contains a unique script for every CCSM4
component. The template files set up the component scripts in the
$CASEROOT/Build*/ directories. These include namelist generation and
data prestaging scripts for the CCSM4 components as well as scripts
for building internal libraries and CCSM4 component executables. The
filename convention is component.template (e.g. cam.template).
Template scripts for each component selected in the env_conf
file are executed when configure is called. There are also
templates for each CCSM4 internal library that needs to be built.
- How are the xml files translated into environment variables?
-
- What are the possible lists of xml variables?
- How can I change the env_conf.xml file or env_mach_pes file once configure has been invoked?
The script configure generates "resolved"
CCSM4 scripts in the $CASEROOT/ directory. These will
function to create component namelists, prestage the necessary
component input data and build and run the CCSM model on a target
machine. configure must be run
from the $CASE directory.
The environment variables listed in env_conf determine what
is "resolved" when configure is invoked and
consequently what may not be modified once the resolved scripts
exist. In particular, the contents of the scripts generated by
configure (in Buildnml_prestage/ and
Buildexe/) depend on the values of the environment variables
in env_conf. In general, env_conf resolves model
resolution, model component sets and model initialization type
(e.g. startup, branch or hybrid). Consequently, env_conf
must be modified before configure is run. Once
configure is invoked env_conf may be
modified only by running configure -cleannall
(see below) first.
In addition, running configure for a given machine
also generates batch queue commands for the given machine that depend
on the task/thread settings in env_mach.$MACH. Consequently,
if values other than the default values listed are desired, these should be
changed before running configure. In this case, the following lines
in env_mach.$MACH must be modified before running
configure):
setenv NTASKS_ATM $ntasks_atm
setenv NTHRDS_ATM $nthrds_atm
setenv NTASKS_LND $ntasks_lnd
setenv NTHRDS_LND $nthrds_lnd
setenv NTASKS_ICE $ntasks_ice
setenv NTHRDS_ICE $nthrds_ice
setenv NTASKS_OCN $ntasks_ocn
setenv NTHRDS_OCN $nthrds_ocn
setenv NTASKS_CPL $ntasks_cpl
setenv NTHRDS_CPL $nthrds_cpl
The usage for configure is as follows.
To obtain a concise summary of the usage type:
To obtain extensive documentation of the usage type:
To invoke the various configure options type:
> configure [-case] [cleanmach] [-cleanall] [cleannamelist]
A full summary of each option is provided below.
configure -case
Running configure -case creates the following
sub-directories and files in the
$CASEROOT/ directory:
-
.cache/
Contains files that enable the scripts to perform
validation tests when building or running the model.
-
Buidconf/
Contains "resolved" scripts to generate
model executables for each model component in the requested CCSM4
component set.
Contains scripts to generate required built-in CCSM4 libraries (e.g. ESMF, MCT, etc.).
Contains "resolved" scripts to generate component namelists and prestage
component input data.
-
$CASE.$MACH.build
Creates the executables necessary to run CCSM4 (see section \ref{subsec_ccsm_executables})
-
$CASE.$MACH.run
Runs the CCSM4 model and performs short term archiving
(see section \ref{subsec_short_term_arch}).
-
$CASE.$MACH.l_archive
Performs long-term archiving (see section \ref{subsec_long_term_arch}).
-
$CASE.$MACH.clean_build
Recreates all the executables necessary to run CCSM4 (see section \ref{subsec_ccsm_executables})
-
LockedFiles
-
env_derived
This file should NOT be modified by users
configure -cleanmach and configure -cleanall -cleannamelist
The configure options -cleanmach
and -cleanall provide the user with the ability to
make changes to env_conf or env_mach.$MACH after
configure -case has been invoked.
The configure script will stop with an error message
if a user attempts to recreate scripts that already exist. The build
and run scripts will also recognize changes in env_conf or
env_mach.$MACH that require configure
to be rerun. Note that the options -cleanall and
-cleanmach are fundamentally different.
Running configure -cleanmach $MACH
renames the build, run, and l_archive scripts for the particular
machine and allows the user to reset the machine tasks and threads and
rerun configure. It is important to note that the
Build*/ directories will NOT be updated in this process. As
a result, local changes to namelist, input data, or the environment
variable files will be preserved.
Running configure -cleanall
removes all files associated with all previous invocations of the
configure script. The $CASEROOT/ directory
will now appear as if create_newcase had just been
run with the exception that local modifications to the env_*
files are preserved. All Build*/ directories will be
removed, however. As a result, any changes to the namelist generation
scripts in Buildnml_prestage/ will NOT be preserved. Before
invoking this command, users should make backup copies of their
"resolved" component namelists in the Buildnml_Prestage/
directory if modifications to the generated scripts were made.
Running configure -cleannamelist
Building the executable for a case
- How do I create the executable for a case?
-
To build the CCSM executable, you need to issue the following commands interactively
> cd $CASEROOT
> $CASE.$MACH.build
We recommmend building the executable interactively and verifying that it is built correctly before a run is submitted to the machine batch system.
- What is the machine specific Macro file in the case directory?
The Macros.* files contain machine-specific makefile directives. In
the current release, the Macros have been divided into different
machine-dependent files each containing site/machine specific
options.
- How does gmake work?
- What if I want to use user-modified source code - where do I place it?
Each component *.buildexe.csh script has a
directory, $CASEROOT/SourceMods/src.xxx/ (where xxx is the
component name, e.g. cam) as the first Filepath directory. This
allows user modified code to be easily
introduced into the model by placing the modified code into the appropriate
$CASEROOT/SourceMods/src.xxx/ directory.
- What is the structure of the executable directory ($EXEROOT)?
Invoking $CASE.$MACH.build> results in the creation of $EXEROOT, the case executable directory.
The contents of this directory are as follows:
- atm
- ccsm
- cpl
- csm_share
- ice
- lib
- lnd
- mct
- ocn
- pio
- How do I modify the default compiler settings for a case?
Default compiler settings can be modified by editing the Macros.* file in the case directory.
- When do I need to rebuild a case settings for a case?
-
Running a case
- How do I run a case?
-
- Edit env_run.xml to set run control variables (such as run length) appropriately
- Edit env_run.xml if you want to perform short term/long term archiving (DOUT_S and DOUT_L_MS)
By default short term archiving is turned on and long term archiving is turned off.
- Perform a batch submission of the script $CASE.$MACH.run
e.g. on jaguar
> qsub $CASE.$MACH.run
- What happens when I submit the run script to batch queueing system?
Upon submission of the script, $CASE.$MACH.run, the following will occur as part of a CCSM4 run:
-
The $CASE.$MACH.run script is submitted and this
script in turn executes the $CASE.$MACH.build script.
-
Restart files and restart pointer files located in
$DOUT_S_ROOT/restart/
are copied to the appropriate locations in the executable
directory structure. If this is an initial run, these restart
files will not exist, a warning message will be printed and
$CASE.$MACH.build will continue.
-
A check is made to verify that env_conf has not been
changed and that the tasks/threads settings in
env_mach.$MACH have also not been modified since
configure was invoked.
-
Necessary model input data is prestaged to each component executable
directory.
-
The CCSM model is run.
-
Component model history, log, diagnostic, and restart files are copied
to the short-term archive directory, $DOUT_S_ROOT/ (by the
Tools/ccsm_s_archive script).
-
The $DOUT_S_ROOT/restart/ directory is cleaned and populated
with the latest restart data files and restart pointer files
(by the Tools/ccsm_s_archive script).
-
The restart files in $DOUT_S_ROOT/restart/
are tarred up and placed in $DOUT_S_ROOT/restart.tars/
with a unique filename. This is done by the Tools/ccsm_s_archive script.
-
The long-term archiver, $CASE.$MACH.l_archive,
is submitted to the batch queue if appropriate.
-
If the $RESUBMIT environment variable in env_run.xml is
greater than 0, then three things happen: (1) the $CONTINUE_RUN
environment variable is set to TRUE, (2) $CASE.$MACH.run is
resubmitted and (3) the value of $RESUBMIT is decremented by 1.
In particular, the script, $CASE.$MACH.run, does the following:
-
sources env_conf, env_run and env_mach.$MACH
-
executes $CASE.$MACH.build
-
sets up the local run environment
-
runs the CCSM model
-
copies log files back to $LOGDIR if $LOGDIR is not set " "
-
executes the short-term archive script, Tools/ccsm_s_archive.csh
(if $DOUT_S is TRUE)
-
submits the long-term archive script, $CASE.$MACH.l_archive, if:
-
$CASE.$MACH.l_archive exists for $MACH
-
$DOUT_S is TRUE
-
$DOUT_L_MS is TRUE
-
resubmits $CASE.$MACH.run if $RESUBMIT (in env_run)is greater than 0
-
decrements \$RESUBMIT by 1 if $RESUBMIT is greater than 0
- What are the different ways to initialize a run?
The environment variable, $RUN_TYPE in env_conf
determines the way in which a new CCSM run will be
initialized.
$RUN_TYPE can have values of 'startup', 'hybrid' or 'branch'.
In a startup run, each component's initialization occurs from some
arbitrary baseline state. In a branch run, each component is
initialized from restart files. In a hybrid run initialization occurs
via a combination of existing CCSM restart files for some components
(e.g. POP and CSIM) and initial files for other components (e.g. for
CAM and CLM).
The value of $START_DATE in env_conf is ignored for a branch run,
since each model component will obtain the $START_DATE from its own
restart dataset. The coupler will then validate at run time that all
the models are coordinated in time and therefore have the same
$START_DATE. This is the same mechanism that is used for performing a
restart run (where $CONTINUE_RUN set to TRUE). In a hybrid
or startup run, $START_DATE is obtained from
env_conf and not from component restart or initial
files. Therefore, inconsistent restart and/or initial files may be
used for hybrid runs, whereas they may not be used for branch runs.
All CCSM components produce "restart" files containing data necessary
to describe the exact state of the CCSM run when it was halted.
Restart files allow the CCSM to be continued or branched to produce
exactly the same answer (bit-for-bit) as if it had never stopped. A
restart run is not associated with a new $RUN_TYPE setting (as
was the case in CCSM2), but rather is determined by the setting of the
environment variable $CONTINUE_RUN in env_run.
In addition to the periodic generation of restart files, some CCSM
components (e.g. CAM and CLM) also periodically produce netCDF initial
files. These files are smaller and more flexible than the component's
binary restart files and are used in cases where it is not crucial
for the new run to be bit-for-bit the same as the run which produced the
initial files.
The following provides a summary of the different initialization
options for running CCSM.
- startup - arbitrary initialization determined by components
(default)
-
hybrid - initialization occurs from the restart/initial files of a
previous reference case,
the start date can be changed with respect to reference case
-
branch - initialization occurs from restart files of a
previous reference case,
cannot change start date with respect to reference case
Types of Files Used Under Various Runtype parameters:
| atm | lnd | ocn | ice | cpl |
| startup : | nc | internal | internal+file | binary | internal/delay |
| hybrid : | nc | nc | binary | binary | internal/delay |
| branch : | binary | binary | binary | binary | binary |
Delay mode is when the ocean model starts running on the
second day of the run, not the first. In delay mode, the
coupler also starts without a restart file and uses whatever
fields the other components give it for startup. It's generally
climate continuous but produces initial changes that are much
bigger than roundoff.
A detailed summary of each $RUN_TYPE setting is provided in the
following sections.
-
- What is a Startup run?
-
When the environment variable $RUN_TYPE is set to 'startup',
a new CCSM run will be initialized using arbitrary baseline states for
each component. These baseline states are set independently by each
component and will include the use of restart files, initial files,
external observed data files or internal initialization (i.e. a ``cold
start''). By default, the CCSM4.0 scripts will produce a startup run.
Under a startup run, the coupler will start-up using "delay"
capabilities in which the ocean model starts running on the
second day of the run, not the first. In this mode, the coupler also
starts without a restart file and uses whatever fields the other
components give it for startup.
The following environment variables in env_conf define
a startup run:
- $CASE : new case names
- $RUN_TYPE : startup
- $START_DATE: YYYYMMDD (date for starting the run)
The following holds for a startup run:
- All models startup from some arbitrary initial conditions set in an
external file or internally.
- The coupler sends the start date to the components at
initialization.
- All models set the case name internally from namelist input.
- All models set the start date through namelist input or
at initialization from the coupler.
- The coupler starts up in "delay" mode.
- What is a hybrid run?
-
A hybrid run indicates that the CCSM is to be initialized using
datasets from a previous CCSM run. A hybrid run allows the user to
bring together combinations of initial/restart files from a previous
CCSM run (specified $RUN_REFCASE) at a given model output date (specified by
$RUN_REFDATE) and change the start date ($RUN_STARTDATE) of
the hybrid run relative to that used for the reference run. In a
branch run the start date for the run cannot be changed relative to
that used for the reference case since the start date is obtained from
each component's restart file. Therefore, inconsistent restart and/or
initial files may be used for hybrid runs, whereas they may not be
used for branch runs. For a hybrid run using the fully active
component set (B) (see section \ref{subsec_compsets}), CAM and CLM
will start from the netCDF initial files of a previous CCSM run,
whereas POP and CSIM will start from binary restart files of that same
CCSM run.
The model will not continue in a bit-for-bit fashion with respect to
the reference case under this scenario. The resulting climate,
however, should be continuous as long as no namelists or model source
code are changed in the hybrid run. The variables
$RUN_REFCASE and $RUN_REFDATE in env_conf are
used to specify the previous (reference) case and starting date of the
initial/restart files to be used. In a hybrid run, the coupler will
start-up using the "delay" capabilities.
The following environment variables in env_conf define
a hybrid run:
- $CASE : new case name
- $RUN_TYPE : hybrid
- $START_DATE : YYYYMMDD (date where to start this run)
- $RUN_REFCASE: reference case name (for initial/restart data)
- $RUN_REFDATE: YYYYMMDD (date in $RUN_REFCASE for initial/restart data)
Note that the combination of $RUN_REFCASE and $RUN_REFDATE specify
the initial/restart reference case data needed to initialize the
hybrid run. The following holds for a hybrid run:
- All models must be able to read a restart file and/or initial
condition files from a different case and change both the case
name and the start date internally to start a new case.
- The coupler must send the start date to the components at initialization.
- All models must set the case name internally from namelist input.
- All models must set the base date through namelist input or
at initialization from the coupler.
- The coupler and ocean models start up in "delay" mode.
- What is a branch run?
-
A branch run is initialized using binary restart files from a previous
run for each model component. The case name is generally changed for a
branch run, although it does not have to be.
In the case of a branch run, the setting of $RUN_STARTDATE in
env_conf is ignored since each model component will obtain the start
date from its own restart dataset. At run time, the coupler validates
that all the models are coordinated in time and therefore have the
same start date. This is the same mechanism that is used for
performing a restart run (where $CONTINUE_RUN is set to TRUE).
Branch runs are typically used when sensitivity or parameter studies
are required or when settings for history file output streams need to
be modified. Under this scenario, the new case must be able to
produce bit-for-bit exact restart in the same manner as a continuation
run if no source code or namelist inputs are modified. All models
must use full bit-for-bit restart files to carry out this type of run.
Only the case name changes.
The following environment variables in env_conf define
a branch run:
- $CASE : new case name
- $RUN_TYPE : branch
- $RUN_REFCASE: reference case (for restart data)
- $RUN_REFDATE: YYYYMMDD (date in RUN\_REFCASE for restart data)
The following holds for a branch run:
- All models must be able to restart exactly from a branch run
when compared to the equivalent continuation run given the
same source code and namelist inputs.
- The base date set in the components must be read from a restart file.
- What can I change in a branch run?
- What is a restart?
- Model input data
- Overview of model input data.
-
CCSM4 input data are provided as part of the release via several input
data tar files. The tar files are typically broken down by components
and/or resolutions. These files should be downloaded and untarred into
a single input data root directory (see section \ref{subsec_inputdata_tree}).
Each tar file will place files under a
common directory named inputdata/. The inputdata/
directory contains numerous
subdirectories and the CCSM4 assume that
the directory structure and filenames will be preserved.
The officially released input data root directory name is set in the
env_mach.$MACH file via the environment variable is
\$DIN_LOC_ROOT. A default setting of
$DIN_LOC_ROOT is provided for each machine in env_mach.$MACH.
The user should edit this value if it does not correspond to their
inputdata/ root. Multiple users can share the same inputdata/
directory. The files existing in the various subdirectories of
inputdata/ should not have Unix write permission on them.
An empty input data root directory tree is also provided as a future
place holder for custom user-generated input datasets. This is set in
the env_mach.$MACH file via the environment variable
$DIN_LOC_ROOT_USER. If the user wishes to use any user-modified
input datasets in place of the officially released version, these
should be placed in the appropriate subdirectory of
$DIN_LOC_ROOT_USER/.
The appropriate CCSM resolved component scripts (in
$CASEROOT/Buildnml_Prestage/) must then also be modified to use the new
filenames. Any datasets placed in $DIN_LOC_ROOT_USER/
should have unique names that do not correspond to any datasets in
$DIN_LOC_ROOT/. The contents
of $DIN_LOC_ROOT/ should not be modified. The user should
be careful to preserve these changes, since invoking
configure -cleanall will remove all user made
changes.
- What does it mean to prestage input data?
-
Prestaging input data is coping the needed input data into $DIN_LOC_ROOT_CSMDATA
- How do I interact with the input database?
-
check_input_data
SYNOPSIS
check_input_data [options]
OPTIONS
-inputdata inputdata Set the inputdata directory. Optional
inputdata defaults to the CSMDATA environment variable.
-check Check whether data is available in "inputdata",
-export Export missing data into "inputdata",
-prestage prestage Prestage data from inputdata to prestage
-datalistdir dir Directory which will be searched for .input_data_list files
Default is . Optional.
-help [or -h] Print usage to STDOUT. Optional.
SUMMARY
This utility checks, exports and prestages necessary input data for CCSM.
The utility first searches for .input_data_list files in the then
prints their locations and passes them to the caseroot
Tools/get-input-data utility via the -flist option.
Default directory to search for .input_data_list files is
- How do I perform a hybrid/branch runs and prestaging restart data?
-
To start up a branch or hybrid run, restart and/or initial data from a
previous run must be made available to each model component. As is
discussed below, restart tar files of the form
$CASE.ccsm.r.yyyy-mm-dd-sssss.id.tar
where id corresponds to a unique creation time stamp, are
periodically generated. The restart tar files contain data that is
required to start up either a hybrid or branch run.
The simplest way to make this data available to the hybrid or branch
run at initialization is to untar appropriate reference case
restart.tar file, or copy all files in the $DOUT_S_ROOT/restart/ short-term
archiving directory of the branch or hybrid run case.
For example, assume that a new hybrid case, Test2, is to be run on machine
bluefire, using restart and initial data from case Test1,
at date yyyy-mm-dd-sssss. Also assume that the short-term archiving directory
($DOUT_S_ROOT in env_mach.bluefire) is set
to /ptmp/$LOGNAME/archive/Test2. Then the restart tar file
Test1.ccsm.r.yyyy-mm-dd-sssss.id.tar
should be untarred in
/ptmp/$LOGNAME/archive/Test2/restart/
- What does it mean to load balance a run?
-
CCSM load balance refers to the allocation of processors to different
components such that efficient resource utilization occurs for a given
model case a and the resulting throughput is in some sense optimized.
Because of the constraints in how processors can be allocated
efficiently to different components, this usually results in a handful
of ``sweet spots'' for processor usage for any given component set,
resolution, and machine.
- How do I determine the optimal load balance for a run?
-
CCSM components run independently and are tied together only through
MPI communication with the coupler. For example, data sent by the
atm component to the land component is sent first to the coupler
component which then sends the appropriate data to each land component
process. The coupler component communicates with the atm, land, and ice
components once per hour and with the ocean only once a day. The overall
coupler calling sequence currently looks like
Coupler
-------
do i=1,ndays ! days to run
do j=1,24 ! hours
if (j.eq.1) call ocn_send()
call lnd_send()
call ice_send()
call ice_recv()
call lnd_recv()
call atm_send()
if (j.eq.24) call ocn_recv()
call atm_recv()
enddo
enddo
For scientific reasons, the coupler receives hourly data from
the land and ice models before receiving hourly data from the
atmosphere. Because of this execution sequence, it is important to
allocate processors in a way that assures that atm processing is not
held up waiting for land or ice data. It is easy to naively allocate
processors to components in such a way that unnecessary time is spent
blocking on communication and idle processors result.
While the coupler is largely responsible for inter-component
communication, it also carries out some computations such as flux
calculations and grid interpolations. These are not indicated in the
above pseudo-code.
Since all MPI ``sends'' and ``receives'' are blocked, the components might wait
during the send and/or receive communication phase. Between the
communication phases, each component carries out its internal computations.
In general, a components time loop looks like:
General Physical Component
--------------------------
do i=1,ndays
do j=1,24
call compute_stuff_1()
call cpl_recv()
call compute_stuff_2()
call cpl_send()
call compute_stuff_3()
enddo
enddo
So, compute_stuff_1 and compute_stuff_3 are carried out between
the send and the receive, and compute_stuff_2 is carried out between
the receive and send. This results in a communication pattern that is
represented below. We note that for each ocean communication, there
are 24 ice, land, and atm communications. However, aggregated over a
day, the communication pattern can be represented schematically below
and serves as a template for load balancing CCSM4.
ocn r---------------------------s
^ |
ice ^ r s |
^ ^ | |
lnd ^ r ^ | s |
^ ^ ^ | | |
atm ^ ^ ^ | | r s |
^ ^ ^ v v ^ v v
cpl s-s--s---r---r---s--------r-r
time->
s = send
r = recv
CCSM4 runtime statistics can be found in coupler, csim, and pop
log files whereas cam and clm create files of the form
timing. containing timing statistics. As an
example, near the end of the coupler log file, the line
(shr_timer_print) timer 2: 1 calls, 355.220s, id: t00 - main integration
indicates that 355 seconds were spent in the main time integration
integration loop. This time is also referred to as the ``stepon''
time. Simply put, load balancing involves reassigning processors to
components so as to minimize this statistic for a given number of
processors. Due to the CCSM processing sequences, it is impossible to
keep all processors 100% busy. Generally, a well balanced
configuration will show that the atm and ocean processors are well
utilized whereas the ice and land processors may indicate considerable
idle time. It is more important to keep the atm and ocean processors
busy as the number of processors assigned to atm and ocean is much
larger than the number assigned to ice and land.
The script getTiming.csh, in the directory
$CCSMROOT/Tools, can be used
to aid in the collection of run time statistics needed to examine the
load balance efficiency.
The following examples illustrate
some issues involved in load balancing a CCSM4 run
for a T42_gx1v3 run on bluesky.
Case LB1 LB2
====================================
OCN cpus 40 48
ATM cpus 32 40
ICE cpus 16 20
LND cpus 8 12
CPL cpus 8 8
total CPUs 104 128
stepon 336 280
node seconds 34944 35840
simulated yrs/day 7.05 8.45
simulated yrs/day/cpu .067 .066
In the above example, adding more processors in the correct balance
resulted in an ensemble that was "faster" (computed more years per
wall clock day) and statistically just as efficient (years per day per
cpu). The example below shows that assigning more processors to a
given run may speed up that run (generates more simulated years per
day) but may be less processor efficient.
Case LB3 LB4
====================================
OCN cpus 32 48
ATM cpus 16 40
ICE cpus 8 20
LND cpus 4 12
CPL cpus 4 8
total CPUs 64 128
stepon 471 280
node seconds 30144 35840
simulated yrs/day 5.03 8.45
simulated yrs/day/cpu .078 .066
Learning how to analyze run time statistics and properly assign
processors to components takes considerable time and is beyond the
scope of this document.
- How do I switch machines during a run?
-
- If you want to switch to another machine during a model run, you MUST
first be on the machine that is being switched to and copy the $CASE directory
over to the new machine.
- While in the $CASE directory on the new machine, you should issue the command
> $CCSMROOT/scripts/switch_machine -newmach name-of-new-machine
- What is the RESUBMIT flag and how is it used?
-
Variable set in env_run.xml to determine if the model should resubmit
itself at the end of a run. If $RESUBMIT is 0, then the run
script will not resubmit itself. If $RESUBMIT is greater
than 0, then the case run script will resubmit itself, decrement
$RESUBMIT by 1 and set the value of $CONTINUE_RUN to TRUE.
Output Data
-
CCSM4 is comprised of a collection of distinct models optimized for a
very high-speed, parallel multi-processor computing environment. Each
component produces its own output stream consisting of history,
restart and output log files. Component history files are in netCDF
format whereas component restart files are in binary format and are
used to either exactly restart the model or to serve as initial
conditions for other model cases.
Standard output generated from each CCSM component is saved in a "log
file" located in each component's subdirectory under $EXEROOOT.
Each time the CCSM is run, a single coordinated timestamp
is incorporated in the filenames of all output log files associated
with that run. This common timestamp is generated by the run script
and is of the form YYMMDD-hhmmss, where YYMMDD are the Year, Month,
Day and hhmmss are the hour, minute and second that the run began
(e.g. ocn.log.040526-082714). Log files can also be copied to a user
specified directory using the variable $LOGDIR in
env_run. The default is ``'', so no extra copy of the log
file occurs.
By default, each component writes monthly averaged history files in
netCDF format and also writes binary restart files. The history and
log files are controlled independently by each component. Restart
files, on the other hand, are written by each component at regular
intervals dictated by the flux coupler via the setting of
$REST_OPTION and $REST_N in env_run.
Restart files are also known as "checkpoint" files. They allow the
model to stop and then start again with bit-for-bit exact capability
(i.e. the model output is exactly the same as if it had never been
stopped). The coupler coordinates the writing of restart files as
well as the model execution time. All components receive information
from the coupler and write restarts or stop as specified by the
coupler. Coupler namelist input in env_run sets the run
length and restart frequency via the settings of the environment
variables $STOP_OPTION, $STOP_N,
$RESTART_OPTION and $RESTART_N. Each component's
log, diagnostic, history, and restart files can be saved to the local
mass store system using the CCSM4 long-term archiver.
The raw history data does not lend itself well to easy time-series
analysis. For example, CAM writes one large netCDF history file (with
all the requested variables) at each requested output period. While
this allows for very fast model execution, it makes it difficult to
analyze time series of individual variables without having to access
the entire data volume. Thus, the raw data from major CCSM
integrations is usually postprocessed into more user-friendly
configurations, such as single files containing long time-series of
each output fields, and made available to the community (see section
\ref{sec_postprocess}.
Archiving is a phase of the CCSM production process where model output
is moved from each component's executable directory to a local disk
area (short-term archiving) and subsequently to a long-term storage
system (long-term archiving). It has no impact on the production run
except to clean up disk space and help manage user quotas.
Short and long-term archiving environment variables are set in the
env_mach.$MACH file. Although short-term and long-term
archiving are implemented independently in the scripts, there is a
dependence between the two since the short-term archiver must be be
turned on in order for the long-term archiver to be activated.
By default, short-term archiving is enabled and long-term archiving is
disabled. Several important points need to be made about archiving:
-
All output data is initially written to $EXEROOT/run/.
-
Unless a user explicitly turns off short-term archiving, files
will be moved to the short-term archive area by default at the end of
a model run.
-
Users should generally turn off short term-archiving when developing
new CCSM code.
-
If long-term archiving is not enabled, users should monitor quotas and
usage in the $DOUT_S_ROOT/ directory and should manually clean up
these areas on a regular basis.
- What is short-term archiving?
-
Short-term archiving is executed as part of running the
$CASE.$MACH.run script. The short-term archiving script,
ccsm_s_archive, resides in the ccsm_utils/Tools ($UTILROOT/Tools)
directory. Short-term archiving is executed after the CCSM run is
completed if $DOUT_S is set to TRUE in env_mach.$MACH.
The short-term archiving area is determined by the setting of
$DOUT_S_ROOT in env_mach.$MACH.
The short-term archiver does the following:
-
copies complete sets of generated restart/initial files and restart
pointer files from each component's executable directory to
$DOUT_S_ROOT/restart/}.
-
moves all history, log, diagnostic, restart and initial files from
each component's executable directory to that component's specific
directory under $DOUT_S_ROOT/
-
tars up the contents of the directory
$DOUT_S_ROOT/restart/
and places the tarred set in the directory
$DOUT_S_ROOT/restart.tar/
with an appended unique date string
The ccsm_s_archive script is written quite generally.
However, there may be certain user cases where it needs to be modified
for a production run because different sets of files need to be
stored. If this is the case, ccsm_s_archive should be copied
to the user's $CASEROOT/ directory and modified there since in general
this file is shared among different production runs. In addition, the
path to ccsm_s_archive in the $CASE.$MACH.run file also must
be modified.
- What is long-term archiving?
-
Long-term archiving is done via a separate CCSM script that can be run
interactively or submitted in batch mode. Long-term archiving saves
files onto the local mass store system. It also can copy data files
to another machine via scp. Normally, the long-term
archiver is submitted via batch automatically at the end of every CCSM
production run. The long-term archive script is generated by
configure and since is a machine-dependent batch
script called $CASE.$MACH.l_archive.
The environment variables which control the behavior of long-term
archiving are set in the file, env_mach.$MACH (see section
\ref{subsec_msout}) and correspond to:
-
$DOUT_L_MS
-
$DOUT_L_MSROOT
-
$DOUT_L_MSNAME (optional, generally used at NCAR only)
-
$DOUT_L_MSPWD (optional, generally used at NCAR only)
-
$DOUT_L_MSRPD (optional, generally used at NCAR only)
-
$DOUT_L_MSPRJ (optional, generally used at NCAR only)
Not all of these parameters are used for all mass store systems. The
long-term archiver calls ccsm_l_archive which in turns calls
ccsm_mswrite to actually execute the mass store writes. The
script ccsm_mswrite is configured to test the local mass store
and execute the appropriate command to move data onto the local mass
store. Both ccsm_l_archive and ccsm_mswrite script
reside in the ccsm_utils/Tools ($UTILROOT/Tools/) directory.
The long-term archiver is also capable of copying files to another
machine or site via scp. This requires that
scp passwords be set up transparently between the
two machines and will also likely require modification to the
ccsm_l_archive script to specify which files
should be moved. The parameters in env_mach.$MACH that
turn this on are:
-
$DOUT_L_RCP
-
$DOUT_L_RCP_ROOT
The above feature is not currently supported.
Although the ccsm_l_archive script is written
quite generally, there may be cases where it needs to be modified for
a given production run because different sets of files need to be
stored. If this is the case, ccsm_l_archive
should be copied to the user's $CASEROOT/ directory,
modified and the path to ccsm_l_archive in
$CASE.$MACH.run also must be changed accordingly.
Adding Non-default behavior to scripts
- Script basics
-
This section provides a brief overview of the design of the scripts
and can be used by CCSM developers to help understand how the
scripts are implemented and how they operate.
The highest level scripts, create_newcase and
create_test, are in the $CCSMROOT/ccsm4/scripts directory.
All supporting utilities are contained in the
$CCSMROOT/ccsm4/scripts/ccsm_utils/ directory.
create_newcase copies the Case.template/ directory recursively
to any a new $CASEROOT/ directory and subsequently modifies these files with
the appropriate sed commands. The files and directories in Case.template
form the baseline set of files for every new case.
The Components/ directory contains a unique script for every CCSM4
component. The template files set up the component scripts in the
$CASEROOT/Build*/ directories. These include namelist generation and
data prestaging scripts for the CCSM4 components as well as scripts
for building internal libraries and CCSM4 component executables. The
filename convention is component.template (e.g. cam.template).
Template scripts for each component selected in the env_conf
file are executed when configure is called. There are also
templates for each CCSM4 internal library that needs to be built.
The ccsm_utiles/Machines/ directory contains all machine specific information. The
only other directory where machine-specific information exists is
$CCSMROOT/ccsm4/models/bld/, where machine specific compiler flags are
placed. In the Machines/ directory, each machine may contain
up to five files. For each supported machine there are always three
files: env*.$MACH, batch*.$MACH, and run*.$MACH.
If long-term archiving
exists for that machine, l_archive*.$MACH will also exist. In
addition, some machines may also have modules.*.$MACH, if modules are
used on that machine. The env*.$MACH file is copied directly to the
$CASEROOT/ directory and renamed env_mach.$MACH by the script
create_newcase. The batch*.$MACH, run*.$MACH and
l_archive*.$MACH scripts are used by configure (in conjunction
with other tools) to generate the machine-dependent build, run, and
long-term archiving scripts in the $CASEROOT/ directory.
The ccsm_utils/Testcases/ directory contains scripts which automatically generate
CCSM test cases. For each test case, there are two scripts. Each
test case is designated by a unique six character name, e.g. ER.01a, and has
its own setup script. These scripts are used by create_test to
generate a wide variety of CCSM tests (see section \ref{sec_testing}).
Finally, the Tools/ directory contains a large suite of scripts. Some
of these scripts are used by create_newcase and
configure. Others are simply used as utilities by the ``{resolved''
scripts in the Buildnml_Prestage/ directory and the ``resolved'' run and
build scripts in the $CASEROOT/ directory.
Users will generally not need to modify any of the scripts under
ccsm_utils/. Exceptions where such modifications may be needed would
be porting to a new machine, modifying the default archiving behavior,
adding new resolutions, or adding new test cases. Such issues are
described in more detail later (see section \ref{sec_usecases}). Users should
feel free to modify files in ccsm_utils/ as needed. If modifications
to ccsm_utils/ files are needed, they must be done by
users in their personal workarea and not in code directories that
are shared among more than one user.
- Adding a New Component Set
A new component set can be created by adding a new compset in file
$CCSMROOT/scripts/ccsm_utils/Case.template/config_compsets.xml
- Adding a New Machine
$CCSMROOT/scripts/ccsm_utils/Machines/
- Adding a New Grid
A new grid can be created by adding a new horiz_grid in file
$CCSMROOT/scripts/ccsm_utils/Case.template/config_grid.xml
Description of env_xxx files in $CASEROOT
env_case.xml
- Non user modifialbe xml file that determines grid and model components
- Is cached in LockedFiles/ at create_newcase time and cannot be edited after the create_newcase
command is invoked
env_conf.xml
- User modifiable non-machine specific xml file that sets environment variables that are used by template scripts in
$CASEROOT/Tools/Templates
to generate resolved namelists in the directories
$CASEROOT/Buildnml_prestage/ and $CASEROOT/Buildexe/
- Once configure -case is invoked, this file is cached in LockedFiles/ and cannot be modified
unless you invoke
> configure -cleannamelist
> configure -case
env_build.xml
- User modifialbe non-machine specific xml file that sets build options
- IMPORTANT Although this file is not cached in LockedFiles, it should not be modified after the $CASE.$MACH.build script is run
env_mach_pes.xml
- User modifiable machine specific xml file that sets task and thread counts for model components.
- This file is cached in LockedFiles and cannot be modified once configure -case is invoked unless you first invoke
> ./configure -cleanmach
> ./configure -case
IMPORTANT If this file is changed, then the file, env_decomp, is also modified and the model needs to be rebuilt, if it already has been built.
env_run.xml
- User modifiable machine specific xml file that sets run-control variables
which may be modified during the course of a model run.
- These variables comprise, among others, driver namelist
settings for the stop time, restart frequency, energy and
water budget frequencies, history frequency and a flag to
determine if the run should be flagged as a continuation
run.
- In general, the user needs to only set the
variables $STOP_OPTION and $STOP_N in
env_run.xml. The other drivr namelist settings will then be
given consistent and reasonable default values. These
default settings will always guarantee that restart files
are produced at the end of the model run.
Other
- How do I create a new ccsm startup case which initializes the ocean model with a spun-up initial condition?
Steps to create a new ccsm startup case which initializes
the ocean model with a spun-up initial condition (aka a "startup/spunup" case).
-
Starting with a ccsm4 tag, use create_newcase to create a new case, $case
By default, this is a "startup" run (env_conf:setenv RUN_TYPE startup)
-
Configure the new case, using "configure -case"
-
Customize the pop2 namelist script. This customization will allow
the ocean model to start from an existing pop2 restart file created from a
"spun up" case. To do this, you will need to do the following:
-
cp $CCSMROOT/models/ocn/pop2/input_templates/pop2_in_build.csh $case/SourceMods/src.pop2
-
edit $case/SourceMods/src.pop2, setting init_ts_suboption = 'spunup' in the
namelist init_ts_nml. (Presently, the default value is 'null'). That is,
"set init_ts_suboption = null" is changed to "set init_ts_suboption = spunup"
-
Get a copy of all of the spun-up ocean restart files (all of the pop.r*. files,
including the ascii pop.r.*.hdr file, if it exists) and put all of these files
into your execution directory. In this document, "the ocean restart file"
refers to the *.pop.r.* file, which contains the temperature, salinity, and
ocean tracer fields. There are potentially other ocean restart files, all of which
contain the same filename pattern (*.pop.r*.*). You must copy all of the ocean
restart files to your case directory.
-
Redefine the input ocean-model dataset, replacing the standard filename with
the the ocean restart filename (*.pop.r.*). Do this by editing
$case/Buildnml_Prestage/pop2.buildnml_prestage.csh (older ccsm4 tags) or
$case/Buildconf/pop2.buildnml_prestage.csh (newer ccsm4 tags)
-
setenv OCN_PRESTAGE FALSE # (only in tags prior to ccsm4_0_beta08)
-
go to the section marked
"1.2.3 put any nonstandard user-modifications to inputdata filenames here."
and add the following line, substituting the full path and filename of your
spunup ocean dataset:
set init_ts_data = <put the full name of your ocean file here>
(use the *.pop.r.* filename)
Note: if the ocean restart file is a binary file, the ocean code will
internally look for the companion "*.hdr" file in the same location as
the ocean restart file.
-
build and run as usual