During the fall of 1994 the Climate System Model (CSM) principal investigators recognized that future users of model generated data would be diverse in both discipline and location. Therefore, a CSM Data Management Working Group was assigned to evaluate and select a standard data format for use within CSM. The working group consisted of model developers, data users, and software engineers from within NCAR ( CGD and SCD ). The objective of the group was to form a consensus as to the optimal format relative to CSM data management issues. By July of 1995 it was determined that netCDF would meet the criteria established by the working group, and would optimally meet the needs of potential users in the expected CSM community.
During the winter of 1996-1997 CSM selected the NCARgraphics command language interface ( NCAR Command Language or NCL) as it's primary tool to process CSM datasets. NCL can access netCDF files, which is only one of many reasons it was chosen as the CSM processor (see Rationale for a Climate System Model Processor and Climate System Model (CSM) Output Processor ). While NCL was selected by CSM for data processing within the project, the CSM netCDF datasets can just as easily be accessed, processed, and examined by a number of readily available tools .
CSM Data Management Issues
It was obvious that the data format selected by CSM could not be restricted to the machine native binary formats which all model components had been archiving. A format was needed which would be accessible by a wide audience of researchers using a variety of computational platforms.
Furthermore, CSM simply does not have the resources to build and maintain a data format, then support a user community. The adoption of a standard format would allow investigators to readily access, manipulate, and display the data, using readily available tools, without being concerned about the internal storage details of the datasets themselves.
An ideal data format would be one which addressed the following criteria:
|Compactness||Minimize Storage Space|
|Functionality||Allow User Easy/Quick Access|
|Simplicity||Easy to Learn and Use|
|Flexibility||Capable of Handling Any Hardware Representation|
|Self-Describing||Internally Contains All Information|
|Support||Full Third Party Support|
|Popularity||Recognized and Used by Large Community|
|Usability||Imported by Available Analysis Tools|
|Availability||Easily Obtained and Free|
Clearly, no one format will be the optimal in all the categories listed above. The CSM working group focused on the following four categories: self-description, portability, performance, and functionality. Compactness was also an issue from the perspective that the data should be manageable relative to our native binary datasets, and that the primary archive should be should be easy to access, extract and condense.
The CSM data management working group focused on netCDF, but did discuss the advantages and disadvantages of the other formats. It was decided that unless netCDF failed miserably in one of the evaluation criteria, it should be the accepted CSM standard data format. Frankly, the major concern regarding netCDF was how it would impact model performance. This was addressed in a cooperative manner between CSM and SCD programmers and netCDF developers. Modifications to enhance performance within the CRAY supercomputing environment were introduced into the netCDF library, and were subsequently accepted and adopted into the most recent version of netCDF.
netCDF is self-describing, portable, flexible, and is considered a standard(see netCDF Factsheet ).
netCDF is used by a large, diverse, community engaged in a variety of scientific research projects (see netCDF Users ).
netCDF is in the public domain, well documented, and supported by a third party(see netCDF Documentation )..
netCDF is used by a number of organizations, universities, and research institutions (see Organizations Using netCDF ).
netCDF is used by an ever-growing number of data analysis, processing, and visualization tools (see Software for Manipulating or Displaying netCDF Data ).
netCDF is a UCAR Unidata product, which gives CSM ready access to it's developers.
Furthermore, Unidata has responded to the requirements of CSM in terms of performance and data compression, and the resulting modifications appear in the netCDF library.
The National Science Foundation (NSF) supports the Information Infrastructure Technology and Applications (IITA) . One of IITA's primary functions is to provide funding to enhance the netCDF libraries.
Finally, one issue which is often overlooked but is a major concern within CSM is data management. Many different experiments will be run. The fact that netCDF is self describing means that each experiment can be documented within the experiment datasets. This means that the CSM does not have to use resources (i.e., people) to maintain experiment documentation. We will be investigating the use of data management software which supports netCDF.
CSM netCDF Convention
A CSM netCDF Convention has been developed for use within the CSM effort. It is based upon the COARDS netCDF Convention, but has been broadened to meet the needs of CSM.