An Informed  Guide to Climate Data Sets

I-COADS Data Sets
Variable(s) Air Temperature, Cloudiness, Sea Level Pressure, Sea Surface Temperature, Specific Humidity, Surface Winds, Other Derived Variables
Land or Ocean Ocean
Current Period of Record 1784-1997
Resolution Monthly, Global, 2ox2o(1784-1997), 1ox1o(1960-1997)
Description: Gridded archive of surface marine weather observations. No processing beyond quality control.
Reference: Woodruff et al. (1987)
Data Set Location: NOAA-CIRES Climate Diagnostics Center (netcdf/ascii format)

Technical Overview Expert User Guidance Relevant Articles Coverage Maps


Technical Overview

The current version of data is known as Release 2.0. This release encompasses the data and monthly summary products from three separate COADS updates: Releases 1a (1980-97), 1b (1950-79), and 1c (1784-1949).

The International Comprehensive Ocean-Atmosphere Data Set (I-COADS) is the most extensive collection of surface marine data available for the world ocean over the past century and a half. See the I-COADS Website for additional information and electronic documentation about the project, the different releases, and the full suite of observational products and monthly summary statistics. Selected I-COADS statistics are available in netCDF format at CDC.

In I-COADS Release 1 (1854-1949) processing, the basic observational data were edited, using a "trimming" procedure to identify outliers with respect to climatological 3.5 standard deviation (sigma) limits derived from data for two periods (1854-1909 and 1910-49). Fourteen summary statistics were then calculated for each of 19 observed and derived variables for each month of each year and decade of the period of record, January 1854 through December 1949, using 2-degree latitude x 2-degree longitude boxes. Two statistics were extracted into netCDF from Release 1: the mean and number of observations. In addition, long-term mean files were constructed at CDC from the basic monthly means for 1950-79 for thirteen variables.

The netCDF files are grouped according to the time period and spatial domain covered, and the gridbox size (2-degree or 1-degree latitude x longitude). For data starting in 1950, the files also are subdivided into "standard" and "enhanced" products, which reflect the "trimming" (quality control) procedure used and the data mixture. In the standard products, the data have been edited using climatological 3.5 standard-deviation limits, with the observations limited (as nearly as practical) to those from ships. In the "enhanced" products, in contrast, the data have been edited using broader 4.5 sigma limits to better represent extreme climate events, with observations from ships plus from other in situ marine platform types (e.g., drifting and moored buoys).

The 1-degree data are available starting in 1960, and in global and equatorial formats. The equatorial products cover the latitude band 10.5N to 10.5S, and are global with respect to longitude. As opposed to the global 1-degree boxes, the equatorial 1-degree boxes are shifted half a degree in latitude (only) in comparison to the global domain, such that the center-latitude of the central row of boxes is the equator (e.g., 0-1E, 0.5S-0.5N). Both global and equatorial formats have a standard and enhanced product.

Current update plans can be found on the official I-COADS web site.


Expert User Guidance
General Description

The Comprehensive Atmosphere-Ocean Data Set (COADS) is the most extensive and widely used digital collection of quality-controlled surface weather observations available for the world oceans for studies of marine climate and its variability. The data set begins in 1854, the year marking the beginning of an internationally organized system for recording shipboard meteorological observations, and currently extends through 1997 with updates every few years (work is ongoing to extend the archive back to 1800). The COADS includes monthly values each year of sea surface temperature (SST), air temperature, wind, cloudiness, barometric pressure, and humidity, as well as derived variables such as turbulent heat and momentum pseudo-fluxes ("pseudo" because they neglect transfer coefficients). The number of observations and the standard deviation of the individual observations that make up each monthly value are also archived for each variable. This important data set forms the basis for our empirical knowledge of surface marine climate and its variability during the past century and a half.

The majority of the observations come from ships-of-opportunity, supplemented in recent years by research vessels, moored environmental buoys, drifting buoys, and near-surface measurements from hydrographic profiles. The data are binned into 2 x 2 latitude-longitude boxes (1x1 summaries are also available since 1960). Each variable is subjected to quality-control procedures to remove outliers and duplicates. The screened values are not corrected for changes in instrumentation, observing practice, ship type, etc; missing grid boxes are not filled in; and no "analysis" of the data is performed (e.g., no spatial or temporal smoothing or interpolation).

Due to the uneven distribution of commercial shipping routes and changes in those routes over time, data coverage is poor in certain regions and periods (see "Coverage Maps" for data distribution by variable by decade). Broadly speaking, the North Atlantic, western South Atlantic and northern Indian Oceans contain the highest density of observations, with reasonable coverage back to about 1870. Data coverage is limited in the North Pacific before about 1946 and in the Tropics before about 1960; the Southern Oceans remain poorly sampled throughout the record.

The lack of spatial and temporal smoothing in the COADS archive, along with large uncertainties in individual monthly mean values due to inadequate sampling, makes it difficult to produce comprehensible maps of a particular climatic variable for a specific month and year without additional processing of the data. Aggregating the data over many months and/or years, and use of judicious smoothing and/or interpolation in space, can dramatically enhance the large-scale coherency of anomaly patterns by reducing noise associated with random errors and sampling fluctuations (recall that the standard error of the mean decreases by the square-root of the number of observations: see Trenberth et al., 1992 for further discussion). Smoothing/interpolation procedures may be as simple as running means in the zonal and meridional directions, followed by linear interpolation across a specified number of missing grid boxes (c.f. Deser and Wallace, 1990; Mitchell and Wallace ?? for examples in the Tropical Pacific). More sophisticated procedures such as statistical optimization methods may also be employed. For example, Kaplan (1997, 1999) uses Empirical Orthogonal Function (EOF) analysis as a basis for interpolating across missing grid boxes and smoothing in space and time for SST and SLP (similar procedures are employed for SST in Rayner et al., 1996).

The quality-controlled surface marine climate observations that constitute the COADS archive also serve as input to more sophisticated "data sets" which employ statistical and empirical procedures directed at improving data homogeneity and signal-to-noise ratios; for example, filling in missing grid boxes, correcting for changes in instrumentation and observing practice, and smoothing in space and time. These "value-added" data sets are tremendously useful for certain applications (for example, as contourable fields or as model boundary conditions), but it should be remembered that observational analyses of climatic variations are limited ultimately by the quality, quantity and distribution of the original measurements (c.f., Hurrell and Trenberth, 1999). As such, the limited processing of the data contained in the COADS archive remains a virtue and serves as "ground truth" for our empirical knowledge of surface marine climate and its variability since the mid 19th century.


Strategies for making optimal use of COADS data

Physical consistency among variables

Because COADS contains many climatic variables which are measured independently but are physically related, evaluating the data for physical consistency provides a powerful tool for assessing the reliability of climatic signals. This approach has been adopted in many studies for wind and temperature: for example, SLP gradients may be compared with winds following a geostrophic or frictional momemtum balance, and SSTs may be compared against marine air temperatures due to the close coupling between these quantities via the turbulent flux of sensible heat. Relevant studies on wind and SLP comparisons include Ramage (1987), Wright (1988), Harrison (1989), Cardone et al., 1990, Deser and Blackmon (1993) and Ward and Hoskins (1996). Some of these studies report spurious trends in certain regions and time periods: for example, an upward trend in wind speed associated with the change from visual estimation of sea state to measurements by anemometers. Useful studies on SST and air temperature relationships include those of Wright (1986), Trenberth et al. (1992), Kent et al. (1993) and Folland and Parker (1995). These studies document an upward trend in SST associated with the change from bucket to engine-intake temperatures over much of the world oceans. Data sets which attempt to correct for these spurious trends include Bottomley et al. (1990) and da Silva et al. (1994). The former study also advocates the use of nighttime marine air temperatures due to the possible contamination of daytime air temperatures by solar heating of thermistor shields.

Removal of the mean annual cycle before averaging in space and time

If one is interested in climatic signals on time scales longer than the annual cycle, it is best to work with monthly anomalies rather than monthly totals, where monthly anomalies are defined as departures from the long-term mean annual cycle: e.g., the difference between a value for a given month and a particular year from the climatological value for that month based upon a sufficiently long period of record. If monthly totals are needed, the climatological annual cycle may then be added back in to the monthly anomalies. Before defining monthly anomalies, it may be desirable to form smoothed climatological background fields to further reduce noise levels. The reason that it is preferable to work with monthly anomalies rather than monthly totals in a data set such as COADS is that missing data may result in severe aliasing of variability associated with the annual cycle into variability on longer time scales (e.g., interannual and beyond). For example, time averages (seasonal, annual, etc.) of monthly totals in which different months are missing in different years will result in spurious signals associated with inadequate sampling of the annual cycle rather than real climatic variability. Similarly, inadequate sampling of spatial gradients due to missing grid boxes (with different boxes missing in different months/years) will alias signals associated with long-term mean spatial gradients into spurious temporal variability. The study by Bottomley et al. (1990) provides a nice illustration of the practical application of some of these procedures to SST data.


Additional comments by variable

SST

Trenberth et al. (1992) provide a comprehensive discussion of the sources of errors in estimating monthly mean SSTs from ship data, and provide a detailed analysis of noise levels by month and region. Comparison is also made between COADS SSTs and air temperatures, and between SSTs from COADS and Bottomley et al. (1990). Much of the discussion of errors and noise levels may be carried over to other variables such as SLP or wind. As far as we know, there are no analogous, comprehensive studies of noise levels in other COADS fields, although this is desirable.

Clara Deser
30 May 2001


Relevant Arcticles

Publications Discussing COADS data


Coverage Maps

Click on the links below to view data coverage for the time period indicated. Percentage of non-missing data in each time period is plotted. The minimum number of observations needed per month per grid box was 1.

Air TemperatureCloudinessSLP
(1801-1820, 1821-1840, 1841-1860) (1801-1820, 1821-1840, 1841-1860) (1801-1820, 1821-1840, 1841-1860)
(1861-1880, 1881-1900, 1901-1920) (1861-1880, 1881-1900, 1901-1920) (1861-1880, 1881-1900, 1901-1920)
(1921-1940, 1941-1960, 1961-1980) (1921-1940, 1941-1960, 1961-1980) (1921-1940, 1941-1960, 1961-1980)
(1981-1997) (1981-1997) (1981-1997)
SSTSpecific HumidityU-Wind
(1801-1820, 1821-1840, 1841-1860) (1801-1820, 1821-1840, 1841-1860) (1801-1820, 1821-1840, 1841-1860)
(1861-1880, 1881-1900, 1901-1920) (1861-1880, 1881-1900, 1901-1920) (1861-1880, 1881-1900, 1901-1920)
(1921-1940, 1941-1960, 1961-1980) (1921-1940, 1941-1960, 1961-1980) (1921-1940, 1941-1960, 1961-1980)
(1981-1997) (1981-1997) (1981-1997)

Updated: 10/15/03
Maintained by asphilli@ucar.edu