The mission of the Geophysical Statistics Project (GSP) is to encourage the application of statistical analysis and the development of new statistical methods in the geophysical sciences. Although GSP is administered through the Climate and Global Dynamics (CGD) Division, this project is an NCAR-wide effort and the research reported in the following sections reflects this breadth of scientific collaboration across the center's divisions. Project leadership has been provided by Mark Berliner (November, 1996, to November, 1997) and Doug Nychka (August, 1997-present). Co-Principal investigators for the project are Richard Katz (Environmental and Societal Impacts Group, ESIG) and Roland Madden (Climate Analysis Section, CAS). The material in this report is organized by statistical topics, but within each section, is subdivided into subject matter headings.

Barbara Bailey is involved in modeling the spatial and temporal distribution of cloud cover. This is in collaboration with Berliner, Nychka, William Collins (Climate Modeling Section, CMS), and Jeffrey Kiehl (CMS). Clouds play a fundamental role in controlling the amount of solar and infrared radiation available to the climate system. Most clouds are smaller in area than a typical grid resolution of climate models, so a parameterization of clouds effects is necessary. In particular, prediction of cloud cover amount is of great importance to climate modeling. The parameterization of cloud shape uses a small number of Fourier coefficients of the radius vector function to describe the contour function of a cloud. This approach has been successfully applied to the infrared cloud cover measurements from the Tropical Ocean and Global Atmosphere Program (TOGA) Coupled Ocean-Atmosphere Response Experiment (COARE). To parameterize cloud cover, preliminary results indicate that a first-order, time-lagged, nearest-neighbor model gives accurate predictions. Because the space/time relationship between grid cell neighbors is very complicated, a neural network is fit to describe this relationship. Current research has considered large-scale variables, such as relative humidity and available kinetic energy, to classify the cloud cover according to different cloud types. Even in the absence of using previous cloud cover, models with these covariates appear to give useful parameterizations.

*(This figure
(20K) shows results from a neural network parameterization of
climate variables that measure mean convection and mean relative
humidity to predict cloud cover over the TOGA COARE experiment region
(10ºN-10ºS and 135ºE-175ºW). The top plot (a) is
an image plot of the root mean square error between observed and
predicted cloud fraction over the grid sites for a period of 30 hours
(land mass is white). The distribution of the root mean square error
(RMSE) statistics as a fraction of surface area are reported in
(b). Finally in (c) the data and fitted values for a grid site are
plotted over time. The agreement indicates the ability of the
statistical model to track changes in cloud fraction.)*

To gain a stochastic understanding of the complicated dynamics of surface wind systems over the tropical Pacific Ocean, Christopher Wikle performed exploratory analyses on tropical wind data. Data has been analyzed from four dramatically different data sources: operational wind estimates from the National Centers for Environmental Prediction (NCEP) with spatial resolution of approximately 200 km; satellite scatterometer-derived wind estimates with approximately 25 km spatial resolution; modeled winds from a cloud resolving model with a spatial resolution of 3 km; and wind estimates from the NCAR Electra aircraft, measured during the TOGA COARE intensive observation period, with resolution of 1 km . Wikle examined the empirical space-time dynamics of these datasets through space-time correlation function analysis, spectral analysis, and the examination of the coefficient and error-covariance matrices from vector autoregression models. Comparisons of energy versus wavenumber spectra from these different data sources overlap to form a continuous curve from scales ranging between 6 km and 6000 km. The consistency of the energy spectrum demonstrates quantitative linkages consistent with dynamical notions of continuous enstrophy and energy cascades. These characterizations will be beneficial in blending wind measurements of different spatial resolution into a single coherent field.

*(This
figure (5K) shows average wavenumber spectral estimates for
u- and v-wind components for zonal and meridional sample
paths from three dramatically different data sources: low-resolution
NCEP reanalysis wind fields (blue), moderate resolution European
Research Satellite-1 (ERS-1) scatterometer winds (green), and
high-resolution aircraft observations from the NCAR Electra (red). The
observations were taken during the TOGA COARE intensive observation
period (spanning November, 1992, to February, 1993) on days when the
Electra aircraft took observations at the 30 m level for flight paths
of at least 60 km. The spectra were estimated using a
pre-whitening/post-coloring procedure. Also shown is a
weighted-least-squares linear fit of these estimates. The regression
is weighted by the relative precision of the spectral estimates as
determined by a correlated spectral bootstrap procedure. The
surprising power-law relation over spatial scales ranging from
kilometers to thousands of kilometers suggests the possible existence
of inertial sub-ranges supporting the forward cascade of enstrophy and
the inverse cascade of energy.)*

Satellite-derived wind estimates have high spatial resolution but are limited in global coverage; in contrast, operational analyses provided by the major weather centers provide complete wind fields but are of low spatial resolution. The goal is to blend these data in a manner that incorporates the space-time dynamics inherent in the surface wind field. This is a critical task as no complete, high-resolution wind observations exist over the world's oceans. Such high-resolution datasets are critical to improving our understanding of tropical disturbances, as well as for driving ocean circulation models. To blend the wind information from various scales, a collaboration of Wikle, Ralph Milliff (Oceanography Section, OS), Berliner, and Nychka have developed Bayesian hierarchical models that preserve the spatial and temporal dependencies in the wind fields. Atmospheric and oceanographic data typically contain many different scales of spatial and temporal variability. Such data are often non-stationary in space and time and may involve many observation/prediction locations. These factors limit the effectiveness of traditional space-time statistical methods. Hierarchical spatio-temporal models that incorporate complicated dynamics are new in statistics and represent a distinct contribution in the geosciences. We have successfully implemented the hierarchical model in a Markov chain Monte Carlo framework via a Gibbs sampler. One important feature in this development is the use of vector autoregressive models in the spectral domain. This transformation serves two purposes. First, it provides the most natural way to include knowledge of the underlying dynamics, both through large-scale equatorial wave dynamics in the state process and through turbulence models in the error processes. Secondly, the spectral domain provides a natural way for reducing the dimensionality of the dynamical process.

*(This
figure (22K) shows east-west component (u) wind speeds in
high-resolution sampling swaths obtained from the polar-orbiting
sattelite-based ERS-1 scatterometer instrument. Information from
January 17-19, 1993, is shown with (a) corresponding to
January 17 observations, (b) showing January
18 observations in color and January 17
observations in gray, and (c) showing January 19 observations in color
and January 18 and 17 observations in dark and light gray,
respectively.)*

Jeffrey Andrew Royle, in collaboration with Tim Kittel (Ecosystems Dynamics and the Atmosphere Section, EDAS), David Schimel (EDAS), Berliner, and Nychka, has produced precipitation and temperature data records with complete temporal coverage (i.e., space-time prediction). This methodology can be extended to understand regional influence of El Niño-Southern Oscillation (ENSO) effects on temperature and precipitation. As part of this effort, efficient numerical algorithms have been developed to allow for solution of large spatial prediction problems.

Many physical processes are naturally bivariate or multivariate,
and for interpreting these fields, statistics can be sharpened by
models incorporating the multivariate relationships. An example of
this principle is the spatial interpolation/extrapolation of wind
fields based on the *u* and *v* wind components. The
approach developed by Royle in collaboration with Wikle, Milliff, and
Berliner uses a hierarchical model to specify the joint distribution
of two variables. This is an improvement over the standard methods of
co-kriging because it allows for model components that are physically
interpretable and avoids the difficulties of specifying
cross-covariance functions for complicated physical processes. For
example, a hierarchical spatial model for scatterometer wind data
builds spatial structure into the model via a hidden pressure
variable. This simplifies the bivariate component of the model,
facilitates extension to temporal dynamics, and, surprisingly,
provides a reasonable reconstruction of the pressure field associated
with the geostrophic component of the wind.

*(This figure
(10K) shows: (a) ECMWF's analysis grid locations and wind field on
scatterometer data locations at 0Z on 19 November 1996; and (b)
predicted wind and (standardized) pressure fields using a hiearchical
spatial model that parameterizes spatial dependence via a hidden
(i.e., unobserved) pressure field.)*

A grand challenge across the geosciences is to extend spatial statistics from the moderate size datasets considered in the statistics literature to the massive and substantive records assembled to study the atmosphere and ocean. Wikle, Royle, Berliner, Nychka, and Milliff collaborated on this project. One breakthrough is the modeling of the spatial structure directly instead of starting with the covariance function of the field. This has resulted in a computational strategy based on hierarchical models for large datasets. The key is to exploit iterative solution methods for the large, sparse linear systems that are implied by the hierarchical models. Another component is the efficiency of simulating a complex multivariate normal distribution using the sparse structure from the hierarchical models. The models avoid a costly Cholesky decomposition of large covariance matrices by modeling spatial dependencies in the field directly.

To quantify long-term trends in stratospheric ozone due to anthropogenic activities, it is important to understand the long-term variability in ozone due to other natural forcings. Some of these include the solar cycle, the release of volcanic aerosols, and the locking of the quasi-biennial oscillation with the seasonal cycle. Wendy Meiring, Gary Grunwald (visitor, University of Colorado-Denver), Richard Jones (visitor, University of Colorado-Denver), Susan Solomon (NOAA Aeronomy Laboratory), and Robert Portman (NOAA Aeronomy Laboratory) investigated how the vertical stratospheric ozone profile at midlatitudes depends on the season and other atmospheric variables. Statistical research projects include the development of techniques for modeling and obtaining variability estimates of curves (e.g., vertical profiles). These include models for seasonal and altitude heterogeneity in the variance field, as well as approaches for modeling space-time correlation based on observations that are irregularly sampled in space and time. An important scientific contribution of these models is the ability to discern different temporal dependence at different levels in the profile. This may give a more sensitive characterization of the influence of different factors on the ozone profile and help to separate the effects of aerosols from the solar cycle.

*(This figure
(23K) shows observations from all ozonesondes at Hohenpeissenberg
during January, April, July, and October of 1980-1982. The blue lines
are seasonally varying vertical ozone profile fits based on a
seasonally varying Laguerre polynomial expansion. The red line is a
Laguerre polynomial expansion fit that does not vary with the
season.)*

Meiring and Richard Levine, in collaboration with Madden, David
Erickson (Atmospheric Chemistry Division, ACD), and William Randel
(ACD) studied the behavior of the temporal behavior of CO_{2}
on a seasonal timescale and the spectral density on a synoptic
timescale as a function of spatial location. This analysis gives
insight into the interaction between dynamics of circulation and the
effect of different sources and sinks for this gas. A related project
is combining statistics based on monitoring site observations, with
statistics based on numerical geophysical models. Here a Bayesian
approach is proposed where the prior distributions are based on
geophysical model results.

Levine, in collaboration with Berliner and Tom M.L. Wigley (CAS), has focused on predicting climate change due to human-induced increases in greenhouse gases, aerosol concentrations, and impacts of other activities of man on the climate system. The statistical challenge lies in detecting a climate change signal in climate observations over time and attributing this potential signal to particular anthropogenic forcings. Using a Bayesian formulation, detection and attribution methodologies in the meteorological literature can be interpreted from a statistical viewpoint. This new characterization of the global warming problem is new and has shed light on the advantages and short-comings of the techniques presently implemented by climate modelers. This work also suggests alternative, and more appropriate statistical procedures for inferring climate change signals in climate data.

Adapted Observations for Field Experiments

Zhan-Qian (John) Lu, in collaboration with Berliner and Christopher Snyder (Mesoscale and Microscale Meteorology Division, MMM), has worked on the problem of taking adaptive (targeted) observations. The impetus for this work is the need for strategies for the collection of data in large field experiments in a way that maximizes the use of often limited sampling resources. The Fronts Atlantic Storm Track Experiment (FASTEX) project has been used as an example and as a test bed for guidance in this research. However, the basic issues in this problem have lead to a rigorous formulation of the targeted observation problem and an identification of useful statistical design criteria. This mathematical foundation can also suggest the best way to approximate the full problem on lower dimensional subspaces to make the computation of a design feasible. Statistical designs for targeted observations were tested in a simple atmospheric model, and they compare favorably to various heuristic approaches that have been proposed in the meteorological community.Traditional methods for spatial designs have been hampered by the
difficulty of computing an exact optimal design from a statistical
criterion. Royle and Nychka have investigated a more flexible approach
to locate points using a space filling criterion that is independent
of the covariance structure in the spatial field. Coupled with this
approach are simple swapping algorithms for finding designs that can
satisfy complicated geographic or other physical constraints. An
important design problem in monitoring the environment is siting a
network to be useful for temporal trend estimation. Some other design
contributions are: (a) constructing designs for optimal estimation of
covariance structure, (b) choosing optimal subsets of climate
monitoring (HCN) stations, (c) optimal design for forecast
verification, and

*(This
figure (23K) shows a comparison of two space-filling designs for
monitoring ambient ozone. The color images indicate the prediction
standard errors for estimating 8-hour daily ozone from a network of
20 Midwestern stations (green 6 PPB, yellow 8 PPB). Plot
(a) is a subset of 20 points determined from an existing
monitoring network of approximately 160 stations. The full network are
smaller dots in this plot. Plot (b) is a set of 20 points where the
locations are only constrained to be within the rectangular region and
over land. Note the better coverage and lower prediction error
associated with the unconstrained design over northern
Michigan.)*

Bailey collaborated with James Hurrell (CAS) and Kevin Trenberth (CAS) by fitting linear models to tropospheric temperature data in an attempt to correct for systematic biases. A linear model with autoregressive errors of order one was fit to temperature data derived from Microwave Sounding Units (MSUs) on polar orbiting satellites. The change of satellites was modeled as a step function, and the model indicates that there is systematic change from the overall mean during the time periods of interest.

Although runoff is important for assessing the effects of increased precipitation, it has been difficult to find adequate statistical models to fit observed hydrologic series. Lu and Berliner developed a switching, Markov time series model that is then fit to hydrologic data employing a Bayesian strategy. The model works well, capturing the sharp rises and slow falls that are characteristic for non-Gaussian series. One-day predictions based on this model are also an improvement over more traditional methods.

Jones and Grunwald developed stochastic models for daily rainfall that simultaneously model occurrence and intensity. With these models, dependence of a previous day's rainfall incorporated in a natural way. This form of dependence has been shown to exist in observational data and is contrary to the usual assumptions in rainfall modeling. The results yield estimates of seasonal patterns in mean daily rainfall amounts and probability distributions for rainfall that account for both occurrence and intensity.