From cjw at ucar.edu Mon Jan 1 15:54:53 2007 From: cjw at ucar.edu (Chris Webster) Date: Mon Jan 1 15:54:55 2007 Subject: [CF-metadata] Re: attributes for min/max data values for visualization Message-ID: <4599913D.3080207@ucar.edu> Don Murray wrote: > This is what I would like to see - the actual minimum and maximum > values for the dataset as optional attributes for the variables. > The problem we face now is having to read the data to determine > this. If we had the attributes available, we could circumvent that. So what would/do we call these attributes? --Chris From Steven.C.Hankin at noaa.gov Tue Jan 2 12:35:48 2007 From: Steven.C.Hankin at noaa.gov (Steve Hankin) Date: Tue Jan 2 12:35:02 2007 Subject: [CF-metadata] Getting back to ensembles In-Reply-To: <1167398778.14692.12.camel@localhost.localdomain> References: <20061229103525.GA28828@met.reading.ac.uk> <1167398778.14692.12.camel@localhost.localdomain> Message-ID: <459AB414.60408@noaa.gov> Hi Jonathan, Bryan, et. al., I think Bryan has stated the essence of the problem well, below. Jonathan, you have pointed out that the next challenge is to find words that describe the semantic distinctions that we are trying to draw with minimal ambiguity. So I suggest that we turn our attention to that challenge. I do not have a specific proposal of wording to offer. Instead (like a broken record (boy, that dates me!)) I'd suggest we begin with a discussion of what our requirement is, and see if the wording falls naturally out from that. Borrowing some of Bryan's words: *Requirement:* CF must support standardized terminology in multiple semantic domains. It must do so in a manner that will permit tools to be built that utilize the distinctions between these domains. The domains that must be kept distinct include at a minimum the ones listed just below, however, this list must be extensible. 1. scientifically significant measured quantities 2. parameters describing measurement techniques or processes (including who made the measurements) 3. identification of CF data structures (grids, axes, coordinates, coordinate geometry information, ...) 4. others (??) I'm no metadata expert, so please correct me if the following assertion is wrong: There is nothing to prevent the same name from existing in multiple vocabularies. For example, if "platform_orientation" (mentioned as an example in a previous email) really is both a "scientifically significant measured quantity" and a "parameter describing measurement technique", then is can exists separately under both domains. It do not think it is a requirement that the distinction between semantic domains can be inferred from the name alone. Stated another way -- we do not need to determine the context of the name from the name; we should always already know the context in which we are encountering a name. - Steve ================================================================ Bryan Lawrence wrote: > Jonathan > > I think we all understand the distinction between things which are > measured, and things which are about measurement techniques, or who made > the measurements (or simulations). > > We want to start building smart tools that can assist data users > understand these distinctions faster and (as Roy implies, more > accurately), but for these smart tools to work we need to build > semantics directly into our vocabs. One way of doing this (and it's > simply a first step) is to separate the vocabs. It's a design decision > as to whether one does that by rules within a vocab or between vocabs. > Either way, in our ontology building, we can start with the science > vocabs, and the job is that little bit more tractable (and I think > Jonathan underestimates how hard this going to be ... although it's only > possible because of the quality of what we have now!) > > At the moment it is only your voice which is arguing against separating > these vocabs. Are there any others who have a problem with the proposed > separation? (Which is simply the creation of an additional table, to be > called standard_metadata, which we will continue to control in the same > way as we control standard_names ... and yes, at some future time we > might deprecate some existing standard names and define them as standard > metadata --- and to forestall Jonathan's objection: this wont hurt > existing files because this will all be version controlled :-) > > Bryan > > On Fri, 2006-12-29 at 10:35 +0000, Jonathan Gregory wrote: > >> Dear Roy >> >> >>> Consider a case where a metadata record has two fields, one for geographic >>> coverage and one for parameter. If selection drop-downs for these are >>> covered by two separate lists - either vocabs or within an ontology - then >>> 'sea_temperature' will not appear in the geographic coverage drop-down and >>> 'Atlantic_Ocean' will not appear in the paramer drop-down. Were both >>> drop-downs covered by a single ' Standard Name list' then both terms would >>> appear. This not only increases the risk of field population with nonsense >>> (the type of error I was visualising - admittedly it's still possible to >>> call temperature salinity), but also makes the drop-down appear eccentric to >>> say the least. >>> >> We distinguish between lists for (a) standard names (b) the possible values of >> quantities which have a standard name. "atlantic_ocean" is not a standard name; >> it is a possible value for a variable whose standard name is "region". A menu >> of standard names would includes sea_surface_temperature, rainfall_flux, >> latitude, region and land_cover (to list a few from the present table) and >> also (if my proposal is agreed, to meet Paco's requirement) source, institution >> and experiment_id. These are all names for things which a data variable or a >> coordinate variable could contain. The *values* of these variables are dealt >> with in other ways. sea_surface_temperature, rainfall_flux and latitude are >> numeric, so no list is needed. The others are string-valued. At present only >> region has standardised values; the possible values are given by >> http://www.cgd.ucar.edu/cms/eaton/cf-metadata/region.html >> It's quite likely we might develop a standard list for land_cover. As we have >> discussed a lot, it would be useful to make links to other people's controlled >> vocabularies if we can, and the proposal also includes a new attribute to point >> to external lists of the possible values for a quantity (Bryan's suggestion). >> >> >>> Jon's comment that we can carry on as we are now and change later worries me a little. So many times in my work with metadata I have found that aggregation is infinitely easier than teasing things apart. >>> >> It is right to be cautious, but I think this reasonable concern of yours is >> that things should be sufficiently informative. I agree with you. That's why >> we spend so much time making sure we know exactly what quantity is being >> identified by a standard name, and why quantities with different physical >> dimensions (units) have different standard names. In this case, I think we >> are talking about a categorisation that can be introduced whenever we need it. >> There are only 814 standard names at present, so it would not be a big job to >> classify them in future, given a clear criterion for doing it - which is what >> we lack, since we don't have a need for it (as far as I can see). >> >> Cheers >> >> Jonathan >> _______________________________________________ >> CF-metadata mailing list >> CF-metadata@cgd.ucar.edu >> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata >> > _______________________________________________ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > -- -- Steve Hankin, NOAA/PMEL -- Steven.C.Hankin@noaa.gov 7600 Sand Point Way NE, Seattle, WA 98115-0070 ph. (206) 526-6080, FAX (206) 526-6744 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070102/36e29f2a/attachment.html From j.m.gregory at reading.ac.uk Wed Jan 3 11:10:12 2007 From: j.m.gregory at reading.ac.uk (Jonathan Gregory) Date: Wed Jan 3 11:10:15 2007 Subject: [CF-metadata] Getting back to ensembles Message-ID: <20070103181012.GA28718@met.reading.ac.uk> Dear Steve > I think Bryan has stated the essence of the problem well, below. > "I think we all understand the distinction between things which are > measured, and things which are about measurement techniques, or who made > the measurements (or simulations)." That is not a requirement, though. > CF must support standardized terminology in multiple semantic > domains. It must do so in a manner that will permit tools to be > built that utilize the distinctions between these domains. This is a requirement (thanks). But do these tools need the distinctions to be drawn in the metadata stored in the file? Or do they just need to know that there are distinctions that can be drawn, to help people construct or examine files? > 1. scientifically significant measured quantities > 2. parameters describing measurement techniques or processes > (including who made the measurements) > 3. identification of CF data structures (grids, axes, coordinates, > coordinate geometry information, ...) > I'm no metadata expert, so please correct me if the following > assertion is wrong: There is nothing to prevent the same name from > existing in multiple vocabularies. For example, if > "platform_orientation" (mentioned as an example in a previous email) > really is both a "scientifically significant measured quantity" and a > "parameter describing measurement technique", then is can exists > separately under both domains. It do not think it is a requirement > that the distinction between semantic domains can be inferred from the > name alone. If the categories are overlapping, that suggests to me that what we should do is add a column to the standard name table to indicate which possible categories a quantity belongs to. It does not require introducing a new attribute as an alternative to standard_name for metadata stored in the netCDF file. Using different attributes would require that you decide which it is, in each case stored in the file, and that would seem arbitrary. Why would you need to do that? I mean, what tool has a requirement for it? Best wishes Jonathan From godin at mbari.org Wed Jan 3 17:09:34 2007 From: godin at mbari.org (Michael Godin) Date: Wed Jan 3 17:11:17 2007 Subject: [CF-metadata] Proposed standard names for biological model outputs In-Reply-To: <1167869079.16373.3.camel@localhost.localdomain> References: <49FA8637609AC24FB26C007791DE215937E936@ice.shore.mbari.org> <1167869079.16373.3.camel@localhost.localdomain> Message-ID: <1167869374.16373.9.camel@localhost.localdomain> Hi Roy, It seems that a majority of biologists here agree with your biologists and would prefer to use "concentration" to express moles/volume, as the usual units for biomass involve mass. However, since the CF-Metadata uses both "concentration" and "mass_conentation" to represent mass/volume, it is probably appropriate to specify "mole_concentration" to express moles/volume (which is roughly equivalent to using "mole_fraction" to express moles/moles). I also concur with Jonathan's earlier recommendation to use the "expressed_as" modifier. So, my revised list of proposed standard names is as follows (all units are mol m-3): Prefix to all descriptions: Mole concentration means moles (amount of substance) per unit volume and is used in the construction mole_concentration_of_X_in_Y, where X is a material constituent of Y. mole_concentration_of_organic_detritus_in_sea_water_expressed_as_nitrogen Description: Organic detritus are particles of debris from decaying plants and animals. The construction expressed_as_nitrogen indicates that the indicated mole concentration is of nitrogen atoms due to the organic detritus. mole_concentration_of_organic_detritus_in_sea_water_expressed_as_silica Description: Organic detritus are particles of debris from decaying plants and animals. The construction expressed_as_silica indicates that the indicated mole concentration is of silica atoms due to the organic detritus. mole_concentration_of_diatoms_in_sea_water_expressed_as_nitrogen Description: Diatoms are single-celled phytoplankton with an external skeleton made of silica. The construction expressed_as_nitrogen indicates that the indicated mole concentration is of nitrogen atoms due to the diatoms. mole_concentration_of_mesozooplankton_in_sea_water_expressed_as_nitrogen Description: Mesozooplankton are large protozoans (single-celled organisms) and small metazoans (multi-celled organisms) sized between 2x10-4 m and 2x10-2 m that feed on other plankton and telonemia. The construction expressed_as_nitrogen indicates that the indicated mole concentration is of nitrogen atoms due to the mesozooplankton. mole_concentration_of_microzooplankton_in_sea_water_expressed_as_nitrogen Description: Microzooplankton are protozoans (single-celled organisms) sized between 2x10-5 m and 2x10-4 m that feed on other plankton and telonemia. The construction expressed_as_nitrogen indicates that the indicated mole concentration is of nitrogen atoms due to the microzooplankton. mole_concentration_of_phytoplankton_in_sea_water_expressed_as_nitrogen Description: Phytoplankton are autotrophic prokaryotic or eukaryotic algae that live near the water surface where there is sufficient light to support photosynthesis. The construction expressed_as_nitrogen indicates that the indicated mole concentration is of nitrogen atoms due to the phytoplankton. mole_concentration_of_ammonium_in_sea_water mole_concentration_of_nitrate_in_sea_water mole_concentration_of_silicate_in_sea_water Regards, Mike > -----Original Message----- > From: cf-metadata-bounces@cgd.ucar.edu > [mailto:cf-metadata-bounces@cgd.ucar.edu] On Behalf Of Roy Lowry > Sent: Friday, December 22, 2006 12:06 AM > To: cf-metadata@cgd.ucar.edu > Subject: RE: [CF-metadata] Proposed standard names for biological > model outputs > > Michael, > > This discussion seems to have stalled and it would be nice to get your > proposed standard names into the system. As regards the 'biomass' > versus 'molar concentration' issue further talks with biological > colleagues indicate that the association of the units 'moles' with > biomass is a totall alien concept. To them canonical units biomass > are kg/m3 (usually expressed as mg/m2 or ug/l). So, how about if we > go with your initial suggestions and reserve the term 'biomass' for > the inevitable future request? > > Cheers, Roy. > > >>> "Godin, Michael" 12/6/2006 4:42 pm >>> > Ron, > > The small group of biological oceanographers I have spoken with thus > far fall into two camps: those who use ecosystem model outputs, and > those who don't (I'm still waiting to talk with more of each). Of > those who use ecosystem model outputs (and publish papers in the > field), the term to describe moles of X per unit volume water due to Y > is simply "concentration" (with no "molar" or "amount-of-substance" > prefix). Those who don't use biological models call it > "biomass" (even those who use physical models). > > I also asked about terms for moles per kilogram, and was told that > such a measure is rarely used by biologists, as it is too akin to the > chemist's concepts of molality and molinity, which tend to imply a > dissolved solution. Similarly, it appears that oceanographers avoid > expressing "mass of X per unit volume water," as it is non-trivial to > measure the dry mass of biological samples; and the resulting quantity > would have to be called "density", which could be confused with water > density. > > Mike > > -----Original Message----- > From: cf-metadata-bounces@cgd.ucar.edu > [mailto:cf-metadata-bounces@cgd.ucar.edu] On Behalf Of Roy Lowry > Sent: Wednesday, December 06, 2006 12:50 AM > To: cf-metadata@cgd.ucar.edu > Subject: Re: [CF-metadata] Proposed standard names for biological > model outputs > > Hello Jonathan, > > Let's see what Mike turns up when he talks to his MBARI colleagues > about the best way to describe substance held in biological material. > A straw poll of four biologists in BODC indicated that biomass was the > better understood term in this context. > > I agree with you about molality. I again asked around the BODC data > scientists and nobody could give me a definition of molality - > including a couple of people with chemical oceanography PhDs. > > Longer term we need to 'get smart' and provide the technology to > manage synonyms operationally. > > Cheers, Roy. > > >>> Jonathan Gregory 12/05/06 6:27 PM >>> > Dear Roy > > > I can see a future request for > 'Nitrogen_molar_biomass_of_phytoplankton' and nobody realising that it > is the same thing as the pre-existing > 'molar_concentration_of_nitrogen_in_sea_water_due_to_phytoplankton'. > > This kind of thing is certainly a problem but I don't think we can > avoid it. > When people approach things from different backgrounds they have > different expectations. We just have to point that the quantity exists > under a different name already. This has happened before. > > Of course, we can minimise it by using familiar terms, and that is one > reason for doing so. However, I somewhat disagree with Steve's > preference for the technical terms of specialised fields, as often > these terms are unclear and confused - at least, I have got that > impression from previous exercises to devise new standard names. In > their own fields they are jargon which is understood, and the > background is known, but to outsiders they can seem unintuitive and > unclear. Obviously this is not always the case. We have to take each > case on its merits. > > Best wishes > > Jonathan > _______________________________________________ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > > -- > This message (and any attachments) is for the recipient only. NERC is > subject to the Freedom of Information Act 2000 and the contents of > this email and any reply you make may be disclosed by NERC unless it > is exempt from release under the Act. Any material supplied to NERC > may be stored in an electronic records management system. > > > _______________________________________________ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > -- > This message (and any attachments) is for the recipient only. NERC > is subject to the Freedom of Information Act 2000 and the contents > of this email and any reply you make may be disclosed by NERC unless > it is exempt from release under the Act. Any material supplied to > NERC may be stored in an electronic records management system. > > > _______________________________________________ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > From j.m.gregory at reading.ac.uk Thu Jan 4 01:11:39 2007 From: j.m.gregory at reading.ac.uk (Jonathan Gregory) Date: Thu Jan 4 01:11:44 2007 Subject: [CF-metadata] Proposed standard names for biological model outputs Message-ID: <20070104081139.GC29666@met.reading.ac.uk> Dear Michael These look very good to me, and thanks for your clear definitions. I have one question, about silica (=SiO2). Do you mean silicon? In the discussions about Christiane's aerosol names we appear to have agreed that we need only say X_expressed_as_Y when X and Y are different; that is the rule you have used, so yours and hers are consistent. Best wishes Jonathan From rkl at bodc.ac.uk Thu Jan 4 06:05:05 2007 From: rkl at bodc.ac.uk (Roy Lowry) Date: Thu Jan 4 06:05:47 2007 Subject: [CF-metadata] Proposed standard names for biological model outputs Message-ID: Michael, Nice job. I love the definitions. However, I agree with Jonathan that although numerically equivanent in the world of moles 'expressed_as_silicon is more consistent and less confusing than 'expressed_as_silica'. Cheers, Roy. >>> Jonathan Gregory 01/04/07 8:11 AM >>> Dear Michael These look very good to me, and thanks for your clear definitions. I have one question, about silica (=SiO2). Do you mean silicon? In the discussions about Christiane's aerosol names we appear to have agreed that we need only say X_expressed_as_Y when X and Y are different; that is the rule you have used, so yours and hers are consistent. Best wishes Jonathan _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From dstuebe at umassd.edu Thu Jan 4 09:32:02 2007 From: dstuebe at umassd.edu (David Stuebe) Date: Thu Jan 4 09:32:06 2007 Subject: [CF-metadata] CF-metadata Digest, Vol 46, Issue 1 Message-ID: <1f31dac10701040832s713f430fg20b827ccd04c90f2@mail.gmail.com> Re: attributes for min/max data values for visualization (Chris Webster) This is a very important point that Chris has made regarding the need for min/max data in visualization. For my work with visualization of FVCOM unstructured data, I have only encountered this as an issue while working with multi-domain data sets. Since I have multiple files, one for each domain plus a master file, I only store min/max data in the master file. I have found that the min/max data are only useful in this context, where certain data may not be useful based on its range and can therefore be neglected without reading from disk. My highly >>non<< standard file structure for multi-domain data is to break apart a single file into seperate files which are identical to the original and complete for visualization of that subdomain plus its ghost zones. The master file then contains the data reguarding which cells are ghost cells and the min and max (spatial and data extents) for each sub-domain. This information can be checked first when the user requests a particular plot from the visualization program. Before we get into details like naming schemes for min max values, what are the contexts in which this optional min/max data are useful? This will be an important determining factor in how it is stored. David -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070104/e51b6f2b/attachment.html From Godin at mbari.org Thu Jan 4 09:51:56 2007 From: Godin at mbari.org (Godin, Michael) Date: Thu Jan 4 09:51:59 2007 Subject: [CF-metadata] Proposed standard names for biological model outputs Message-ID: Ron and Jonathan, Good catch. Not sure why I typed "silica" when I meant "silicon". It is actually quite incorrect ("silicate" would have been closer to reality), and the reference to "silica atoms" in the definition seems truly bizarre. So, here's the revision, with full description: mole_concentration_of_organic_detritus_in_sea_water_expressed_as_silicon Description: Mole concentration means moles (amount of substance) per unit volume and is used in the construction mole_concentration_of_X_in_Y, where X is a material constituent of Y. Organic detritus are particles of debris from decaying plants and animals. The construction expressed_as_silicon indicates that the indicated mole concentration is of silicon atoms due to the organic detritus. Cheers, Mike -----Original Message----- From: cf-metadata-bounces@cgd.ucar.edu [mailto:cf-metadata-bounces@cgd.ucar.edu] On Behalf Of Roy Lowry Sent: Thursday, January 04, 2007 5:05 AM To: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] Proposed standard names for biological model outputs Michael, Nice job. I love the definitions. However, I agree with Jonathan that although numerically equivanent in the world of moles 'expressed_as_silicon is more consistent and less confusing than 'expressed_as_silica'. Cheers, Roy. >>> Jonathan Gregory 01/04/07 8:11 AM >>> Dear Michael These look very good to me, and thanks for your clear definitions. I have one question, about silica (=SiO2). Do you mean silicon? In the discussions about Christiane's aerosol names we appear to have agreed that we need only say X_expressed_as_Y when X and Y are different; that is the rule you have used, so yours and hers are consistent. Best wishes Jonathan _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata From christiane.textor at aero.jussieu.fr Thu Jan 4 10:43:37 2007 From: christiane.textor at aero.jussieu.fr (Christiane Textor) Date: Thu Jan 4 10:46:51 2007 Subject: [CF-metadata] aerosol and chemistry names - continuation In-Reply-To: <20061223214244.GA13364@met.reading.ac.uk> References: <20061223214244.GA13364@met.reading.ac.uk> Message-ID: <459D3CC9.4080707@aero.jussieu.fr> Happy new year ! Dear Jonathan, Thank you very much for your comments. =========== CHANGED: surface_dry_deposition_mass_flux_of_ozone_in_stomata surface_dry_deposition_mass_flux_of_ozone_into_stomata It is still surface deposition, since plants grow at the Earth surface. **** Chemical production and destruction both have the prefix 'gross' now. **** hexachlorbiphenyl is changed to hexachlorobiphenyl =========== UPDATES: The issue of X_expressed_as_Y seams to be settled: X=Y is the standard, and the extension _expressed_as_Y it is only added if X\=Y. The name particulate_organic matter seams to be accepted as well. If there are no further objections, I would like to move the following quantities from the proposed to the almost-accepted table: surface_dry_deposition_mass_flux_of_X surface_wet_deposition_mass_flux_of_X mass_fraction_ of_X_dry_aerosol_in_air mole_fraction_of_dimethyl_sulfide_in_air chemical_gross_production_rate_of_mole_concentration_of_X chemical_gross_destruction_rate_of_mole_concentration_of_X =========== DISCUSSION Your comment: > I am unsure what atmosphere_emission_mass_flux (and production and re_emission) > means. I understand "emission" to mean something coming from a source with a > fixed location. atmosphere_emission means a source within the atmosphere, e.g. from the surface or from an air plane. atmosphere_production means production within the atmosphere, this includes direct sources and chemical production from precursors. re-emission refers to the source of a pollutant that is not directly emitted by human activities, but re-emitted after previously being deposited and accumulated in soils or water. These names were understood in our community (for the HTAP experiments), are they not clear enough? **** Your comment on water_in_ambient_aerosol_optical_depth: I had proposed ' _ambient_aerosol_optical_depth as explained in an email from on 10/10/2006: > - optical depth: ambient > > As the optical depth depends on the water content, the word 'ambient' avoids ambiguities. > > - optical depth or thickness? > > atmosphere_optical_thickness_due_to_X is defined by CF, however: > 1) very long name, > 2) does not fit with cloud_optical_depth, > 3) is there any opt. depth not in the atmosphere? > this is why I proposed X_aerosol_ambient_optical_depth There were no objections to this proposal, this is why they appear in the almost-accepted table. water_in_ambient_aerosol_optical_depth follows this systematic, it is the optical depth due to the water contained in aerosol. **** I am looking forward to your comments. Best regards, Christiane -- Christiane Textor Service d'A?ronomie INSU CNRS, Tour 46, RDC # 2 Universit? Pierre et Marie Curie, Boite 102 4 place Jussieu 75252 Paris C?dex 05 France Tel: ++33 1.44.27.21.82 Fax: ++33 1.44.27.21.81 Email: christiane.textor@aero.jussieu.fr From Godin at mbari.org Thu Jan 4 17:31:58 2007 From: Godin at mbari.org (Godin, Michael) Date: Thu Jan 4 17:32:02 2007 Subject: [CF-metadata] Indicating data lineage or provenance Message-ID: I am heartened by all the work this group has put into standardizing the metadata for representing multiple models as an ensemble. However, a particularly thorny issue has been for the most part ignored (I think it has been called a "nightmare"), so I'd like to see if some of the list participants would be willing to work together to form a proposal for indicating the provenance of derived data (for example, initial conditions, larger nested grids, and assimilated data that go into models). So here are the (draft) requirements that I believe need to be addressed: - derived data users need to be provided the information they need to understand the differences between data (covering the same temporal/spatial region) from different models and different realizations of the same model. - skeptics (public, governmental, other modelers, observationalists) should be able to request specific observational data that went into a model realization (granted, the request may be for data that would not otherwise be made publicly available). - the specification of source data should not only indicate the source data files (or URLs) and variables, but also the temporal/spatial/realization bounds on the supplied data. I don't know if such a set of requirements can be addressed in a netCDF file, or if it would require a link to an external XML (or other format) file. I am also unsure if any other community has solved the above set of requirements - both the OGC's Layer definition within their Web Map Context Document standard, and the FGDC's Lineage definition within their Content Standard for Digital Geospatial Metadata allow one to specify a lot of metadata about lineage and provenance, but neither really meets the requirements above. My initial thought for doing this within a netCDF file would be to specify a global multi-line string attribute called something like "lineage" or "provenance" and populate it with a series of DAP2.0-like URIs (of course, this would not be global in the case of ensembles -- it would have to be a 3D set of strings!). The DAP2.0 URIs would not have to be publicly accessible, and the syntax would have to allow combinations of hyperslab operators and queries -- which I do not believe any DAP server actually allows -- but would allow one to specify precise data ranges. Thanks for your consideration, Mike _____________________________________________ Michael A. Godin Software Engineer Monterey Bay Aquarium Research Institute Phone: 831-775-2063 http://www.mbari.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070104/f671ef20/attachment-0001.html From graybeal at mbari.org Thu Jan 4 17:47:17 2007 From: graybeal at mbari.org (John Graybeal) Date: Thu Jan 4 17:47:24 2007 Subject: [CF-metadata] Indicating data lineage or provenance In-Reply-To: References: Message-ID: To provide some data in response to Mike's question, and then a question of my own: I, along with Maureen Edwards of the UK, are tasked by OceanSITES with presenting a nominal solution to provenance in netCDF. How far we can get, and how quickly, is definitely TBD, but the notion I have devolves to separate files. (Yes I do hate that, but provenance on a whole mooring system is pretty complicated to put into a netCDF file). So I'd probably suggest a link (URL) from netCDF to a registered SensorML instance (registrations of which are being pursued on another project I'm involved with). Similar to Mike's solution but with important differences. One point being, this is a more general problem than just model provenance. Observation and processing provenance is also desirable to represent in netCDF files. So the question is, how much of this does the CF standard want to take on directly, and how much does it want to defer to other standards or efforts? (No I really didn't put Mike up to this, and he really is only 8 doors from me. But neither of us knew...) John At 4:31 PM -0800 1/4/07, Godin, Michael wrote: >Content-class: urn:content-classes:message >Content-Type: multipart/alternative; > boundary="=_reb-r50C4DCF4-t459D9D0C" > >I am heartened by all the work this group has put into standardizing the metadata for representing multiple models as an ensemble. However, a particularly thorny issue has been for the most part ignored (I think it has been called a "nightmare"), so I'd like to see if some of the list participants would be willing to work together to form a proposal for indicating the provenance of derived data (for example, initial conditions, larger nested grids, and assimilated data that go into models). > >So here are the (draft) requirements that I believe need to be addressed: >- derived data users need to be provided the information they need to understand the differences between data (covering the same temporal/spatial region) from different models and different realizations of the same model. >- skeptics (public, governmental, other modelers, observationalists) should be able to request specific observational data that went into a model realization (granted, the request may be for data that would not otherwise be made publicly available). >- the specification of source data should not only indicate the source data files (or URLs) and variables, but also the temporal/spatial/realization bounds on the supplied data. > >I don't know if such a set of requirements can be addressed in a netCDF file, or if it would require a link to an external XML (or other format) file. I am also unsure if any other community has solved the above set of requirements - both the OGC's Layer definition within their Web Map Context Document standard, and the FGDC's Lineage definition within their Content Standard for Digital Geospatial Metadata allow one to specify a lot of metadata about lineage and provenance, but neither really meets the requirements above. > >My initial thought for doing this within a netCDF file would be to specify a global multi-line string attribute called something like "lineage" or "provenance" and populate it with a series of DAP2.0-like URIs (of course, this would not be global in the case of ensembles -- it would have to be a 3D set of strings!). The DAP2.0 URIs would not have to be publicly accessible, and the syntax would have to allow combinations of hyperslab operators and queries -- which I do not believe any DAP server actually allows -- but would allow one to specify precise data ranges. > >Thanks for your consideration, >Mike > >_____________________________________________ > >Michael A. Godin > >Software Engineer > >Monterey Bay Aquarium Research Institute > >Phone: 831-775-2063 http://www.mbari.org > > > >_______________________________________________ >CF-metadata mailing list >CF-metadata@cgd.ucar.edu >http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- ---------- John Graybeal -- 831-775-1956 Monterey Bay Aquarium Research Institute Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds From b.n.lawrence at rl.ac.uk Fri Jan 5 01:20:15 2007 From: b.n.lawrence at rl.ac.uk (Bryan Lawrence) Date: Fri Jan 5 01:20:21 2007 Subject: [CF-metadata] Indicating data lineage or provenance In-Reply-To: References: Message-ID: <1167985215.19332.14.camel@localhost.localdomain> I think the most important thing to get right is the content model (i.e. what does one want to know/record), and then worry about whether it is in the netcdf file or an accompanying file (serialised however). For the record, our (BADC) attempts at dealing with this are at http://proj.badc.rl.ac.uk/ndg/wiki/NumSim (and despite what I said, the schema is an xml xsd, rather than serialisation independent). The NumSim project has been somewhat in abeyance due to other priorities, but it's about to be revamped in the context of a new funding line (to support both research use and public engagement with climate simulations), and we expect to be documenting all our simulation data in the next twelve months using it. As part of that activity we'll be giving it a makeover in partnership with the Met Office and looking at how it relates to the NMM work coming out of Reading (which is being looked at in a number of contexts). Ideally NumSim ought to be a human readable/generated subset/component of NMM, and our vision is that it ought to meet the requirement you're outlining here. In any case you will see that NumSim allows the explicit linking to datasets which are used as boundary conditions and initial conditions for simulations. It'd be great if you wanted to make some specific criticisms of what we have now in terms of the content model. Then we should worry about how we use it :-) Cheers Bryan On Thu, 2007-01-04 at 16:31 -0800, Godin, Michael wrote: > I am heartened by all the work this group has put into standardizing > the metadata for representing multiple models as an ensemble. > However, a particularly thorny issue has been for the most part > ignored (I think it has been called a "nightmare"), so I'd like to see > if some of the list participants would be willing to work together to > form a proposal for indicating the provenance of derived data (for > example, initial conditions, larger nested grids, and assimilated data > that go into models). > > So here are the (draft) requirements that I believe need to be > addressed: > - derived data users need to be provided the information they need to > understand the differences between data (covering the same > temporal/spatial region) from different models and different > realizations of the same model. > - skeptics (public, governmental, other modelers, > observationalists) should be able to request specific observational > data that went into a model realization (granted, the request may be > for data that would not otherwise be made publicly available). > - the specification of source data should not only indicate the source > data files (or URLs) and variables, but also the > temporal/spatial/realization bounds on the supplied data. > > I don't know if such a set of requirements can be addressed in a > netCDF file, or if it would require a link to an external XML (or > other format) file. I am also unsure if any other community has > solved the above set of requirements - both the OGC's Layer > definition within their Web Map Context Document standard, and the > FGDC's Lineage definition within their Content Standard for Digital > Geospatial Metadata allow one to specify a lot of metadata about > lineage and provenance, but neither really meets the requirements > above. > > My initial thought for doing this within a netCDF file would be to > specify a global multi-line string attribute called something like > "lineage" or "provenance" and populate it with a series of DAP2.0-like > URIs (of course, this would not be global in the case of ensembles -- > it would have to be a 3D set of strings!). The DAP2.0 URIs would not > have to be publicly accessible, and the syntax would have to allow > combinations of hyperslab operators and queries -- which I do not > believe any DAP server actually allows -- but would allow one to > specify precise data ranges. > > Thanks for your consideration, > Mike > _____________________________________________ > > Michael A. Godin > > Software Engineer > > Monterey Bay Aquarium Research Institute > > Phone: 831-775-2063 http://www.mbari.org > > > _______________________________________________ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata From Francisco.Doblas-Reyes at ecmwf.int Fri Jan 5 03:05:23 2007 From: Francisco.Doblas-Reyes at ecmwf.int (Francisco Doblas-Reyes) Date: Fri Jan 5 03:05:27 2007 Subject: [CF-metadata] Getting back to ensembles In-Reply-To: <20061223210347.GA13158@met.reading.ac.uk> References: <20061223210347.GA13158@met.reading.ac.uk> Message-ID: <459E22E3.2090405@ecmwf.int> Hi, It seems that the only point from the original message posted by Jonathan http://www.cgd.ucar.edu/pipermail/cf-metadata/2006/001397.html is the issue of whether to use the standard_name attribute or to create a new type. Taking into account the type of data we try to encode, I fully support his proposal of the variable having "a dimension which serves as an index over the members of an ensemble, in which the ensemble members are derived from different models, integrations, institutions supplying the data, etc., and the data from each ensemble member is a function of the same spatio-temporal coordinates and/or other physical independent variables". To provide metadata identifying the members of the multi-forecast system ensemble, the option of using "auxiliary coordinate variables (with values not necessarily unique or ordered) with the ensemble index dimension to contain metadata identifying the institution (string), source (string), experiment_id (string) and realization (numeric or string) of the data". This agrees quite well with the requirements in my original posting: http://www.cgd.ucar.edu/pipermail/cf-metadata/2006/001147.html although additional metadata may be required. We intend to make the string-valued metadata self-describing, and a web site will be provided with additional information. The use of a table specifying a string-valued "vocabulary" with attributes of the auxiliary coordinate variables may require some additional experience and, surely, the participation of more forecast institutions. We plan to make available in the coming days a Thredds server with examples of multi-forecast system files using the criteria described above (with the standard_name attribute). This might be helpful by providing some examples. Best wishes Paco -- ________________________________________ Francisco J. Doblas-Reyes European Centre for Medium-Range Weather Forecasting (ECMWF) Shinfield Park, RG2 9AX Reading, UK Tel: +44 (0)118 9499 655 Fax: +44 (0)118 9869 450 f.doblas-reyes@ecmwf.int _______________________________________ From rkl at bodc.ac.uk Sat Jan 6 09:20:15 2007 From: rkl at bodc.ac.uk (Roy Lowry) Date: Sat Jan 6 09:21:23 2007 Subject: [CF-metadata] Indicating data lineage or provenance Message-ID: Dear All, This issue is also of great concern to the SeaDataNet project, particularly in the case where multiple operational centres have grabbed a common raw dataset off the GTS and processed it independently creating 'near duplicates', which are difficult to identify. Standardised encoded provenance metadata has occurred to me as a possible solution tothis problem. We all seem to need the same thing, so I think collaboration is the order of the day. Could this be a candidate for a CF Twiki project advertised to other interested communities? Cheers, Roy. >>> John Graybeal 01/05/07 12:47 AM >>> To provide some data in response to Mike's question, and then a question of my own: I, along with Maureen Edwards of the UK, are tasked by OceanSITES with presenting a nominal solution to provenance in netCDF. How far we can get, and how quickly, is definitely TBD, but the notion I have devolves to separate files. (Yes I do hate that, but provenance on a whole mooring system is pretty complicated to put into a netCDF file). So I'd probably suggest a link (URL) from netCDF to a registered SensorML instance (registrations of which are being pursued on another project I'm involved with). Similar to Mike's solution but with important differences. One point being, this is a more general problem than just model provenance. Observation and processing provenance is also desirable to represent in netCDF files. So the question is, how much of this does the CF standard want to take on directly, and how much does it want to defer to other standards or efforts? (No I really didn't put Mike up to this, and he really is only 8 doors from me. But neither of us knew...) John At 4:31 PM -0800 1/4/07, Godin, Michael wrote: >Content-class: urn:content-classes:message >Content-Type: multipart/alternative; > boundary="=_reb-r50C4DCF4-t459D9D0C" > >I am heartened by all the work this group has put into standardizing the metadata for representing multiple models as an ensemble. However, a particularly thorny issue has been for the most part ignored (I think it has been called a "nightmare"), so I'd like to see if some of the list participants would be willing to work together to form a proposal for indicating the provenance of derived data (for example, initial conditions, larger nested grids, and assimilated data that go into models). > >So here are the (draft) requirements that I believe need to be addressed: >- derived data users need to be provided the information they need to understand the differences between data (covering the same temporal/spatial region) from different models and different realizations of the same model. >- skeptics (public, governmental, other modelers, observationalists) should be able to request specific observational data that went into a model realization (granted, the request may be for data that would not otherwise be made publicly available). >- the specification of source data should not only indicate the source data files (or URLs) and variables, but also the temporal/spatial/realization bounds on the supplied data. > >I don't know if such a set of requirements can be addressed in a netCDF file, or if it would require a link to an external XML (or other format) file. I am also unsure if any other community has solved the above set of requirements - both the OGC's Layer definition within their Web Map Context Document standard, and the FGDC's Lineage definition within their Content Standard for Digital Geospatial Metadata allow one to specify a lot of metadata about lineage and provenance, but neither really meets the requirements above. > >My initial thought for doing this within a netCDF file would be to specify a global multi-line string attribute called something like "lineage" or "provenance" and populate it with a series of DAP2.0-like URIs (of course, this would not be global in the case of ensembles -- it would have to be a 3D set of strings!). The DAP2.0 URIs would not have to be publicly accessible, and the syntax would have to allow combinations of hyperslab operators and queries -- which I do not believe any DAP server actually allows -- but would allow one to specify precise data ranges. > >Thanks for your consideration, >Mike > >_____________________________________________ > >Michael A. Godin > >Software Engineer > >Monterey Bay Aquarium Research Institute > >Phone: 831-775-2063 http://www.mbari.org > > > >_______________________________________________ >CF-metadata mailing list >CF-metadata@cgd.ucar.edu >http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- ---------- John Graybeal -- 831-775-1956 Monterey Bay Aquarium Research Institute Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From j.m.gregory at reading.ac.uk Sat Jan 6 10:32:46 2007 From: j.m.gregory at reading.ac.uk (Jonathan Gregory) Date: Sat Jan 6 10:32:53 2007 Subject: [CF-metadata] aerosol and chemistry names - continuation Message-ID: <20070106173246.GA27027@met.reading.ac.uk> Dear Christiane > The issue of X_expressed_as_Y seams to be settled: X=Y is the standard, > and the extension _expressed_as_Y it is only added if X\=Y. > The name particulate_organic matter seams to be accepted as well. Yes. Thank you for carrying this debate through to a happy conclusion. > If there are no further objections, I would like to move the following > quantities from the proposed to the almost-accepted table: > > surface_dry_deposition_mass_flux_of_X > surface_wet_deposition_mass_flux_of_X > mass_fraction_of_X_dry_aerosol_in_air > mole_fraction_of_dimethyl_sulfide_in_air > chemical_gross_production_rate_of_mole_concentration_of_X > chemical_gross_destruction_rate_of_mole_concentration_of_X They all look fine to me. > atmosphere_emission means a source within the atmosphere, e.g. from the > surface or from an air plane. Ah, OK. Why is the surface included? You have separate surface fluxes. > atmosphere_production means production within the atmosphere, this > includes direct sources and chemical production from precursors. So production = emission (from sources) + chemical net production It seems to me potentially confusing to have production in these two different senses on the left and right of the equation. What about saying "addition" on the left e.g. NOx is added to the atmosphere by emission from aircraft and by chemical production. The opposite of "addition" might be "removal", which comes about by deposition and chemical destruction. Do we also need to distinguish gross and net addition? > re-emission refers to the source of a pollutant that is not directly > emitted by human activities, but re-emitted after previously being > deposited and accumulated in soils or water. I see. Is reemission included in emission? This might be a source of confusion. > These names were understood in our community (for the HTAP experiments), > are they not clear enough? Definitions help, of course, thanks. However it might be that wrong guesses by the ignorant could be reduced. > water_in_ambient_aerosol_optical_depth follows this systematic, it is > the optical depth due to the water contained in aerosol. Sorry to revisit this. That is what I understood it to mean, but I didn't believe it! I think that this long phrase is quite difficult to parse, and it would be easier to understand optical_depth_due_to_water_in_ambient_aerosol Perhaps, as with named surfaces, it might be acceptable (if not too complicated) to say X_optical_depth if X is one word (since that is convenient and what people usually say e.g. for cloud and aerosol), and optical_depth_due_to_X if X is several words (to make it easier to understand). But the existing standard_name is atmosphere_optical_thickness_due_to_aerosol - not depth. Optical depth and thickness are both in the American Met Soc glossary, for instance. Optical depth means the optical thickness above some specified altitude (the idea of depth being that one is looking from above), and optical thickness is along any path. I suppose that by saying "atmosphere" we are defining the path to be the whole atmosphere, so they are synonymous. A more general quantity, such as the existing standard name optical_thickness_of_atmosphere_layer_due_to_aerosol should be thickness, not depth. Hence I still have a preference for thickness. Best wishes Jonathan From graybeal at mbari.org Sat Jan 6 19:04:46 2007 From: graybeal at mbari.org (John Graybeal) Date: Sat Jan 6 19:05:04 2007 Subject: [CF-metadata] Indicating data lineage or provenance In-Reply-To: References: Message-ID: Roy, Based on our experience so far with provenance-aware data systems, I suspect it is a very good (read: powerful) solution to this problem. There are multiple standards that encode provenance information. Would the CF project support an evaluation of the application of those standards; or are you looking for an embedded (into netCDF) solution; or is that a question to be discussed on the TWiki? Note also that one aspect of Mike's requirement, namely referencing a subset of a data set, is not so fully addressed (that I know of); participants of a recent AGU session hopes to kick off a discussion on this topic. But we have imagined some reasonably effective approaches using existing encoding standards. John At 4:20 PM +0000 1/6/07, Roy Lowry wrote: >Dear All, > >This issue is also of great concern to the SeaDataNet project, particularly in the case where multiple operational centres have grabbed a common raw dataset off the GTS and processed it independently creating 'near duplicates', which are difficult to identify. Standardised encoded provenance metadata has occurred to me as a possible solution tothis problem. > >We all seem to need the same thing, so I think collaboration is the order of the day. Could this be a candidate for a CF Twiki project advertised to other interested communities? > >Cheers, Roy. > >>>> John Graybeal 01/05/07 12:47 AM >>> >To provide some data in response to Mike's question, and then a question of my own: > >I, along with Maureen Edwards of the UK, are tasked by OceanSITES with presenting a nominal solution to provenance in netCDF. How far we can get, and how quickly, is definitely TBD, but the notion I have devolves to separate files. (Yes I do hate that, but provenance on a whole mooring system is pretty complicated to put into a netCDF file). So I'd probably suggest a link (URL) from netCDF to a registered SensorML instance (registrations of which are being pursued on another project I'm involved with). Similar to Mike's solution but with important differences. > >One point being, this is a more general problem than just model provenance. Observation and processing provenance is also desirable to represent in netCDF files. > >So the question is, how much of this does the CF standard want to take on directly, and how much does it want to defer to other standards or efforts? > >(No I really didn't put Mike up to this, and he really is only 8 doors from me. But neither of us knew...) > >John > >At 4:31 PM -0800 1/4/07, Godin, Michael wrote: >>Content-class: urn:content-classes:message >>Content-Type: multipart/alternative; >> boundary="=_reb-r50C4DCF4-t459D9D0C" >> >>I am heartened by all the work this group has put into standardizing the metadata for representing multiple models as an ensemble. However, a particularly thorny issue has been for the most part ignored (I think it has been called a "nightmare"), so I'd like to see if some of the list participants would be willing to work together to form a proposal for indicating the provenance of derived data (for example, initial conditions, larger nested grids, and assimilated data that go into models). >> >>So here are the (draft) requirements that I believe need to be addressed: >>- derived data users need to be provided the information they need to understand the differences between data (covering the same temporal/spatial region) from different models and different realizations of the same model. >>- skeptics (public, governmental, other modelers, observationalists) should be able to request specific observational data that went into a model realization (granted, the request may be for data that would not otherwise be made publicly available). >>- the specification of source data should not only indicate the source data files (or URLs) and variables, but also the temporal/spatial/realization bounds on the supplied data. >> >>I don't know if such a set of requirements can be addressed in a netCDF file, or if it would require a link to an external XML (or other format) file. I am also unsure if any other community has solved the above set of requirements - both the OGC's Layer definition within their Web Map Context Document standard, and the FGDC's Lineage definition within their Content Standard for Digital Geospatial Metadata allow one to specify a lot of metadata about lineage and provenance, but neither really meets the requirements above. > > >>My initial thought for doing this within a netCDF file would be to specify a global multi-line string attribute called something like "lineage" or "provenance" and populate it with a series of DAP2.0-like URIs (of course, this would not be global in the case of ensembles -- it would have to be a 3D set of strings!). The DAP2.0 URIs would not have to be publicly accessible, and the syntax would have to allow combinations of hyperslab operators and queries -- which I do not believe any DAP server actually allows -- but would allow one to specify precise data ranges. >> >>Thanks for your consideration, >>Mike >> >>_____________________________________________ >> >>Michael A. Godin >> >>Software Engineer >> >>Monterey Bay Aquarium Research Institute >> >>Phone: 831-775-2063 http://www.mbari.org >> >> >> >>_______________________________________________ >>CF-metadata mailing list >>CF-metadata@cgd.ucar.edu >>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > >-- >---------- >John Graybeal -- 831-775-1956 >Monterey Bay Aquarium Research Institute >Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds >_______________________________________________ >CF-metadata mailing list >CF-metadata@cgd.ucar.edu >http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > >-- >This message (and any attachments) is for the recipient only. NERC >is subject to the Freedom of Information Act 2000 and the contents >of this email and any reply you make may be disclosed by NERC unless >it is exempt from release under the Act. Any material supplied to >NERC may be stored in an electronic records management system. > > >_______________________________________________ >CF-metadata mailing list >CF-metadata@cgd.ucar.edu >http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- ---------- John Graybeal -- 831-775-1956 Monterey Bay Aquarium Research Institute Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds From rkl at bodc.ac.uk Sun Jan 7 09:36:40 2007 From: rkl at bodc.ac.uk (Roy Lowry) Date: Sun Jan 7 09:37:34 2007 Subject: [CF-metadata] Indicating data lineage or provenance Message-ID: Hi John, My primary concern is that there is communication so we get a single solution, not yet another set of 'near duplicates'. Documentation and evaluation of what exists in an open forum like the CF Trac Twiki or an area on the MMI site would seem an excellent way to achieve this. As far as SeaDataNet is concerned, a model that could either be implemented both within NetCDF or as an XML document would be required as the project uses multiple protocols. Cheers, Roy. >>> John Graybeal 01/07/07 2:04 AM >>> Roy, Based on our experience so far with provenance-aware data systems, I suspect it is a very good (read: powerful) solution to this problem. There are multiple standards that encode provenance information. Would the CF project support an evaluation of the application of those standards; or are you looking for an embedded (into netCDF) solution; or is that a question to be discussed on the TWiki? Note also that one aspect of Mike's requirement, namely referencing a subset of a data set, is not so fully addressed (that I know of); participants of a recent AGU session hopes to kick off a discussion on this topic. But we have imagined some reasonably effective approaches using existing encoding standards. John At 4:20 PM +0000 1/6/07, Roy Lowry wrote: >Dear All, > >This issue is also of great concern to the SeaDataNet project, particularly in the case where multiple operational centres have grabbed a common raw dataset off the GTS and processed it independently creating 'near duplicates', which are difficult to identify. Standardised encoded provenance metadata has occurred to me as a possible solution tothis problem. > >We all seem to need the same thing, so I think collaboration is the order of the day. Could this be a candidate for a CF Twiki project advertised to other interested communities? > >Cheers, Roy. > >>>> John Graybeal 01/05/07 12:47 AM >>> >To provide some data in response to Mike's question, and then a question of my own: > >I, along with Maureen Edwards of the UK, are tasked by OceanSITES with presenting a nominal solution to provenance in netCDF. How far we can get, and how quickly, is definitely TBD, but the notion I have devolves to separate files. (Yes I do hate that, but provenance on a whole mooring system is pretty complicated to put into a netCDF file). So I'd probably suggest a link (URL) from netCDF to a registered SensorML instance (registrations of which are being pursued on another project I'm involved with). Similar to Mike's solution but with important differences. > >One point being, this is a more general problem than just model provenance. Observation and processing provenance is also desirable to represent in netCDF files. > >So the question is, how much of this does the CF standard want to take on directly, and how much does it want to defer to other standards or efforts? > >(No I really didn't put Mike up to this, and he really is only 8 doors from me. But neither of us knew...) > >John > >At 4:31 PM -0800 1/4/07, Godin, Michael wrote: >>Content-class: urn:content-classes:message >>Content-Type: multipart/alternative; >> boundary="=_reb-r50C4DCF4-t459D9D0C" >> >>I am heartened by all the work this group has put into standardizing the metadata for representing multiple models as an ensemble. However, a particularly thorny issue has been for the most part ignored (I think it has been called a "nightmare"), so I'd like to see if some of the list participants would be willing to work together to form a proposal for indicating the provenance of derived data (for example, initial conditions, larger nested grids, and assimilated data that go into models). >> >>So here are the (draft) requirements that I believe need to be addressed: >>- derived data users need to be provided the information they need to understand the differences between data (covering the same temporal/spatial region) from different models and different realizations of the same model. >>- skeptics (public, governmental, other modelers, observationalists) should be able to request specific observational data that went into a model realization (granted, the request may be for data that would not otherwise be made publicly available). >>- the specification of source data should not only indicate the source data files (or URLs) and variables, but also the temporal/spatial/realization bounds on the supplied data. >> >>I don't know if such a set of requirements can be addressed in a netCDF file, or if it would require a link to an external XML (or other format) file. I am also unsure if any other community has solved the above set of requirements - both the OGC's Layer definition within their Web Map Context Document standard, and the FGDC's Lineage definition within their Content Standard for Digital Geospatial Metadata allow one to specify a lot of metadata about lineage and provenance, but neither really meets the requirements above. > > >>My initial thought for doing this within a netCDF file would be to specify a global multi-line string attribute called something like "lineage" or "provenance" and populate it with a series of DAP2.0-like URIs (of course, this would not be global in the case of ensembles -- it would have to be a 3D set of strings!). The DAP2.0 URIs would not have to be publicly accessible, and the syntax would have to allow combinations of hyperslab operators and queries -- which I do not believe any DAP server actually allows -- but would allow one to specify precise data ranges. >> >>Thanks for your consideration, >>Mike >> >>_____________________________________________ >> >>Michael A. Godin >> >>Software Engineer >> >>Monterey Bay Aquarium Research Institute >> >>Phone: 831-775-2063 http://www.mbari.org >> >> >> >>_______________________________________________ >>CF-metadata mailing list >>CF-metadata@cgd.ucar.edu >>http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > >-- >---------- >John Graybeal -- 831-775-1956 >Monterey Bay Aquarium Research Institute >Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds >_______________________________________________ >CF-metadata mailing list >CF-metadata@cgd.ucar.edu >http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > >-- >This message (and any attachments) is for the recipient only. NERC >is subject to the Freedom of Information Act 2000 and the contents >of this email and any reply you make may be disclosed by NERC unless >it is exempt from release under the Act. Any material supplied to >NERC may be stored in an electronic records management system. > > >_______________________________________________ >CF-metadata mailing list >CF-metadata@cgd.ucar.edu >http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- ---------- John Graybeal -- 831-775-1956 Monterey Bay Aquarium Research Institute Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From jdb at mail.nerc-essc.ac.uk Mon Jan 8 03:31:02 2007 From: jdb at mail.nerc-essc.ac.uk (Jon Blower) Date: Mon Jan 8 03:31:06 2007 Subject: [CF-metadata] CF-metadata Digest, Vol 46, Issue 1 In-Reply-To: <1f31dac10701040832s713f430fg20b827ccd04c90f2@mail.gmail.com> References: <1f31dac10701040832s713f430fg20b827ccd04c90f2@mail.gmail.com> Message-ID: <2bb6ee950701080231r71b0d412s33f36cb750aba67b@mail.gmail.com> Dear all, I think there are at least two types of min/max attributes that one could associate with a particular variable in a netCDF file: 1) The *actual* min and max of the variable's values in that particular file (not including missing values). Perhaps stored as data_min and data_max? Useful for both visualisation and data mining (as Bryan pointed out). 2) If the netCDF file is part of a larger dataset (e.g. it represents one timestep in a forecast sequence) it would be useful to store a "reasonable" (or actual, if possible) min and max for the *whole dataset* (taking into account the measurement method used, geographical region etc). Perhaps stored as dataset_min and dataset_max? The latter pair of attributes would really help when creating visualisations of timeseries across multiple netCDF files and could be a "soft" range: i.e. it could be possible for some data to lie outside this range (for a forecast dataset it would be impossible to define the whole dataset's min and max in advance unless it were defined to be much wider than is really useful for visualisation). I think that all viz tools should highlight data that are "out of range": it seems to me to be a serious flaw to do otherwise as it hides potentially vital data. If a file specifies neither pair of attributes, perhaps there could be a community dictionary of suggested ranges per standard name, but this is flawed for reasons discussed on this thread. Might be better than nothing though. Regards, Jon On 1/4/07, David Stuebe wrote: > Re: attributes for min/max data values for visualization > (Chris Webster) > > > This is a very important point that Chris has made regarding the need for > min/max data in visualization. For my work with visualization of FVCOM > unstructured data, I have only encountered this as an issue while working > with multi-domain data sets. Since I have multiple files, one for each > domain plus a master file, I only store min/max data in the master file. > > I have found that the min/max data are only useful in this context, where > certain data may not be useful based on its range and can therefore be > neglected without reading from disk. My highly >>non<< standard file > structure for multi-domain data is to break apart a single file into > seperate files which are identical to the original and complete for > visualization of that subdomain plus its ghost zones. The master file then > contains the data reguarding which cells are ghost cells and the min and max > (spatial and data extents) for each sub-domain. This information can be > checked first when the user requests a particular plot from the > visualization program. > > Before we get into details like naming schemes for min max values, what are > the contexts in which this optional min/max data are useful? This will be an > important determining factor in how it is stored. > > David > > > > _______________________________________________ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > > -- -------------------------------------------------------------- Dr Jon Blower Tel: +44 118 378 5213 (direct line) Technical Director Tel: +44 118 378 8741 (ESSC) Reading e-Science Centre Fax: +44 118 378 6413 ESSC Email: jdb@mail.nerc-essc.ac.uk University of Reading 3 Earley Gate Reading RG6 6AL, UK -------------------------------------------------------------- From julian.hill at metoffice.gov.uk Mon Jan 8 04:15:34 2007 From: julian.hill at metoffice.gov.uk (Hill, Julian) Date: Mon Jan 8 04:15:39 2007 Subject: [CF-metadata] CF-metadata Digest, Vol 46, Issue 1 In-Reply-To: <2bb6ee950701080231r71b0d412s33f36cb750aba67b@mail.gmail.com> References: <1f31dac10701040832s713f430fg20b827ccd04c90f2@mail.gmail.com> <2bb6ee950701080231r71b0d412s33f36cb750aba67b@mail.gmail.com> Message-ID: <1168254934.4849.51.camel@eld451.desktop.frd.metoffice.com> Dear Jon, An addition to this would also be a theoretical max and min. Where theory indicates that all valid values lie within a specific range we could add this to the attributes. I.e. probabilities lying between 0 and 1 etc. Kind regards Julian On Mon, 2007-01-08 at 10:31 +0000, Jon Blower wrote: > Dear all, > > I think there are at least two types of min/max attributes that one > could associate with a particular variable in a netCDF file: > > 1) The *actual* min and max of the variable's values in that > particular file (not including missing values). Perhaps stored as > data_min and data_max? Useful for both visualisation and data mining > (as Bryan pointed out). > > 2) If the netCDF file is part of a larger dataset (e.g. it represents > one timestep in a forecast sequence) it would be useful to store a > "reasonable" (or actual, if possible) min and max for the *whole > dataset* (taking into account the measurement method used, > geographical region etc). Perhaps stored as dataset_min and > dataset_max? > > The latter pair of attributes would really help when creating > visualisations of timeseries across multiple netCDF files and could be > a "soft" range: i.e. it could be possible for some data to lie outside > this range (for a forecast dataset it would be impossible to define > the whole dataset's min and max in advance unless it were defined to > be much wider than is really useful for visualisation). I think that > all viz tools should highlight data that are "out of range": it seems > to me to be a serious flaw to do otherwise as it hides potentially > vital data. > > If a file specifies neither pair of attributes, perhaps there could be > a community dictionary of suggested ranges per standard name, but this > is flawed for reasons discussed on this thread. Might be better than > nothing though. > > Regards, Jon > > On 1/4/07, David Stuebe wrote: > > Re: attributes for min/max data values for visualization > > (Chris Webster) > > > > > > This is a very important point that Chris has made regarding the need for > > min/max data in visualization. For my work with visualization of FVCOM > > unstructured data, I have only encountered this as an issue while working > > with multi-domain data sets. Since I have multiple files, one for each > > domain plus a master file, I only store min/max data in the master file. > > > > I have found that the min/max data are only useful in this context, where > > certain data may not be useful based on its range and can therefore be > > neglected without reading from disk. My highly >>non<< standard file > > structure for multi-domain data is to break apart a single file into > > seperate files which are identical to the original and complete for > > visualization of that subdomain plus its ghost zones. The master file then > > contains the data reguarding which cells are ghost cells and the min and max > > (spatial and data extents) for each sub-domain. This information can be > > checked first when the user requests a particular plot from the > > visualization program. > > > > Before we get into details like naming schemes for min max values, what are > > the contexts in which this optional min/max data are useful? This will be an > > important determining factor in how it is stored. > > > > David > > > > > > > > _______________________________________________ > > CF-metadata mailing list > > CF-metadata@cgd.ucar.edu > > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > > > > > > > -- Dr Julian Hill Marine Data Research Scientist Met Office Hadley Centre for Climate Prediction and Research FitzRoy Road Exeter EX1 3PB United Kingdom Tel: +44 (0)1392 884278 Fax: +44(0)1392 885681 Datasets are available from http://www.hadobs.org From jamie.kettleborough at metoffice.gov.uk Mon Jan 8 04:17:03 2007 From: jamie.kettleborough at metoffice.gov.uk (Kettleborough, Jamie) Date: Mon Jan 8 04:17:24 2007 Subject: [CF-metadata] Getting back to ensembles In-Reply-To: <459E22E3.2090405@ecmwf.int> References: <20061223210347.GA13158@met.reading.ac.uk> <459E22E3.2090405@ecmwf.int> Message-ID: <1168255023.2731.57.camel@eld408.desktop.frd.metoffice.com> Hello Paco, how are you going to label these variables in the NetCDF file? - as per your original proposal, or something different? We are going to want to start to producing ensemble based files fairly soon, probably in the late spring or early summer. We (CF community) seem to be a bit stuck at the moment on how to represent these in a CF way - whether to use 'standard_name', 'structure_element', or 'standard_metadata'. Given the current timescales I think we may have to produce files with support for ensemble meta data that is not covered by CF - our (in practice BADC's) tooling will have to support that extension. Any advice on this welcome - is this a reasonable approach? Of course the Met Office Ensemble data and BADC data delivery tools could be seen as a possible test implementation for a representation of ensembles. Jamie (please don't read this as a sulk that the meta data issue has not been resolved - I just want to try and be pragmatic and future proof ourselves so when we have figured out how to represent ensembles within CF we can do it with minimal effort). On Fri, 2007-01-05 at 10:05 +0000, Francisco Doblas-Reyes wrote: > We intend to make the string-valued metadata self-describing, and a > web > site will be provided with additional information. The use of a table > specifying a string-valued "vocabulary" with attributes of the > auxiliary > coordinate variables may require some additional experience and, > surely, > the participation of more forecast institutions. From Francisco.Doblas-Reyes at ecmwf.int Mon Jan 8 05:13:47 2007 From: Francisco.Doblas-Reyes at ecmwf.int (Francisco Doblas-Reyes) Date: Mon Jan 8 05:13:51 2007 Subject: [CF-metadata] Getting back to ensembles In-Reply-To: <1168255023.2731.57.camel@eld408.desktop.frd.metoffice.com> References: <20061223210347.GA13158@met.reading.ac.uk> <459E22E3.2090405@ecmwf.int> <1168255023.2731.57.camel@eld408.desktop.frd.metoffice.com> Message-ID: <45A2357B.8030604@ecmwf.int> Hi Jamie, Following the pragmatic option, we have already started producing some files. A preliminary set using DEMETER seasonal hindcasts has been stored in the ENSEMBLES OPeNDAP server: http://ensembles.ecmwf.int/thredds/catalogServices?catalog=http://ensembles.ecmwf.int/thredds/variables.xml However, I feel happier with the headers of a slightly different set of ensemble seasonal and interannual forecasts that will be stored very soon in the same server. An example of the typical headers can be found in the attachment. Of course, I used a minimal number of variables to describe the metadata. More variables might be needed in a true operational setup, as I suggested in my original posting. Until the Met Office and BADC make their files available, I'm more than happy to carry on working on the ENSEMBLES OPeNDAP server at ECMWF and make any changes that will be necessary following the discussions in the CF list. Furthermore, we already have some users that can test the files I'm producing and give feedback. Best regards, Paco Kettleborough, Jamie wrote: > Hello Paco, > > how are you going to label these variables in the NetCDF file? - as per > your original proposal, or something different? > > We are going to want to start to producing ensemble based files fairly > soon, probably in the late spring or early summer. We (CF community) > seem to be a bit stuck at the moment on how to represent these in a CF > way - whether to use 'standard_name', 'structure_element', or > 'standard_metadata'. Given the current timescales I think we may have > to produce files with support for ensemble meta data that is not covered > by CF - our (in practice BADC's) tooling will have to support that > extension. Any advice on this welcome - is this a reasonable > approach? > > Of course the Met Office Ensemble data and BADC data delivery tools > could be seen as a possible test implementation for a representation of > ensembles. > > Jamie > > (please don't read this as a sulk that the meta data issue has not been > resolved - I just want to try and be pragmatic and future proof > ourselves so when we have figured out how to represent ensembles within > CF we can do it with minimal effort). > > On Fri, 2007-01-05 at 10:05 +0000, Francisco Doblas-Reyes wrote: >> We intend to make the string-valued metadata self-describing, and a >> web >> site will be provided with additional information. The use of a table >> specifying a string-valued "vocabulary" with attributes of the >> auxiliary >> coordinate variables may require some additional experience and, >> surely, >> the participation of more forecast institutions. > _______________________________________________ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata -- ________________________________________ Francisco J. Doblas-Reyes European Centre for Medium-Range Weather Forecasting (ECMWF) Shinfield Park, RG2 9AX Reading, UK Tel: +44 (0)118 9499 655 Fax: +44 (0)118 9869 450 f.doblas-reyes@ecmwf.int _______________________________________ -------------- next part -------------- netcdf MM_129_mon_2001 { dimensions: longitude = 144 ; latitude = 71 ; level = 5 ; time = 21 ; time_bnd = 2 ; ensemble = 36 ; string4 = 4 ; string15 = 15 ; string50 = 50 ; variables: float longitude(longitude) ; longitude:data_type = "float" ; longitude:units = "degrees_east" ; longitude:axis = "X" ; longitude:standard_name = "longitude" ; longitude:topology = "circular" ; longitude:modulo = 360 ; longitude:valid_min = 0. ; longitude:valid_max = 359. ; float latitude(latitude) ; latitude:data_type = "float" ; latitude:units = "degrees_north" ; latitude:axis = "Y" ; latitude:standard_name = "latitude" ; latitude:valid_min = -89. ; latitude:valid_max = 89. ; float reftime(time) ; reftime:units = "days since 1950-01-01 00:00:00" ; reftime:standard_name = "forecast_reference_time" ; reftime:long_name = "forecast reference time" ; int leadtime(time) ; leadtime:units = "days" ; leadtime:standard_name = "forecast_period" ; leadtime:long_name = "Time elapsed since the start of the forecast" ; leadtime:bounds = "time_bnd" ; int time_bnd(time, time_bnd) ; time_bnd:units = "days" ; int realization(ensemble) ; realization:standard_name = "realization" ; realization:long_name = "Number of the simulation in the ensemble" ; char experiment_id(ensemble, string4) ; experiment_id:standard_name = "experiment_id" ; experiment_id:long_name = "Experiment identifier" ; char source(ensemble, string50) ; source:standard_name = "source" ; source:long_name = "Method of production of the data" ; char institution(ensemble, string15) ; institution:standard_name = "institution" ; institution:long_name = "Institution responsible for the forecast system" ; float sc ; sc:data_type = "float" ; sc:units = "m" ; sc:axis = "Z" ; sc:standard_name = "height" ; sc:positive = "up" ; float level(level) ; level:data_type = "float" ; level:units = "hPa" ; level:axis = "Z" ; level:standard_name = "air_pressure" ; level:positive = "up" ; float geopotential(time, ensemble, level, latitude, longitude) ; geopotential:data_type = "float" ; geopotential:units = "m2 s-2" ; geopotential:unit_long = "square_meter_per_square_second" ; geopotential:standard_name = "geopotential" ; geopotential:long_name = "geopotential" ; geopotential:cell_methods = "leadtime: mean (interval 1 day)" ; geopotential:coordinates = "reftime leadtime experiment_id source realization institution" ; geopotential:_FillValue = 1.e+12f ; // global attributes: :Conventions = "CF-1.0" ; :Generator = "SeasPy v1.1" ; :Created = "Mon Dec 18 12:57:27 2006" ; :Title = "ENSEMBLES project" ; :References = "http://www.ecmwf.int/research/EU_projects/ENSEMBLES/index.html, http://www.ecmwf.int/research/EU_projects/ENSEMBLES/experiments/index.html" ; :Comment = "Data interpolated from original model grid into a regular grid. Data restrictions: none" ; data: longitude = 0, 2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20, 22.5, 25, 27.5, 30, 32.5, 35, 37.5, 40, 42.5, 45, 47.5, 50, 52.5, 55, 57.5, 60, 62.5, 65, 67.5, 70, 72.5, 75, 77.5, 80, 82.5, 85, 87.5, 90, 92.5, 95, 97.5, 100, 102.5, 105, 107.5, 110, 112.5, 115, 117.5, 120, 122.5, 125, 127.5, 130, 132.5, 135, 137.5, 140, 142.5, 145, 147.5, 150, 152.5, 155, 157.5, 160, 162.5, 165, 167.5, 170, 172.5, 175, 177.5, 180, 182.5, 185, 187.5, 190, 192.5, 195, 197.5, 200, 202.5, 205, 207.5, 210, 212.5, 215, 217.5, 220, 222.5, 225, 227.5, 230, 232.5, 235, 237.5, 240, 242.5, 245, 247.5, 250, 252.5, 255, 257.5, 260, 262.5, 265, 267.5, 270, 272.5, 275, 277.5, 280, 282.5, 285, 287.5, 290, 292.5, 295, 297.5, 300, 302.5, 305, 307.5, 310, 312.5, 315, 317.5, 320, 322.5, 325, 327.5, 330, 332.5, 335, 337.5, 340, 342.5, 345, 347.5, 350, 352.5, 355, 357.5 ; latitude = 87.5, 85, 82.5, 80, 77.5, 75, 72.5, 70, 67.5, 65, 62.5, 60, 57.5, 55, 52.5, 50, 47.5, 45, 42.5, 40, 37.5, 35, 32.5, 30, 27.5, 25, 22.5, 20, 17.5, 15, 12.5, 10, 7.5, 5, 2.5, 0, -2.5, -5, -7.5, -10, -12.5, -15, -17.5, -20, -22.5, -25, -27.5, -30, -32.5, -35, -37.5, -40, -42.5, -45, -47.5, -50, -52.5, -55, -57.5, -60, -62.5, -65, -67.5, -70, -72.5, -75, -77.5, -80, -82.5, -85, -87.5 ; reftime = 18748, 18748, 18748, 18748, 18748, 18748, 18748, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932 ; leadtime = 15, 46, 76, 107, 138, 168, 199, 15, 45, 76, 107, 135, 166, 196, 227, 257, 288, 319, 349, 380, 410 ; time_bnd = 0, 31, 31, 61, 61, 92, 92, 123, 123, 153, 153, 184, 184, 214, 0, 30, 30, 61, 61, 92, 92, 120, 120, 151, 151, 181, 181, 212, 212, 242, 242, 273, 273, 304, 304, 334, 334, 365, 365, 395, 395, 426 ; realization = 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5, 6, 7, 8 ; experiment_id = "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1004", "1000", "1000", "1000", "1000", "1000", "1000", "1000", "1000", "1000", "1001", "1001", "1001", "1001", "1001", "1001", "1001", "1001", "1001" ; source = "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 1 ", "IFS, System 1, Method 10 ", "IFS, System 1, Method 10 ", "IFS, System 1, Method 10 ", "IFS, System 1, Method 10 ", "IFS, System 1, Method 10 ", "IFS, System 1, Method 10 ", "IFS, System 1, Method 10 ", "IFS, System 1, Method 10 ", "IFS, System 1, Method 10 ", "IFS, System 0, Method 1 ", "IFS, System 0, Method 1 ", "IFS, System 0, Method 1 ", "IFS, System 0, Method 1 ", "IFS, System 0, Method 1 ", "IFS, System 0, Method 1 ", "IFS, System 0, Method 1 ", "IFS, System 0, Method 1 ", "IFS, System 0, Method 1 " ; institution = "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF ", "ECMWF " ; level = 850, 500, 200, 100, 50 ; geopotential = From jamie.kettleborough at metoffice.gov.uk Mon Jan 8 05:40:28 2007 From: jamie.kettleborough at metoffice.gov.uk (Kettleborough, Jamie) Date: Mon Jan 8 05:40:34 2007 Subject: [CF-metadata] Getting back to ensembles In-Reply-To: <45A2357B.8030604@ecmwf.int> References: <20061223210347.GA13158@met.reading.ac.uk> <459E22E3.2090405@ecmwf.int> <1168255023.2731.57.camel@eld408.desktop.frd.metoffice.com> <45A2357B.8030604@ecmwf.int> Message-ID: <1168260028.10076.18.camel@eld408.desktop.frd.metoffice.com> Hello Paco, thanks for giving us sight of these - do you care that they are not, strictly speaking, CF compliant as they use standard_names (like source, etc) that are not yet in the standard_name list? Should the standard_names just be dropped for now and rely on the long_names? Though I think this is all tied up with process (apologies for mixing threads). If we are going to have 'test files' and test implementations don't we need some way of flagging the bits in the files that are provisional or being tested? Jamie On Mon, 2007-01-08 at 12:13 +0000, Francisco Doblas-Reyes wrote: > Hi Jamie, > > Following the pragmatic option, we have already started producing some > files. A preliminary set using DEMETER seasonal hindcasts has been > stored in the ENSEMBLES OPeNDAP server: > http://ensembles.ecmwf.int/thredds/catalogServices?catalog=http://ensembles.ecmwf.int/thredds/variables.xml > > However, I feel happier with the headers of a slightly different set of > ensemble seasonal and interannual forecasts that will be stored very > soon in the same server. An example of the typical headers can be found > in the attachment. Of course, I used a minimal number of variables to > describe the metadata. More variables might be needed in a true > operational setup, as I suggested in my original posting. > > Until the Met Office and BADC make their files available, I'm more than > happy to carry on working on the ENSEMBLES OPeNDAP server at ECMWF and > make any changes that will be necessary following the discussions in the > CF list. Furthermore, we already have some users that can test the files > I'm producing and give feedback. > > Best regards, > Paco > > > > Kettleborough, Jamie wrote: > > Hello Paco, > > > > how are you going to label these variables in the NetCDF file? - as per > > your original proposal, or something different? > > > > We are going to want to start to producing ensemble based files fairly > > soon, probably in the late spring or early summer. We (CF community) > > seem to be a bit stuck at the moment on how to represent these in a CF > > way - whether to use 'standard_name', 'structure_element', or > > 'standard_metadata'. Given the current timescales I think we may have > > to produce files with support for ensemble meta data that is not covered > > by CF - our (in practice BADC's) tooling will have to support that > > extension. Any advice on this welcome - is this a reasonable > > approach? > > > > Of course the Met Office Ensemble data and BADC data delivery tools > > could be seen as a possible test implementation for a representation of > > ensembles. > > > > Jamie > > > > (please don't read this as a sulk that the meta data issue has not been > > resolved - I just want to try and be pragmatic and future proof > > ourselves so when we have figured out how to represent ensembles within > > CF we can do it with minimal effort). > > > > On Fri, 2007-01-05 at 10:05 +0000, Francisco Doblas-Reyes wrote: > >> We intend to make the string-valued metadata self-describing, and a > >> web > >> site will be provided with additional information. The use of a table > >> specifying a string-valued "vocabulary" with attributes of the > >> auxiliary > >> coordinate variables may require some additional experience and, > >> surely, > >> the participation of more forecast institutions. > > _______________________________________________ > > CF-metadata mailing list > > CF-metadata@cgd.ucar.edu > > http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata > > plain text document attachment (nc.out) > netcdf MM_129_mon_2001 { > dimensions: > longitude = 144 ; > latitude = 71 ; > level = 5 ; > time = 21 ; > time_bnd = 2 ; > ensemble = 36 ; > string4 = 4 ; > string15 = 15 ; > string50 = 50 ; > variables: > float longitude(longitude) ; > longitude:data_type = "float" ; > longitude:units = "degrees_east" ; > longitude:axis = "X" ; > longitude:standard_name = "longitude" ; > longitude:topology = "circular" ; > longitude:modulo = 360 ; > longitude:valid_min = 0. ; > longitude:valid_max = 359. ; > float latitude(latitude) ; > latitude:data_type = "float" ; > latitude:units = "degrees_north" ; > latitude:axis = "Y" ; > latitude:standard_name = "latitude" ; > latitude:valid_min = -89. ; > latitude:valid_max = 89. ; > float reftime(time) ; > reftime:units = "days since 1950-01-01 00:00:00" ; > reftime:standard_name = "forecast_reference_time" ; > reftime:long_name = "forecast reference time" ; > int leadtime(time) ; > leadtime:units = "days" ; > leadtime:standard_name = "forecast_period" ; > leadtime:long_name = "Time elapsed since the start of the forecast" ; > leadtime:bounds = "time_bnd" ; > int time_bnd(time, time_bnd) ; > time_bnd:units = "days" ; > int realization(ensemble) ; > realization:standard_name = "realization" ; > realization:long_name = "Number of the simulation in the ensemble" ; > char experiment_id(ensemble, string4) ; > experiment_id:standard_name = "experiment_id" ; > experiment_id:long_name = "Experiment identifier" ; > char source(ensemble, string50) ; > source:standard_name = "source" ; > source:long_name = "Method of production of the data" ; > char institution(ensemble, string15) ; > institution:standard_name = "institution" ; > institution:long_name = "Institution responsible for the forecast system" ; > float sc ; > sc:data_type = "float" ; > sc:units = "m" ; > sc:axis = "Z" ; > sc:standard_name = "height" ; > sc:positive = "up" ; > float level(level) ; > level:data_type = "float" ; > level:units = "hPa" ; > level:axis = "Z" ; > level:standard_name = "air_pressure" ; > level:positive = "up" ; > float geopotential(time, ensemble, level, latitude, longitude) ; > geopotential:data_type = "float" ; > geopotential:units = "m2 s-2" ; > geopotential:unit_long = "square_meter_per_square_second" ; > geopotential:standard_name = "geopotential" ; > geopotential:long_name = "geopotential" ; > geopotential:cell_methods = "leadtime: mean (interval 1 day)" ; > geopotential:coordinates = "reftime leadtime experiment_id source realization institution" ; > geopotential:_FillValue = 1.e+12f ; > > // global attributes: > :Conventions = "CF-1.0" ; > :Generator = "SeasPy v1.1" ; > :Created = "Mon Dec 18 12:57:27 2006" ; > :Title = "ENSEMBLES project" ; > :References = "http://www.ecmwf.int/research/EU_projects/ENSEMBLES/index.html, http://www.ecmwf.int/research/EU_projects/ENSEMBLES/experiments/index.html" ; > :Comment = "Data interpolated from original model grid into a regular grid. Data restrictions: none" ; > data: > > longitude = 0, 2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20, 22.5, 25, 27.5, 30, > 32.5, 35, 37.5, 40, 42.5, 45, 47.5, 50, 52.5, 55, 57.5, 60, 62.5, 65, > 67.5, 70, 72.5, 75, 77.5, 80, 82.5, 85, 87.5, 90, 92.5, 95, 97.5, 100, > 102.5, 105, 107.5, 110, 112.5, 115, 117.5, 120, 122.5, 125, 127.5, 130, > 132.5, 135, 137.5, 140, 142.5, 145, 147.5, 150, 152.5, 155, 157.5, 160, > 162.5, 165, 167.5, 170, 172.5, 175, 177.5, 180, 182.5, 185, 187.5, 190, > 192.5, 195, 197.5, 200, 202.5, 205, 207.5, 210, 212.5, 215, 217.5, 220, > 222.5, 225, 227.5, 230, 232.5, 235, 237.5, 240, 242.5, 245, 247.5, 250, > 252.5, 255, 257.5, 260, 262.5, 265, 267.5, 270, 272.5, 275, 277.5, 280, > 282.5, 285, 287.5, 290, 292.5, 295, 297.5, 300, 302.5, 305, 307.5, 310, > 312.5, 315, 317.5, 320, 322.5, 325, 327.5, 330, 332.5, 335, 337.5, 340, > 342.5, 345, 347.5, 350, 352.5, 355, 357.5 ; > > latitude = 87.5, 85, 82.5, 80, 77.5, 75, 72.5, 70, 67.5, 65, 62.5, 60, 57.5, > 55, 52.5, 50, 47.5, 45, 42.5, 40, 37.5, 35, 32.5, 30, 27.5, 25, 22.5, 20, > 17.5, 15, 12.5, 10, 7.5, 5, 2.5, 0, -2.5, -5, -7.5, -10, -12.5, -15, > -17.5, -20, -22.5, -25, -27.5, -30, -32.5, -35, -37.5, -40, -42.5, -45, > -47.5, -50, -52.5, -55, -57.5, -60, -62.5, -65, -67.5, -70, -72.5, -75, > -77.5, -80, -82.5, -85, -87.5 ; > > reftime = 18748, 18748, 18748, 18748, 18748, 18748, 18748, 18932, 18932, > 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, 18932, > 18932, 18932 ; > > leadtime = 15, 46, 76, 107, 138, 168, 199, 15, 45, 76, 107, 135, 166, 196, > 227, 257, 288, 319, 349, 380, 410 ; > > time_bnd = > 0, 31, > 31, 61, > 61, 92, > 92, 123, > 123, 153, > 153, 184, > 184, 214, > 0, 30, > 30, 61, > 61, 92, > 92, 120, > 120, 151, > 151, 181, > 181, 212, > 212, 242, > 242, 273, > 273, 304, > 304, 334, > 334, 365, > 365, 395, > 395, 426 ; > > realization = 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, > 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5, 6, 7, 8 ; > > experiment_id = > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1004", > "1000", > "1000", > "1000", > "1000", > "1000", > "1000", > "1000", > "1000", > "1000", > "1001", > "1001", > "1001", > "1001", > "1001", > "1001", > "1001", > "1001", > "1001" ; > > source = > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 1 ", > "IFS, System 1, Method 10 ", > "IFS, System 1, Method 10 ", > "IFS, System 1, Method 10 ", > "IFS, System 1, Method 10 ", > "IFS, System 1, Method 10 ", > "IFS, System 1, Method 10 ", > "IFS, System 1, Method 10 ", > "IFS, System 1, Method 10 ", > "IFS, System 1, Method 10 ", > "IFS, System 0, Method 1 ", > "IFS, System 0, Method 1 ", > "IFS, System 0, Method 1 ", > "IFS, System 0, Method 1 ", > "IFS, System 0, Method 1 ", > "IFS, System 0, Method 1 ", > "IFS, System 0, Method 1 ", > "IFS, System 0, Method 1 ", > "IFS, System 0, Method 1 " ; > > institution = > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF ", > "ECMWF " ; > > level = 850, 500, 200, 100, 50 ; > > geopotential = From Francisco.Doblas-Reyes at ecmwf.int Mon Jan 8 06:20:20 2007 From: Francisco.Doblas-Reyes at ecmwf.int (Francisco Doblas-Reyes) Date: Mon Jan 8 06:20:24 2007 Subject: [CF-metadata] Getting back to ensembles In-Reply-To: <1168260028.10076.18.camel@eld408.desktop.frd.metoffice.com> References: <20061223210347.GA13158@met.reading.ac.uk> <459E22E3.2090405@ecmwf.int> <1168255023.2731.57.camel@eld408.desktop.frd.metoffice.com> <45A2357B.8030604@ecmwf.int> <1168260028.10076.18.camel@eld408.desktop.frd.metoffice.com> Message-ID: <45A24514.5000702@ecmwf.int> Hi Jamie, Thanks for the comments. I remember the discussion on the need to flagging provisional files. However, we haven't advertised this server for official dissemination, so I didn't think the flagging was necessary at this stage. We use the OPeNDAP server only for testing (for instance, to check which software is able to access, read and display 5-dim fields from an aggregated server). That is also the reason why I decided to keep the standard_names of the variables not yet included in the list, so as to have examples with all the required elements. Paco Kettleborough, Jamie wrote: > Hello Paco, > > thanks for giving us sight of these - do you care that they are not, > strictly speaking, CF compliant as they use standard_names (like source, > etc) that are not yet in the standard_name list? Should the > standard_names just be dropped for now and rely on the long_names? > > Though I think this is all tied up with process (apologies for mixing > threads). If we are going to have 'test files' and test implementations > don't we need some way of flagging the bits in the files that are > provisional or being tested? > > Jamie > -- ________________________________________ Francisco J. Doblas-Reyes European Centre for Medium-Range Weather Forecasting (ECMWF) Shinfield Park, RG2 9AX Reading, UK Tel: +44 (0)118 9499 655 Fax: +44 (0)118 9869 450 f.doblas-reyes@ecmwf.int _______________________________________ From graybeal at mbari.org Mon Jan 8 09:42:54 2007 From: graybeal at mbari.org (John Graybeal) Date: Mon Jan 8 09:42:57 2007 Subject: [CF-metadata] CF-metadata Digest, Vol 46, Issue 1 In-Reply-To: <2bb6ee950701080231r71b0d412s33f36cb750aba67b@mail.gmail.com> References: <1f31dac10701040832s713f430fg20b827ccd04c90f2@mail.gmail.com> <2bb6ee950701080231r71b0d412s33f36cb750aba67b@mail.gmail.com> Message-ID: I think this concept assumes the netCDF file will never be used in any other context (e.g, as a timestep in a shorter or longer forecast sequence), and the data set will no change (because if it does, every file must be reviewed for min/max values that need updating). Is that a correct understanding? I wonder if we want to promote characterizing whole datasets inside every constituent of the set. Does CF do this in other contexts? John At 10:31 AM +0000 1/8/07, Jon Blower wrote: >... >2) If the netCDF file is part of a larger dataset (e.g. it represents >one timestep in a forecast sequence) it would be useful to store a >"reasonable" (or actual, if possible) min and max for the *whole >dataset* (taking into account the measurement method used, >geographical region etc). -- ---------- John Graybeal -- 831-775-1956 Monterey Bay Aquarium Research Institute Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds From jdb at mail.nerc-essc.ac.uk Mon Jan 8 11:35:12 2007 From: jdb at mail.nerc-essc.ac.uk (Jon Blower) Date: Mon Jan 8 11:35:16 2007 Subject: [CF-metadata] CF-metadata Digest, Vol 46, Issue 1 In-Reply-To: References: <1f31dac10701040832s713f430fg20b827ccd04c90f2@mail.gmail.com> <2bb6ee950701080231r71b0d412s33f36cb750aba67b@mail.gmail.com> Message-ID: <2bb6ee950701081035h7292da6ah9d5362b8998a0aad@mail.gmail.com> Dear John, > I think this concept assumes the netCDF file will never be used in any other context (e.g, as a timestep in a shorter or longer forecast sequence), and the data set will no change (because if it does, every file must be reviewed for min/max values that need updating). Is that a correct understanding? I was thinking that that the min/max for the whole dataset cannot be the actual min/max, unless this is somehow known in advance (e.g. for a static dataset). I was suggesting that each file in the dataset should contain a "reasonable" min/max, given the context of the dataset (e.g. a numerical model of the Arctic ocean temperature would have a different dataset_min and dataset_max from a set of observations in the tropics). Actually, come to think of it, if the actual min/max for a variable is stored in each file, it's a relatively cheap operation to calculate the overall min/max for a set of files (depending on the number of files in the dataset, obviously). So maybe the most important attribute pair is the actual min/max of each variable in a particular file (which I suggested could be called data_min and data_max, to distinguish from valid_min and valid_max). Regards, Jon On 1/8/07, John Graybeal wrote: > I think this concept assumes the netCDF file will never be used in any other context (e.g, as a timestep in a shorter or longer forecast sequence), and the data set will no change (because if it does, every file must be reviewed for min/max values that need updating). Is that a correct understanding? > > I wonder if we want to promote characterizing whole datasets inside every constituent of the set. Does CF do this in other contexts? > > John > > At 10:31 AM +0000 1/8/07, Jon Blower wrote: > >... > >2) If the netCDF file is part of a larger dataset (e.g. it represents > >one timestep in a forecast sequence) it would be useful to store a > >"reasonable" (or actual, if possible) min and max for the *whole > >dataset* (taking into account the measurement method used, > >geographical region etc). > > -- > ---------- > John Graybeal -- 831-775-1956 > Monterey Bay Aquarium Research Institute > Marine Metadata Initiative: http://marinemetadata.org || Shore Side Data System: http://www.mbari.org/ssds > -- -------------------------------------------------------------- Dr Jon Blower Tel: +44 118 378 5213 (direct line) Technical Director Tel: +44 118 378 8741 (ESSC) Reading e-Science Centre Fax: +44 118 378 6413 ESSC Email: jdb@mail.nerc-essc.ac.uk University of Reading 3 Earley Gate Reading RG6 6AL, UK -------------------------------------------------------------- From earmstro at mail.jpl.nasa.gov Mon Jan 8 16:48:06 2007 From: earmstro at mail.jpl.nasa.gov (Ed Armstrong) Date: Mon Jan 8 16:48:21 2007 Subject: [CF-metadata] COARDS name for a time offset In-Reply-To: <20061223154723.GA12073@met.reading.ac.uk> References: <20061219182458.GA453@met.reading.ac.uk> <458C6244.9030600@noaa.gov> <20061223154723.GA12073@met.reading.ac.uk> Message-ID: HI Johnathen, Thank you for the reply. Sorry for the Christmas hiatus, but I am getting my brain in gear. I like your idea of a new standard name 'time_offset', that is a the relative difference from a predefined time. This is what I had in mind from the beginning. What is the process of proposing a registering a new CF standard name ? At 3:47 PM +0000 2006/12/23, Jonathan Gregory wrote: >Dear Ed > >Thanks for the further information and example. > >> I would like to specify >> a time offset such that for a certain wind pixel: >geographical indices j,i >> time[0] + sst_dtime[0,j,i] + this_offset = time of observation of >> the wind pixel. > >I presume that this_offset is dimensioned [nj,ni] and that 0 is an >example of a >time-index. From the example I understand that the point of this is to specify >2D time arrays (nj,ni) for SST, wind etc. rather than 3D (nt,nj,ni). This can >be done because the time-offset from time(t) of each variable at each location >is the same for all values of time index t. > >I agree with you that there is no existing CF convention which would be able >to indicate this procedure. However, most CF features are optional anyway so >CF compliance is a fairly minimal requirement. I suspect that yours is quite a >specialised requirement, so maybe we can approach it with a combination of CF >conventions, and your own GHRSST conventions. > >For instance, we could introduce a new standard name of time_offset, which has >time units but is for a time-difference rather than an absolute time. We have >a standard name of forecast_period with the same characteristics, but that >would not be appropriate here. You could give the sst_dtime variable this >standard name, and indicate it as an auxiliary coordinate variable of the SST >variable, by listing it in the coordinates attribute. You then >require a GHRSST >convention that if a data variable has a 1D coordinate variable of >time, and an >auxiliary coordinate variable with the time dimension and a standard_name of >time_offset, the absolute time of datum(t,j,i) is >time(t)+t_dependent_time_offset(t,j,i). > >Then for each non-SST variables such as wind speed, you need another 2D >variable e.g. wind_dtime, again with standard_name of time_offset. You can >indicate both the sst_dtime and the wind_dtime as auxiliary coordinate vars >of the wind_speed variable, and extend the GHRSST convention to say that if >there are is also an auxiliary coordinate variable with standard_name of >time_offset but which does not have the time dimension, that should >be added as >well, so that the absolute time of datum(t,j,i) is >time(t)+t_dependent_time_offset(t,j,i)+t_independent_time_offset(j,i) > >I wonder why you store time(t) at all? t could just be an index dimension, >and the sst_dtime could be an absolute time(t,j,i) rather than an offset. That >would simplify this a bit. Some comments here: If we were to reengineer the file format, we might make this change. But as it stands we have a stick to sst_dtime to be an relative (offset) time > >Best wishes > >Jonathan -- ~ed Edward M. Armstrong Physical Oceanography DAAC Tel: 818 393-6710 Jet Propulsion Laboratory Fax: 818 393-2710 edward.armstrong@jpl.nasa.gov From earmstro at mail.jpl.nasa.gov Mon Jan 8 17:23:24 2007 From: earmstro at mail.jpl.nasa.gov (Ed Armstrong) Date: Mon Jan 8 17:23:35 2007 Subject: [CF-metadata] COARDS name for a time offset In-Reply-To: <4592EE80.8060008@unidata.ucar.edu> References: <20061223192319.GB12784@met.reading.ac.uk> <4592EE80.8060008@unidata.ucar.edu> Message-ID: Hi John, Thanks for the reply. Some comments below. At 2:06 PM -0800 2006/12/27, John Caron wrote: >Sorry my previous post was incomplete (trying to finish it as I >walked out the door). >Try it again: > >heres a Grid with 1D coordinates (list coordinates explicitly for clarity): > >1) short sea_surface_temperature(time, lat, lon) ; > sea_surface_temperature:coordinates = "lon lat time" ; > float lat(lat) ; > float lon(lon) ; > int time(time) ; > >heres a Grid with 2D lat/lon coordinates: > >2) short sea_surface_temperature( time, nj, ni) ; > sea_surface_temperature:coordinates = "lon lat time" ; > float lat(nj, ni) ; > float lon(nj, ni) ; > int time(time) ; > >we were thinking that this is a Swath: > >3) short sea_surface_temperature( nj, ni) ; > sea_surface_temperature:coordinates = "lon lat time" ; > float lat(nj, ni) ; > float lon(nj, ni) ; > int time(nj, ni) ; > >the difference being that Swaths have a 2D time coordinate. > >Now consider > > short sea_surface_temperature( timeRef, nj, ni) ; > sea_surface_temperature:coordinates = "lon lat" ; > > int timeRef(timeRef) ; > timeRef:long_name = "reference time of sst file" ; > > short sst_dtime(time, nj, ni) ; > sst_dtime:long_name = "time difference from reference time" ; > > float lat(nj, ni) ; > float lon(nj, ni) ; > >As it is written, the coordinate system for sea_surface_temperature >is (lat, lon, timeRef). This looks like a Grid with 2D lat/lon >coordinates (case 2 above). There is no reference to sst_dtime > >But I suppose its more accurate to see it as a swath: > > short sea_surface_temperature( nj, ni) ; > sea_surface_temperature:coordinates = "lon lat sst_dtime" ; > > short sst_dtime(time, nj, ni) ; > sst_dtime:long_name = "time difference from reference time" > sst_dtime:formula = "timeRef + sst_dtime" ; > >where I have imagined that we have come up with some convention for >calculating the time coordinate values. > >Im guessing it is useful to see it as a Grid (ignore 2D time) or a >Swath (ignore 1D time). Because the lat/lon coordinates are the same >for each timeRef coordinate, it seems to me to really be a hybrid of >the 2. > > short sea_surface_temperature( timeRef, nj, ni) ; > sea_surface_temperature:coordinates = "lon lat timeRef >sst_dtime" ; > > int timeRef(timeRef) ; > timeRef:long_name = "reference time of sst file" ; > > short sst_dtime(time, nj, ni) ; > sst_dtime:long_name = "time difference from reference time" ; > sst_dtime:formula = "timeRef + sst_dtime" ; > >It would seem that there are 2 time coordinates, an approximate time >for showing the data as a Grid, and an actual "pixel-by-pixel" time. > >Ed, can you tell us more about this dataset? Its derived from Swath >data, and resampled? to a fixed lat/lon array? What causes the >constant offsets of different variables? Does each variable have a >different offset? The GHRSST L2P dataset is swath data, not resampled. There is a unique lat/lon for each pixel, thus a 2D array for both lat and lon. The offsets in time are caused by non-satellite ancillary fields (wind, aerosols, solar insolation, sea ice) in the file that have different observation times (or prediction times if model forecast fields) from the SST satellite data (which are are referenced to the sst_dtime array). Each of the ancillary variable could have a different time offset. I like the solution proposed by Johanthen with a 'time_offset' standard name that can then be used with the relative time from array sst_dtime to calculated a new relative time from the absolute time specified in array 'time' . Not quite an elegant solution, but within the framework of the current file format it will work. -- ~ed Edward M. Armstrong Physical Oceanography DAAC Tel: 818 393-6710 Jet Propulsion Laboratory Fax: 818 393-2710 edward.armstrong@jpl.nasa.gov From r.gorman at niwa.co.nz Mon Jan 8 18:27:14 2007 From: r.gorman at niwa.co.nz (Richard Gorman) Date: Mon Jan 8 18:27:24 2007 Subject: [CF-metadata] Standard names added In-Reply-To: References: Message-ID: <45A2EF72.50200@niwa.co.nz> Hello Alison, In September you informed us of some additions to the standard names table. Checking the table, I see that these included > sea_surface_wave_directional_variance_spectral_density with units m2 s-1 rad-1 and > sea_surface_wave_variance_spectral_density with units m2 s-1 I believe that these should be m2 s rad-1 and m2 s, respectively. Integrating the variance spectral density over frequency (units s-1) should give a sea level variance (units m2). Apologies for not noticing sooner. Regards, Richard -- ========================================================= Richard Gorman National Institute of Water and Atmospheric Research PO Box 11-115, Hamilton, 3251, New Zealand Tel: +64 7 856 1736 Mob: 021 074 7490 Fax: +64 7 856 0151 Email: r.gorman@niwa.co.nz Web: http://www.niwa.co.nz ========================================================= From caron at unidata.ucar.edu Mon Jan 8 18:33:11 2007 From: caron at unidata.ucar.edu (John Caron) Date: Mon Jan 8 18:33:19 2007 Subject: [CF-metadata] COARDS name for a time offset In-Reply-To: References: <20061223192319.GB12784@met.reading.ac.uk> <4592EE80.8060008@unidata.ucar.edu> Message-ID: <45A2F0D7.1010205@unidata.ucar.edu> Ed Armstrong wrote: > > Hi John, > > Thanks for the reply. Some comments below. > > At 2:06 PM -0800 2006/12/27, John Caron wrote: > >> Sorry my previous post was incomplete (trying to finish it as I walked >> out the door). >> Try it again: >> >> heres a Grid with 1D coordinates (list coordinates explicitly for >> clarity): >> >> 1) short sea_surface_temperature(time, lat, lon) ; >> sea_surface_temperature:coordinates = "lon lat time" ; >> float lat(lat) ; >> float lon(lon) ; >> int time(time) ; >> >> heres a Grid with 2D lat/lon coordinates: >> >> 2) short sea_surface_temperature( time, nj, ni) ; >> sea_surface_temperature:coordinates = "lon lat time" ; >> float lat(nj, ni) ; >> float lon(nj, ni) ; >> int time(time) ; >> >> we were thinking that this is a Swath: >> >> 3) short sea_surface_temperature( nj, ni) ; >> sea_surface_temperature:coordinates = "lon lat time" ; >> float lat(nj, ni) ; >> float lon(nj, ni) ; >> int time(nj, ni) ; >> >> the difference being that Swaths have a 2D time coordinate. >> >> Now consider >> >> short sea_surface_temperature( timeRef, nj, ni) ; >> sea_surface_temperature:coordinates = "lon lat" ; >> >> int timeRef(timeRef) ; >> timeRef:long_name = "reference time of sst file" ; >> >> short sst_dtime(time, nj, ni) ; >> sst_dtime:long_name = "time difference from reference >> time" ; >> >> float lat(nj, ni) ; >> float lon(nj, ni) ; >> >> As it is written, the coordinate system for sea_surface_temperature is >> (lat, lon, timeRef). This looks like a Grid with 2D lat/lon >> coordinates (case 2 above). There is no reference to sst_dtime >> >> But I suppose its more accurate to see it as a swath: >> >> short sea_surface_temperature( nj, ni) ; >> sea_surface_temperature:coordinates = "lon lat sst_dtime" ; >> >> short sst_dtime(time, nj, ni) ; >> sst_dtime:long_name = "time difference from reference time" >> sst_dtime:formula = "timeRef + sst_dtime" ; >> >> where I have imagined that we have come up with some convention for >> calculating the time coordinate values. >> >> Im guessing it is useful to see it as a Grid (ignore 2D time) or a >> Swath (ignore 1D time). Because the lat/lon coordinates are the same >> for each timeRef coordinate, it seems to me to really be a hybrid of >> the 2. >> >> short sea_surface_temperature( timeRef, nj, ni) ; >> sea_surface_temperature:coordinates = "lon lat timeRef >> sst_dtime" ; >> >> int timeRef(timeRef) ; >> timeRef:long_name = "reference time of sst file" ; >> >> short sst_dtime(time, nj, ni) ; >> sst_dtime:long_name = "time difference from reference >> time" ; >> sst_dtime:formula = "timeRef + sst_dtime" ; >> >> It would seem that there are 2 time coordinates, an approximate time >> for showing the data as a Grid, and an actual "pixel-by-pixel" time. >> >> Ed, can you tell us more about this dataset? Its derived from Swath >> data, and resampled? to a fixed lat/lon array? What causes the >> constant offsets of different variables? Does each variable have a >> different offset? > > > > > The GHRSST L2P dataset is swath data, not resampled. There is a unique > lat/lon for each pixel, thus a 2D array for both lat and lon. The > offsets in time are caused by non-satellite ancillary fields (wind, > aerosols, solar insolation, sea ice) in the file that have different > observation times (or prediction times if model forecast fields) from > the SST satellite data (which are are referenced to the sst_dtime > array). Each of the ancillary variable could have a different time offset. > > I like the solution proposed by Johanthen with a 'time_offset' standard > name that can then be used with the relative time from array sst_dtime > to calculated a new relative time from the absolute time specified in > array 'time' . Not quite an elegant solution, but within the framework > of the current file format it will work. thanks for the additional info. my main point is that there are 2 time variables here. this brings up a number of issues, for example, to which should the time_offset be added? From caron at unidata.ucar.edu Tue Jan 9 08:57:18 2007 From: caron at unidata.ucar.edu (John Caron) Date: Tue Jan 9 08:59:21 2007 Subject: [CF-metadata] COARDS name for a time offset In-Reply-To: References: <20061223192319.GB12784@met.reading.ac.uk> <4592EE80.8060008@unidata.ucar.edu> Message-ID: <45A3BB5E.2000507@unidata.ucar.edu> ok, let me try this again: abstractly, we have swath data with 2D time coordinates. since the swath data occurs at the same lat/lon pixels, it is useful to make these into 3D variables: variable[t, y, x] which have coordinates: lat[y,x] lon[y,x] time_for_variable[t,y,x] the times for different variables may be different. to save space, we want to calculate the time variables, instead of storing them, as: time_for_variable[t, y, x] = reference_time[t] + sst_dtime[t,y,x] + time_offset_for_variable or is it time_for_variable[t, y, x] = reference_time[t] + sst_dtime[t,y,x] + time_offset_for_variable[y,x] as jonathan suggests?? does this capture the issue correctly? if so, and we want to proceed, perhaps a "formula" attribute (a la grid_mapping) would be a good generalized way to do these calculations? From christiane.textor at aero.jussieu.fr Tue Jan 9 09:58:28 2007 From: christiane.textor at aero.jussieu.fr (Christiane Textor) Date: Tue Jan 9 10:01:55 2007 Subject: [CF-metadata] aerosol and chemistry names - continuation In-Reply-To: <20070106173246.GA27027@met.reading.ac.uk> References: <20070106173246.GA27027@met.reading.ac.uk> Message-ID: <45A3C9B4.4060304@aero.jussieu.fr> Dear Jonathan and others, I have updated the tables, please have a look at http://wiki.esipfed.org/index.php/CF_Standard_Names_-_Accepted_names_for_TF_HTAP and http://wiki.esipfed.org/index.php/CF_Standard_Names_-_Proposed_names_for_TF_HTAP The remaining issues concern: ============================ 1) emission fluxes >>atmosphere_emission means a source within the atmosphere, e.g. from the >>surface or from an air plane. > > Ah, OK. Why is the surface included? You have separate surface fluxes. > The updated names are atmosphere_emission_mass_flux_of_X, there is no surface. I have added a comment: Integrate 3D emission field vertically to 2d field. > >>atmosphere_production means production within the atmosphere, this >>includes direct sources and chemical production from precursors. > > So production = emission (from sources) + chemical net production > It seems to me potentially confusing to have production in these two different > senses on the left and right of the equation. What about saying "addition" on > the left e.g. NOx is added to the atmosphere by emission from aircraft and by > chemical production. The opposite of "addition" might be "removal", which > comes about by deposition and chemical destruction. Do we also need to > distinguish gross and net addition? > I agree that production is not the best term here. Could we use atmosphere_source and atmosphere_removal? > I see. Is reemission included in emission? This might be a source of confusion. > re-emission is not included in emission, and as it occurs only for a few species, I do not think that it is necessary to add. > ============================ 2) optical depth or thickness Ok, convinced: I changed it to thickness. ============================ 3) X_optical_thickness or optical_thickness_due_to_X (depth changed to thickness) > Perhaps, as with named surfaces, it might be acceptable (if not too > complicated) to say X_optical_depth if X is one word (since that is convenient > and what people usually say e.g. for cloud and aerosol), and > optical_depth_due_to_X if X is several words (to make it easier to understand). > (Please help me: Which named surfaces? is this linked to surface in emission variables?) So we come back to atmosphere_optical_thickness_due_to_X. I agree that this is easier to understand. I would like to add 'atmosphere' because it could also be e.g. in the ocean. I have changed the table accordingly. See http://wiki.esipfed.org/index.php/CF_Standard_Names_-_Proposed_names_for_TF_HTAP X) something additional: particulate_organic_matter_dry_aerosol Expressed as mass of particulate organic matter. I have added a comment asking for the scale factor to obtain carbon mass (particulate organic matter = Factor * particulate organic carbon) "If possible, indicate scale factor to obtain carbon mass." Different factors are used by the modelers, and it is of advantage to know what is assumed for scaling OC and OM. OK? Best wishes Christiane -- Christiane Textor Service d'A?ronomie INSU CNRS, Tour 46, RDC # 2 Universit? Pierre et Marie Curie, Boite 102 4 place Jussieu 75252 Paris C?dex 05 France Tel: ++33 1.44.27.21.82 Fax: ++33 1.44.27.21.81 Email: christiane.textor@aero.jussieu.fr From j.m.gregory at reading.ac.uk Wed Jan 10 04:02:50 2007 From: j.m.gregory at reading.ac.uk (Jonathan Gregory) Date: Wed Jan 10 04:02:53 2007 Subject: [CF-metadata] Re: attributes for min/max data values for visualization Message-ID: <20070110110250.GN4006@met.reading.ac.uk> Dear John > I wonder if we want to promote characterizing whole datasets inside every constituent of the set. Does CF do this in other contexts? No, because CF does not have a concept of "dataset" yet i.e. there are no specific conventions for linking files. Cheers Jonathan From Steven.C.Hankin at noaa.gov Wed Jan 10 10:15:34 2007 From: Steven.C.Hankin at noaa.gov (Steve Hankin) Date: Wed Jan 10 10:14:46 2007 Subject: [CF-metadata] CF "datasets" In-Reply-To: <20070110110250.GN4006@met.reading.ac.uk> References: <20070110110250.GN4006@met.reading.ac.uk> Message-ID: <45A51F36.3070505@noaa.gov> Hi Jonathan et. al., As a general matter, I think we ought to get to the heart of this "dataset" concept. CF "datasets", that are not simple files, have been in routine use for a long time. It is routine that the data object we think of as a "CF thingy" is actually an aggregation of many files. From the data user's point of view this is wholly transparent. He/she does not (and should not) need to know the file management details of the CF dataset. From the data creator's point of view this has also been a non-issue, because time aggregations have the simplifying property that each individual file is a CF dataset in its own right. The min/max concept under discussion illustrates one of the special cases where one has to be careful extending files into datasets. The "ensembles" discussions that have occupied us recently illustrate some much more challenging cases, for example, requiring that global attributes from individual files get promoted into arrays of strings in the "dataset" along the ensemble axis. - Steve ===================================== Jonathan Gregory wrote: > Dear John > > >> I wonder if we want to promote c