Report to Climate Committee on SOLA Meeting

By Doug Goodin.

Date: Wed, 24 Sep 1997 10:32:35 -0500 (CDT)

To All Climstan meeting participants,

Greetings. As part of my premeeting tasks, I was asked by David to

circulate a copy of a report I made last year following my participation

in the SOLA workshop, and also a list of climate-related terms. Attached

to this first message is an updated version of my report. It's brief,

but summarizes many of my thoughts on the data standards issues. The

glossary of terms will arrive attached to a second message.

 

I am looking forward to the meeting as a chance to finally meet and

interact with members of the committee. If anyone feels that either of

these documents are incomplete or in need of revision, please let me know.

 

Cheers,

 

Doug

 

==============================================================

Douglas G. Goodin, Ph.D. | Tel: (913) 532-6727

Department of Geography | Fax: (913) 532-7310

Kansas State University | E-mail: dgoodin@ksu.edu

Manhattan, KS 66506-0801 | http://www.ksu.edu/rssg/doug.htm

==============================================================

This short report summarizes some ideas for the revised climate standards. Most of these ideas came as a result of my attending the Science On-Line Antarctica (SOLA) meeting in Lake Tahoe last September. I circulated that report to members of the climate committee, however David Greenland asked me to recirculate it prior to the upcoming October meeting. Unfortunately, my original report was lost following a hard drive failure on my PC (and no backup, shame on me!). I am therefore reconstructing this mostly from memory, with a few new ideas tossed in.

Attending the SOLA workshop was very useful in organizing my thinking about data formatting problems. Although the data concerns of the SOLA group are somewhat different than those of the climate committee, there are many similarities. Both involve complex, multi-institute, multi-investigator problems, and both aim to satisfy a diverse "client group." Before I consider the "nuts and bolts" of data formatting, it might be useful to relay some of the overriding concerns of the scientists present at the SOLA meetings. First, the was adamant agreement about the importance, accuracy and availability of metadata. The desire for high-quality metadata was no surprise to me, but the persistent discussion of it led me to conclude that perhaps some of the SOLA investigators were uncertain about the lineage of the data they used. I am sure that LTER investigators (who in fact overlap SOLA to a large extent) would have the same concerns. In looking at data archives from various sites, I'm impressed with the availability of metadata, but some attention to its format may also be wise.

A second interesting development of the SOLA workshop was the desire on the part of investigators not just for data, but for analysis tools as well. This included more than just the canned and commonly available statistics or mathematics packages. Investigators wanted access to process models. One example that was used was the solar radiation models developed by Ralph Dubayah of the University of Maryland (a workshop attendee). While the consensus seemed to be that availability of such models would be a good thing, their was spirited discussion of how best to accomplish this. Intellectual property issues were raised. Availability of supporting information such as sensitivity analysis or models (i.e. metadata for the models) was also a concern. It was agreed that some sort of peer review would be required before an analysis tool could be made available to a user community. Professional recognition and rewards were also discussed. Clearly, an investigator would voluntarily need to make her/his model available, essentially placing it in the public domain. Users would need to be sensitive to this and recognize and acknowledge the contributor.

I mention these two major points because I believe both have relevance to the climate standards problem. Data format standards were also discussed at SOLA, the material below represents some ideas circulated at the workshop with some of my own thoughts thrown in. In talking with data users and managers here at KNZ and at other sites, I have been struck by the fact that (very) few users of climate data are climatologists or have more than a basic knowledge of climate. In devising the guidelines below, I placed my self in an analogous situation by considering my own use of ecological data in the context of my rudimentary knowledge of biology/ecology. My conclusions, which should surprise no one, are that I would want the data to be in a simple and easily understandable format and I want to be able to use them with confidence even though I know little about their origins. In addition, I also recognize the need to create a system that will not unduly complicate the lives of data managers. After the SOLA meeting, I devised several scenarios for standardization. Each scenario includes two components;

(1) how the data are stored( i.e in what format), and (2) how the data are accessed. Following the example of remote sensing satellite data, I have designated storage options by level. Level 0 data are at the finest temporal resolution, level 1 data are level 0 data processed to a more coarse time scales. For example, level 0 temperature data might consist of hourly observations, level 1 temperature data might be daily, weekly, or monthly averages. Level 2 data would be derived measures, for example potential ET derived from temperature and humidity data. Distribution options were divided into on-line and off-line storage. Off-line storage would consist of static ascii files similar to those now in place. On-line availability in this case refers to the presence of some dedicated search engine capable of being interactively queried by the user and returning data of some requested type and time increment. This approach would be paired with level 0 data, so for example, a user might query the system to retrieve only weekly temperature data from one site. The data management system would then seek out the appropriate hourly data, perform the averaging on-line, then create the user's file. Of course, combinations of these options are possible, so that level 1 data might be accessible by a data management system. At additional levels of complication, level 2 data might also be created on the fly by such a system. Of these options, I favor those that offer the best combination of simplicity and ease of implementation. I therefore favor a format which includes level 0 data, but no on-line management system. circulated my original report to data managers, and those who replied agreed. Lloyd Swift of the Coweeta LTER noted that most researchers are already highly software literate and knowledgeable in handling data, thus not in need of a "fancy front end" for the data base. Individual investigators have generally already worked out a combination of software and procedures that work well for them. Forcing a certain package on them would probably create problems and ultimately defeat our purpose by making the data less useful. Based on the arguments above, I favor the implementation of standardized level 0 data.

The actual details of constructing the data files still needs to be resolved, however I think a group discussion with managers and users would be the best approach to this part of the problem. Several possible formats have already been suggested and circulated by David Greenland, Caroline Bledsoe, and others. These will be good starting points for discussions at the meeting.

An additional point of discussion should be our approach toward level 2 data. Should it be the responsibility of data managers to implement procedures for calculating these sorts of variables. Which variables should be calculated? Which algorithms should be used? Who should develop them? These questions echo the concerns of the SOLA group. Here again, I think that. At least for the present, we should keep things as simple as possible. Concentrate on a standardized format for basic, raw data, let individual users take responsibility for converting data into desired information.

The comments above were the central points of my report from the SOLA workshop. I would like to emphasize that my opinions are included as starting points for discussion. I'm sure that input from the group will result in even stronger ideas. Throughout my involvement in this project, I have consistently noted a strong desire for standardized data on the part of investigators involved in multi-site research. I'm confident we can make progress in supplying this.