Archive for September, 2010


Why bother with words when abbreviations are so much more cryptic? As pd points out, there is a new version of Global Historical Climate Network, version 3. There isn’t an official announcement yet, but others have noticed.

GHCN-M is the monthly datasets. Version 3 is still in beta, so we’re all still learning.

The file format is different. More like USHCNv2. And like USHCN each datum has a set of flags that indicate quality checks (isolated value, inconsistent with climatology, month has missing days, and so on). One of the flags is a source flag, each monthly datum is tagged with its source: UK Met Office, CLIMAT report, MCDW, and so on.

Unlike GHCN v2 there is only one record for each station in GHCN v3. There are no “duplicates”. This makes one job (the job of Step 0) easier, we don’t have to decide how to select or combine multiple records for the same station: that’s been done for us. On the other hand, we may have wanted to combine records in a different way.

I’ve been modifying ccc-gistemp to experiment with GHCN v3. At first I thought I could use the v2.inv file supplied by GISTEMP, but the GHCN station identifiers for the contiguous US have changed (so that they’re based on their USHCN station identifiers—probably a good thing). Writing code to parse the new v3 .inv file is straightforward enough.

Of course the v3 .inv file doesn’t have the night-time satellite brightness that GISTEMP uses in its analysis (globally, since 2010-01-16). So I also added a parameter to use the GHCN population index (POPCLS in the documentation) globally.

This result should be considered preliminary.

When making comparisons with official GISTEMP there are several caveats:

  • Only GHCN v3 data is used. No SCAR READER (and no Hohenpeissenberg correction).
  • In Step 2, urban adjustment, the GHCN v3 analysis uses the POPCLS field for the rural/urban designation. The field has three values, R/S/U, for Rural/Semi-Rural/Urban. R maps to rural in the analysis, the others count as urban. The current GISTEMP analysis uses night-time satellite brightnesses.
  • Each GHCN v3 station is treated as a single record. GISTEMP using v2 data combines duplicate records for the same station into one record (sometimes more than one); this record may not be the same as the GHCN v3 record. And in particular…
  • (because I appended a ‘G’ to all the 11-digit v3 station identifiers) the “hand picked” list of deletions and adjustments is not used. The most obvious example of where this matters is St Helena, 14761901000.

I changed ccc-gistemp to use GHCN v3 and wrote this post ages ago, but when I met Jay Lawrimore at the Exeter workshop, he said I should probably hold off posting. Here’s the record of my GHCN v3 changes in googlecode (made on 2010-09-04).

Surface Temperatures Workshop

David Jones and I attended the Surface Temperatures workshop at the Met Office in Exeter this week. This is the kickoff meeting for an ambitious new project to produce a far more comprehensive databank of surface temperature records than currently exists, especially at finer time resolution (daily and sub-daily) and incorporating many station records which are not currently available.

There were around 80 attendees from around the world, including climate scientists, meteorologists, computer people, statisticians, metrologists, and ourselves. This was the first outing for our new Climate Code Foundation, although many people there were aware of Clear Climate Code. This was the first time either of us had attended a climate science meeting. We were made welcome, our motivation and focus was respected, and our voices were heard. The project principles established at the meeting include a strong commitment to openness and transparency, and although some scientists don’t share our conviction of the importance of code publication, the project is committed to publishing all its code.

We were not paid for our participation or for our expenses. In the final meeting we were asked to contribute to software aspects of the project, and said that this may be possible depending on resources.

A mind-boggling side-light: estimates of the volume of non-digitized or hard-copy data range in the hundreds of millions of pages; NCDC alone has a digital archive of 56 million page images, and literally thousands of boxes of unscanned hard-copy in their basement. Many national weather services, and other governmental, non-governmental, and commercial organisations also have large paper or imaged archives; the Galaxy Zoo people are working with the Met Office and the National Maritime Museum on an amazingly cool new crowd-sourced project to recover weather records from millions of imaged pages of Royal Navy log books. There was a strong emphasis at the meeting on the need to retain original data and to make any dataset fully traceable to that original data (I imagine a web interface in which one can drill down to page images of the original weather station hard-copy records). It was clear to the meeting that this traceability requirement implies software publication.

The Climate Code Foundation

We are pleased to announce the creation of the Climate Code Foundation. The Foundation is a non-profit organisation founded by David Jones, Nick Barnes, and Philippa Davey to promote public understanding of climate science. The Foundation will continue work on the Clear Climate Code project, and also related activities, encouraging climate scientists to improve and publish their software.

The Foundation intends to work with climate scientists, funding bodies, national and international organisations, and science publishers. We hope to establish climate science in the forefront of science software quality and transparency.

Members of the Foundation are attending the Surface Temperatures workshop at the UK Met Office in September 2010, to promote better and more open software practices within that project.

If you support the Foundation’s goals, there are many ways to contribute to its work.