Archive for the ‘status’ Category

ccc-gistemp release 0.6.0

[ed: 2010-10-29: This was an announcement for release 0.6.0, but that has a bug (see comments). Please use release 0.6.1 instead. I've edited the article to update the links]

I’m pleased to announce ccc-gistemp release 0.6.1 (making the previous buggy 0.6.0 obsolete). ccc-gistemp is our project to reimplement the NASA GISS GISTEMP algorithm in clearer code (in Python). This release is the first release made under the aegis of the Climate Code Foundation.

Many of the significant changes in this release have already been previewd in earlier blog posts:

Further details are available in the release notes.

We intend to carry on our work on ccc-gistemp, and we urge you to download our code, try it, and read it. We welcome contributions.

The work for release 0.6.0 was carried out by David Jones and Nick Barnes, and the work for release 0.6.1 was carried out by David Jones.

ccc-gistemp release 0.5.1

I am pleased to announce ccc-gistemp 0.5.1 (the astute reader will note that there is no announcement for release 0.5.0. It is available but does not work in Python 2.5.1 so I fixed that for release 0.5.1).

Compared to the previous release, the changes are not so grand. This release incorporates many incremental improvements to clarity. It also has a couple of bug fixes: to cope with the fact that the GISTEMP source tarfile that we used changed its layout (see this comment here for example); and to once again run on Python 2.4 (a thoroughly ancient version, please try and use Python 2.6).

I have spent a large amount of time trying to clarify Step 2 the peri-urban adjustment described in Hansen et al 1999. I encourage you to try out this release, read the code, and help us improve it.

David Jones, Nick Barnes, and Ronan Lamy have contributed to this release.

Trendy!

tool/vischeck.py has been recently updated so that it computes and draws trends (the work was done by me and Nick Barnes). Here’s some recent comparisons redrawn with trends:

The “before 1992 / after 1992 stations” from “The 1990s station dropout does not have a warming effect”:


The short trends are done with the last 30 years of data for each series (which since one series ends in 1991, is a different period for each). Notice how similar the recent trends are.

Reprising the Urban Adjustment post:

I don’t think I’ve done a combined land and ocean chart comparing hemispheres for the blog before, but here it is now:

Nick Barnes added the calculation of R2 whilst I was writing this post, causing me to redraw all the charts.

Nick has also been exploiting ccc-gistemp‘s new parameters.py module, and did a run with the somewhat experimental 250km smoothing rather than the traditional 1200km smoothing. The parameter is named gridding_radius and it affects gridding in Step 3; setting it to 250km essentially reduces each station’s influence to very roughly the size of the cell used in gridding.

The effect on the trends is most visible in the Northern Hemisphere:

Trends are just one minor example of the way in which the ccc-gistemp code can be continuously improved. We don’t just draw trends for one graph, we improve the code so that all graphs can have trends.

ccc-gistemp release 0.4.0

[Updated: ccc-gistemp release 0.4.1 is now available]

I am pleased to announce ccc-gistemp release 0.4.0. This release is much clearer than previous releases. Give it a go.

  • Almost all of our code has now been rewritten to remove the Fortran style which remained from the original conversion from GISTEMP. Previous releases had greatly improved steps 0-2; this release continues the improvement work there and also carries those improvements through steps 3-5. Almost all of the code now has sensible variable and function names, clearer data handling, and helpful comments. Many unused variables and functions have been removed. The current core algorithm has 3740 lines of code, of which more than half are either comments, documentation strings, or blank.
  • Rounding has been completely eliminated from the system. Previously, rounding and truncation code was used to exactly emulate GISTEMP. Rounding made the code less clear, and Dr Reto Ruedy of NASA GISS confirmed that rounding was not important to the algorithm, so it has been removed. All temperature data is now handled internally as floating point degrees Celsius (previously it was a mixture of integer tenths, floating point tenths, and floating point degrees) and all location information is handled as floating point degrees latitude and longitude (previously it was a mixture of floating point degrees and integer hundredths).
  • In a normal run of ccc-gistemp, no data passes through intermediate files. Much of GISTEMP is concerned with generating and consuming intermediate files, to separate phases and to avoid keeping the whole dataset in memory at once (an important consideration when GISTEMP was originally written). We have now completely replaced this with an in-memory pipeline, which is clearer, automatically pipelines all the processing where possible, and avoids all code concerned with serialization and deserialization.
    We now have separate code to generate data files between the distinct steps of the GISTEMP algorithm, and to allow running a step from a data file instead of in a pipeline. This allows the running of single steps, and is useful for testing purposes.
  • Parameters, such as the 1200 km radius used when gridding, and the number, 3, of rural stations required to adjust an urban station, which were scattered throughout the code, are now all to be found, with explanatory comments, in code/parameters.py
  • It’s now possible to omit Step 4 and produce a land-only index, which closely matches GISTEMP.
  • It’s also possible to omit Step 2, and run the algorithm without the urban heat-island adjustment.
  • GISTEMP recently switched to using nighttime brightness to determine urban/rural stations. We made the corresponding change, which is switchable.

Note that none of these changes altered any of our results by more than 0.01 degrees C, except for the change to urban station identification, for which the changes in our results (none greater than 0.03 degrees C) closely match the changes the GISTEMP results.

The work for this release has been done by David Jones, Paul Ollis, and Nick Barnes.

[Updated: this release has been swiftly followed by ccc-gistemp release 0.4.1, to fix a bug reported in comments here.]

GISTEMP Land Index

GISS publish a land-only temperature anomaly (referred to as their “traditional analysis”).

As I pointed out in an earlier article ccc-gistemp can now create a land index by omitting Step 4: python tool/run.py -s0-3,5.

Here’s how we compare with official GISTEMP:

The 1990s station dropout does not have a warming effect

Tamino gives his results for his GHCN based temperature reconstruction. It is well worth reading. He also gives a comparison between stations that are reporting after 1992, and those that “dropped out” before 1992. He concludes that there is no significant difference in the overall trend. In other words refuting the claim that the 1990s station dropout has a warming effect. His results are preliminary and for the Northern Hemisphere only.

Tamino’s analysis use only the land stations; in order to write this blog post I tweaked ccc-gistemp so that we can produce a land index (python tool/run.py -s 1-3,5 now skips step 4, avoids merging in the ocean data, and effectively produces a global average based only on land data).

It is very easy to subset the input to ccc-gistemp and run it with smaller input datasets. So in this case I can split the input data into stations reporting since 1992, and those that have no records since 1992, and run ccc-gistemp separately on each input. I created tool/v2split.py to split the input data. Specifically I ran step 0 (which merges USHCN, Antarctic, and Hohenpeissenberg data into the GHCN data) to create work/v2.mean_comb then split that file into those stations reporting in 1992 and after, and those not reporting after the cutoff. Then I ran steps 1,2,3, and 5 of ccc-gistemp to create a land index:

It is certainly not the case that the warming trend is stronger in the data from the post-cutoff stations. [edit 2010-03-22: In a subsequent post I add trend lines to this chart]

The differences between these results and Tamino’s are interesting. Both show good agreement for most of the 20th century. These data show more divergence than Tamino’s in the 1800′s. Is that because we’re using Southern Hemisphere data as well, or is it because of the difference in station combining? Further investigation is merited.

We hope to make “experiments” of this sort easier to perform using ccc-gistemp and encourage anyone interested to download the code and play with it.

Update: Nick B obliges with a graph of the differences:

On integers, floating-point numbers, and rounding

Progress continues on the ccc-gistemp project. Anyone interested is welcome to go on over to the source code browse page and peruse it.

  • Paul Ollis has done excellent work separating all the I/O code from the main algorithm, and refactoring it so that data can flow through the entire program without passing through several intermediate data files.
  • David Jones has made a tool for indexing plain-text data files for random access, and has been working SVG-based visualisation tools. Together, one day these will let us provide a snappy graphical interface for answering questions like “how did the peri-urban adjustment on this station work?”
  • I have been working on removing rounding from the whole system. Until now we have often found ourselves having to round values in order to maintain exact equivalence with GISS results (which may have been rounded for output to an intermediate data file which is read by a later phase). For example, rounding temperatures to the nearest tenth degree Celsius, or latitude and longitude values to the nearest tenth degree. I mentioned this in email with Dr Reto Ruedy of GISS, and he assured me that all such rounding is incidental to the algorithm – an accident of history. So we are removing it from our version, to help clarify the algorithm. We will end up with the only explicit rounding in the system being done in order to write the final result files.
  • Next I am hoping we will extract the main numerical parameters of the algorithm – for instance, the 1200km station radius for gridding, the 4 rural stations required for peri-urban adjustment – to a separate module, where they can be easily modified by anyone interested in experimenting with different values.

We are aiming for a release 0.4.0 of ccc-gistemp to happen around the end of February or in early March, time permitting. The specification of this version is something like “no I/O, no rounding, and explicit parameters”, and we’re pretty close to that now.

Rounding in GISTEMP has prompted a lot of discussion in the blogosphere, and since I have been working in that area in ccc-gistemp, I thought I could write a few words here to clarify it. There is a lot of general misunderstanding of computer arithmetic, even among professional programmers. I have dealt with the nitty-gritty of it in various capacities in the past, and hopefully can convey some of my expertise.
(more…)

ccc-gistemp release 0.3.0

I am pleased to annnounce ccc-gistemp release 0.3.0. This includes a number of bug fixes and features in our framework and tools, and a great deal of clarification work especially in steps 1 (station combination) and 2 (peri-urban adjustment). Really, it’s much better. Give it a go.

Much of GISTEMP was concerned with generating and consuming intermediate files, to separate phases and to avoid keeping the whole dataset in memory at once (an important consideration when GISTEMP was originally written). In 0.3.0 this has largely been replaced by an iterator-based approach, which is clearer, automatically pipelines all the processing where possible, and avoids all code concerned with serialization and deserialization.

We have retained intermediate files between the distinct steps of the GISTEMP algorithm, for compatibility with GISTEMP and for testing purposes. We have also retained some code to round or truncate some data at the points where Fortran truncates it for serialization. This will be removed in future.

Some of the original GISS code was already in Python, and survived almost unchanged in 0.2.0. Much of the rest of 0.2.0, especially the more complex arithmetical processing in step 2, was more-or-less transliterated from the Fortran. A lot of this code has been rewritten in 0.3.0, especially improving the clarity of the station-combining code (in step1.py) and the peri-urban adjustment (now in step2.py).

There has been a rearrangement of the code: the code/ directory now only contains code which we consider part of the GISTEMP algorithm. Everything else – input data fetching, run framework, testing, debugging utilities – is in the tool/ directory. This division will continue, to allow us to add useful tools while still reducing and clarifying the core code.

There is better code for comparing results, and a regression test against genuine GISTEMP results.

All-Python ccc-gistemp release

I am proud to announce release 0.2.0 of ccc-gistemp.  This is an all-Python reimplementation of GISTEMP, the NASA GISS surface temperature analysis.  Please feel free to download and play with it.  It will automatically fetch input data across the internet, and produce textual and graphical result files.

This release works on Windows, Linux, Mac OS X, FreeBSD, and probably anywhere else you can get Python to work.  The only dependency is on Python (2.5.2 or later, as we discovered today that the code to fetch input data trips over a bug in earlier Python libraries).

The results of running this release match GISTEMP results very closely indeed:

Comparison of ccc-gistemp with GISTEMP, on common input data

In fact, the annual global, northern hemisphere, and southern hemisphere anomaly results are identical, as are the southern hemisphere monthly anomalies.  The global monthly anomalies differ 7 times, out of more than 1000, each time by one digit in the least-significant place.

This ends phase 1 of the CCC-GISTEMP project.  However, although there is no remaining Fortran, ksh, or C source code, much of step1.py is still GISS code, and a lot of the large-scale structure of the code is still dictated by its 1980s Fortran heritage.  For instance, the data is broken up into pieces because it couldn’t all fit into memory at once [ed: 2010-01-19: this particular instance is Issue 25 and it's now fixed].  This obscures the underlying algorithms being applied.  Phase 2 of CCC-GISTEMP will refactor the code to eliminate this obscurity.  We expect one side-effect to be an increase in speed.

Thanks to all who have contributed, including David Jones, Paul Ollis, Gareth Rees, John Keyes, and Richard Hendricks. Thanks also to Reto Ruedy at GISS, who has been helpful and responsive.

GISTEMP 2009 anomaly

GISS haven’t published a 2009 anomaly yet (as of writing, 2010-01-11T14:30Z), but new GHCN records were made available on 2010-01-07. I’ve just made a fresh run of our ccc-gistemp code with all fresh inputs to produce this graph:
global historical temperature anomaly

Because I’m using fully up to date inputs, this run of ccc-gistemp produces an anomaly for 2009. That red tick at the end is the extra year, 2009, that we produce.

I predict that when GISTEMP publish their 2009 anomaly, it will be +0.58 K.

[minor edits: screwed up year in opening paragraph, and colour of labels in graph]