Finding bugs in GISTEMP

As part of our work, it is natural that we find problems with the GISTEMP code.  All code has bugs, and there are few better ways to find them than by analysing the code in detail for re-implementation.

To date, we have found:

  • a bug in STEP0, in reading the last decimal place of  USHCN temperature records, which made a very small difference to the GISTEMP results;
  • a problem in STEP2, whereby the rounding behavior in some Fortran implementations could cause an infinite loop;
  • a collection of problems in STEP5, all basically down to a problem in using sorted indexes with an unsorted array, which didn’t make any difference the the final results.

We have found a number of other problems with the code – this project would not exist if the GISTEMP code were perfect – but these are the only places in which the actual semantics of the code definitely differ from the intentions of the programmer.

On a number of occasions we have had difficulty understanding the code, and it is only after a long analysis that we have concluded that the code has valid semantics and is simply unclear.  These occasions justify this project.

On some other occasions we have questioned whether the intended algorithm is a sensible way to compute a global surface temperature anomaly dataset.  That question is one which we hope the project can get into at a later date, after our reimplementation has made the various algorithms clear, and readily amenable to inspection and change.  Of course, there are many other approaches adopted by various climate science groups around the world, and one reasonable answer to such criticisms would be “if you don’t like ours, use one of theirs”.

So problems with the code fall into various categories, and when one first struggles to understand a section of code it isn’t always immediately obvious which category one faces.  It would be somewhere between obnoxiously stupid and stupidly obnoxious to trumpet this code is broken on the blog, especially when there’s a reasonable chance that it is our understanding at fault.

My practice to date is as follows: we share questions and doubts among the project team in an ad-hoc way (e.g. the most recent problem was detailed in doc/step5-notes in the SVN repository).  When we are sure that there is a problem in the code, I contact Reto Ruedy at GISS – keeper of the GISTEMP code.  On each occasion in the past he has quickly looked over the GISTEMP code, confirmed the problem, fixed it, and posted an update to the GISTEMP site.  It’s only after such an exchange that we have made any sort of announcement (most recently, here on the blog).

Of course, if GISS failed to respond to our questions or suggestions, or if we felt they were responding unreasonably, then we might take other steps.  But that has never happened to date: they have never failed to reply quickly, courteously, and with gratitude for our help.

One Response to “Finding bugs in GISTEMP”

  1. (Additional) Transforming research through open data « Open data, democracy and public sector reform Says:

    […] [9] Accessed 3rd March 2010 […]

Leave a Reply