Archive for December, 2009
GISTEMP tab
Posted by drj | Filed under status
I added a tab page about GISTEMP which has more detail on the status of ccc-gistemp. Of note from that page:
It is our opinion that the GISTEMP code performs substantially as documented in Hansen, J.E., and S. Lebedeff, 1987: Global trends of measured surface air temperature. J. Geophys. Res., 92, 13345-13372., the GISTEMP documentation, and other papers describing updates to the procedure.
How close are we to GISTEMP?
Posted by drj | Filed under status
This close:
The two graphs are almost on top of each other. I’ll add 0.02K to the black line to separate them a bit:
We can now see the red series that the black series was hiding, and we can see that the differences between the 2 series are minute at most. 1 or 2 centikelvin here and there. Red is official GISTEMP, black is our ccc-gistemp code.
What exactly am I comparing? GISTEMP’s global temperature anomalies, one set from their website, one set from our ccc-gistemp code. I’m running the vischeck command:
code/vischeck.py -o 2 result/GLB.Ts+dSST.txt result/GLB.Ts.ho2.GHCN.CL.PA.txt
(the -o option is used to produce the offset graphs, bottom picture)
The first file is GLB.Ts+dSST.txt, that I download from NASA yesterday. The second file, GLB.Ts.ho2.GHCN.CL.PA.txt, is the result of me running ccc-gistemp yesterday.
But it’s not a very careful comparison. The inputs I am using are SBBX.HadR2 and v2.mean downloaded on 2009-12-04 and an hcn_doe_mean_data downloaded in June (!). Also, the version of the GISTEMP code we are coding against is quite old (about a year) and has been updated several times. For example, GISTEMP currently use USHCN version 2, ccc-gistemp does not (yet). The fact that we’re not keeping up with GISTEMP is Issue 7.
Furthermore the exact output may depend on the Fortran compilers being used, the architecture on which I’m running, and the Python versions we’re using.
The bottom line is that we’re already very close to the GISTEMP output, well with any meaningful error threshold. As we get closer we’ll need to be a lot more careful about keeping track of exactly what inputs and software tools are being used. We’ve requested from GISS a copy of the exact inputs and outputs for one of the runs, so that we have a fixed set for comparison purposes.
Site updates
Posted by Nick.Barnes | Filed under Uncategorized
After several late nights trying to correct misapprehensions in a few places in the blogosphere, I have updated the goals and about pages to remove any ambiguity between this project as it currently stands and the CCC project at Ravenbrook which immediately preceded it.
Hopefully now some critics will join us and work to improve climate science software.
Finding bugs in GISTEMP
Posted by Nick.Barnes | Filed under policies
As part of our work, it is natural that we find problems with the GISTEMP code. All code has bugs, and there are few better ways to find them than by analysing the code in detail for re-implementation.
To date, we have found:
- a bug in STEP0, in reading the last decimal place of USHCN temperature records, which made a very small difference to the GISTEMP results;
- a problem in STEP2, whereby the rounding behavior in some Fortran implementations could cause an infinite loop;
- a collection of problems in STEP5, all basically down to a problem in using sorted indexes with an unsorted array, which didn’t make any difference the the final results.
We have found a number of other problems with the code – this project would not exist if the GISTEMP code were perfect – but these are the only places in which the actual semantics of the code definitely differ from the intentions of the programmer. (more…)
Project history
Posted by Nick.Barnes | Filed under Uncategorized
A potted history of the project so far:
- I had the idea for the project in 2007, after the first release of GISTEMP code. I saw it criticised online for various failings, from the ridiculous (e.g. “I demanded this code and now you’ve released it I don’t understand it”) to the sublime (e.g. the many attacks on a line of code which quite legitimately translated temperatures in Fahrenheit into tenths of degrees Celsius). It was plain to me that any software with results which might determine critical public policy should be more accessible than this. Ideally it ought to be possible for any interested member of the public to download the source code and inspect it.
- I presented my ideas to colleagues at Ravenbrook Limited in the spring of 2008. It was agreed that Ravenbrook should pursue such a project on a pro bono basis: we’d use our systems to host an open-source project, but nobody would be paid for their time.
- David Jones and I got started on the code over the summer of 2008, and presented our first results at PyconUK in September 2008.
- There was considerable interest at the conference and online, including a number of offers of help. Wanting to widen participation in the project, but not keen to host and support the infrastructure, we decided to use a Google Code project, and a Google Groups mailing list, and to consider a wiki or blog. We set those up and various volunteers started work, including John Keyes who later created and hosts this blog and Paul Ollis who has contributed a considerable amount of code.
- Our real lives intervened, and David and I didn’t do anything very much on the CCC project until the autumn of 2009, when we restarted work on the Python reimplementation and on this blog. Just in time for the CRU email hacking incident to stir up a lot of public interest in climate code quality.
Project goals
Posted by Nick.Barnes | Filed under goals
Others online seem to have misunderstood what we are about. I have reworded the project goals accordingly. This public Clear Climate Code project is not working for Ravenbrook’s benefit. Ravenbrook’s contribution to the project is (to a small degree) motivated by self-interest, and I expect the same may be true of other contributors.
Sceptics are welcome
Posted by Nick.Barnes | Filed under policies
Our project goals are well-defined:
1. To produce clear climate science software;
2. To encourage the production of clear climate science software;
3. To increase public confidence in climate science results;
The following are not project goals, and will not form part of the project:
1. To pick fights and flame wars with sceptics and/or denialists;
2. To judge or arbitrate in climate science;
I am not a scientist and I didn’t set up the project to make judgements about climate science. By doing ClearClimateCode I hope to help actual climate scientists to do actual climate science, and to help others to trust the results.
My personal beliefs on some aspects of climate science are pretty well-documented (if you make the reasonable and correct guess that I am the Nick Barnes who sometimes hangs out on blog comment threads): I am certain that anthropogenic global warming is real and a serious global crisis. And those beliefs form a strong motivation for me to start and take part in this project. But this project is not intended to be a platform for promoting those beliefs. The blogosphere is full of places to vent views about these subjects; this is not one.
In particular, the project welcomes sceptics to take part: write code, read code, criticise code.
If you truly doubt the climate science consensus and are (therefore rightly) alarmed at moves for critical public policy to be based on that consensus, then I expect you are keen to discover and publicise the truth about the global temperature record. Working on the project will allow you to do that. Please, join the mailing list, download the code, work with us.
(this important post is partly cut-and-paste from a message I sent last year to the project mailing list)
We find bug in GISTEMP; GISS fixes it
Posted by Nick.Barnes | Filed under Uncategorized
Reto Ruedy of GISS has changed GISTEMP to fix a collection of minor bugs in STEP5′s SBBXotoBX.f, which David Jones and I found while re-implementing STEP5 in Python. The fix did not have any effects on the final numeric outputs of GISTEMP.
This particular program combined land and ocean temperature data. Each sub-box (an area of about 64,000 km^2) is given an “ocean weight”, depending on the amount of ocean data and the distance of the nearest surface station. Then the land and ocean series for each sub-box are given weights depending on the ocean weight and on the number of valid monthly temperatures. Then the 200 series for each box (the land and ocean series for each of 100 sub-boxes) are combined in order of decreasing weight to form a single series for the box.
The error was in the way the land and ocean series were combined after sorting into order: sometimes the index of an entry in the sorted set was used to index into the unsorted set.
As it happens, with the parameters used for this program, in particular the Rintrp parameter set to zero, this error has no effect because the ocean weight is always either 1 or 0, so after sorting the second half of the set of data series always has zero weight.
In email to David and myself, Reto Ruedy expressed thanks to us and to the CCC-GISTEMP project.
Detailed ccc-gistemp status
Posted by Nick.Barnes | Filed under status
Thank you, David, for kicking off the blog. Thank you, John Keyes, for setting it up and hosting it. This entry is a brief description of the current status of the CCC-GISTEMP project. Anyone interested should feel free to wander over to the project page to browse or download the code.
GISTEMP as published by NASA consists of six steps, numbered 0 to 5. Each step includes some FORTRAN code and one or more driver shell scripts (written in the slightly-obscure ksh shell), and takes one or more input files from preceding steps or from external data sources, and sometimes some config files. Most steps produce a number of executables (by compiling the FORTRAN), some intermediate files, and one or more output files (which are either consumed by subsequent steps or are the outputs of GISTEMP as a whole). Some of the data files are in formatted text, some are in big-endian binary FORTRAN formats. The ksh driver scripts would rename and delete some intermediate files as necessary as they went along.
The first thing we did in CCC-GISTEMP was to regularize this structure. The ksh scripts were rewritten in /bin/sh and consolidated into a single run.sh file. All the files were placed in consistent subdirectories (all the config files in config/, all the executables in bin/, all the intermediate files in work/, log files in log/, and so on). Some files have been renamed, and no files are now deleted during a run, so intermediate files can be inspected.
Following that, we have done the following re-implementation work:
- David Jones wrote a preflight script, which fetches any source data as necessary.
- I rewrote STEP0 as step0.py. This reads the input met station data from various sources and consolidates it into a single consistent plain-text data file. Our version does not use the various intermediate data files of GISTEMP STEP0.
- I rewrote STEP1 as step1.py. This takes the data file and performs some adjustments (for instance, step-changes where a weather station has been replaced), as specified by configuration files. STEP1 was already partly in Python and partly in C. I have rewritten the C but I have not tinkered much with the existing Python; just enough to consolidate it into a single file. This stage produces a number of intermediate files in DB2 format; I haven’t changed that.
- Paul Ollis rewrote STEP2 as a number of Python files. This applies peri-urban adjustment and calculates anomaly values. Paul has maintained the structure of the FORTRAN code quite closely in the Python.
- David Jones rewrote STEP3 as step3.py. This produces monthly weighted anomaly values for each geographical “boxes” and “sub-boxes” – a division of the Earth’s surface into 8000 parts of equal area – according to a weighted averaging system based on the distance of each station from the centre of the sub-box.
- STEP4 is still in FORTRAN. This is an optional step which updates a boxed sea-surface temperature file based on recent sea-surface temperature measurements. We haven’t done anything to this, and in fact our current ./run.sh file doesn’t run any of this code.
- We are in the process of rewriting STEP5 as step5.py. This combines the land data from step 3 and the sea data from step 4 into a single data set (according to land/ocean weighting in boxes with both land and ocean), and outputs a set of formatted text files giving monthly and annual temperature anomalies for a number of zones and for the globe as a whole.
- We have a little script step5res.py, which takes the global anomaly file produced by step 5 and turns it into a chart using Google Charts.
There was a long hiatus this year, but David and I are both active in the CCC project again and we hope to complete our first-cut Python version of GISTEMP soon.
Welcome to CCC
Posted by drj | Filed under status
Clear Climate Code is an open project created by Ravenbrook; we aim to write and maintain software for climate modelling and analysis, with an emphasis on clarity and correctness. Our goals are:
- To produce clear climate science software;
- To encourage the production of clear climate science software;
- To increase public confidence in climate science results;
- To promote Ravenbrook’s software consultancy services.
[Updated to add: of course, these are the goals of Ravenbrook's internal project, out of which this open project has grown. We don't expect third parties to sign up to goal 4, and of course they may have other goals of their own. - Nick B]
We are not new, but our blog is. Nick Barnes had the idea for the project in 2007 and he and David Jones started work on it at Ravenbrook in 2008. We talked about at PyCon UK 2008. Since then we have been joined by some contributors (on our mailing list), including John Keyes who has provided hosting for this blog.
Currently we are working on ccc-gistemp which is a reimplementation of the GISTEMP algorithm in Python. We are nearing the end of “step 1″ of that project, at which point we will have a Python program that uses exactly the same inputs as GISTEMP, and produces the same intermediate files, and the same outputs (right now, we have such a program but bits of it still use some of the Fortran code from GISS).