Archive for the ‘status’ Category
ccc-gistemp release 0.4.0
Posted by Nick.Barnes | Filed under announcement
I am pleased to annnounce ccc-gistemp release 0.4.0. This release is much clearer than previous releases. Give it a go.
- Almost all of our code has now been rewritten to remove the Fortran style which remained from the original conversion from GISTEMP. Previous releases had greatly improved steps 0-2; this release continues the improvement work there and also carries those improvements through steps 3-5. Almost all of the code now has sensible variable and function names, clearer data handling, and helpful comments. Many unused variables and functions have been removed. The current core algorithm has 3740 lines of code, of which more than half are either comments, documentation strings, or blank.
- Rounding has been completely eliminated from the system. Previously, rounding and truncation code was used to exactly emulate GISTEMP. Rounding made the code less clear, and Dr Reto Ruedy of NASA GISS confirmed that rounding was not important to the algorithm, so it has been removed. All temperature data is now handled internally as floating point degrees Celsius (previously it was a mixture of integer tenths, floating point tenths, and floating point degrees) and all location information is handled as floating point degrees latitude and longitude (previously it was a mixture of floating point degrees and integer hundredths).
- In a normal run of ccc-gistemp, no data passes through intermediate files. Much of GISTEMP is concerned with generating and consuming intermediate files, to separate phases and to avoid keeping the whole dataset in memory at once (an important consideration when GISTEMP was originally written). We have now completely replaced this with an in-memory pipeline, which is clearer, automatically pipelines all the processing where possible, and avoids all code concerned with serialization and deserialization.
We now have separate code to generate data files between the distinct steps of the GISTEMP algorithm, and to allow running a step from a data file instead of in a pipeline. This allows the running of single steps, and is useful for testing purposes. - Parameters, such as the 1200 km radius used when gridding, and the number, 3, of rural stations required to adjust an urban station, which were scattered throughout the code, are now all to be found, with explanatory comments, in code/parameters.py
- It’s now possible to omit Step 4 and produce a land-only index, which closely matches GISTEMP.
- It’s also possible to omit Step 2, and run the algorithm without the urban heat-island adjustment.
- GISTEMP recently switched to using nighttime brightness to determine urban/rural stations. We made the corresponding change, which is switchable.
Note that none of these changes altered any of our results by more than 0.01 degrees C, except for the change to urban station identification, for which the changes in our results (none greater than 0.03 degrees C) closely match the changes the GISTEMP results.
The work for this release has been done by David Jones, Paul Ollis, and Nick Barnes.
GISTEMP Land Index
Posted by drj | Filed under announcement
GISS publish a land-only temperature anomaly (referred to as their “traditional analysis”).
As I pointed out in an earlier article ccc-gistemp can now create a land index by omitting Step 4: python tool/run.py -s0-3,5.
Here’s how we compare with official GISTEMP:
The 1990s station dropout does not have a warming effect
Posted by drj | Filed under announcement
Tamino gives his results for his GHCN based temperature reconstruction. It is well worth reading. He also gives a comparison between stations that are reporting after 1992, and those that “dropped out” before 1992. He concludes that there is no significant difference in the overall trend. In other words refuting the claim that the 1990s station dropout has a warming effect. His results are preliminary and for the Northern Hemisphere only.
Tamino’s analysis use only the land stations; in order to write this blog post I tweaked ccc-gistemp so that we can produce a land index (python tool/run.py -s 1-3,5 now skips step 4, avoids merging in the ocean data, and effectively produces a global average based only on land data).
It is very easy to subset the input to ccc-gistemp and run it with smaller input datasets. So in this case I can split the input data into stations reporting since 1992, and those that have no records since 1992, and run ccc-gistemp separately on each input. I created tool/v2split.py to split the input data. Specifically I ran step 0 (which merges USHCN, Antarctic, and Hohenpeissenberg data into the GHCN data) to create work/v2.mean_comb then split that file into those stations reporting in 1992 and after, and those not reporting after the cutoff. Then I ran steps 1,2,3, and 5 of ccc-gistemp to create a land index:
It is certainly not the case that the warming trend is stronger in the data from the post-cutoff stations.
The differences between these results and Tamino’s are interesting. Both show good agreement for most of the 20th century. These data show more divergence than Tamino’s in the 1800’s. Is that because we’re using Southern Hemisphere data as well, or is it because of the difference in station combining? Further investigation is merited.
We hope to make “experiments” of this sort easier to perform using ccc-gistemp and encourage anyone interested to download the code and play with it.
Update: Nick B obliges with a graph of the differences:
On integers, floating-point numbers, and rounding
Posted by Nick.Barnes | Filed under status
Progress continues on the ccc-gistemp project. Anyone interested is welcome to go on over to the source code browse page and peruse it.
- Paul Ollis has done excellent work separating all the I/O code from the main algorithm, and refactoring it so that data can flow through the entire program without passing through several intermediate data files.
- David Jones has made a tool for indexing plain-text data files for random access, and has been working SVG-based visualisation tools. Together, one day these will let us provide a snappy graphical interface for answering questions like “how did the peri-urban adjustment on this station work?”
- I have been working on removing rounding from the whole system. Until now we have often found ourselves having to round values in order to maintain exact equivalence with GISS results (which may have been rounded for output to an intermediate data file which is read by a later phase). For example, rounding temperatures to the nearest tenth degree Celsius, or latitude and longitude values to the nearest tenth degree. I mentioned this in email with Dr Reto Ruedy of GISS, and he assured me that all such rounding is incidental to the algorithm – an accident of history. So we are removing it from our version, to help clarify the algorithm. We will end up with the only explicit rounding in the system being done in order to write the final result files.
- Next I am hoping we will extract the main numerical parameters of the algorithm – for instance, the 1200km station radius for gridding, the 4 rural stations required for peri-urban adjustment – to a separate module, where they can be easily modified by anyone interested in experimenting with different values.
We are aiming for a release 0.4.0 of ccc-gistemp to happen around the end of February or in early March, time permitting. The specification of this version is something like “no I/O, no rounding, and explicit parameters”, and we’re pretty close to that now.
Rounding in GISTEMP has prompted a lot of discussion in the blogosphere, and since I have been working in that area in ccc-gistemp, I thought I could write a few words here to clarify it. There is a lot of general misunderstanding of computer arithmetic, even among professional programmers. I have dealt with the nitty-gritty of it in various capacities in the past, and hopefully can convey some of my expertise.
(more…)
ccc-gistemp release 0.3.0
Posted by Nick.Barnes | Filed under announcement
I am pleased to annnounce ccc-gistemp release 0.3.0. This includes a number of bug fixes and features in our framework and tools, and a great deal of clarification work especially in steps 1 (station combination) and 2 (peri-urban adjustment). Really, it’s much better. Give it a go.
Much of GISTEMP was concerned with generating and consuming intermediate files, to separate phases and to avoid keeping the whole dataset in memory at once (an important consideration when GISTEMP was originally written). In 0.3.0 this has largely been replaced by an iterator-based approach, which is clearer, automatically pipelines all the processing where possible, and avoids all code concerned with serialization and deserialization.
We have retained intermediate files between the distinct steps of the GISTEMP algorithm, for compatibility with GISTEMP and for testing purposes. We have also retained some code to round or truncate some data at the points where Fortran truncates it for serialization. This will be removed in future.
Some of the original GISS code was already in Python, and survived almost unchanged in 0.2.0. Much of the rest of 0.2.0, especially the more complex arithmetical processing in step 2, was more-or-less transliterated from the Fortran. A lot of this code has been rewritten in 0.3.0, especially improving the clarity of the station-combining code (in step1.py) and the peri-urban adjustment (now in step2.py).
There has been a rearrangement of the code: the code/ directory now only contains code which we consider part of the GISTEMP algorithm. Everything else – input data fetching, run framework, testing, debugging utilities – is in the tool/ directory. This division will continue, to allow us to add useful tools while still reducing and clarifying the core code.
There is better code for comparing results, and a regression test against genuine GISTEMP results.
All-Python ccc-gistemp release
Posted by Nick.Barnes | Filed under announcement
I am proud to announce release 0.2.0 of ccc-gistemp. This is an all-Python reimplementation of GISTEMP, the NASA GISS surface temperature analysis. Please feel free to download and play with it. It will automatically fetch input data across the internet, and produce textual and graphical result files.
This release works on Windows, Linux, Mac OS X, FreeBSD, and probably anywhere else you can get Python to work. The only dependency is on Python (2.5.2 or later, as we discovered today that the code to fetch input data trips over a bug in earlier Python libraries).
The results of running this release match GISTEMP results very closely indeed:
In fact, the annual global, northern hemisphere, and southern hemisphere anomaly results are identical, as are the southern hemisphere monthly anomalies. The global monthly anomalies differ 7 times, out of more than 1000, each time by one digit in the least-significant place.
This ends phase 1 of the CCC-GISTEMP project. However, although there is no remaining Fortran, ksh, or C source code, much of step1.py is still GISS code, and a lot of the large-scale structure of the code is still dictated by its 1980s Fortran heritage. For instance, the data is broken up into pieces because it couldn’t all fit into memory at once [ed: 2010-01-19: this particular instance is Issue 25 and it's now fixed]. This obscures the underlying algorithms being applied. Phase 2 of CCC-GISTEMP will refactor the code to eliminate this obscurity. We expect one side-effect to be an increase in speed.
Thanks to all who have contributed, including David Jones, Paul Ollis, Gareth Rees, John Keyes, and Richard Hendricks. Thanks also to Reto Ruedy at GISS, who has been helpful and responsive.
GISTEMP 2009 anomaly
Posted by drj | Filed under status
GISS haven’t published a 2009 anomaly yet (as of writing, 2010-01-11T14:30Z), but new GHCN records were made available on 2010-01-07. I’ve just made a fresh run of our ccc-gistemp code with all fresh inputs to produce this graph:
Because I’m using fully up to date inputs, this run of ccc-gistemp produces an anomaly for 2009. That red tick at the end is the extra year, 2009, that we produce.
I predict that when GISTEMP publish their 2009 anomaly, it will be +0.58 K.
[minor edits: screwed up year in opening paragraph, and colour of labels in graph]
GISTEMP tab
Posted by drj | Filed under status
I added a tab page about GISTEMP which has more detail on the status of ccc-gistemp. Of note from that page:
It is our opinion that the GISTEMP code performs substantially as documented in Hansen, J.E., and S. Lebedeff, 1987: Global trends of measured surface air temperature. J. Geophys. Res., 92, 13345-13372., the GISTEMP documentation, and other papers describing updates to the procedure.
How close are we to GISTEMP?
Posted by drj | Filed under status
This close:
The two graphs are almost on top of each other. I’ll add 0.02K to the black line to separate them a bit:
We can now see the red series that the black series was hiding, and we can see that the differences between the 2 series are minute at most. 1 or 2 centikelvin here and there. Red is official GISTEMP, black is our ccc-gistemp code.
What exactly am I comparing? GISTEMP’s global temperature anomalies, one set from their website, one set from our ccc-gistemp code. I’m running the vischeck command:
code/vischeck.py -o 2 result/GLB.Ts+dSST.txt result/GLB.Ts.ho2.GHCN.CL.PA.txt
(the -o option is used to produce the offset graphs, bottom picture)
The first file is GLB.Ts+dSST.txt, that I download from NASA yesterday. The second file, GLB.Ts.ho2.GHCN.CL.PA.txt, is the result of me running ccc-gistemp yesterday.
But it’s not a very careful comparison. The inputs I am using are SBBX.HadR2 and v2.mean downloaded on 2009-12-04 and an hcn_doe_mean_data downloaded in June (!). Also, the version of the GISTEMP code we are coding against is quite old (about a year) and has been updated several times. For example, GISTEMP currently use USHCN version 2, ccc-gistemp does not (yet). The fact that we’re not keeping up with GISTEMP is Issue 7.
Furthermore the exact output may depend on the Fortran compilers being used, the architecture on which I’m running, and the Python versions we’re using.
The bottom line is that we’re already very close to the GISTEMP output, well with any meaningful error threshold. As we get closer we’ll need to be a lot more careful about keeping track of exactly what inputs and software tools are being used. We’ve requested from GISS a copy of the exact inputs and outputs for one of the runs, so that we have a fixed set for comparison purposes.
Detailed ccc-gistemp status
Posted by Nick.Barnes | Filed under status
Thank you, David, for kicking off the blog. Thank you, John Keyes, for setting it up and hosting it. This entry is a brief description of the current status of the CCC-GISTEMP project. Anyone interested should feel free to wander over to the project page to browse or download the code.
GISTEMP as published by NASA consists of six steps, numbered 0 to 5. Each step includes some FORTRAN code and one or more driver shell scripts (written in the slightly-obscure ksh shell), and takes one or more input files from preceding steps or from external data sources, and sometimes some config files. Most steps produce a number of executables (by compiling the FORTRAN), some intermediate files, and one or more output files (which are either consumed by subsequent steps or are the outputs of GISTEMP as a whole). Some of the data files are in formatted text, some are in big-endian binary FORTRAN formats. The ksh driver scripts would rename and delete some intermediate files as necessary as they went along.
The first thing we did in CCC-GISTEMP was to regularize this structure. The ksh scripts were rewritten in /bin/sh and consolidated into a single run.sh file. All the files were placed in consistent subdirectories (all the config files in config/, all the executables in bin/, all the intermediate files in work/, log files in log/, and so on). Some files have been renamed, and no files are now deleted during a run, so intermediate files can be inspected.
Following that, we have done the following re-implementation work:
- David Jones wrote a preflight script, which fetches any source data as necessary.
- I rewrote STEP0 as step0.py. This reads the input met station data from various sources and consolidates it into a single consistent plain-text data file. Our version does not use the various intermediate data files of GISTEMP STEP0.
- I rewrote STEP1 as step1.py. This takes the data file and performs some adjustments (for instance, step-changes where a weather station has been replaced), as specified by configuration files. STEP1 was already partly in Python and partly in C. I have rewritten the C but I have not tinkered much with the existing Python; just enough to consolidate it into a single file. This stage produces a number of intermediate files in DB2 format; I haven’t changed that.
- Paul Ollis rewrote STEP2 as a number of Python files. This applies peri-urban adjustment and calculates anomaly values. Paul has maintained the structure of the FORTRAN code quite closely in the Python.
- David Jones rewrote STEP3 as step3.py. This produces monthly weighted anomaly values for each geographical “boxes” and “sub-boxes” – a division of the Earth’s surface into 8000 parts of equal area – according to a weighted averaging system based on the distance of each station from the centre of the sub-box.
- STEP4 is still in FORTRAN. This is an optional step which updates a boxed sea-surface temperature file based on recent sea-surface temperature measurements. We haven’t done anything to this, and in fact our current ./run.sh file doesn’t run any of this code.
- We are in the process of rewriting STEP5 as step5.py. This combines the land data from step 3 and the sea data from step 4 into a single data set (according to land/ocean weighting in boxes with both land and ocean), and outputs a set of formatted text files giving monthly and annual temperature anomalies for a number of zones and for the globe as a whole.
- We have a little script step5res.py, which takes the global anomaly file produced by step 5 and turns it into a chart using Google Charts.
There was a long hiatus this year, but David and I are both active in the CCC project again and we hope to complete our first-cut Python version of GISTEMP soon.