Archive for the ‘status’ Category
Trendy!
Posted by drj | Filed under status
tool/vischeck.py has been recently updated so that it computes and draws trends (the work was done by me and Nick Barnes). Here’s some recent comparisons redrawn with trends:
The “before 1992 / after 1992 stations” from “The 1990s station dropout does not have a warming effect”:
The short trends are done with the last 30 years of data for each series (which since one series ends in 1991, is a different period for each). Notice how similar the recent trends are.
Reprising the Urban Adjustment post:
I don’t think I’ve done a combined land and ocean chart comparing hemispheres for the blog before, but here it is now:
Nick Barnes added the calculation of R2 whilst I was writing this post, causing me to redraw all the charts.
Nick has also been exploiting ccc-gistemp’s new parameters.py module, and did a run with the somewhat experimental 250km smoothing rather than the traditional 1200km smoothing. The parameter is named gridding_radius and it affects gridding in Step 3; setting it to 250km essentially reduces each station’s influence to very roughly the size of the cell used in gridding.
The effect on the trends is most visible in the Northern Hemisphere:
Trends are just one minor example of the way in which the ccc-gistemp code can be continuously improved. We don’t just draw trends for one graph, we improve the code so that all graphs can have trends.
ccc-gistemp release 0.4.0
Posted by Nick.Barnes | Filed under announcement
[Updated: ccc-gistemp release 0.4.1 is now available]
I am pleased to announce ccc-gistemp release 0.4.0. This release is much clearer than previous releases. Give it a go.
- Almost all of our code has now been rewritten to remove the Fortran style which remained from the original conversion from GISTEMP. Previous releases had greatly improved steps 0-2; this release continues the improvement work there and also carries those improvements through steps 3-5. Almost all of the code now has sensible variable and function names, clearer data handling, and helpful comments. Many unused variables and functions have been removed. The current core algorithm has 3740 lines of code, of which more than half are either comments, documentation strings, or blank.
- Rounding has been completely eliminated from the system. Previously, rounding and truncation code was used to exactly emulate GISTEMP. Rounding made the code less clear, and Dr Reto Ruedy of NASA GISS confirmed that rounding was not important to the algorithm, so it has been removed. All temperature data is now handled internally as floating point degrees Celsius (previously it was a mixture of integer tenths, floating point tenths, and floating point degrees) and all location information is handled as floating point degrees latitude and longitude (previously it was a mixture of floating point degrees and integer hundredths).
- In a normal run of ccc-gistemp, no data passes through intermediate files. Much of GISTEMP is concerned with generating and consuming intermediate files, to separate phases and to avoid keeping the whole dataset in memory at once (an important consideration when GISTEMP was originally written). We have now completely replaced this with an in-memory pipeline, which is clearer, automatically pipelines all the processing where possible, and avoids all code concerned with serialization and deserialization.
We now have separate code to generate data files between the distinct steps of the GISTEMP algorithm, and to allow running a step from a data file instead of in a pipeline. This allows the running of single steps, and is useful for testing purposes. - Parameters, such as the 1200 km radius used when gridding, and the number, 3, of rural stations required to adjust an urban station, which were scattered throughout the code, are now all to be found, with explanatory comments, in code/parameters.py
- It’s now possible to omit Step 4 and produce a land-only index, which closely matches GISTEMP.
- It’s also possible to omit Step 2, and run the algorithm without the urban heat-island adjustment.
- GISTEMP recently switched to using nighttime brightness to determine urban/rural stations. We made the corresponding change, which is switchable.
Note that none of these changes altered any of our results by more than 0.01 degrees C, except for the change to urban station identification, for which the changes in our results (none greater than 0.03 degrees C) closely match the changes the GISTEMP results.
The work for this release has been done by David Jones, Paul Ollis, and Nick Barnes.
[Updated: this release has been swiftly followed by ccc-gistemp release 0.4.1, to fix a bug reported in comments here.]
GISTEMP Land Index
Posted by drj | Filed under announcement
GISS publish a land-only temperature anomaly (referred to as their “traditional analysis”).
As I pointed out in an earlier article ccc-gistemp can now create a land index by omitting Step 4: python tool/run.py -s0-3,5.
Here’s how we compare with official GISTEMP:
The 1990s station dropout does not have a warming effect
Posted by drj | Filed under announcement
Tamino gives his results for his GHCN based temperature reconstruction. It is well worth reading. He also gives a comparison between stations that are reporting after 1992, and those that “dropped out” before 1992. He concludes that there is no significant difference in the overall trend. In other words refuting the claim that the 1990s station dropout has a warming effect. His results are preliminary and for the Northern Hemisphere only.
Tamino’s analysis use only the land stations; in order to write this blog post I tweaked ccc-gistemp so that we can produce a land index (python tool/run.py -s 1-3,5 now skips step 4, avoids merging in the ocean data, and effectively produces a global average based only on land data).
It is very easy to subset the input to ccc-gistemp and run it with smaller input datasets. So in this case I can split the input data into stations reporting since 1992, and those that have no records since 1992, and run ccc-gistemp separately on each input. I created tool/v2split.py to split the input data. Specifically I ran step 0 (which merges USHCN, Antarctic, and Hohenpeissenberg data into the GHCN data) to create work/v2.mean_comb then split that file into those stations reporting in 1992 and after, and those not reporting after the cutoff. Then I ran steps 1,2,3, and 5 of ccc-gistemp to create a land index:
It is certainly not the case that the warming trend is stronger in the data from the post-cutoff stations.
The differences between these results and Tamino’s are interesting. Both show good agreement for most of the 20th century. These data show more divergence than Tamino’s in the 1800’s. Is that because we’re using Southern Hemisphere data as well, or is it because of the difference in station combining? Further investigation is merited.
We hope to make “experiments” of this sort easier to perform using ccc-gistemp and encourage anyone interested to download the code and play with it.
Update: Nick B obliges with a graph of the differences:
On integers, floating-point numbers, and rounding
Posted by Nick.Barnes | Filed under status
Progress continues on the ccc-gistemp project. Anyone interested is welcome to go on over to the source code browse page and peruse it.
- Paul Ollis has done excellent work separating all the I/O code from the main algorithm, and refactoring it so that data can flow through the entire program without passing through several intermediate data files.
- David Jones has made a tool for indexing plain-text data files for random access, and has been working SVG-based visualisation tools. Together, one day these will let us provide a snappy graphical interface for answering questions like “how did the peri-urban adjustment on this station work?”
- I have been working on removing rounding from the whole system. Until now we have often found ourselves having to round values in order to maintain exact equivalence with GISS results (which may have been rounded for output to an intermediate data file which is read by a later phase). For example, rounding temperatures to the nearest tenth degree Celsius, or latitude and longitude values to the nearest tenth degree. I mentioned this in email with Dr Reto Ruedy of GISS, and he assured me that all such rounding is incidental to the algorithm – an accident of history. So we are removing it from our version, to help clarify the algorithm. We will end up with the only explicit rounding in the system being done in order to write the final result files.
- Next I am hoping we will extract the main numerical parameters of the algorithm – for instance, the 1200km station radius for gridding, the 4 rural stations required for peri-urban adjustment – to a separate module, where they can be easily modified by anyone interested in experimenting with different values.
We are aiming for a release 0.4.0 of ccc-gistemp to happen around the end of February or in early March, time permitting. The specification of this version is something like “no I/O, no rounding, and explicit parameters”, and we’re pretty close to that now.
Rounding in GISTEMP has prompted a lot of discussion in the blogosphere, and since I have been working in that area in ccc-gistemp, I thought I could write a few words here to clarify it. There is a lot of general misunderstanding of computer arithmetic, even among professional programmers. I have dealt with the nitty-gritty of it in various capacities in the past, and hopefully can convey some of my expertise.
(more…)
ccc-gistemp release 0.3.0
Posted by Nick.Barnes | Filed under announcement
I am pleased to annnounce ccc-gistemp release 0.3.0. This includes a number of bug fixes and features in our framework and tools, and a great deal of clarification work especially in steps 1 (station combination) and 2 (peri-urban adjustment). Really, it’s much better. Give it a go.
Much of GISTEMP was concerned with generating and consuming intermediate files, to separate phases and to avoid keeping the whole dataset in memory at once (an important consideration when GISTEMP was originally written). In 0.3.0 this has largely been replaced by an iterator-based approach, which is clearer, automatically pipelines all the processing where possible, and avoids all code concerned with serialization and deserialization.
We have retained intermediate files between the distinct steps of the GISTEMP algorithm, for compatibility with GISTEMP and for testing purposes. We have also retained some code to round or truncate some data at the points where Fortran truncates it for serialization. This will be removed in future.
Some of the original GISS code was already in Python, and survived almost unchanged in 0.2.0. Much of the rest of 0.2.0, especially the more complex arithmetical processing in step 2, was more-or-less transliterated from the Fortran. A lot of this code has been rewritten in 0.3.0, especially improving the clarity of the station-combining code (in step1.py) and the peri-urban adjustment (now in step2.py).
There has been a rearrangement of the code: the code/ directory now only contains code which we consider part of the GISTEMP algorithm. Everything else – input data fetching, run framework, testing, debugging utilities – is in the tool/ directory. This division will continue, to allow us to add useful tools while still reducing and clarifying the core code.
There is better code for comparing results, and a regression test against genuine GISTEMP results.
All-Python ccc-gistemp release
Posted by Nick.Barnes | Filed under announcement
I am proud to announce release 0.2.0 of ccc-gistemp. This is an all-Python reimplementation of GISTEMP, the NASA GISS surface temperature analysis. Please feel free to download and play with it. It will automatically fetch input data across the internet, and produce textual and graphical result files.
This release works on Windows, Linux, Mac OS X, FreeBSD, and probably anywhere else you can get Python to work. The only dependency is on Python (2.5.2 or later, as we discovered today that the code to fetch input data trips over a bug in earlier Python libraries).
The results of running this release match GISTEMP results very closely indeed:
In fact, the annual global, northern hemisphere, and southern hemisphere anomaly results are identical, as are the southern hemisphere monthly anomalies. The global monthly anomalies differ 7 times, out of more than 1000, each time by one digit in the least-significant place.
This ends phase 1 of the CCC-GISTEMP project. However, although there is no remaining Fortran, ksh, or C source code, much of step1.py is still GISS code, and a lot of the large-scale structure of the code is still dictated by its 1980s Fortran heritage. For instance, the data is broken up into pieces because it couldn’t all fit into memory at once [ed: 2010-01-19: this particular instance is Issue 25 and it's now fixed]. This obscures the underlying algorithms being applied. Phase 2 of CCC-GISTEMP will refactor the code to eliminate this obscurity. We expect one side-effect to be an increase in speed.
Thanks to all who have contributed, including David Jones, Paul Ollis, Gareth Rees, John Keyes, and Richard Hendricks. Thanks also to Reto Ruedy at GISS, who has been helpful and responsive.
GISTEMP 2009 anomaly
Posted by drj | Filed under status
GISS haven’t published a 2009 anomaly yet (as of writing, 2010-01-11T14:30Z), but new GHCN records were made available on 2010-01-07. I’ve just made a fresh run of our ccc-gistemp code with all fresh inputs to produce this graph:
Because I’m using fully up to date inputs, this run of ccc-gistemp produces an anomaly for 2009. That red tick at the end is the extra year, 2009, that we produce.
I predict that when GISTEMP publish their 2009 anomaly, it will be +0.58 K.
[minor edits: screwed up year in opening paragraph, and colour of labels in graph]
GISTEMP tab
Posted by drj | Filed under status
I added a tab page about GISTEMP which has more detail on the status of ccc-gistemp. Of note from that page:
It is our opinion that the GISTEMP code performs substantially as documented in Hansen, J.E., and S. Lebedeff, 1987: Global trends of measured surface air temperature. J. Geophys. Res., 92, 13345-13372., the GISTEMP documentation, and other papers describing updates to the procedure.
How close are we to GISTEMP?
Posted by drj | Filed under status
This close:
The two graphs are almost on top of each other. I’ll add 0.02K to the black line to separate them a bit:
We can now see the red series that the black series was hiding, and we can see that the differences between the 2 series are minute at most. 1 or 2 centikelvin here and there. Red is official GISTEMP, black is our ccc-gistemp code.
What exactly am I comparing? GISTEMP’s global temperature anomalies, one set from their website, one set from our ccc-gistemp code. I’m running the vischeck command:
code/vischeck.py -o 2 result/GLB.Ts+dSST.txt result/GLB.Ts.ho2.GHCN.CL.PA.txt
(the -o option is used to produce the offset graphs, bottom picture)
The first file is GLB.Ts+dSST.txt, that I download from NASA yesterday. The second file, GLB.Ts.ho2.GHCN.CL.PA.txt, is the result of me running ccc-gistemp yesterday.
But it’s not a very careful comparison. The inputs I am using are SBBX.HadR2 and v2.mean downloaded on 2009-12-04 and an hcn_doe_mean_data downloaded in June (!). Also, the version of the GISTEMP code we are coding against is quite old (about a year) and has been updated several times. For example, GISTEMP currently use USHCN version 2, ccc-gistemp does not (yet). The fact that we’re not keeping up with GISTEMP is Issue 7.
Furthermore the exact output may depend on the Fortran compilers being used, the architecture on which I’m running, and the Python versions we’re using.
The bottom line is that we’re already very close to the GISTEMP output, well with any meaningful error threshold. As we get closer we’ll need to be a lot more careful about keeping track of exactly what inputs and software tools are being used. We’ve requested from GISS a copy of the exact inputs and outputs for one of the runs, so that we have a fixed set for comparison purposes.