ccc-gistemp release 0.4.0

[Updated: ccc-gistemp release 0.4.1 is now available]

I am pleased to announce ccc-gistemp release 0.4.0. This release is much clearer than previous releases. Give it a go.

  • Almost all of our code has now been rewritten to remove the Fortran style which remained from the original conversion from GISTEMP. Previous releases had greatly improved steps 0-2; this release continues the improvement work there and also carries those improvements through steps 3-5. Almost all of the code now has sensible variable and function names, clearer data handling, and helpful comments. Many unused variables and functions have been removed. The current core algorithm has 3740 lines of code, of which more than half are either comments, documentation strings, or blank.
  • Rounding has been completely eliminated from the system. Previously, rounding and truncation code was used to exactly emulate GISTEMP. Rounding made the code less clear, and Dr Reto Ruedy of NASA GISS confirmed that rounding was not important to the algorithm, so it has been removed. All temperature data is now handled internally as floating point degrees Celsius (previously it was a mixture of integer tenths, floating point tenths, and floating point degrees) and all location information is handled as floating point degrees latitude and longitude (previously it was a mixture of floating point degrees and integer hundredths).
  • In a normal run of ccc-gistemp, no data passes through intermediate files. Much of GISTEMP is concerned with generating and consuming intermediate files, to separate phases and to avoid keeping the whole dataset in memory at once (an important consideration when GISTEMP was originally written). We have now completely replaced this with an in-memory pipeline, which is clearer, automatically pipelines all the processing where possible, and avoids all code concerned with serialization and deserialization.
    We now have separate code to generate data files between the distinct steps of the GISTEMP algorithm, and to allow running a step from a data file instead of in a pipeline. This allows the running of single steps, and is useful for testing purposes.
  • Parameters, such as the 1200 km radius used when gridding, and the number, 3, of rural stations required to adjust an urban station, which were scattered throughout the code, are now all to be found, with explanatory comments, in code/parameters.py
  • It’s now possible to omit Step 4 and produce a land-only index, which closely matches GISTEMP.
  • It’s also possible to omit Step 2, and run the algorithm without the urban heat-island adjustment.
  • GISTEMP recently switched to using nighttime brightness to determine urban/rural stations. We made the corresponding change, which is switchable.

Note that none of these changes altered any of our results by more than 0.01 degrees C, except for the change to urban station identification, for which the changes in our results (none greater than 0.03 degrees C) closely match the changes the GISTEMP results.

The work for this release has been done by David Jones, Paul Ollis, and Nick Barnes.

[Updated: this release has been swiftly followed by ccc-gistemp release 0.4.1, to fix a bug reported in comments here.]

17 Responses to “ccc-gistemp release 0.4.0”

  1. ligne Says:

    great job! keep up the excellent work :-)

    just another data-point to your supported platforms list: it Works For Me on debian etch (i386) and python 2.5.

  2. sod Says:

    thanks for this work!

  3. drj Says:

    Thanks. And it’s always useful to have reports of ccc-gistemp working (as well as not).

  4. carrot eater Says:

    Very smart of you to make features easily switchable.

    I know I could just look in the code, but is the urban/rural now based on nightlights alone? The point has been reasonably raised by various people that cities in low-income countries may be dark at night.

  5. drj Says:

    @carrot eater: Yes. ccc-gistemp gets it from a value in columns 102 to 106 of the v2.inv file (supplied by NASA GISS). See is_rural() in step2.py and the code to read that file.

    You can get the old behaviour with the use_global_brightness parameter.

  6. Tim W Says:

    Firstly, thank you very much for putting in the work to make this code as clear as possible. I’m sure this will be a great aid in resolving many thermometer-related issues by providing a means to actually test them. Well done.

    I was running diss version 0.4.0 and ran into a problem with the regression.py module. I am using Windows XP, Python release 2.6.4 and got the following error:

    Traceback (most recent call last):
    File “tool\regression.py”, line 27, in (module)
    from code import script_support
    ImportError: Cannot import name script_support

    I downloaded the file from a previous release and tried to run it with awful results. . .

    Thanks,

    Tim

    [I have moved this from the Code page — Nick Barnes]

  7. Nick.Barnes Says:

    You are quite right, Tim. Thanks for this bug report, which I have moved to our defect-tracking system. I removed the script_support module from the code directory because it was no longer needed there, without checking the modules in the tool directory. I’ll fix this.

  8. Nick.Barnes Says:

    I have fixed the bug that Tim reported, and made release 0.4.1.

  9. Dick Veldkamp Says:

    Now there’s such a wonderful new version, requests will pour in to make the program even nicer!

    One thing about which I have been thinking for a while is this: why use (semi) rectangular gridding to calculate the global average temperature? Wouldn’t it be more natural to put a Delaunay triangulation on the globe with the stations as vertices? Station weights would follow naturally from the dimensions of the triangles. Also the polae regions would be included without special tricks.

    Is there a reason not to use this method, apart from convention?

    Keep up the good work!

  10. drj Says:

    @Dick: Delaunay triangulation is an interesting idea. We don’t do it of course because ccc-gistemp does what GISTEMP does, which is an equal area grid of boxes. In the future we hope to make the gridding step more flexible and allow a choice of grids.

    One question for your Delaunay idea: How big would the triangles be that touch St Helena (or any other isolated station such), and why should those stations get more influence than others?

  11. Dick Veldkamp Says:

    Re: Delaunay triangulation

    @DRJ: I think that St Helena does not get more weight than it does in rectangular gridding. Suppose the grid cell containing St Helena has no other stations, then St Helena would fix the temperature for that entire cell.

    If we do not apply any physical modelling to the problem of finding the global average temperature, it seems to me that for each point on the Earth we can make no better temperature estimate than by interpolation based on triangles. The station weights follow from this.

    Well, maybe that is not true. Continuing with the ‘What’s the best estimate?” question, why not construct some smooth surface -by kriging say- and find the average temperature based on that?

    I would love to try out a couple of thee things, but unfortunately my regular work leaves me no time just now.

  12. carrot eater Says:

    I’m getting quite very tempted to try our your product, now.

    Any gut idea on whether it will have trouble running properly if I read in USHCN data from the raw or TOB file, instead of the fully adjusted F52.avg.gz file?

    Because step 0 reads in and acts on the flags in the F52.avg file, marked as ‘E’, ‘X’ or ‘Q’, I’d have to edit out these lines in step 0:

    flag = line[m*7+17]
    if ((flag in ‘EQ’) or

    Is there anything else that could go wrong? I doubt it would make much of any difference to the global numbers, but the differences between the USHCN raw, TOB and final adjustment are not negligible, so being able to choose the input file would be a fun capability.

  13. drj Says:

    @carrot eater: I just reviewed the USHCN v2 readme.txt file, the code, and had a quick look at a the latest raw USHCN v2 file. I see no reason why it wouldn’t work, and I don’t think you’d need to edit the code.

    The raw and F52 data files are in the same format. The raw file still has flags in the same column positions, and it would still seem sensible to ignore E and Q values (although the raw file only seems to use the Q flag).

    You can download whatever USHCNv2 file you want and unpack to input/ushcnv2 and ccc-gistemp will use that without downloading a new one.

  14. carrot eater Says:

    drj

    You’re right; the raw does have the ‘Q’ flag in it, and flagged values should be excluded there as well. I’ll try it out sometime. Thank you for the reply.

    Any idea why the ‘X’ flag isn’t excluded from GISTEMP, as well? On my reading of the readme file, the values in F52 with the flags E, Q and X are all in-filled using FILNET; just for different reasons.

    If GISS does not want to use infilled values, I don’t understand why it doesn’t exclude all of them.

  15. Nick.Barnes Says:

    I don’t recall why GISTEMP uses the ‘X’ values. I recommend that you do try running with ‘raw’, ‘tob’, and ‘F52′ datasets, and use the tool/compare_results.py script to compare the results.
    If I have time later in the week, I might do this myself, and make a blog post here to show the results.

  16. Kicker Says:

    clearclimatecode.org – da mejor. Guardar va!
    Gracias

    Kicker

  17. Global Warming Contrarians Part 1.1: Amateur Temperature Records « Planet James Says:

    […] (generally independent of each other) including Zeke Hausfather of The Blackboard, Nick Barnes of Clear Climate Code (a project to replicate GISS results using clearer processing code), Tamino of Open Mind, Joseph of […]

Leave a Reply