Posts Tagged ‘gistemp’

GISTEMP 2009 anomaly anomaly

In a previous article I predicted that the 2009 GISTEMP anomaly would be +0.58. In fact when it was published it came in at +0.57. This 0.01 K difference is well within any reasonable error bounds and typical of the sort of error you get from rounding. Still, it bothered me. How unlucky was I to get agreement for all the years except the most recent one?

Today I realised that although I was using up to date land data I wasn’t using up to date ocean data. I have just fetched fresh ocean data and rerun ccc-gistemp. Of course the 2009 anomaly comes out as +0.57 K, same as GISS:

Overview of GISTEMP intermediate files

When ccc-gistemp runs, the data files in the input/ directory are processed in a number of steps to produce the results in the result/ directory. On the way many intermediate files are written to the work/ directory. Generally the intermediate files are written by one stage of the process and consumed by a later stage. GISS’s GISTEMP works in a broadly similar way, but the details are slightly different. One of the first things we did when working with GISTEMP was to reorganise where the intermediate files were written.

Many of the intermediate files were only written because the computers on which GISTEMP was originally intended to run were extremely resource poor and the whole data could not always be processed in working memory. Hence, data was written into several intermediate files and processed piece by piece. This is no longer necessary, and ince ccc-gistemp release 0.2.0 we have made changes that mean that far fewer intermediate files are written.

A consequence is that there are now a manageable number of file in the work/ directory, and a listing of them tells a story about how GISTEMP works:

         5 Jan 20 13:27 GHCN.last_year
  44716518 Jan 20 13:28 v2.mean_comb
  29802728 Jan 20 13:30 Ts.txt
  39696368 Jan 20 13:31 Ts.bin
  20853712 Jan 20 13:31 Ts.GHCN.CL
   2107900 Jan 20 13:31 ANN.dTs.GHCN.CL
    354106 Jan 20 13:33 PApars.pre-flags
    371742 Jan 20 13:33 PApars.list
  19233584 Jan 20 13:34 Ts.GHCN.CL.PA
         0 Jan 20 13:34 BX.Ts.GHCN.CL.PA.1200
  50240120 Jan 20 13:50 SBBX1880.Ts.GHCN.CL.PA.1200
  34001576 Jan 20 13:50 SBBX.HadR2
    176152 Jan 20 13:51 ZON.Ts.ho2.GHCN.CL.PA.1200.step1
    176152 Jan 20 13:51 ZON.Ts.ho2.GHCN.CL.PA.1200
     15974 Jan 20 13:51 ANNZON.Ts.ho2.GHCN.CL.PA.1200

The above is a listing of my work/ directory having done a fresh run using subversion revision 199 sources. Each row lists: file size in bytes, timestamp, file name.

The first thing to note are the timestamps. The first file is written at 13:27 and the last file at 13:51. On my machine ccc-gistemp took about 25 minutes for this run.

I’ll go through the files in order and try and explain what each one is. Bear in mind that some of these files will probably disappear in future version as we reduce the number of time data is written to disk and read back in again.


This file is used to pass the highest year that is found in the GHCN input data (input/v2.mean) from step0 (where this file is created) to step2 (where this file is read). The contents are the highest year. This is not a very elegant way to pass this information. It’s needed because in step2 a Fortran binary file is created with fixed record lengths, and the length of the record is related to the highest year that has data so that highest year needs to be known before the binary file is created.


This large file is the output from step0. It contains all the temperature data that GISTEMP will go on to use combined into one file. The temperature data are combined from: GHCN, USHCN, SCAR, and one input file for Hohenpeissenberg. The combining process is not entirely trivial: data from USHCN do not simply replace data from GHCN, they are adjust by the mean monthly difference (see the function include_US in


This is the output of step1. Duplicate records (multiple series for the same weather station) are combined, if possible; an adjustment is made for the St Helena record (listed in the file config/; records and partial records listed in the file config/Ts.strange.RSU.list.IN are removed; an adjustment is made for a discontinuity in the Lihue record (listed in the file config/Ts.discont.RS.alter.IN).

The output format of this file is different from the v2.mean format used in the previous step. Metadata from the file input/v2.inv is included in this file.


At the beginning of step2 the Ts.txt file from step1 is converted to this Fortran binary file. The binary file is easier to access using a Fortran program. At one time in the past the binary file would have been significantly quicker as well, but I doubt that matters these days. Since ccc-gistemp is now entirely in Python it’s likely that we’ll remove the binary file, preserving it only as an option to match the GISTEMP intermediate files.

In the GISS code, this file is then split into 6 files so that the gridding, in step3, can proceed by using only a subset of the station data held in memory at once. Keeping all the data in one file greatly reduced the number of intermediate files created in ccc-gistemp.

The next few files are all internal to step2, the Urban adjustment.


The same data as Ts.bin but trimmed to make the file size slightly smaller. Even more pointless in this day and age.


Annual anomalies for each station, computed in step2. Each of the 12 months has an average computed, and the anomaly for a year is computed from the difference between each month in that year and that month’s average.

There is far less data in this file than the monthly series in Ts.txt, and it is the this data that is used to make the urban adjustment.


By analysing urban and rural stations step2 creates this table of parameters that control what adjustments are going to be made (to urban stations).


The parameters in PApars.pre-flags are annotated with a flag that affects the exact adjustment made.


The data from Ts.GHCN.CL are read in and urban stations are adjusted according to the parameters previously computed and stored in PApars.list. This file contains the adjusted data and is the output of step2.


An empty file created by step3. The GISS version of GISTEMP creates a gridded dataset ( SBBX1880.Ts.GHCN.CL.PA.1200 see below) with 8000 cells (subboxes) and from this also creates a gridded dataset with 80 cells (boxes) which is what this file would be. Currently ccc-gistemp does not create the 80 cell version.


This is the gridded output of step3. It’s created by considering each grid cell in turn, and combining (usually several) station records into one record for each cell. From this point on only gridded data is used.


This file contains sea surface data from Hadley and Reynolds version 2 (the “Had” and “R2″ in its name). It’s the result of step4 which takes the input file of the same name (in the input/) directory, and adds in any updates that have been downloaded. Usually we run without any updates, and in this case step4 simply copies this file from the input/ directory.


This is a temporary file used by step5. Step5 takes the two subbox files, SBBX1880.Ts.GHCN.CL.PA.1200 containing land data, and SBBX.HadR2 containing ocean data, and merges the land and ocean data together creates a gridded file with 80 boxes. The gridded file appears in the result/ directory: BX.Ts.ho2.GHCN.CL.PA.1200.

The data in the gridded file are combined to produce a data series for each of 8 latitudinal zones, then from those 8 zones another 6 are produce for large scale regions, including the 3 for Northern Hemispehere, Southern Hemisphere, and Global average. That zonal data is stored in this file.


The zonal data are read and an alternative computation is done where the global average is the simple average of the northern and southern hemispheres, as opposed to the previous calculation with uses an area weighted average.


Annual anomalies are computed from the monthly data series for each zone.

Whilst the data for this and the previous file are being computed, the text files that hold summaries of this data are written to the result/ directory.

GISTEMP 2009 anomaly

GISS haven’t published a 2009 anomaly yet (as of writing, 2010-01-11T14:30Z), but new GHCN records were made available on 2010-01-07. I’ve just made a fresh run of our ccc-gistemp code with all fresh inputs to produce this graph:
global historical temperature anomaly

Because I’m using fully up to date inputs, this run of ccc-gistemp produces an anomaly for 2009. That red tick at the end is the extra year, 2009, that we produce.

I predict that when GISTEMP publish their 2009 anomaly, it will be +0.58 K.

[minor edits: screwed up year in opening paragraph, and colour of labels in graph]