Gavin’s bug

(I call it Gavin’s bug because he found it, not because he created it).

Gavin’s bug concerns Step 3, and in particular the part where for each cell a list of stations within 1200 km is created. Creating this list of “nearby” stations means potentially consulting all the stations. All the stations could be quite a lot. So for each of the 80 boxes (see overview.txt), the code restricts the search list to an extended box. Here’s a typical box (in white) and the extended box (in green):

In principle it’s faster to first cull the entire station list by rejecting those stations outside of an enclosing rectangle based on latitude and longitude, because we can do that quickly without needing to do the trigonometry for the proper “within 1200 km” test.

The bug is that the rectangle chosen for the culling is not large enough. Here’s the code from revision 665 of

# Extend box, by half a box east and west and by arc north
# and south.
extent = [box[0] - arcdeg,
box[1] + arcdeg,
box[2] - 0.5 * (box[3] - box[2]),
box[3] + 0.5 * (box[3] - box[2])]
if box[0] <= -90 or box[1] >= 90:
# polar
extent[2] = -180.0
extent[3] = +180.0

Note that the comment says that the box is extended by half a width east and west. The polar boxes (there are 4 for the North Pole, and they all meet at the pole) have to be special-cased because otherwise stations close to the pole would be missed.

The bug concerns the boxes that span latitudes 44 to 64 (there are 8 of these in each hemisphere, see overview.txt. The extended box isn’t quite wide enough:

But wait… that circle (above) is centred on the northwest corner of the box. When it comes to cells, it’s the centre of the cell that used to select the stations within 1200 km, and the northwest cell is set inside the box a little bit. Is it enough?

The white circle is 1200 km around the centre of the northwest cell. It just fits. Phew!

So even though there are point within 1200km of the box that are not inside the extended box, no stations are missed, because the cells inside the box are too large to be close enough to the edge.

Perhaps at one time computers were slow enough for this “optimisation” to make a difference, but it’s been irrelevant for probably at least 20 years. Whilst the code seems to be correct, it’s not clearly correct.

It would be simpler and clearer without it. So it’s gone.

It does change the results by a tiny tiny bit. The, essentially bogus, reasons for which I may writeup in another post.

7 Responses to “Gavin’s bug”

  1. Nick.Barnes Says:

    Could the inbox() function go as well?

  2. Gareth Rees Says:

    It seems kind of wrong to mark issue 95 as WontFix, though, since there was in fact a defect (the code wasn’t clear enough) and you did fix it.

  3. drj Says:

    @Gareth: I agree. Really what i wanted to say was “NotABug”. I’ve created Issue 102 for the code clarity bug.

  4. drj Says:

    @Nick: yes! now it’s gone too.

  5. Robert Way Says:

    Hey all at CCC

    See the following, you might find it interesting!

  6. rr Says:

    A recent posting at Andy Revkin’s Dot Earth blog was a conversation with Gavin Schmidt. In the comments, a deniar had a question for Gavin and Gavin replied (through Revkin) here:

    Note Gavin’s first bullet in his reply:

    “- the algorithms, code and data for GISTEMP are all online and publicly accessible, and have been independently replicated by the the project.”

    Would Nick Barnes or other members of who worked on the GISTEMP project care to comment on Gavin’s use of the term “independently replicated”? Seems like quite a stretch to me.

    Gavin is clear on what this project is about isn’t he?

  7. Nick.Barnes Says:

    Certainly Gavin is aware of and understands our project, although I don’t suppose it impinges on his radar very much. I wouldn’t use the term “independently replicated”, but I don’t think it’s a major stretch. Although in the early history of ccc-gistemp we were very particular to match exactly the precise details of every step of GISTEMP, so that we could understand and reproduce them, subsequent development has allowed many independent variations and experiments, and the fundamental results have been very robust under such variation.

Leave a Reply