OKCon CCC Presentation

Saturday past was OKCon 2010 and we were in London to give a presentation about Clear Climate Code (well, Nick, Paul, and I were). Specifically, I was there to monkey the slides, and Nick was there to stand up and talk.

A PDF of the slides (3.5e6 octets) is available from our googlecode download page; you can also find a zip of PNGs there if you need it.

It was an interesting conference; thanks to Open Knowledge Foundation for organising, and everyone else for attending.

10 Responses to “OKCon CCC Presentation”

  1. barry Says:

    Hi there – I have an off-topic query, something I’ve been wondering for a long time. I think Nick Barnes and Steve Mosher might be able to help me.

    What occasioned the demise of the analysis of surfacestations.org data at Climate Audit that kicked off late 2007?

    I was a lurker there and followed the discussion with great interest. This was the final of four threads on the matter:

    http://climateaudit.org/2007/10/14/ushcn-3/

    Why did it stop?

  2. naught101 Says:

    Nice work. The code size is especially interesting. I know that it’s not really important, but have you tried any benchmark comparisons?

    Also, have you considered something like Drupal’s Project manager (http://drupal.org/project/project), for on-site project management? You could then allow people to add and manage their own projects. Of course, then you need a drupal site and a CVS repository, but they aren’t too hard to set up.

  3. Nick.Barnes Says:

    @naught101: If by “benchmark” you mean speed comparisons, then we do some on an informal ad-hoc basis. On my development machine, the FORTRAN version took a couple of minutes to run, and the Python takes about 18 minutes.

    However, speed is not an important requirement for this project: if it was ten times slower than this, we’d probably put in some effort to speed it up, but as it stands the speed is broadly acceptable (although it can be irritating during development). The key requirement is clarity, and in pursuit of that we have made some specific changes which slow the code down (for instance, moving some very-frequently-executed common code out into a function in a separate module: it is called a billion times during execution, and every time we pay a small cost for module lookup). There are some changes we could make to speed the whole system up, some of which we have rejected, at least for the time being, because they would reduce clarity or make usage more complex. For instance, we could use numpy, a Python library for numeric and scientific programming.

    One ccc-gistemp developer has tried using pypy, a JIT implementation for Python, with considerable success (he got a speedup factor of about 5). See his report on his experiments.

  4. Nick.Barnes Says:

    @barry: Gosh, that is very off-topic. I’m not very familiar with the details of the surface stations project. I did read Menne et al 2010, which seemed to be a pretty robust paper. Further discussion of this should probably be taken elsewhere, as it’s not really relevant to CCC.

  5. barry Says:

    Sorry, Nick. I’ve been trying to contact John Vliet with no luck and was directed here. Didn’t mean to bring gossip to a technical site.

  6. Marcus Says:

    Hi all, thanks for putting together great and useable code! I’ve tested out a bunch of different things (urban adjustments, station drop-out, <10 brightness only calculation, etc.). One odd result:

    http://tinyurl.com/27q32x8
    http://tinyurl.com/25bdz7o

    [inlined here by drj:


    ]
    Red line in both charts is my attempt to run the GISS code with v2mean.adj instead of v2mean, and leaving out step2. Basically, my attempt to see how the NOAA GHCN adjustments compare to the GISS adjustment routine, given the same ocean and spatial interpolation routines.

    Black line in the first chart is GISS without step2, black line in the second chart is GISS run normally.

    So, my question is why is there such a large deviation between GISS using v2mean.adj and GISS using v2mean in the last 15 years? My first thought was Arctic amplification… but that was the whole point of doing the comparison using Clear Climate, so that interpolation wouldn't be an issue.

    Thoughts? (also, I was a little surprised that it ran at all… I expected there to be some format difference between v2mean.adj and v2mean that would ruin things)

    Oh. And my next thought was that it is possible that my v2mean is older than my v2mean.adj by a month or so… but that would be a pretty radical change for GHCN?

    -Marcus

    (and again, thanks for such a great tool!)

  7. Marcus Says:

    Checking with v2.mean and v2.mean.adj downloaded simultaneously:

    http://tinyurl.com/2v7eqww

    That’s not it. So, really, what it looks like to me is that GHCN has picked a couple years to really push the temperature record colder (especially 1995 and much of 2005-2009). Seems odd to me.

    Unless there is a formatting difference between v2.mean and v2.mean.adj that could make this a spurious result?

    -Marcus

  8. Nick.Barnes Says:

    @Marcus: Thanks very much for performing this experiment, and thank you for your kind remarks about ccc-gistemp. I’m intrigued by your result, and have no way of accounting for it. There are various adjustments applied by GHCN, and I don’t have the details at my fingertips (I’m at a conference today). My next step would be to look at intermediate steps, probably starting with the step3 output (gridded land-only data). Unfortunately ccc-gistemp doesn’t yet have good visualisation tools for gridded data, so I’d have to knock something up.
    A quick-and-dirty approach could be to look at using tool/compare_results.py to generate a regression-test comparison between your two final results. That will at least give more statistical information, some on a gridded basis, about the difference between your two result sets.

  9. Marcus Says:

    Ah. I hadn’t used the compare_results code yet.

    A quick check suggests that Box 03 has a lot of issues after 1994, with residues in the 6-7 range (both positive and negative), and Box 10 shows up once. And all of the top-10 largest monthly residues are in the northern hemisphere after 1993. I’ll try and modify compare_results to give me more info on the largest monthly box residues so I can identify the other boxes that might be involved, but won’t get to it for a day or two…

  10. carrot eater Says:

    Marcus,
    That is a wonderful experiment – a great use of ccc.

    But I am puzzled by the result. So please do keep giving updates as you investigate.

Leave a Reply