ccc-gistemp release 0.5.1

I am pleased to announce ccc-gistemp 0.5.1 (the astute reader will note that there is no announcement for release 0.5.0. It is available but does not work in Python 2.5.1 so I fixed that for release 0.5.1).

Compared to the previous release, the changes are not so grand. This release incorporates many incremental improvements to clarity. It also has a couple of bug fixes: to cope with the fact that the GISTEMP source tarfile that we used changed its layout (see this comment here for example); and to once again run on Python 2.4 (a thoroughly ancient version, please try and use Python 2.6).

I have spent a large amount of time trying to clarify Step 2 the peri-urban adjustment described in Hansen et al 1999. I encourage you to try out this release, read the code, and help us improve it.

David Jones, Nick Barnes, and Ronan Lamy have contributed to this release.

Opening up the IPCC

Updated: I have now 2010-06-29 submitted this comment to the IAC. Thank you, all signatories.

We have a rare opportunity to affect the conduct and perception of climate science. If you believe this is important, please read on, and comment.

The Intergovernmental Panel on Climate Change (IPCC) produces reports which review and summarize the science of climate change. These reports are then used by inter-governmental treaties, bodies, conferences, and national governments, as the basis for international and national policies on climate change. In other words, it is vitally important. The Clear Climate Code project has the goal of “increasing public confidence in climate science results”, and the perception of IPCC reports directly affects this goal.

There has been a lot of controversy about the accuracy and balance of IPCC reports. In response, in March the UN asked the InterAcademy Council (representing the national science academies of many different countries) to conduct a review of the IPCC processes and procedures. A committee has been established and the review is underway. The committee is now soliciting public comment. This is a rare opportunity to influence the way in which the science of climate change is conducted, reviewed, synthesized, and communicated.

I have written the following comment, and am hereby soliciting signatures. If you agree with this comment and would like to be added as a signatory, please either contact me directly, or post a comment to this blog post, giving your name and affiliation, as you wish it to appear in the list of signatories. Please also spread the word about this blog post, and encourage your friends, colleagues, and contacts to sign it.

[edited to add: as people send me their endorsements, I will update the list of signatories here in the post. I cannot make other changes, since this is now receiving signatures.]

Comment to the InterAcademy Council Review of the Intergovernmental Panel on Climate Change.

1. Summary

The IPCC procedures should be amended to increase the transparency of
the science and of the IPCC process itself. The proposed amendments
are small, but would have a large effect on confidence in IPCC
reports.

“Sunlight is said to be the best of disinfectants” – Louis D. Brandeis, 1913.

2. The Problem

IPCC reports contribute to global public policy debates and processes,
which may have major effects on the daily lives of every person in the
world. Every government and large enterprise has already been
affected. As the century continues, the effects of policies based on
IPCC work will increase in their scope and impact: they will create
whole new industrial sectors, thousands of businesses, and many ways
of life.

For this reason, the IPCC reports and the processes which create them
have been under increasing scrutiny. Questions are asked and doubts
are raised, both about the IPCC process and about the underlying
scientific research. Both the research, and the processes of review
and synthesis, have been criticised for opacity. Very serious
accusations have been made: of a lack of rigor, of group-think, of
conflicts of interest, of deception, and even of conspiracy and fraud.

This has led to doubts about the validity of IPCC conclusions, and to
serious difficulty in making national and international policy
regarding climate change.

All this is well-known and need not be rehearsed further here.
Indeed, the recognition of these problems has led directly to the
United Nations request for a review, and the establishment of this IAC
review committee.

3. The Solution

A key part of any solution to these problems is to increase the
transparency of the research underlying IPCC reports, and of the IPCC
process itself. While the research and the process remain closed and
opaque to commentators and to the public, doubts will flourish and
will impede progress.

3.1. Bibliography

The IPCC AR4 WG1 report included references to around 5000 items of
peer-reviewed research. Thousands more were referred by the WG2 and
WG3 reports. To assess or fully understand any part of an IPCC
report, an interested reader will want to follow the bibliographic
references and read the underlying research. For this reason the
bibliographic function of an IPCC report is very important. However,
the IPCC AR4 bibliography does not perform it well.

Each chapter of each report of AR4 has its own separate bibliography.
These bibliographies are not linked together, within a report or
between reports. The formats of these bibliographies varies. There
is no way to see whether any given paper is referred in more than one
working-group report, in more than one chapter, or at all. In the
online published text of each chapter of AR4 each citation does not
link to the matching reference in that chapter’s bibliography. In
turn, in each chapter’s bibliography, each reference does not link to
any online materials relating to that piece of research.

AR5 should have a single unified bibliography, containing all
references in all working group reports. Each citation in the body of
a report should link to the matching entry in the bibliography. If a
reference is to material which is published online, the bibliography
should link to that publication. The bibliography should also
reproduce whatever part of the publication and supporting materials is
available for reproduction (possibly just the abstract, but see
below). To protect these references against future change or loss,
wherever possible the IPCC should also archive copies of any online
publication on its own server (for instance, at the IPCC Data
Distribution Centre http://www.ipcc-data.org/).

There are many free tools available for managing online bibliographic
databases and repositories such as this. Such tools allow
collaborative enterprises such as the IPCC to readily create,
populate, update, search, and publish bibliographic data. The IPCC
should adopt such a tool, and mandate its use by lead authors and
contributing lead authors.

3.2. Underlying Research

Each piece of research lies somewhere on a spectrum of transparency
and open-ness. Some publications are open-access: freely available
for anyone to read and assess. For instance, some are published in
open-access journals. Many are not open-access, but describe results
such as datasets which are publicly available. Still more may have
some additional materials, such as computer source code used to
produce or analyse the datssets, freely available for download.
Finally, a great deal of research is entirely closed: only the
abstract is available, and neither the scientific paper, nor the data
described in the paper, nor the computer source code (or other
processing details), is generally open.

In recent years, and especially since AR4, it has become clear that
public confidence in research is directly connected to this spectrum
of transparency. The more open the research, the less vulnerable it
is to criticism, and especially to the more serious accusations of
fabrication and fraud. As argued above, this criticism seriously
damages the reputation of the IPCC and impedes progress in the use of
the IPCC reports.

For this reason, all contributors to AR5 should be encouraged to open
their work as much as possible: to make their contributed papers
available online, to publish their datasets and supporting materials
such as computer source code, design documents, and additional text,
images, and charts. This can be very simply done by the IPCC
routinely gathering and publishing information about the transparency
of each piece of underlying research. This information can easily be
stored in the IPCC bibliographic database.

As noted above, whenever possible a publication, and/or supporting
material, should be copied to an IPCC repository, to protect against
change or loss. As publications in climate science become more open,
such reproduction should be increasingly possible.

3.3. The IPCC Process

Much of the IPCC process itself is already open. Draft reports,
review comments, and responses are all published. However, the IPCC
reports themselves are not open. It is not possible to freely
reproduce and disseminate them. The IPCC should immediately change
this, and adopt an open licensing policy. All IPCC reports, past and
future, should be freely available under a license which conforms to
the Open Knowledge Definition http://www.opendefinition.org/, for
example the Creative Commons Attribution Share-Alike license CC-BY-SA http://www.opendefinition.org/licenses/cc-by-sa/.

The existing transparency should also be increased. There have been
prominent recent calls for the review and synthesis process to take
place in public, for instance by adopting a wiki-style drafting
mechanism. Such a move would protect the IPCC against certain
accusations of group-think (or even conspiracy). However, such a move
is somewhat outside the scope of the detailed recommendations below.

4. Recommendations

This is a series of concrete recommendations for amendments to the
document “Principles Governing IPCC Work, Appendix A – Procedures for
the preparation, review, acceptance, adoption, approval and
publication of IPCC Reports”
, with the effect of implementing the
solutions described above.

In section 4.1, “Introduction to Review Process”, this paragraph should
be added:
The IPCC Secretariat should identify, implement, and provide a
bibliographic system and repository for the use of Coordinating
Lead Authors, Lead Authors, and Review Editors.
The content of this bibliographic system and repository shall be
shared between all the Working Groups and the Task Force on
National Greenhouse Gas Inventories, and shall be publicly
available on or before completion of the Report for a period of at
least five years.

In section 4.2.3, “Preparation of Draft Report”, this sentence should
be added to the first paragraph:
Contributions should include, wherever possible, access
instructions for any original data, supplementary materials,
computer source code used for analysis or processing, and an
indication of the public availability and licensing of such
materials.

In Annex 1, under “Lead Authors”, this paragraph should be added:
Lead Authors shall record all contributed material in the IPCC
bibliographic system. Where any access to original data,
supplementary materials, or computer source code is provided, Lead
Authors shall record such access in the IPCC bibliographic system
and, wherever possible, copy such material to the IPCC repository.

In section 4.2, “Reports Accepted by Working Groups and Reports
prepared by the Task Force on National Greenhouse Gas Inventories”,
this paragraph should be added:
Reports accepted by Working Groups, or prepared by the Task Force
on National Greenhouse Gas Inventories, shall be made publicly
available under the Creative Commons Attribution Share-Alike
license CC-BY-SA.

In section 4.4, “Reports Approved and/or Adopted by the Panel”, this
paragraph should be added:
The Synthesis Report shall be made publicly available under the
Creative Commons Attribution Share-Alike license CC-BY-SA.

Furthermore, the IPCC should make its existing reports publicly available
under the same CC-BY-SA license.

5. Conclusion

The IPCC reports have been questioned and attacked on many fronts, and
this has been a source of great difficulty in making national and
international policy regarding climate change. A principal ground for
complaint has been the transparency of the underlying science and of
the IPCC process of review and synthesis. Progress can be enabled by
addressing these complaints: by making the science and the process far
more open.

The IPCC doesn’t have a direct influence on the working practices of
the thousands of researchers who contribute work to its reports.
However, it can shine a bright light on those practices by the simple
and cheap step of requesting and recording certain information in its
bibliography, and by making that bibliography readily available to the
public.

Finally, by making its own processes more open, and by making its own
reports more freely available, the IPCC can both avoid any further
criticism on these grounds and set a leading example for the research
community from which it is drawn.

—-

Signatories

  • Nicholas Barnes, Founder, Clear Climate Code project
  • David Jones, Founder, Open Climate Code project
  • Richard Drake, Founder, Open Climate Initiative
  • Rufus Pollock, Founder, Open Knowledge Foundation
  • Jonathan Gray, Community Coordinator, Open Knowledge Foundation
  • Joshua Halpern, Professor of Chemistry, Howard University
  • Tim Lambert, School of Computer Science and Engineering, University of New South Wales
  • Peter Murray-Rust, University of Cambridge and Open Knowledge Foundation
  • Andrew Montford. Author: The Hockey Stick Illusion
  • Subbiah Arunachalam, Distinguished Fellow, Centre for Internet and Society, Bangalore, India
  • Dave Berry, ex Deputy Director of the UK National e-Science Centre
  • Peter Suber, Berkman Fellow, Harvard University
  • Lucia Liljegren of the Blackboard
  • Carrick Talmadge, Senior Scientist, University of Mississippi
  • Ivo Grigorov (Centre National de la Recherche Scientifique/DTU-Aqua)
  • William Eichinger, William Ashton Professor of Engineering, University of Iowa
  • Nick Levine
  • Philippa Davey
  • Leif Burrough
  • David L. Hagen
  • Scott McKay
  • Ronald Broberg
  • Ted Lemon
  • Martin Brumby
  • Gerry Morrow
  • David Bishop
  • Conrad Taylor
  • John Shade
  • Allen McMahon
  • Robert Thomson
  • Eamon Watters
  • Bruce Cunningham
  • Greg Freemyer
  • Chad Herman
  • Barry Woods
  • Jack Mosevich
  • Stephen L. Jones
  • Zeke Hausfather
  • Daniel Godet
  • Laurence Childs
  • Peter O’Neil
  • Phillip Bratby
  • Colin Brooks
  • Andrew Smith
  • Peter Walsh
  • Louis Hooffstetter
  • Steve Fitzpatrick
  • Stephen Gaalema
  • Charles Minning
  • Brian Crounse

Airport Warming

More or less on a whim I split the GHCN data into two sets: Those stations marked as being at an airport; those stations not marked as being at an airport. This is easy to do because the v2.inv file puts an ‘A’ in column 81 (counting from 0) for airport stations.

Here’s the airport versus non-airport comparison for ccc-gistemp:

Certainly for the most recent 50 years it doesn’t seem to matter much whether you use exclusively airport based measurements or exclude airport based measurements (considering the global anomaly).

My earlier post about the 1990s station “dropout” used a similar technique of splitting the input data into two sets.

OKCon CCC Presentation

Saturday past was OKCon 2010 and we were in London to give a presentation about Clear Climate Code (well, Nick, Paul, and I were). Specifically, I was there to monkey the slides, and Nick was there to stand up and talk.

A PDF of the slides (3.5e6 octets) is available from our googlecode download page; you can also find a zip of PNGs there if you need it.

It was an interesting conference; thanks to Open Knowledge Foundation for organising, and everyone else for attending.

Trendy!

tool/vischeck.py has been recently updated so that it computes and draws trends (the work was done by me and Nick Barnes). Here’s some recent comparisons redrawn with trends:

The “before 1992 / after 1992 stations” from “The 1990s station dropout does not have a warming effect”:


The short trends are done with the last 30 years of data for each series (which since one series ends in 1991, is a different period for each). Notice how similar the recent trends are.

Reprising the Urban Adjustment post:

I don’t think I’ve done a combined land and ocean chart comparing hemispheres for the blog before, but here it is now:

Nick Barnes added the calculation of R2 whilst I was writing this post, causing me to redraw all the charts.

Nick has also been exploiting ccc-gistemp’s new parameters.py module, and did a run with the somewhat experimental 250km smoothing rather than the traditional 1200km smoothing. The parameter is named gridding_radius and it affects gridding in Step 3; setting it to 250km essentially reduces each station’s influence to very roughly the size of the cell used in gridding.

The effect on the trends is most visible in the Northern Hemisphere:

Trends are just one minor example of the way in which the ccc-gistemp code can be continuously improved. We don’t just draw trends for one graph, we improve the code so that all graphs can have trends.

ccc-gistemp release 0.4.0

[Updated: ccc-gistemp release 0.4.1 is now available]

I am pleased to announce ccc-gistemp release 0.4.0. This release is much clearer than previous releases. Give it a go.

  • Almost all of our code has now been rewritten to remove the Fortran style which remained from the original conversion from GISTEMP. Previous releases had greatly improved steps 0-2; this release continues the improvement work there and also carries those improvements through steps 3-5. Almost all of the code now has sensible variable and function names, clearer data handling, and helpful comments. Many unused variables and functions have been removed. The current core algorithm has 3740 lines of code, of which more than half are either comments, documentation strings, or blank.
  • Rounding has been completely eliminated from the system. Previously, rounding and truncation code was used to exactly emulate GISTEMP. Rounding made the code less clear, and Dr Reto Ruedy of NASA GISS confirmed that rounding was not important to the algorithm, so it has been removed. All temperature data is now handled internally as floating point degrees Celsius (previously it was a mixture of integer tenths, floating point tenths, and floating point degrees) and all location information is handled as floating point degrees latitude and longitude (previously it was a mixture of floating point degrees and integer hundredths).
  • In a normal run of ccc-gistemp, no data passes through intermediate files. Much of GISTEMP is concerned with generating and consuming intermediate files, to separate phases and to avoid keeping the whole dataset in memory at once (an important consideration when GISTEMP was originally written). We have now completely replaced this with an in-memory pipeline, which is clearer, automatically pipelines all the processing where possible, and avoids all code concerned with serialization and deserialization.
    We now have separate code to generate data files between the distinct steps of the GISTEMP algorithm, and to allow running a step from a data file instead of in a pipeline. This allows the running of single steps, and is useful for testing purposes.
  • Parameters, such as the 1200 km radius used when gridding, and the number, 3, of rural stations required to adjust an urban station, which were scattered throughout the code, are now all to be found, with explanatory comments, in code/parameters.py
  • It’s now possible to omit Step 4 and produce a land-only index, which closely matches GISTEMP.
  • It’s also possible to omit Step 2, and run the algorithm without the urban heat-island adjustment.
  • GISTEMP recently switched to using nighttime brightness to determine urban/rural stations. We made the corresponding change, which is switchable.

Note that none of these changes altered any of our results by more than 0.01 degrees C, except for the change to urban station identification, for which the changes in our results (none greater than 0.03 degrees C) closely match the changes the GISTEMP results.

The work for this release has been done by David Jones, Paul Ollis, and Nick Barnes.

[Updated: this release has been swiftly followed by ccc-gistemp release 0.4.1, to fix a bug reported in comments here.]

GISTEMP Land Index

GISS publish a land-only temperature anomaly (referred to as their “traditional analysis”).

As I pointed out in an earlier article ccc-gistemp can now create a land index by omitting Step 4: python tool/run.py -s0-3,5.

Here’s how we compare with official GISTEMP:

GISTEMP Urban Adjustment

After some recent tweaks by me to the ccc-gistemp sources it is now possible to run a pipeline of the GISTEMP process with some of the steps omitted. An earlier post shows how I can omit Step 4 to create a land-only index. My recent changes allow Step 2 to be omitted. Step 2 is the urban adjustment step (in which stations marked as urban have their trend adjusted).

Omitting Step 2 will therefore give us an idea of the magnitude of the effect of the urban adjustment. It so happens that my writing this blog post overlaps with Nick Barnes implementing GISTEMP’s new scheme for identifying urban stations (corresponding to GISTEMP’s update of 2010-01-16). That gives me an opportunity to show both the new and old adjustment schemes against a “no adjustment” baseline:

In making this graph Step 4 has been omitted, giving us a land index. This is primarily to amplify the differences: land covers the lesser fraction of the Earth; so including the ocean data (which does not require an urban adjustment) makes the difference smaller.

And for each hemisphere:

Northern:

Southern:

To make a “no urban adjustment” run of ccc-gistemp: «python tool/run.py -s 0,1,3,5»; and to make an “urban adjustment” land-index: «python tool/run.py -s 0,1,2,3,5».

The 1990s station dropout does not have a warming effect

Tamino gives his results for his GHCN based temperature reconstruction. It is well worth reading. He also gives a comparison between stations that are reporting after 1992, and those that “dropped out” before 1992. He concludes that there is no significant difference in the overall trend. In other words refuting the claim that the 1990s station dropout has a warming effect. His results are preliminary and for the Northern Hemisphere only.

Tamino’s analysis use only the land stations; in order to write this blog post I tweaked ccc-gistemp so that we can produce a land index (python tool/run.py -s 1-3,5 now skips step 4, avoids merging in the ocean data, and effectively produces a global average based only on land data).

It is very easy to subset the input to ccc-gistemp and run it with smaller input datasets. So in this case I can split the input data into stations reporting since 1992, and those that have no records since 1992, and run ccc-gistemp separately on each input. I created tool/v2split.py to split the input data. Specifically I ran step 0 (which merges USHCN, Antarctic, and Hohenpeissenberg data into the GHCN data) to create work/v2.mean_comb then split that file into those stations reporting in 1992 and after, and those not reporting after the cutoff. Then I ran steps 1,2,3, and 5 of ccc-gistemp to create a land index:

It is certainly not the case that the warming trend is stronger in the data from the post-cutoff stations. [edit 2010-03-22: In a subsequent post I add trend lines to this chart]

The differences between these results and Tamino’s are interesting. Both show good agreement for most of the 20th century. These data show more divergence than Tamino’s in the 1800’s. Is that because we’re using Southern Hemisphere data as well, or is it because of the difference in station combining? Further investigation is merited.

We hope to make “experiments” of this sort easier to perform using ccc-gistemp and encourage anyone interested to download the code and play with it.

Update: Nick B obliges with a graph of the differences:

On integers, floating-point numbers, and rounding

Progress continues on the ccc-gistemp project. Anyone interested is welcome to go on over to the source code browse page and peruse it.

  • Paul Ollis has done excellent work separating all the I/O code from the main algorithm, and refactoring it so that data can flow through the entire program without passing through several intermediate data files.
  • David Jones has made a tool for indexing plain-text data files for random access, and has been working SVG-based visualisation tools. Together, one day these will let us provide a snappy graphical interface for answering questions like “how did the peri-urban adjustment on this station work?”
  • I have been working on removing rounding from the whole system. Until now we have often found ourselves having to round values in order to maintain exact equivalence with GISS results (which may have been rounded for output to an intermediate data file which is read by a later phase). For example, rounding temperatures to the nearest tenth degree Celsius, or latitude and longitude values to the nearest tenth degree. I mentioned this in email with Dr Reto Ruedy of GISS, and he assured me that all such rounding is incidental to the algorithm – an accident of history. So we are removing it from our version, to help clarify the algorithm. We will end up with the only explicit rounding in the system being done in order to write the final result files.
  • Next I am hoping we will extract the main numerical parameters of the algorithm – for instance, the 1200km station radius for gridding, the 4 rural stations required for peri-urban adjustment – to a separate module, where they can be easily modified by anyone interested in experimenting with different values.

We are aiming for a release 0.4.0 of ccc-gistemp to happen around the end of February or in early March, time permitting. The specification of this version is something like “no I/O, no rounding, and explicit parameters”, and we’re pretty close to that now.

Rounding in GISTEMP has prompted a lot of discussion in the blogosphere, and since I have been working in that area in ccc-gistemp, I thought I could write a few words here to clarify it. There is a lot of general misunderstanding of computer arithmetic, even among professional programmers. I have dealt with the nitty-gritty of it in various capacities in the past, and hopefully can convey some of my expertise.
(more…)