## Posts Tagged ‘anomalies’

## Byrd

Posted by drj | Filed under Uncategorized

In this comment Bob Koss notices something a little about some of the 1980 anomalies near Byrd station (-80.0-119.4). He points out there is a patch of suspiciously zero anomaly north of Byrd. Illustrated by this graphic (the dot is Byrd, roughly):

Here’s an extract from ccc-gistemp’s Step 3 output (`work/v2.step3.out`) showing that several cells have flat 0 anomalies for 1980:

-77.2-121.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -77.2-112.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -77.2-103.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -77.2-094.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -74.8-130.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -74.8-121.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -74.8-112.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -74.8-103.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -74.8-094.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -72.7-139.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -72.7-130.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -72.7-121.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -72.7-112.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -72.7-103.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -70.9-139.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -70.9-130.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -70.9-121.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -70.9-112.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -70.9-103.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0 -69.2-121.5C1980-9999-9999 0 0 0 0 0 0 0 0 0 0

(this file is a v2.mean style file where the 11-digit station identifiers are in fact the centre of the grid cell rounded to the nearest .1 degree in latitude and longitude.)

It turns out that all of those cells have a time series starting in 1980. And this leads to me having to explain a curiosity in the algorithm used to produce anomalies.

The anomalies produce by the gridding step, Step 3, have a baseline of 1951 to 1980 (see gridding_reference_period in parameters.py). What this means is that just before the end of Step 3 each cell’s time series is adjusted: For each of the 12 months in the year, an average is computed for the period 1951 to 1980 and this average is subtracted from that month’s time series.

If there is a cell that is unlucky enough to have its series start in 1980, then the average for the period 1951 to 1980 is simply the 1980 value. That’s subtracted from the series, so after Step 3 all the anomalies for 1980 will be 0 (see list above).

This doesn’t happen a lot. Recall that the GISTEMP analysis uses a grid with 8000 cells. We can see that the vast majority of the cells (that have any series at all) have 30 years of data for the period 1951 to 1980:

(The chart above represents a simplification, but a reasonable one: A year is marked as “present” if it has *any* valid monthly data)

The vast majority of the cells have a full 30 years for the base period.

What if there are no values between 1951 and 1980? Then in that case the algorithm uses an average over the entire series for its baseline. There are 39 such cells (that’s the tiny blip on the left of the graph).

So some cells will have a different baseline, some cells will have a pathetically small number of values to average over in the baseline (the 1980 example above is the extreme example). This is unfortunate if you’re the sort of person that worries about individual years in localised regions, but it doesn’t affect the larger scale averages at all.

Koss also asks why the cell just north and west of Byrd does not have a yearly anomaly for 1980. A reasonable question since it’s close to Byrd and Byrd has data for 1980.

Here’s a Google Earth screenshot, showing stations south of -70. Taken from roughly above the South Pole with 0 degrees longitude to the right.

Note that this shows stations after Step 2 (I ran the command «`tool/kml.py --lat -90,-70 work/v2.step2.out > step2.kml`»), and Step 2 discards all stations with a short series, which means throwing out several Antarctic stations.

Let’s zoom in a little on Byrd and add the gridded cells (a select few of the gridded cells from Step 3):

For the cells (with locations of the form -NN.N-EEE.E), the 4 digit number is the first year with data (technically, the first year with enough data to produce a yearly anomaly, but that’s the same in these cases).

So why does cell -77.2-121.5 (just north of Byrd) have a series that starts in 1980? Answer: Because it uses data from Byrd, and Byrd’s data starts in 1980. The little squiggle inside the icon is the yearly anomaly series and you can just about see that -77.2-121.5 is the same squiggle as Byrd.

What about cell -77.2-130.5 (just north and west of Byrd)? Why does that have no anomaly for 1980? Because its series starts in 1985. Why is that? Because this cell uses data from Gill and not Byrd (again, compare the squiggle to Gill’s squiggle). Why is that? Because there are 3 stations within 1200 km of this cell. I recently added a bit of logging to Step 3 and it’s now possible to see for each cell what stations contributed and the weights of their contribution:

$ grep -e -77.2-130.5C log/step3.log -77.2-130.5C stations [('700893760009', 0.10965896795479568), ('700893240009', 0.0), ('700891250000', 0.0)]

70089376000, Gill, -80.00-178.60

70089324000, Byrd, -80.00-119.40

70089125000, Byrd Station, -80.02-119.53

Byrd and Byrd Station are presumably different stations in more or less the same location. Byrd Station has a temperature record starting in 1957 (yay for International Geophysical Year) running continuously up to 1971, and then sporadic southern summers up to 1987. Byrd is presumably the modern station; its record starts in 1980 and runs up to 2010 (not quite continuously).

When considering the order in which to combine stations to produce a series for a cell, stations are considered “longest record first” by a simple count of the total number of months of valid data. It turns out that Gill, 70089376000, has 246 months of data, while Byrd, 70089324000, has 216 months of data. So Gill comes first (even though Byrd is much closer).

Referring back to the log output for the three stations, we see that records 700893240009 and 700891250000 contributed with 0.0 weight. In other words, not at all. Why is that?

In order for a station to combine with a cell series it has to have a certain number of years of common overlap: parameters.gridding_min_overlap. This is normally 20. Each month of the year is considered separately; it turns out that February is the month for which Gill and Byrd have the greatest overlap, with 18 years in common. Just falling short of the cut. If both Gill and Byrd continue to have temperature records then in 2012 their February record will be combined. This still won’t be enough to compute a yearly anomaly for 1980 for that cell, because only February will combine, not the other months of the year. And one month is not enough to compute a yearly anomaly.

For completeness we should consider Byrd Station, 70089125000, as well. Gill starts in 1985, and Byrd Station has only 4 monthly values since 1985 (and in fact, only 8 for the whole of the 1980’s). So no chance of enough overlap.

However for the cell containing Byrd itself, -80.1-121.5, we can see that its record starts in 1957. Again grepping station contributors out of the log:

$ grep -e -80.1-121.5 log/step3.log -80.1-121.5C stations [('700890090008', 0.079848008479036836), ('700893760009', 0.12098360649264683), ('700893240009', 0.9657843990261038), ('700891250000', 0.96815637481310624)]

The first station is Amundsen–Scott, 70089009000, with a record that starts in 1957 (International Geophysical Year again). We have increased overlap, and so it turns out that Gill *and* both records from Byrd contribute. You can think of Amundsen-Scott as being like a catalyst that allows Gill and Byrd to combine when they ordinarily would not. Amundsen-Scott is only just within range to contribute to this cell. Cells further north will not use Amundsen-Scott, so Gill and Byrd will not combine (unless there is some other station in range to provide an overlapping series).

**Conclusion**: Some individual cells can have slightly unintuitive series, but they are merely the result of the documented computation (the key facts—that records are combined longest first, stations from up to 1200 km away are used, and records require an overlap for combining—are all documented in Hansen and Lebedeff 1987). A tiny fraction of cells in the anomaly maps either have misleadingly short anomaly periods or entirely different anomaly periods. This makes no difference to zonal averages. Probably when browsing the maps such cells should be marked or eliminated.