Climate models accurately predicted global warming when reflecting natural ocean cycles
Posted on 21 July 2014 by dana1981
Predicting global surface temperature changes in the short-term is a challenge for climate models. Temperature changes over periods of a decade or two can be dominated by influences from ocean cycles like El Niño and La Niña. During El Niño phases, the oceans absorb less heat, leaving more to warm the atmosphere, and the opposite is true during a La Niña.
We can't yet predict ahead of time how these cycles will change. The good news is that it doesn't matter from a big picture climate perspective, because over the long-term, temperature influences from El Niño and La Niña events cancel each other out. However, when we examine how climate model projections have performed over the past 15 years or so, those natural cycles make a big difference.
A new paper led by James Risbey just out in Nature Climate Change takes a clever approach to evaluating how accurate climate model temperature predictions have been while getting around the noise caused by natural cycles. The authors used a large set of simulations from 18 different climate models (from CMIP5). They looked at each 15-year period since the 1950s, and compared how accurately each model simulation had represented El Niño and La Niña conditions during those 15 years, using the trends in what's known as the Niño3.4 index.
Each individual climate model run has a random representation of these natural ocean cycles, so for every 15-year period, some of those simulations will have accurately represented the actual El Niño conditions just by chance. The study authors compared the simulations that were correctly synchronized with the ocean cycles (blue data in the left frame below) and the most out-of-sync (grey data in the right frame) to the observed global surface temperature changes (red) for each 15-year period.
The authors conclude,
When the phase of natural variability is taken into account, the model 15-year warming trends in CMIP5 projections well estimate the observed trends for all 15-year periods over the past half-century.
It's also clear from the grey figure that models that are out-of-sync with the observed changes in these ocean cycles simulate dramatically higher warming trends over the past 30 years. In other words, the model simulations that happened not to accurately represent these ocean cycles were the ones that over-predicted global surface warming.
The claim that climate models are unreliable is the 6th-most popular contrarian myth. The argument is generally based on the claim that climate models didn't predict the slowdown in global surface warming over the past 15 years. That's in large part because during that time, we've predominantly experienced La Niña conditions. Climate modelers couldn't predict that ahead of time, but the models that happened to accurately simulate those conditions also accurately predicted the amount of global surface warming we've experienced.
Yu Kosaka and Shang-Ping Xie from the Scripps Institution of Oceanography published a paper in Nature last year taking a similar approach to that of Risbey and colleagues. In that study, the authors ran climate model simulations in which they incorporated the observed Pacific Ocean sea surface temperature changes, essentially forcing the models to accurately reflect the influence from these ocean cycles. When they did that, the authors found that the models simulated the observed global surface temperature changes remarkably well.
The results of these studies give us two important pieces of information:
Note: this post has been incorporated into the rebuttal to the myth 'Models are unreliable'.
Charlie A @37:
1) The authors of the paper, SFAIK, make no claim that ENSO is the only factor supressing recent observed trends in GMST. Therefore you are not entitled to assume that because the observed 1998-2012 trend in GMST is at the 2.5% limit that the ENSO trend will also be at or near that limit. Indeed, the 4 best modelled trends are unlikely to be within the 2.5% limit of ENSO trends as they are selected only for having the same phase out of far fewer than 100 realizations. Yet they match the observed trend fairly closely (see first figure in OP), therefore falsifying your assumption. (Note, the lower limit is the 2.5% limit, not the 97.5% limit.)
2) Even if a ENSO trend approaching the 2.5% limit was required to explain the depressed observed trend in GMST, it is the trend that needs to be statistically unlikely, not the individual ENSO states in any period. An unusual trend can be formed by a couple of stronger than normal El Nino events at the start of the trend period and a couple of stronger than usual La Nina events at the end of the trend period without any of those events being 97.5% (for El NIno) or 2.5% (for La Nina) events.
3) In the so obvious it is unbelievable that you missed it category, an unusually strong El Nino at the start of the trend period is just as capable of generating a very strong trend as an unusually strong La Nina at the end. Your restricting the test to the later condition only is uncalled for, and very puzzling given that it is known the 97/98 El Nino was unusually strong:
4) Your claim that the observed ENSO trends were not unusual (based solely on claims regarding the strength of recent La Ninas) is not backed up by the data. For temperature based indices (plotted above), the observed percentile rank of the 1998-2012 ENSO trends are:
NINO1+2_|_ NINO3_|_ NINO4_|_ NINO3.4
_10%_____|_ 7.1%___|_ 38.1%__|_ 25.7%
That is, two out of four such indices do show very low percentile ranks. That they do not show lower percentile ranks is probably due to two unusualy strong El Ninos appearing in the short record. (Note, the ONI is just the three month running mean of NINO 3.4, and so will differ little from that record.)
5) Single region temperatre indices for ENSO are fatally flawed (IMO) in that they will incorporate the general warming trend due to global warming as a trend to more, and stronger El Ninos. Far better are multiple region indices (such as ENSO 1+2) where the common global warming signal can be cancelled out, or non temperature indices such as the SOI:
The inverted five month lagged SOI trend for 1998-2012 has a percentile rank of 2.52%, compared the GISS LOTI trend of 42.9%. For what it is worth, the inverted, lagged 1998 ranks at the 95.6th percentile in the SOI, and 2011 ranks at the 0th percentile. The inverted, lagged SOI index for 2011 was -17.3, which is less than the strongest shown on the graph above (which was not lagged). The five month lagged 2011 La Nina has a percentile rank of 0.8% among all 12 month averages of the SOI index.
So, when you say the 2011 La Nina was not unusually strong, that only indicates over reliance on one ENSO index, and an unsuitable one in a warming world.
In summary, nearly every claim you make @37 is wrong. To be so comprehensively wrong should be a matter of embarrassment for you. You should certainly pause and reconsider your position.
Yes, thank you Russ for withdrawing your claim of cherrypicking.
You still misunderstand the main purpose of the paper, as revealed by your comment "The higher the correlation, the more the method would treat luck as skill." The authors of the paper did not treat luck as skill. Indeed, they conceived their project on the basis of their and everyone else's explicit and repeated acknowledgment that the GCMs get the timing of ENSO events correct entirely by chance! Their main conclusion was, as scaddenp noted, that the GCMs could be improved substantially (not completely!) in their projections of 15 year periods if the GCMS' timing of ENSO events was improved substantially. The authors did not claim any method for accomplishing that improvement of ENSO timing, and did not even claim that it is possible for anyone, ever, to accomplish that improvement. Their paper leaves unchallenged the suspicion that GCMs forever will lack the skill to accurately project the timing of ENSO events. That means their paper leaves unchallenged the suspicion that GCMs forever will lack the skill to much more accurately project global mean surface temperature for 15 year periods.
What the authors did claim (I think; somebody please correct me if I'm wrong) is that:
Charlie A @32 shows the following image, and comments:
From 97 fifteen year trends, 5 instances of observations being at the 2.5% limit, and two of them being at the 97.5% limit. (Because trends overlap, clusters of trends at the limit are treated as single instances.) That is enough to suggest the models do not capture the range of natural variability, but not enough to data to suggest a bias towards warm or cool result.
Of the two warm episodes, both are associated with strong positive 15 year trends in the inverted, lagged SOI. Of the 5 cool episodes, four are associated with strong negative trends in the lagged SOI. That is, 6 out of seven strongly tending temperature excursions in observed temperatures relative to modelled temperatures are associated with same sign excursions in lagged inverted SOI, and therefore are probably the results of large La Nina trends. The one low escursion not related to ENSO trends occurs in the twenty year period from 1880 to 1899 in which there were twelve major volcanic erruptions (VEI 4 +), leading of with Krakatoa.
When comparing the lagged, inverted SOI trends to GISS LOTI, the match in the early half of the century is quite good (with the exception of the first 20 years). In the latter half of the twentieth, and the early thirtyieth century two discrepancies stand out. One is the major positive trend excursion around 1980 associated with the 1982/83 El Nino. That event coincided with the 1982 El Chichon erruption, the effects of the two events on global temperatures more or less cancelling out. The other is the large disparity in the early twentieth century, where GMST trends are far more positive than would be expected from the SOI trends. Something, in other words, has been warming the Earth far more strongly than would be expected from looking at natural variation alone.
I'm a bit confused by this as well. I must admit looking at the maps of the regional trends around the Pacific look inaccurate based on the graphs shown by Russ. This seems to conflict with the bolded text above. I'm not convinced anyone has really provided a reasonable answer to this. Either;
1) The authors actually mean a different thing when they talk about "Pacific spatial trend patterns" than what Russ believes, and that phrase does not refer to the regional distribution of warming in the Pacific region but rather something else. In this case, what exactly are the authors referring to here?
2) The maps are misleading in some way, making similar trends actually look completely different.
3) The models are in fact inaccurate, and the authors are incorrect in the bolded statement.
It's confusing because the paper's goal seems to be to test whether models can provide the correct global temperature scales if the ENSO input is modelled correctly, and it shows that the models are actually accurate globally. But this almost throwaway line seems to suggest that the spatial distribution of the warming was also predicted correctly, when it really looks like it wasn't.
Some commentators have pointed out that the model's aren't expected to get the spatial distribution of warming accurate, and that's fine, I don't think anyone (excluding Watts, Monckton, et al) can reasonably expect accuracy where the models are not designed to provide it, but if that's the case, why is the bolded phrase even included in the paper?
I'd go with 1/ more or less. The spatial pattern of interest is the cooling eastern pacific cf warming central-western. This pattern is visible in both the selected models and observation but missing in the anti-phased model. I would definitely say "good" means something different to the authors than it does to Russ. I think it is accurate for the 15 year trend, but somewhat dependent on your expectation to apply it to the spatial trend. However, I think it is a very small point blown right of proportion when it comes to evaluation of the paper as a whole. The main text barely mentions it.
It is easier to make the comparison looking at the figs at HotWhopper than in the Russ gif, if you dont have access to the paper.
The words used in the context provided definitely seem to signify the spatial trend for the entire Pacific. So therefore the line above is either poorly worded or taken out of context (I don't have access to the paper and so I can't verify, but going off history I'd put my money on the latter).
Indeed I agree that it's not an important point in the context of the paper's goals, but most deniers will be happy to focus on the one incidental discrepency and ignoring the point made by the paper as a whole. This helps them ignore the fact that this paper completely decimates just about the only argument they were hanging onto - that climate models failed to predict the current period of slower warming. This unambiguously shows that the models did in fact predict the current slowdown in warming - within the bounds of what they attempt to predict.
The text you quoted about spatial trends is from the abstract and it is stated without any context. Perhaps someone who has read the paper can provide that context by giving a description of how the authors support that statement in the main body of the paper.
What the comments on this post highlight is the difficulty in our brains coming to grips with two very distinct aspects of modeling climate (or any dynamic system):
1) The conceptual and quantitative understanding of mechanism
2) Assumptions about future states that contribute to the quantity being modeled.
Both have to hold true in order to make skillful predictions about future conditions, especially in the short term when essentially random factors can hold sway. Mismatch between predictions and observed conditions (assuming the observations are reliable — that's another topic) can derive from failures of 1) or 2), but 1) is the component that science is most interested in, and is most relevant to long-term prediction. Therefore, to assess the strength of our understanding, we need to figure out how much of the mismatch can be attributed to 2).
Here's an example:
As I understand it, my bank balance changes according to this equation:
change in balance = pay + other income - expenses
I can predict how my bank balance will change in the future if I assume some things that are pretty well understood (my monthly paycheck, typical seasonal utility bills, etc.). However, some aspects of the future are random (unexpected car repairs, warm/cool spells affecting utility bills, etc.) — these cannot be predicted specifically but their statistical properties can be estimated (e.g., average & variance of car repair bills by year, etc.) to yield a stochastic rather than deterministic forecast. Also, I could get an unexpected pay raise (ha!), need to help my brother out financially, etc. All of these factors can generate mismatch between predicted changes in the balance and what actually happens.
But (and here's the important bit): that mismatch does not mean that my mechanistic understanding of the system is faulty, because it stems entirely from item 2). How can I demonstrate that? Well, if I plug the actual values of income & expenses into the equation above it yields a perfect match (hindcasting). Alternatively, (as was done by Risbey et al), I could select those stochastic forecasts that happened to get income and expense values close to what actually occurred, and find that the forecasts of those runs are close to the actual change in my balance.
Examining these runs is not "cherry picking" in any sense of the word, it is a necessary step to separate out the effects of items 1) and 2) on model-data mismatch. If these tests failed, that would imply that my understanding is faulty: some other source of gain or loss must be operating. Perhaps a bank employee is skimming?
Climate forecasts are necessarily much less precise than my personal economic forecasts, because the system is observed with error and because many more inputs are involved that interact in complicated, nonlinear, spatially explicit ways. But the logic involved is the same.
scaddenp @48:
"The paper does demonstrate that a mean created from runs which are in phase with actual state are a closer match to observed global temperature."
This is so, but it is also a statement of the blatantly obvious. Why would a sane person need proof of this? I'm not asking this as flamebait, I am being completely serious. A very good analogy would be to say "this paper shows that periods during which the door of the darkroom was open are correlated with an increase in ambient illumination."
Seriously? And it's even pretty weak evidence of correlation, as Russ quite correctly pointed out.
The question remains: what does this paper actually demonstrate that wasn't already pretty darned obvious without it? The fact that models have to model reality in order to be valid (including the past) has been long known. So even if this paper is 100% true and valid, it is nothing more than a confirmation of something already known to REASONABLE people. I add that qualifier intentionally.
One might say "Yeah, but there was a time when the existence of phlogiston was considered to be 'obvious'." But these aren't those days. Reference Asimov's "The Relativity of Wrong."
We know what models are for, and at least roughly what evidence they provide and what not. To show that a few models that best (albeit badly) modeled the past also best (albeit very very badly) modeled the present is hardly a revelation. If I were a reviewer I would have rejected it out-of-hand as grandstanding and a waste of everybody's time.
Anne Ominous - Climate deniers frequently note that observations are at the edge of the model envelope, and then claim the models are useless/wrong and we should ignore them. Foolish rhetoric, really, since even perfect models show stochastic variation on different runs, and neither the model mean nor any single individual run will therefore exactly match the trajectory of observations. Climate models aren't expected to track short term observations of climate variations, but rather explore long term trend averages.
This paper is an elegant demonstration that models do reproduce shorter term global temperature trends and patterns when model variations match observations - strong support for the accuracy and physical realism of those models, and their usefulness when exploring longer term trends where those variations average out.
Demonstrating that models are physically accurate enough to model the range of short term variations, and that observations are indeed within the envelope of modeled behavior, is hardly a waste of time. It shows that the models are useful.
Anne... My thought on the relevance of the paper is this: What are the potential outcomes of the experiment?
a) Models phased with La Nina do not show any detectable difference from out of phase models.
b) Models phased with La Nina do show a detectable difference from out of phase models that agree with the observed surface trend.
If the results were (a), that would suggest there is potentially something wrong in the models that are causing them not tracking the observed trend in surface trend of the past 15 years.
If the results were (b), then we have an indicator that prevailing La Nina conditions can at least partially explain the observed temperature trend of the past 15 years.
The results ended up agreeing with (b).
I have a question regarding the measurement of the success of the models. I don't see anything in this paper suggesting a calculation of deviance (sum of squares of errors) of the models. I realize that they make multiple predictions; hence, there would be a LOT of calculating to get an overall assessment of the reliability of the model. Yet I would think that a solid calculation of the deviance would make it easy to address questions about the reliability of the models.
So the question is: where are the deviance results?
Anne Ominous @59, the authors of the paper only constrained their selection of model runs to be in phase with temperature trends in the NINO 3.4 region, as demarcated in the figure below:
That area is just 3.1 million km^2, or 1.2% of the Earth's surface (1.9% of the Pacific's area). Were you to select an equivalent area at random from the Earth's surface, and filter model runs to have the same phase of trends in that area, it is highly unlikely that it would sort the models runs into high and low trend groupings. Consequently your analogy is inapt.
This is only an unsurprising result because a number of other studies (formal and informal) have already shown that ENSO trends are probably the major cause in the relatively flat GMST trends over the last 15-20 years. The authors have in fact done what scientists should do - tested a currently popular hypothesis by an alternative method to those that have already been tried to see if it avoids falsification when you do so. It did, which is fairly ho-hum given the other results.
The only problem is that AGW deniers refuse to acknowledge the ENSO connection. The simultaneously (it seems) maintain that:
1) 1998 was only a very hot year because of ENSO, so the very high temperatures in 1998 are not evidence of global warming;
2) Only short term trends including 1998 at or near the start year can be of any interest for testing the validity of global warming; and
3) The slightly positive trend between 1998 and 2012 has nothing whatever to do with the very strong El Nino in 1998 and the strong La Ninas in 2008, and 2011/12.
Some people notice a certain inconsistency in the denier opinion.
Chris Crawford @ 62:
With a huge variety of variables (temperature, precipitation, wind speed, pressure, radiation, etc.) to compare, combined with the fact that model output often represents averages at fixed grid spacing whereas observations are rather randomly distributed with different temporal and spatial resotuion, there is no simple mathematical relationship to derive such an error statistic. Climate GCMs are not statistical models: they do not reproduce data at points specified by observations.
What can be done is pattern-matching: does a map of modelled global temperature look like the map of observed global temperature?, etc.
Bob Loblaw @ 64
Thanks for explaining that. Yes, it would surely be quite a job to put together a statistical analysis of the reliability of various models, and there would be a lot of tough judgements to be made that would detract from the rigor of the analysis. It *is* certainly possible; the pattern matching you describe can be carried out with mathematical rigor.
I suspect, however, that the value of such a project to the scientific community would be low; in the long run, a well-informed scientist's judgement will always produce better results than any formalized analysis such as I am suggesting. I suppose that such an analysis would be of utility only for debunking deniers' claims that the models don't work. The few knowledgeable deniers have already, in all likelihood, come up with such analyses and realized just how good the climate models are.
Chris:
Keep in mind that portions of a model can be verified in the manner you suggest. Take radiation transfer, for example. It is not difficult to take a vertical profile of radiation measurements, combined with a vertical profile of atmospheric conditions (pressure, humidity, aerosols, radiatively-active gas concentrations such as CO2 and O3, etc.) and compare the observed radiation to a model. You can also examine such things as surface energy balance sub-models (surface evaporation, thermal transfer from the surface to the atmosphere, soil temperatures) or other components of a GCM.
It's the "model the whole world" stage that is difficult to compare in a statistical sense. The model won't be an exact fit, and you can't easily tell if that is because of a model error or because you don't know something like atmospheric compostion well enough. In a physics-based model there aren't a lot of "tuning knobs", and adjusting one to fit one condition - e.g., temperature - may make another condition (precipitation) worse.
The other characteristic in complex models is that you can get good fits over quite a wide range of input variables, due to co-dependence of variables - e.g., add some reflective aerosols to the model atmosphere, but reduce your surface albedo. Trying to tweak results that way is, as you say, not of high scientific value.
To use a crude analogy, it's like having a model that says A+B=C, and you have measurements that say C=4 +/- 0.1, and you think that A is in the range 0.9-1.1 and B is in the range 2.9-3.1, and you start playing around with different values of A and B to try to best match C=4. You'll find an infinite number of values of A and B that will do the job equally well - without learning anything more about the accuracy of your model.
Thanks for explaining it to me, Bob.
@mammelE @58 - one of the best explanations I have ever read. Unless you strenuously object, I will use that on occasion.