返回
Statistical Significance Tests for the Difference between Two Climate Model Performances
徐忠峰
中国科学院大气物理研究所
Statistical metrics, such as root mean square error (RMSE) and correlation coefficient, are widely used to quantitatively measure and compare climate model performances. However, these statistics can vary to a certain extent due to different sampling methods. A difference in the metrics may not necessarily indicate a significant difference in the model performance. Unfortunately, no efforts have been made to test the significance of the differences in model performances in the previous model evaluations. To fill this gap, a significance test method is proposed for the difference in the statistical metrics that measure model performance. The bootstrap method is used to generate the confidence intervals for four commonly used statistical metrics, i.e., absolute mean error, correlation coefficient, standard deviation, and uncentered RMSE. With the support of the confidence interval, the significance of the difference in statistical metrics can be determined. The significance test can be applied to other statistical metrics or model skill scores. The comparisons of various CMIP5 models indicate that some of the models perform significantly better than the others. In contrast, no significant difference is detected between the different ensemble simulations of the same climate model. This finding suggests that the significance test result is reasonable. Furthermore, the method also provides a quantitative measurement of the impacts of the observational uncertainties on the significance test. The significance test method developed in this study may advance the model evaluation and intercomparison to be more objective by considering the uncertainties arising from both sampling errors and observations.