Probabilistic 2-meter surface temperature forecasting over Xinjiang based on Bayesian model averaging

Aihaiti, Ailiyaer; Wang, Yu; Ali, Mamtimin; Huo, Wen; Zhu, Lianhua; Liu, Junjian; Gao, Jiacheng; Wen, Cong; Song, Meiqi

doi:10.3389/feart.2022.960156

ORIGINAL RESEARCH article

Front. Earth Sci., 15 August 2022
Sec. Atmospheric Science
Volume 10 - 2022 | https://doi.org/10.3389/feart.2022.960156

Probabilistic 2-meter surface temperature forecasting over Xinjiang based on Bayesian model averaging

Ailiyaer Aihaiti¹,

Yu Wang¹,

Mamtimin Ali¹*,

Wen Huo¹,

Lianhua Zhu²,

Junjian Liu¹,

Jiacheng Gao¹,

Cong Wen¹ and

Meiqi Song¹

¹Institute of Desert Meteorology, China Meteorological Administration, Urumqi/National Observation and Research Station of Desert Meteorology, Taklimakan Desert of Xinjiang/Taklimakan Desert Meteorology Field Experiment Station of CMA/Xinjiang Key Laboratory of Desert Meteorology and Sandstorm/Key Laboratory of Tree-ring Physical and Chemical Research, China Meteorological Administration, Urumqi, China
²School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China

Based on Bayesian model averaging (BMA), the suitability and characteristics of the BMA model for forecasting 2-m temperature in Xinjiang of China were analyzed by using the forecast results of the Desert Oasis Gobi Regional Analysis Forecast System (DOGRAFS) and Rapid-refresh Multiscale Analysis and Prediction System (RMAPS) developed by the Urumqi Institute of Desert Meteorology of the China Meteorological Administration, China Meteorological Administration–Global Forecast System (CMA-GFS) developed by the China Meteorological Administration, and the European Center for Medium-Range Weather Forecasts (ECMWF) developed by the European Center. The results showed that (1) the weight of ECMWF to the 2-m temperature forecast is maintained at about 0.6–0.7 under different lengths of training periods, and the weight of other model products is below 0.15. (2) The forecasts of each model at the four representative stations are quite different, and the maximum forecast error reaches 6.9°C. However, the maximum error of the BMA forecast is only about 2°C. In addition, the forecast uncertainty in southern Xinjiang is greater than that in northern Xinjiang. (3) Compared with multi-model ensembles, the overall prediction performance of the BMA method is more consistent in spatial distribution. Additionally, the standard deviation and correlation coefficient between the BMA forecast and observation were greater than 0.98, and the RMSE decreased significantly. It is feasible to use the BMA method to correct the accuracy of the 2-m temperature forecast in Xinjiang.

1 Introduction

The Xinjiang Meteorological Service has recently strengthened the construction of a fine grid forecast platform based on multi-model forecasts. However, due to the uncertainty of initial field data and model parameters, meteorological factors such as temperature and precipitation forecast by numerical models differ from the observations. There are also differences in the forecast of meteorological elements such as temperature among model products, making it difficult for a single model product to fulfill the actual forecast needs (Cai and Yu, 2019; Peng and Zhi, 2019).

Forecasts based on multi-model ensembles can improve the performance of model prediction and be used in probabilistic forecasts. Many studies have investigated the Bayesian model averaging (BMA) method based on ensemble forecasts (Tan and Jiang, 2016; Ji et al., 2019; Lee and Shin, 2020). For example, Raftery et al.(2005) applied the BMA method to the ensemble of dynamic meteorological models for the first time to forecast normal variable temperature and sea level pressure and found that the performance of the BMA method was significantly better than that of the traditional ensemble mean method, and the root mean square error (RMSE) of the BMA method was 8% lower than that of the ensemble mean method. Zhiand Wang(2015) used the BMA method to estimate the temperature in East Asia from 2011 to 2035. They pointed out that the temperature generally increased under the representative concentration pathway 4.5 (RCP4.5) scenario, and the increase in the ocean was relatively small. Ji and Zhi(2017) studied the extension period forecast of 2-m temperature in East Asia via the BMA method and concluded that the BMA method significantly improved the ensemble forecast performance.

Additionally, the BMA method is better than the traditional method in simulating observations and can reduce the uncertainty of model simulation. Miao et al. (2014) used the BMA method, simple model averaging, and reliability ensemble averaging (REA) to evaluate the ability of the coupled model intercomparison project phase 5 (CMIP5) model on interannual and interdecadal changes in the surface temperature in Eurasia. The results demonstrated that the BMA and REA methods significantly improved the ability of model simulation, and the BMA method had the lowest uncertainty. Brunner et al. (2020) and Zhao et al. (2020) have pointed out that compared with traditional methods, the BMA method can better reduce the deviation between the model and observation and better capture uncertainty and local climate features. In the statistical downscaling of large-scale variables, Zhang and Yan(2015) pointed out that the downscaling method combining the optimum correlation method and the BMA method has a better performance than multiple linear regression. Fang and Li(2016) estimated the uncertainty, weight, and variance of the paleoclimate modeling intercomparison project phase 3 (PMIP3) and CMIP5 model simulations by using the BMA method. They found that the BMA method considers the simulation capability of different models and generates more reliable past time variations over long periods based on multi-model ensembles and training sets. Javanshiriet al(2021) noted that the BMA method was more accurate, skilled, and reliable than the ensemble model output statistics-censored shifted gamma method and had better resolution but poor discrimination in predicting the probability of high precipitation events.

The terrain of Xinjiang is relatively complex. The regional numerical model assimilates local observation data and satellite data, which can better simulate and forecast extreme weather, and has advantages in forecasting some small-scale regions. However, due to the limitation of computing resources and storage space, the current regional numerical model can only provide deterministic forecasting results. In addition, the forecasting results of global numerical models such as the ECWMF model are relatively stable but cannot simulate and forecast extreme weather well. Therefore, in this study, global numerical models are combined with regional models to investigate the probabilistic forecasts of 2-m temperature in Xinjiang, China, using the BMA method. Section 2 introduces observations and four model products. Section 3 introduces the BMA method. Section 4 selects the best training period of the BMA model, analyzes the temporal and spatial characteristics of BMA deterministic and probabilistic forecasts, and evaluates the BMA forecast performance. Section 5 and Section 6 provide the discussion and main conclusions, respectively.

2 Data and methods

2.1 Data

The 24 h 2-m temperature forecasts (initialized at 0000 UTC) from May 30 to 31 August 2020, used in this study were obtained from the Xinjiang regional weather forecast system Desert Oasis Gobi Regional Analysis Forecast System (DOGRAFS) and Rapid-Refresh Multiscale Analysis and Prediction System (RMAPS) developed by the Urumqi Institute of Desert Meteorology of China Meteorological Administration, the European Center for Medium-Range Weather Forecasts (ECMWF), and the China Meteorological Administration–Global Forecast System (CMA-GFS) (Zhang and Chen, 2012).

DOGRAFS, which achieved business access in 2015, is based on the weather research and forecast (WRF) model and WRF data assimilation (WRFDA) in version 3.5.1, with triple nested domains and 40 vertical computational layers. The regional resolution of Xinjiang is 9 km, and the regional resolution of Urumqi to Xiaocaohu is 3 km. The atmospheric and surface fields of the National Centers for Environmental Prediction (NCEP) GFS model forecasts were introduced as the initial conditions. The RMAPS is based on the WRF model and WRFDA in version 4.1.2, with two nested domains and 50 vertical computational layers. For the Central Asia region and Xinjiang, China, the regional resolutions are 9 km and 3 km, respectively. The RMAPS takes the atmospheric and surface fields of the NCEP GFS model forecasts as the initial conditions and realizes trial operation at the end of May 2018 (Ju and Liu, 2020; Tang and Li, 2021).

All forecasts are interpolated to 103 observation stations over Xinjiang, China, to evaluate the performance of the BMA method and different model products and their ensemble mean. Figure 1 shows the orographic effects of the study area and the location of observation stations. It can be seen that the distribution of observation stations in the study area is not uniform, and the terrain is complex. In addition, southern Xinjiang is subjected to drought, with large diurnal temperature differences and complex climatic characteristics (Yao et al., 2022). Furthermore, the topography of the initial field of the numerical model is different from the actual topography. All of these factors may have an impact on BMA forecast results (Liu and Ju, 2020; Xin and Li, 2021).

FIGURE 1

FIGURE 1. Orographic effects of the study area and the location of observation stations. The blue inverted triangles represent the example stations of X51053, X51705, X51815, and X51855.

2.2 Methods

BMA is a statistical post-processing method for multi-model ensemble forecasts. Its basic principle is to take a weighted average of multi-model forecasts instead of selecting the best members (Raftery et al., 2005). Assuming that $y$ is the predictor, $y^{T}$ is the observation data during the training period, $f_{k} (k = 1, \dots, K)$ is the forecast result of $K$ model products, and the probability density function (PDF) of the BMA model is given by

p (y) = \sum_{k}^{K} p (y | f_{k}) p (f_{k} | y^{T}) (1)

where $p (y | f_{k})$ is the conditional probability of predictor $y$ based on model $f_{k}$ , $p (f_{k} | y^{T})$ is the posterior probability of $y$ forecasted by model $f_{k}$ for a given $y^{T}$ , and $\sum_{k}^{K} p (f_{k} | y^{T}) = 1$ . In essence, the BMA method uses $p (f_{k} | y^{T})$ as the weight of model $k$ . Therefore, the PDF of the BMA model can be expressed as

p (y | f_{1}, \dots, f_{k}) = \sum_{k}^{K} ω_{k} p_{k} (y | f_{k}) (2)

where $ω_{k}$ represents the relative contribution of model $k$ to the forecast (i.e., the weight of model $k$ ), and $\sum_{k}^{K} ω_{k} = \sum_{k}^{K} p (f_{k} | y^{T}) = 1$ .

For surface temperature forecasting, the normal linear hypothesis with expectation $a_{k} + b_{k} f_{k}$ and variance $σ_{k}$ can be adopted:

y | f_{k} \sim N (a_{k} + b_{k} f_{k}, σ_{k}^{2}) (3)

where $a_{k}$ and $b_{k}$ can be obtained from the linear relationship between observation $y^{T}$ and forecast $f_{k}$ . Under this assumption, the conditional expectation of predictor $y$ is the mean value of the BMA forecast:

E [y | f_{1}, \dots, f_{k}] = \sum_{k}^{K} ω_{k} (a_{k} + b_{k} f_{k}) (4)

Eq. 4 can be understood as a deterministic forecast, which can be compared with the mean value of the multi-model ensemble mean or a single-model forecast.

Under the assumption of normal linearity, parameters of the BMA model were solved by using the observation and model data in the training period. For predictor, the estimates of $a_{k}$ and $b_{k}$ can be regarded as a simple deviation correction process. The weights and variance $σ_{k}$ can be estimated using the log-likelihood function. Assuming that the forecast error is independent of space (different stations) and time (different forecast times), the log-likelihood function of the BMA model is provided by

ℓ (ω_{1}, \dots, ω_{k}, σ^{2}) = \sum_{n = 1}^{N} log [\sum_{k = 1}^{K} ω_{k} p_{k} (y_{s t} | f_{k s t})] (5)

where $N$ represents the length of the training period, and $s$ and $t$ represent station $s$ and time $t$ , respectively. When Eq. 5 estimates the conditional distribution of predictor $y$ based on model $f_{k}$ (i.e., when a single predictor y is estimated), there is no analytical maximum. Therefore, the expectation-maximization algorithm is used to solve the parameters.

In addition, this study uses the continuously ranked probability score (CRPS), forecast accuracy, relative error analysis, Brier score (BS), RMSE, and Taylor diagram to evaluate the correction and performance of the BMA method on multi-model ensembles.

The CRPS of the multi-model ensemble mean can be written as

C R P S (F, x) = E_{F} | X - x | - \frac{1}{2} E_{F} | X - X' | (6)

where $X$ and $X$ are independent copies of a random variable with the distribution function $F$ and finite first moment (Gneiting and Raftery, 2007).

The forecast accuracy can be expressed as

{forecast accuracy}_{s} = \frac{1}{T} \sum_{t = 1}^{T} {\begin{matrix} 1, | f_{s t} - y^{s t} | \leq 2 ° C \\ 0, | f_{s t} - y^{s t} | > 2 ° C \end{matrix} (7)

where $f_{s t}$ and the $y^{s t}$ represent the forecast and observation of the station $s$ during the time $t$ , respectively (Cui and Peng, 2002).

Assuming that $P_{m i}$ and $P_{o i}$ are the probabilities of numerical models (or BMA forecasts) and observations within the ith interval and $k$ is the number of separated intervals (Fu et al., 2013), then the BS is given by

B S = \frac{1}{k} \sum_{i = 1}^{k} {(P_{m i} - P_{o i})}^{2} (8)

3 Results

3.1 Selection of the best training period

The BMA method needs to divide data into training and forecast periods, and the length of the training period affects the BMA forecast results (Zhi and Peng, 2018). Therefore, before forecasting the 2-m temperature in the Xinjiang region, determining the best training period for the BMA model is necessary. Because the data duration was 92 days, the first 70 days were selected to participate in the sliding training. The best training period was selected from 41 to 70 days. Figure 2 shows the CRPS scores and RMSEs for different training periods. The CRPS score and RMSE showed the same trends. Before 47 days, the CRPS score and RMSE decreased, but after 47 days, they continued to increase. When the training period was 47 days, the CRPS score and RMSE were the minimum. Therefore, 47 (from June 1 to July 17) days were selected as the training period of the BMA model to conduct deterministic and probabilistic forecasts of 2-m temperature, and the remaining 45 (from July 18 to August 31) days were used to evaluate the BMA forecast and multi-model ensembles (i.e., forecast period).

FIGURE 2

FIGURE 2. Verification metrics of (A) CRPS score and (B) RMSE with different training period lengths for the BMA forecast.

Additionally, to demonstrate the contribution of each model to the 2-m temperature forecast under different training periods, Figure 3 shows a boxplot of the weights of the four models in the sliding training periods. Except for ECMWF, the weights of the other three models change little at different training periods, indicating that each model has a relatively stable contribution to 2-m temperature prediction at different training periods. The weight of the ECMWF remained 0.6–0.7, the RMAPS weight was less than 0.1, and the DOGRAFS and CMA-GFS weights were 0.1–0.15. This result indicates that among the 2-m temperature forecasts of 103 stations in Xinjiang, ECMWF forecast information is dominant, followed by DOGRAFS, CMA-GFS, and RMAPS.

FIGURE 3

FIGURE 3. Boxplot of weights of four models under different training periods for the BMA forecast.

3.2 Probability forecast of Bayesian model averaging

After selecting the best training period, the deterministic prediction results of the BMA forecast and multi-model ensembles were analyzed. The forecasting performance of the same numerical model at different stations is quite different, and different numerical models have different forecasting performances at the same station. Furthermore, the BMA forecasting error of most stations is within 2°C, but the BMA forecasting error of some stations is more than 2°C. Therefore, in order to compare the results of observation, BMA probabilistic forecast, BMA deterministic forecast, and different numerical model forecasts, four stations where there are great differences among different forecast results are selected as representative stations. Figure 4 shows the BMA probability forecast curve, BMA deterministic forecast, and different model deterministic forecasts and their ensemble mean values of 2-m temperature with a lead time of 24 h at four representative stations. Representative station X51053 is an example (Figure 4A): the observed 2-m temperature is 23.7°C (solid gray line in Figure 4A); the maximum and minimum errors of the four models are 4.9°C and 0.63°C, respectively (solid green and blue lines in Figure 4A); and the prediction error of the multi-model ensemble mean also reached 3.1°C (solid black line in Figure 4A). After the multi-model forecasts are processed by the BMA method, the error between the BMA deterministic forecast and observation is 1°C.

FIGURE 4

FIGURE 4. Deterministic forecasts and BMA probability forecasts of 2-m temperature at stations (A) X51053, (B) X51705, (C) X51815, and (D) X51855 with a lead time of 24 h. The black curve and black dotted line represent the BMA probability forecast curve and deterministic forecast curve, respectively. Gray and black solid lines represent the observed and multi-model ensemble mean deterministic forecasts; the remaining solid lines represent the deterministic forecasts of the four models. The shadow represents the probability centered on the BMA deterministic forecast with an interval length of 2°C.

For representative stations X51705, X51815, and X51855, although the minimum error of each model and multi-model ensemble means for the 2-m temperature forecast was 1°C, there were significant differences among the models, and the maximum forecast error reaches 6.9°C. Moreover, the same model had different forecasting performances at different stations. The maximum error of the deterministic BMA forecast weighted by the four models is approximately 2°C, indicating that the BMA method can effectively reduce the error of the observation and model forecasts. Additionally, except for the X51705 station, the observation of the other three representative stations basically falls within the uncertainty range (i.e., the solid gray line is in the shadow). As shown in Figure 4, with the larger interval (i.e., the PDF is flatter), there is a larger possibility that the observation (gray line in Figure 4) is to fall in the interval. In other words, the forecast uncertainty is lower.

To further analyze the regional characteristics of BMA probability forecast uncertainty (i.e., the probability that the forecast error is within 2°C), Figure 5 shows the spatial distribution of 2-m temperature uncertainty with a lead time of 24 h in Xinjiang (i.e., the probability distribution centered on the BMA deterministic forecast of each station and with an interval length of 2°C). The probability of most stations in Xinjiang exceeded 0.6. Among them, the probability of most stations in southern Xinjiang is 0.6 ∼ 0.8 and of some stations is less than 0.6. The probability of most stations in northern Xinjiang is more than 0.7, and the probability of stations in western northern Xinjiang is 0.9–1. This result shows that forecast uncertainty in southern Xinjiang is greater than that in northern Xinjiang. In other words, from low latitude to high dimension, the 2-m temperature uncertainty of the BMA forecast in Xinjiang decreases.

FIGURE 5

FIGURE 5. Spatial distribution of 2-m temperature uncertainty of the BMA forecast at each station. (i.e., probability distribution with the BMA deterministic forecast as the center and interval length of 2°C).

3.3 Evaluation of the Bayesian model averaging forecast

According to the aforementioned analysis, different models have different forecast performances on four stations, and the BMA method effectively reduces the forecast error between the observation and models. To compare the performance of the multi-model ensemble mean and BMA forecast for each station, Figure 6 shows the CRPS score of the multi-model ensemble mean and BMA forecast. There are significant differences in the CRPS scores of the multi-model ensemble mean at each station. Among them, the CRPS scores of some stations in central Xinjiang exceeded 4, and some stations exceeded 7. The CRPS scores of other stations were approximately 1–4 and those of some stations were lower than 1 (Figure 6A). In the spatial distribution, the simple ensemble mean method has poor prediction performance, and the CRPS scores differ. The CRPS score of the BMA forecast of some stations was less than 2, and the CRPS score of most stations was less than 1 (Figure 6B). This shows that the forecast performance of the BMA method is better than that of the multi-model ensemble mean. Additionally, the overall prediction performance of the BMA method for spatial distribution is consistent.

FIGURE 6

FIGURE 6. Spatial distribution of the CRPS score for (A) multi-model ensembles and (B) BMA forecast of 2-m temperature with a lead time of 24 h.

Figure 7 shows the spatial distribution of RMSE between the observation and BMA deterministic forecasts, four models, and their multi-model ensemble mean in the forecast period. During the forecast period, the RMSE between the observation and DOGRAFS, RMAPS, and CMA-GFS forecasts was above 2°C for most stations in Xinjiang (Figures 7C,D,andF). Among them, the RMSE of the RMAPS forecast at some stations exceeded 3°C, and the RMSE of the CMA-GFS forecast exceeded 5°C. The RMSE between the observation and ECMWF forecast is between 1°C and 4°C at most stations (Figure 7E). Among them, the RMSE of stations in the northwest of northern Xinjiang is between 1 and 3°C. Additionally, the RMSE between the observation and multi-model ensemble mean is between 2°C and 5°C at most stations (Figure 7B). The RMSE between the observation and BMA forecast is reduced to less than 2°C at most stations, and at some stations, it is between 2°C and 3°C. In other words, there is a large forecast error between the observation and the CMA-GFS forecast at most stations in the forecast period, and the forecast error of the other three models remains between 2°C and 5°C. In addition, the multi-model ensemble mean does not reduce the forecast error between the observation and the model. The error between the observation and the BMA forecast in the forecast period was lower than that of each model, and there was no obvious regional difference.

FIGURE 7

FIGURE 7. Spatial distribution of RMSE between (A) BMA, (B) multi-model ensemble mean, (C) DOGRAFS, (D) RMAPS, (E) ECMWF, (F) CMA-GFS and observed 2-m temperature during the forecast period.

Furthermore, Figure 8 shows the box plot of the Brier score, relative error and forecast accuracy of BMA forecast, and different model forecasts of 2-m temperature at observation stations during the forecasting period. As shown in Figure 8, the distribution of Brier score, relative error, and forecast accuracy of single model forecasts are scattered, which means that the accuracy of single model forecasts at different stations is significantly different in the forecasting period. During the forecasting period, the distribution of the Brier score, relative error, and forecast accuracy of BMA forecasts is concentrated. The Brier score and relative error of most stations are also close to 0, and the median forecast accuracy is close to 0.8. Compared with a single model forecast, the accuracy of BMA forecasts is basically consistent in spatial distribution better than single model forecasts.

FIGURE 8

FIGURE 8. Box plot of the (A) Brier score, (B) relative error, and (C) forecast accuracy analysis of the BMA forecast and different model forecasts of 2-m temperature at observation stations during the forecasting period.

In addition, to make a more intuitive comparison between the BMA forecast and different models’ (and multi-model ensemble mean) forecasts of 2-m temperature in the Xinjiang, Figure 9 shows the Taylor diagram of the forecasts and observation (the mean of the forecast period at each station). The distance from different forecasts to the observation (the hollow point on the abscissa) represents the RMSE of the observation and forecast. The distance from different forecast results to the origin of the coordinate represents the ratio of the standard deviation of the forecast and observation. The angle between different forecasts and the horizontal axis represents the correlation coefficient between forecast and observation. The abscissa represents the correlation coefficient of forecast and observation. The correlation coefficient between the deterministic forecast of the four models, and the observation is approximately 0.9, RMSE is above 0.5, and the ratio of standard deviation exceeds 1. Compared with the forecast of each model, the multi-model ensemble mean only improves in correlation. However, the standard deviation and correlation coefficient between the BMA forecast and observation were over 0.98, and the RMSE decreased significantly.

FIGURE 9

FIGURE 9. Taylor diagram of BMA and multi-model forecast of 2-m temperature during the forecast period.

These results indicate that the 2-m temperature forecasts of the four models and their ensemble mean differ from the observations in dispersion degree and spatial distribution. The BMA method significantly reduces the difference, and its forecast is closer to the observation.

4 Discussion

Notably, the regional numerical models adopted in this study are the forecast products commonly used by the Xinjiang Meteorological Bureau for daily weather forecasting. In this study, we evaluated the performance and error of four models for 2-m temperature forecasts in the Xinjiang region while conducting probability forecasts based on the BMA method. In general, the ECMWF was better than the other three regional numerical models. Additionally, the deterministic forecast of the 2-m temperature in Xinjiang by different models is inconsistent in different regions. The BMA method makes up for the spatial uniformity of the model forecast, effectively reduces the RMSE of the model forecast and observation, and provides probabilistic prediction results.

In addition, BMA forecast reliability (forecast uncertainty) can be judged using the BMA deterministic forecast and probability forecast results. Zhi and Peng(2018) and Peng and Zhi(2019) have studied the 2-m temperature probability forecast in different seasons in East Asia and pointed out that the forecast uncertainty of land is greater than that of marine areas and that of high-latitude areas is greater than that of low-latitude areas. In the forecast of 2-m temperature in Xinjiang, the uncertainty of the BMA forecast in southern Xinjiang is greater than that in northern Xinjiang, which may be caused by drought and the desert in southern Xinjiang.

5 Conclusion

In this study, first, based on the deterministic forecasts of the DOGRAFS, RMAPS, ECMWF, and CMA-GFS models, an analysis of the applicability of the BMA method for 2-m temperature forecasts in Xinjiang, China, was conducted. Second, the deterministic and probabilistic forecast characteristics of the BMA method were discussed, and the BMA forecast and different models (and their ensemble mean) were evaluated and compared. The results showed the following:

(1) During the sliding training period, the CRPS score and RMSE exhibited the same trend. The CRPS score and RMSE decreased before day 47 but increased after day 47. Therefore, 47 days was the training period selected for the BMA model. In addition, the contribution of each model to the 2-m temperature forecast was relatively stable under different training periods. Among them, the weight of ECMWF basically remains 0.6–0.7, and the weight of the other models is below 0.15.

(2) Although the minimum error of each model and multi-model ensemble means for the 2-m temperature forecast of the four representative stations is only 0.63°C, there is a difference in the forecast of each model, and the maximum forecast error reaches 6.9°C. Moreover, the same model had different forecasting performances at different stations. However, the maximum error of the BMA forecast is only approximately 2°C, which effectively reduces the error of observation and model forecast. Regarding the uncertainty of the forecast, the probability of most stations in southern Xinjiang is 0.6∼0.8, and the probability of most stations in northern Xinjiang is above 0.7, indicating that the uncertainty of the BMA forecast in southern Xinjiang is greater than that in northern Xinjiang.

(3) Spatial distribution of the CRPS score of the multi-model ensemble mean was significantly different, with the CRPS score ranging from 1 to 7. The CRPS score of the BMA method at each station was below 2, indicating that the overall forecast performance of the BMA method is consistent in space. During the forecast period, the RMSE of the observations and the four model forecasts at most stations were above 2°C, and the largest RMSE exceeded 5°C. However, the RMSE of the observations and BMA forecasts at most stations are within 2°C. In the forecast period, the RMSE of the observation and BMA forecasts were lower than those of the other models, and there was no obvious regional difference. Additionally, the standard deviation and correlation coefficient between the observation and BMA forecasts are more than 0.98, and the RMSE decreases significantly.

Machine learning algorithms such as the support vector machine, light gradient boosting machine, and long short-term memory have been widely used in forecasting meteorological elements (Wang et al., 2018; Fan et al., 2019; Hamid et al., 2020; Qadeer et al., 2020). Compared with machine learning algorithms, statistical post-processing methods such as BMA are relatively easy to model but not sufficiently flexible (Javanshiri et al., 2021). Further research could compare and combine BMA and other statistical methods with machine learning algorithms to evaluate the post-processing methods suitable for Xinjiang. These conclusions provide theoretical support for the post-processing of regional numerical models in Xinjiang.

Data availability statement

The datasets used in this study can be provided by MA (ali@idm.cn) upon request.

Author contributions

All authors contributed to the study's conception and design. Material preparation, data collection, and data curation were performed by, JG, CW, and MS. The methodology and software were performed by WH, LZ and JL. The investigation, visualization, writing—original draft preparation, and analysis were performed by AA, YW, and MA. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Funding

This work was supported by the China Desert Meteorological Research Fund (Grant No. Sqj2021001), the National Key Research and Development Program (Grant No. 2018YFC1507105), and the National Natural Science Foundation of China (Grant No. 41875023).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Brunner, L., McSweeney, C., Ballinger, A. P., Befort, D. J., Benassi, M., Booth, B., et al. (2020). Comparing methods to constrain future European climate projections using a consistent framework. J. Clim. 33 (20), 8671–8692. doi:10.1175/jcli-d-19-0953.1

ORIGINAL RESEARCH article

Probabilistic 2-meter surface temperature forecasting over Xinjiang based on Bayesian model averaging

1 Introduction

2 Data and methods

2.1 Data

2.2 Methods

3 Results

3.1 Selection of the best training period

3.2 Probability forecast of Bayesian model averaging

3.3 Evaluation of the Bayesian model averaging forecast

4 Discussion

5 Conclusion

Data availability statement

Author contributions

Funding

Conflict of interest

Publisher’s note

References

This article is part of the Research Topic

People also looked at