复共线性关系对逐步回归预报方程的影响研究

A study on impact of multicollinearity on stepwise regression prediction equation.

  • 摘要: 针对气象预报中常用的逐步回归预报建模方法,由于没有直接考虑筛选出的预报因子之间可能存在复共线性关系会影响气象预报方程的预报性能问题,提出了在初选的大量气象预报因子(自变量)中,采用条件数计算分析方法,选择复共线性关系小的预报因子组合建立预报模型的方法。以重要气象灾害的预报难点——台风预报为例,用大样本分别建立了12个台风移动经度、纬度的条件数预报方程和逐步回归预报方程。对比分析结果表明,由于条件数计算分析有效控制了预报因子间的复共线性关系,因此,在相同的预报因子(自变量)和预报对象(因变量)条件下,分月建立的条件数台风移动路径预报方程,虽然历史建模样本的拟合精度略低于逐步回归预报方程,但是对独立样本的预报精度明显提高,其中7、8和9月条件数预报方程的预报误差平均为153.9 km,而相应的逐步回归预报误差平均为229.2 km,两者相差75.3 km。进一步研究发现,在F值分别取1.0、2.0和3.0的情况下,建立的台风移动路径的逐步回归预报方程,其预报误差也明显大于条件数预报方程。另外,由于预报因子组合的复共线性的影响,逐步回归方程还出现了在个别点预报误差极大的不合理情况。

     

    Abstract: The accuracy of traditional stepwise regression meteorological prediction equation (SRMPE) is limited by the existence of multicollinearity among predictors of the equation, this paper introduces conditional number into the prediction modeling to minimize it in the traditional SRMPE. In the prediction modeling of novel SRMPE, the conditional number is used to determine the predictor set which has the lowest multicollinearity among the possible sets from a number of preliminary screening out predictors (independent variables), and is then used to construct the novel SRMPE. The novel prediction modeling based on condition number is exampled with typhoon track prediction, which is a well known nodus in meteorological disaster prediction.12 typhoons track latitude/longitude stepwise regression prediction equations have been built employing both the traditional and novel prediction modeling methods, respectively, but using a large number of identical samples. And the comparison and analysis results indicate that under the condition of same predictors (independent variable) and predictands (dependent variables), despite the fitting accuracy of typhoon tracks of the novel prediction model to the historical modeling samples is slightly lower than that of the traditional model, the prediction accuracy to the independent samples is obviously improved, with an averaged prediction error of the novel model for July, August, and September being 153.9 km, 75.3 km smaller than that of the tradition model (a reduction of 33%), due to the effectively minimizing of multicollinearity by the computation and analysis of condition number in modeling. It is further shown that when F =1.0, 2.0 and 3.0, the prediction errors of the traditional stepwise regression prediction equations are also obviously larger than those of the novel model. Furthermore, the extremely large/unreasonable errors occurred at the individual points of typhoon tracks in the independent sample prediction experiments of the traditional prediction model due to the impact of the multicollinearity in its predictor set.

     

/

返回文章
返回