A study on impact of multicollinearity on stepwise regression prediction equation.
-
-
Abstract
The accuracy of traditional stepwise regression meteorological prediction equation (SRMPE) is limited by the existence of multicollinearity among predictors of the equation, this paper introduces conditional number into the prediction modeling to minimize it in the traditional SRMPE. In the prediction modeling of novel SRMPE, the conditional number is used to determine the predictor set which has the lowest multicollinearity among the possible sets from a number of preliminary screening out predictors (independent variables), and is then used to construct the novel SRMPE. The novel prediction modeling based on condition number is exampled with typhoon track prediction, which is a well known nodus in meteorological disaster prediction.12 typhoons track latitude/longitude stepwise regression prediction equations have been built employing both the traditional and novel prediction modeling methods, respectively, but using a large number of identical samples. And the comparison and analysis results indicate that under the condition of same predictors (independent variable) and predictands (dependent variables), despite the fitting accuracy of typhoon tracks of the novel prediction model to the historical modeling samples is slightly lower than that of the traditional model, the prediction accuracy to the independent samples is obviously improved, with an averaged prediction error of the novel model for July, August, and September being 153.9 km, 75.3 km smaller than that of the tradition model (a reduction of 33%), due to the effectively minimizing of multicollinearity by the computation and analysis of condition number in modeling. It is further shown that when F =1.0, 2.0 and 3.0, the prediction errors of the traditional stepwise regression prediction equations are also obviously larger than those of the novel model. Furthermore, the extremely large/unreasonable errors occurred at the individual points of typhoon tracks in the independent sample prediction experiments of the traditional prediction model due to the impact of the multicollinearity in its predictor set.
-
-