随机森林机器算法在江苏省小麦赤霉病病穗率预测中的应用

Application of the random forest machine algorithm in forecasting diseased panicle rate of wheat scab in Jiangsu province

  • 摘要: 基于2002—2018年江苏省13个市的小麦赤霉病病穗率资料与生育期观测资料、相应时段内的逐日气象数据,应用随机森林机器学习算法,分生育期、分区域定量评估影响病穗率的主要气象因子特征变量和贡献率,按不同起报时间建立预测模型并进行验证。结果表明,各生育期重要特征变量贡献率的排序为:抽穗扬花期>拔节期>越冬期。抽穗扬花期湿度、连续≥3 d的雨日和日照对赤霉病起主导作用,拔节期日照、降雨量、湿度和雨日与越冬期气温和降雪对赤霉病均具有前期影响,甄别出的重要特征变量排序结果符合赤霉病菌发育、释放、侵染和流行规律;基于随机森林算法建立的病穗率预测模型的精度与重要特征变量个数、赤霉病发生区域、Mtry参数设定、生育期有关;最早可在3月初进行预测,预测时效近3个月,起报时间越接近乳熟期,输入的重要特征变量越多,则病穗率预测准确率越高,病穗率模拟值与实测值的波动趋势完全一致,对赤霉病“中等”和“偏重”等级模拟效果好,表明随机森林算法在赤霉病预测中有较高的可靠性和业务应用潜力。

     

    Abstract: The identification of meteorological and biotic factors that have significant impacts on wheat scab and the development of models for predicting diseased panicle rates at different stages are of remarkable significance for improving the ability to predict scab seriousness and protecting ecological environment of farmlands. On the basis of observations of diseased panicle rate and winter wheat phenology as well as daily meteorological elements in 13 cities in Jiangsu Province of China during the period from 2002 to 2008, the dominant meteorological elements that affect diseased panicle rate are identified, and the contributions of individual elements to diseased panicle rate are assessed for different phonological stages in various regions. Models that are initialized at different times for predicting diseased panicle rates are developed using the random forest(RF)regression algorithm. The reliability of the models is verified against observations of diseased panicle rates. Meteorological and biotic factors during the heading and flowering stage have the largest contribution to final diseased panicle rates, followed by that in the jointing stage and overwintering period. The dominant factors that determine final diseased panicle rates are relative humidity, the total number of consecutive rainy days larger than 3 d, and sunshine during the heading and flowering stages. Sunshine duration, precipitation, relative humidity and rainy days during the jointing stage have significant influences on final diseased panicle rates. Temperature and snowfall during the overwintering period have large precursor impact on final diseased panicle rates. The identified relative importance of key variables in each growth period is consistent with the theory on the development, release, infection, and epidemic of scab. The accuracy of models predicting diseased panicle rates based on RF algorithm varies with the number of critical characteristic variables, regions, the value of parameter Mtry, and the growth period. The earliest time when the models can be used to yield useable prediction of diseased panicle rates is the beginning of March. The longest valid forecast time of the models is about 3 months. With the time approaching the maturity period and increases in the number of important characteristic variables as inputs, the accuracy of the modes increases and the discrepancy between predicted and observed diseased panicle rates is significantly reduced. Models have better skills in predicting medium and serious categories of scab. This study indicates that the RF algorithm is able to provide reliable prediction of scab and thus has a great application potential.

     

/

返回文章
返回