基于奇异值分解的气象数据推测

史加荣; 杨柳

doi:10.11676/qxxb2020.005

基于奇异值分解的气象数据推测

史加荣,
杨柳

Meteorological data estimation based on singular value decomposition

摘要

摘要: 以中国662个气象台站的2004—2013年逐日平均气温、平均相对湿度、日照时数和气温日较差4个气象要素为研究对象，使用奇异值分解方法来推测缺失气象数据。为降低随机的不利影响，将10年的逐日气象数据做平均。分别采用奇异值分解的相对误差和相似度矩阵来证实气象数据的近似低秩性，并探讨气象要素之间的相关。分析主要的基向量，设计3组推测试验。第1组试验随机选取6个气象台站的数据用于测试，其余台站用于训练，以获得5个最佳的基向量。随机选取每个测试台站的12个观测值，再由所选取的基向量来推测未知值。平均气温、平均相对湿度、日照时数和气温日较差的平均推测误差分别为8.00%、7.83%、17.17%和10.82%。在第2组试验中，随机选取70%的气象台站用于训练，其余气象台站用于验证推测性能。试验结果表明基向量的数目可选为5—15，随着基向量或观测值数量的增加，推测性能也随之改善。第3组试验，根据10个台站1952年下半年的气象观测数据，推测上半年的未观测值，试验结果合理可靠。

Abstract: Based on daily average temperature, relative humidity, sunshine hours and diurnal temperature range at 662 meteorological stations in China from 2004 to 2013, the method of singular value decomposition is employed to estimate missing meteorological data. To reduce the negative influence of randomness, the above 10 years of daily meteorological data are averaged. Both the relative error of singular value decomposition and the similarity matrix are adopted to verify the approximate low-rankness of meteorological data, and the correlations between different meteorological elements are discussed. After expounding the principal base vectors, three groups of estimation experiments are designed. The first group randomly chooses data at 6 meteorological stations for testing, and the data at those remaining stations are used for training to obtain five best base vectors. For each testing station, 12 observations are stochastically selected and other unknown elements are estimated according to the chosen base vectors. The means of estimation errors of average temperature, relative humidity, sunshine hours and diurnal temperature range are 8.00%, 7.83%, 17.17% and 10.82%, respectively. In the second group of experiments, 70% of the meteorological stations are randomly selected for training, the remaining for validating the estimation performance. The experimental results show that the number of base vectors can be chosen in the range from 5 to 15, and the estimation performance can be improved with the increase of the number of base vectors or the number of observations. The third group of experiments estimates the unobserved meteorological data at 10 stations in the first half of 1952 according to the corresponding observed data in the second half, and the estimation results are reasonable and reliable.

HTML全文

参考文献(39)

施引文献

资源附件(0)