[关键词]
[摘要]
耕地表层土壤有机质含量与作物生长发育密切相关,掌握土壤有机质空间分布对土壤肥力定向培养和农业生产指导具有重要意义。本研究以河南省辉县市5 922个耕地资源管理单元图斑中心点为基础数据,并分别按8∶2、7∶3、6∶4的比例随机划分训练数据集和验证数据集,以土壤类型作为辅助定性变量,利用随机森林模型模拟预测土壤有机质含量与自然环境变量(坡向、曲率、坡度、高程、土壤质地、归一化植被指数NDVI)、社会经济因子(排水能力、灌溉状况)之间的复杂非线性关系。结果表明:①当训练集与检验集中样点数量的比例为8∶2时,对应的随机森林模型总体上预测精度较高;②选用80% 基础数据作为训练集时,预测得到的地图与已有图件相比,相关性达到0.859;③当用303个实地数据验证时,预测值与实测值的皮尔逊相关系数为0.595。通过对影响因子的重要性排序,发现土壤质地是研究区农用地表层土壤有机质含量的最重要影响因子。因此,随机森林模型作为机器学习和数据挖掘的有效方法,能较好地模拟输入变量与有机质含量之间的关系,预测图件与实际情况相符,但对有机质含量精细的差异不能很好体现。
[Key word]
[Abstract]
The content of topsoil organic matter strongly influences the growth of crops, so understanding its spatial distribution is of great significance in guiding agricultural production and improving soil fertility. Taking 5 922 center points of polygons in the map of cultivated land management units of the Huixian City in Henan Province as the basic data, this study tried to evaluate the complex non-linear relationship between topsoil organic matter content and influential factors at the county scale by using the model of random forest (RF). Each point included soil types, which were the auxiliary qualitative variables, environmental variables (slope, curvature, slope, elevation, soil texture, NDVI) and socio-economic factors (drainage capacity, irrigation status), and in addition, 5 922 center points was randomly divided into the training data set and verification data set with the ratio of 8︰2, 7︰3 and 6︰4 separately. Then the accuracy of predicted map of SOM was evaluated by three ways according to the model. The results showed that when the ratio of the training data set and verification data was 8︰2, the prediction accuracy of RF model was generally higher, and the correlation was 0.859 between the predicted and the existing maps of SOM. Pearson correlation coefficient was 0.595 between the predicated and measured data of 303 field points. Based on the importance of the influential factors, it was found that soil texture was the most important variable affecting distribution of SOM in the agricultural land of the study area. The results demonstrate that the RF method, as a machine learning and data mining approach, can simulate relationships between the input variables and SOM content, meanwhile, the maps can show reliable predicted results of SOM but couldn’t disclose the fine differences in SOM.
[中图分类号]
S159-3
[基金项目]
国家自然科学基金项目(40971128)资助。