Abstract:The content of topsoil organic matter strongly influences the growth of crops, so understanding its spatial distribution is of great significance in guiding agricultural production and improving soil fertility. Taking 5 922 center points of polygons in the map of cultivated land management units of the Huixian City in Henan Province as the basic data, this study tried to evaluate the complex non-linear relationship between topsoil organic matter content and influential factors at the county scale by using the model of random forest (RF). Each point included soil types, which were the auxiliary qualitative variables, environmental variables (slope, curvature, slope, elevation, soil texture, NDVI) and socio-economic factors (drainage capacity, irrigation status), and in addition, 5 922 center points was randomly divided into the training data set and verification data set with the ratio of 8︰2, 7︰3 and 6︰4 separately. Then the accuracy of predicted map of SOM was evaluated by three ways according to the model. The results showed that when the ratio of the training data set and verification data was 8︰2, the prediction accuracy of RF model was generally higher, and the correlation was 0.859 between the predicted and the existing maps of SOM. Pearson correlation coefficient was 0.595 between the predicated and measured data of 303 field points. Based on the importance of the influential factors, it was found that soil texture was the most important variable affecting distribution of SOM in the agricultural land of the study area. The results demonstrate that the RF method, as a machine learning and data mining approach, can simulate relationships between the input variables and SOM content, meanwhile, the maps can show reliable predicted results of SOM but couldn’t disclose the fine differences in SOM.