Abstract:The soil research papers are the main approaches for the expression and dissemination of soil knowledge, and are also able to promote communications between soil researchers. The figures in these papers are an important form to reveal patterns and trends of soil data. However, because the values of relevant points, bars and other elements are difficult to conveniently be extracted from figures, further usages of these kinds of data are limited. Thus, it is urgent to develop an automated high-precision extraction technique for numerical values from figures in soil papers. In this study, based on deep learning, we proposed a technical framework to extract numerical values from figures of soil papers. Firstly, the common figure elements and their symbols were sorted out, and some figures were collected and manually labelled to form a training dataset. Secondly, using YOLO v8 base model, which uses the global image to detect multiple targets through one-time process, an optimized model suitable for the detection of figure elements in soil papers was trained through several rounds of training. Thirdly, to convert the identified figure elements to real values, an algorithm was designed to automatically calculate the numerical values in 2D scatter and histogram figures. Using figures that were not involved in the training, the results showed this technique could effectively extract the figure elements and the numerical values were in high agreement with the manually extracted values. Therefore, the technical framework proposed in this study has strong feasibility, which provides a new approach for the efficient use of figure data in soil papers.