基于深度学习的土壤学文献图形数值抽取技术框架初步构建
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

S15;TP399

基金项目:

国家重点研发计划项目(2020YFC1807401)、国家科技基础资源调查专项项目(2021FY100703)和中国科学院网络安全和信息化专项应用示范项目(CAS-WX2022SF-0201)资助。


Preliminary Construction of Technical Framework for Numerical Value Extraction from Figures in Soil Literatures Based on Deep Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对土壤学文献中图形数值抽取效率低的问题,本文提出了基于深度学习的图形数值抽取技术框架。首先,对土壤学文献图形中常见要素及其表示符号进行梳理,并收集相关图形进行标注,形成训练数据集;然后,利用基于全局图像信息的YOLO v8模型,通过多轮迭代优化训练适合于土壤学文献图形要素的识别模型;最后,研发图像坐标到数值坐标的转换算法,实现二维散点和柱状图数值的自动提取。经独立样本检验,所训练的模型可有效识别文献中的图形要素,且解算的数值与手工提取结果高度吻合(线性回归决定系数R2大于0.99)。据此,本文构建的基于深度学习的图形数值抽取技术框架具有较强的可行性,为土壤学文献中图形数据进一步利用提供了一种新途径。

    Abstract:

    To address the issue of low efficiency in extracting numerical values from figures, based on deep learning, a technical framework to extract numerical values from figures in soil literatures was proposed. Firstly, the common figure elements and their symbols were sorted out, and some figures were collected and manually labelled to form a training dataset. Secondly, using YOLO v8 base model, which uses the global image to detect multiple targets through one-time process, an optimized model suitable for the detection of figure elements in soil literatures was trained through several rounds of training. Thirdly, to convert the identified figure elements to real values, an algorithm was designed to automatically calculate the numerical values in 2D scatter and histogram figures. Using figures that were not involved in the training, the results showed this technique could effectively extract the figure elements and the numerical values were in high agreement with the manually extracted values (the linear regression coefficient of determination R2 > 0.99). Therefore, the technical framework proposed in this study has strong feasibility, which provides a new approach for the efficient use of figure data in soil literatures.

    参考文献
    相似文献
    引证文献
引用本文

刘杰,马海艺,郭志英,郏梦思,王昌昆,潘贤章.基于深度学习的土壤学文献图形数值抽取技术框架初步构建[J].土壤,2025,57(2):445-451. LIU Jie, MA Haiyi, GUO Zhiying, JIA Mengsi, WANG Changkun, PAN Xianzhang. Preliminary Construction of Technical Framework for Numerical Value Extraction from Figures in Soil Literatures Based on Deep Learning[J]. Soils,2025,57(2):445-451

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-12-25
  • 最后修改日期:2024-05-12
  • 录用日期:2024-05-14
  • 在线发布日期: 2025-05-08
  • 出版日期:
文章二维码