文章摘要
程齐凯,雷道宇,石湘,刘寅鹏.基于大模型的科研设备成本评估框架[J].情报工程,2024,10(5):099-114
基于大模型的科研设备成本评估框架
Scientific Research Equipment Cost Evaluation Framework Based On Large Language Model
  
DOI:10.3772/j.issn.2095-915X.2024.05.009
中文关键词: 信息抽取;大模型;高效微调;成本估算框架
英文关键词: Information Extraction; Large Language Model; Efficient Fine-tuning; Cost Evaluation Framework
基金项目:国家自然科学基金面上项目“基于机器阅读理解的科学命题文本论证逻辑识别”(72174157);国家自然科学基金重点项目“数智赋能的科技信息资源与知识管理理论变革”(72234005)。
作者单位
程齐凯 1. 武汉大学信息管理学院 武汉 430072;2. 武汉大学信息检索与知识挖掘研究所 武汉 430072 
雷道宇 1. 武汉大学信息管理学院 武汉 430072;2. 武汉大学信息检索与知识挖掘研究所 武汉 430072 
石湘 1. 武汉大学信息管理学院 武汉 430072;2. 武汉大学信息检索与知识挖掘研究所 武汉 430072 
刘寅鹏 1. 武汉大学信息管理学院 武汉 430072;2. 武汉大学信息检索与知识挖掘研究所 武汉 430072 
摘要点击次数: 26
全文下载次数: 13
中文摘要:
      [目的/意义]提出一种创新的基于大模型的科研设备成本评估框架,旨在解决传统成本评估方法中的局限性,如成本评估的不精确性和效率低下问题。自动化地从科研论文中抽取实验材料与设备信息,并设计科研设备成本估算模型,从而精准和高效地评估科学研究成本,为实验成本的精确评估和科研资源的有效利用提供了新的工具和方法。[方法/过程]以物理和计算机领域为例,利用arXiv数据库与PaperWithCode网站提供的论文数据构建了一个训练数据集,并采用LoRA微调技术在基准模型LLaMA2-13b上进行微调,使其能够精确抽取目标领域论文中关于实验设备与材料的详细信息。通过Wikipedia进行实体链接消歧,并综合考虑材料设备的价格波动,设计了一种平均情况分析的成本估算公式,以计算机视觉领域为例对科研设备成本评估框架的有效性进行验证。[局限]只在计算机领域和物理领域进行了实验,同时数据集的构建主要依赖于公开可获取的论文数据,这可能限制了成本评估框架的泛化能力和准确性。[结果/结论]通过对计算机科学与物理学领域的科研论文进行实证分析,展示了基于大模型的科研设备成本评估框架的有效性。通过LoRA技术微调的LLaMA2模型在信息抽取任务上显示出较高的准确率和召回率,证明了本框架在精准抽取实验材料与设备信息方面的能力。同时,在计算机视觉领域开展了成本估算分析,揭示了计算资源已经成为制约计算机视觉领域科研产出的关键因素之一和特定的算法模型结构或研究范式存在性能上限等结论。这些发现与实际科研活动相吻合,证明了本文提出的成本评估框架能够准确反映科研实践的现实情况,为科研项目的资源优化提供了重要参考。
英文摘要:
      [Objective/Significance] This study proposes an innovative framework for assessing the cost of scientific research equipment based on large language models, aiming to address the limitations of traditional cost assessment methods, such as the inaccuracy and inefficiency of cost estimation. By automatically extracting experimental material and equipment information from scientific research papers and designing a cost estimation model for research equipment, this framework provides a new tool and method for accurately and efficiently evaluating the cost of scientific research, enabling precise cost assessment and effective utilization of research resources. [Methods/Processes] Using physics and computer science as examples, this study constructs a training dataset based on the paper data provided by the arXiv database and the Paper with Code website. It employs the LoRA fine-tuning technique on the benchmark model LLaMA2-13b, enabling it to accurately extract detailed information about experimental equipment and materials from papers in the target domains. Entity linking disambiguation is performed using Wikipedia, and a cost estimation formula based on average-case analysis is designed, considering the price fluctuations of materials and equipment. The effectiveness of the research equipment cost assessment framework is validated using the field of computer vision as an example. [Limitations] Experiments were conducted only in the computer science and physics domains, and the construction of the dataset primarily relies on publicly available paper data, which may limit the generalizability and accuracy of the cost assessment framework. [Results/Conclusions] Through empirical analysis of scientific research papers in the fields of computer science and physics, this study demonstrates the effectiveness of the research equipment cost assessment framework based on large language models. The LLaMA2 model fine-tuned using LoRA technology exhibits high accuracy and recall in the information extraction task, proving the framework’s ability to accurately extract experimental material and equipment information. Additionally, the study conducts cost estimation analysis in the field of computer vision, revealing that computational resources have become one of the key factors constraining research output in computer vision, and that specific algorithmic model structures or research paradigms have performance limits. These findings align with real-world scientific research activities, demonstrating that the proposed cost assessment framework can accurately reflect the realities of scientific practice and provide important references for optimizing resources in research projects.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮