胡泽文,王梦雅,韩雅蓉.基于机器学习的中国区块链专利技术主题识别与自动分类研究[J].数字图书馆论坛,2023,19(12):32~43 |
基于机器学习的中国区块链专利技术主题识别与自动分类研究 |
Topic Recognition and Automatic Classification of Chinese Blockchain Patent Technology Based on Machine Learning |
投稿时间:2023-10-25 |
DOI:10.3772/j.issn.1673-2286.2023.12.004 |
中文关键词: LDA主题模型;机器学习;区块链;主题识别;自动分类 |
英文关键词: LDA Topic Model; Machine Learning; Blockchain; Topic Recognition; Automatic Classification |
基金项目:本研究得到国家社会科学基金项目“面向海量科技文献的潜在‘精品’识别方法与应用研究”(编号:20CTQ031)、江苏高校“青蓝工程”资助。 |
作者 | 单位 | 胡泽文 | 南京信息工程大学气象灾害预报预警与评估协同创新中心 | 王梦雅 | 南京信息工程大学气象灾害预报预警与评估协同创新中心 | 韩雅蓉 | 南京信息工程大学气象灾害预报预警与评估协同创新中心 |
|
摘要点击次数: 496 |
全文下载次数: 460 |
中文摘要: |
区块链领域技术主题的自动识别与技术主题范畴的自动分类研究,为拓展领域研发主题和推动领域发展提供情报支持。以德温特专利数据库中的中国区块链技术专利为样本,设计和实现基于机器学习的区块链技术主题识别与自动分类模型,实现基于LDA主题模型的区块链技术主题识别。基于专利文献特征向量空间,形成技术主题范畴的分类体系,最终实现基于传统机器学习和深度学习模型的区块链技术主题自动分类。研究发现:LDA主题模型能够有效识别出区块链技术领域的主题类别,并构建出技术主题类别的特征向量空间,共识别出18个技术主题,按照研究方向归纳为区块链架构研究、区块链行业应用研究、数据存储和数据安全保护研究、高新技术应用研究4类主题范畴;通过交叉融合LDA主题模型、传统机器学习与深度学习等机器学习方法,能够有效实现领域技术主题范畴的自动分类。分类结果显示,支持向量机、LightGBM、LSTM、BP神经网络、逻辑回归模型等分类模型的性能较优,准确率为84%~87%,确率为79%~83%,其中逻辑回归模型的自动分类效果更显著。 |
英文摘要: |
The automatic recognition of technology topics in the field of blockchain and the automatic classification of technology topic categories provide intelligence support for expanding research and development topics in the field and promoting the development of the field. This paper takes the Chinese blockchain technology patents in the Derwent patent database as samples, designs and implements the blockchain technology topic recognition and automatic classification model based on machine learning, and realizes the blockchain technology topic recognition based on the LDA topic model. Based on the characteristic vector space of patent literature, a classification system for technology topic categories is formed, ultimately achieving automatic classification of blockchain technology topics based on traditional machine learning and deep learning models. The results show that the LDA topic model can effectively identify the topic categories in the blockchain technology field, and construct the characteristic vector space of the technology topic categories. 18 technology topics are identified, which can be summarized as four topic categories according to the research direction: blockchain architecture research, blockchain industry application research, data storage and data security protection research, and high-tech application research. Through the cross-fusion of LDA topic model, traditional machine learning and deep learning, and other machine learning methods, we can effectively realize the automatic classification of technology topic categories in the domain. The classification results show that the performance of classification models such as support vector machine, LightGBM, LSTM, BP neural network, and logistic regression model is better. The accuracy rate is 84%-87%, and the precision rate is 79%-83%, among which the automatic classification effect of logistic regression model is more significant. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |
|
|
|