基于非平衡损失函数的细粒度多标签专利分类方法研究

魏超; 毛一雷; 李琳珊; 王弋波; 李妙钰

文章摘要

魏超,毛一雷,李琳珊,王弋波,李妙钰.基于非平衡损失函数的细粒度多标签专利分类方法研究[J].高技术通讯(中文),2025,35(4):393~402

基于非平衡损失函数的细粒度多标签专利分类方法研究

Fine-grained multi-label patent classification method based on unbalanced loss function

DOI：

中文关键词: 多标签专利分类；深度学习；非平衡损失函数

英文关键词: multi-label patent classification, deep learning, unbalanced loss function

基金项目:

作者	单位
魏超	（中国科学技术信息研究所北京 100038）
毛一雷
李琳珊
王弋波
李妙钰

摘要点击次数: 518

全文下载次数: 492

中文摘要:

细粒度多标签专利分类方法面临非平衡专利分类标签，导致分类精度退化。为此，本文聚焦基于深度学习的多标签文本分类方法，将非平衡损失函数作为分类器的目标函数，然后通过基于深度学习的微调训练进行分类器最优化求解，引导分类器对非平衡类别进行再平衡，缓解标签非平衡问题。选择2017－2022年“光伏”领域的英文专利构建实验数据集进行实证，最佳微平均F1值为0.4969，宏平均F1值为0.3329，汉明损失为0.1745，相比于基于二元交叉熵(binary cross entropy，BCE)模型，分别提升25%、80%和8%。实验结果表明，该方法实现了面向“大组/小组”级多标签专利分类，改善了多标签分类的整体效果，提升了少样本类别效果。

英文摘要:

Fine-grained multi-label patent classification methods face unbalanced patent classification labels, resulting in degradation of classification accuracy. To this end, this paper focuses on the multi-label text classification method based on deep learning, takes the unbalanced loss function as the objective function of the classifier, and then optimizes the classifier through fine-tuning training based on deep learning, guiding the classifier to rebalance the unbalanced categories and alleviate the label imbalance problem. The English patents in the field of ‘photovoltaics’ from 2017 to 2022 are selected to construct an experimental data set for empirical verification. The best micro-average F1 value is 0.4969, the macro-average F1 value is 0.3329, and the Hamming loss is 0.1745, which are increased by 25%, 80% and 8% higher than those based on the binary cross entropy (BCE) model, respectively. Experimental results show that this method realizes multi-label patent classification at the ‘large group/small group’ level, improves the overall effect of multi-label classification, and improves the effect of few-sample categories.

查看全文查看/发表评论下载PDF阅读器

关闭