石教祥 朱礼军 魏 超 张玄玄.融合迁移学习与主动学习的金融科技实体识别方法[J].中国科技资源导刊,2022,(2):35~45 |
融合迁移学习与主动学习的金融科技实体识别方法 |
FinTech Named Entity Recognition Based on Transfer Learning and Active Learning |
投稿时间:2021-07-06 |
DOI: |
中文关键词: 命名实体识别;少样本;主动学习;迁移学习;BERT |
英文关键词: Named Entity Recognition, Few-shot, active learning, transfer learning, BERT |
基金项目:国家重点研发计划项目“颠覆性技术感知响应平台研发与应用示范”课题“地平线扫描系统”(2019YFA0707202);
中国博士后科学基金第 65 批面上项目“流形正则化自编码政策文本表示及主题词抽取方法研究”(2019M650804)。 |
作者 | 单位 | 石教祥 朱礼军 魏 超 张玄玄 | (中国科学技术信息研究所,北京 100038) |
|
摘要点击次数: 706 |
全文下载次数: 1090 |
中文摘要: |
命名实体识别为推动智能系统建设和科技情报服务起到重要作用。针对领域实体识别存在的标注成本高、
识别准确率不高问题,从引入通用领域信息、削减孤立点影响的角度出发,设计基于语义相似度与不确定性度量的主
动迁移学习方法。该方法结合预训练迁移学习模型来提高分类准确性,通过融合主动学习采样策略来减少标注成本。
利用金融科技和通用领域语料库进行一系列实验,实验结果表明该方法能够有效地提高识别准确率,减少标注成本。 |
英文摘要: |
Named Entity Recognition (NER) plays an important role in promoting the construction of
intelligent systems and scientific and technical information services. Aiming to solve the problems of high
labeling cost and low recognition accuracy in named entity recognition in special fields, we propose a novel
sampling framework called Active Transfer Learning based on Semantic Similarity and Uncertainty (ATLSSU) from the perspective of adding extra-semantic information in general field and reducing the impact of
outliers. This method combines a pre-trained transfer learning (TL) model to improve classification accuracy,
and integrates active learning (AL) sampling strategies to reduce labeling costs. We perform a series of
experiments on FinTech corpus and general corpus. The results show that our method can effectively improve
the performance and reduce the annotation costs. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |