刘辉,刘耀.基于条件随机场的专利术语抽取[J].数字图书馆论坛,2014,(12):46~49 |
基于条件随机场的专利术语抽取 |
Patent Term Extraction Based on Conditional Random Fields |
|
DOI: |
中文关键词: 条件随机场;术语抽取;序列标注 |
英文关键词: Conditional random fields;Term extraction;Sequence labeling |
基金项目:本研究得到“十二五”国家科技支撑计划项目“专利信息资源挖掘与发现关键技术研究”(编号2013BAH21B02)资助。 |
作者 | 单位 | 刘辉 | 中国科学技术信息研究所 | 刘耀 | 中国科学技术信息研究所 |
|
摘要点击次数: 2086 |
全文下载次数: 1599 |
中文摘要: |
专利术语抽取是专利文献信息抽取领域的一项重要任务,有助于专利领域词表的构建,有利于中文分词、句法分析、语法分析等工作的进行。文章通过分析专利术语的特点并制定相应的语料标注规则进行人工标注,采用条件随机场(conditional random fields,CRFs)对标注后的数据进行训练和测试,实现了通信领域的术语抽取。标注方法采用基于字的序列标注,精确率、召回率和F值分别达到80.9%、75.6%、78.2%,优于将词和词性等信息作为特征的方法,表明所提出的专利术语抽取方法是有效的。 |
英文摘要: |
Patent term extraction is an important task in patent information extraction, which benefits the construction of lexicography, the work of word segmentation, and parsing. Corpus is labeled manual y with corresponding rules writ en by analyzing the characteristics of patent terms. CRFs (Conditional Random Fields) is adapted to train and test labeled data. Sequence labeling is based on single Chinese characters. Experimental results show that the precision, recal and F-score are 80.12%, 74.2%and 76.9%respectively, which are superior to methods based on sequence labeling of words. Results il ustrates that the established model for extracting patent term is effective. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |