邢晓昭,郑彦宁.中文专利发明人重名消解问题研究[J].数字图书馆论坛,2018,(10):2~8 |
中文专利发明人重名消解问题研究 |
Research on Inventors' Name Disambiguation for Chinese Patent Information |
投稿时间:2018-10-08 |
DOI:10.3772/j.issn.1673-2286.2018.10.001 |
中文关键词: 重名消解;中文专利;发明人;相似度;向量空间模型 |
英文关键词: Name Disambiguation; Chinese Patent Information; Inventor; Similarity; Vector Space Model |
基金项目:本研究得到中国科学技术信息研究所创新研究基金青年项目"基于社会网络分析的科研团队识别关键技术研究"(编号:QN2018-01)资助. |
作者 | 单位 | 邢晓昭 | 中国科学技术信息研究所 | 郑彦宁 | 中国科学技术信息研究所 |
|
摘要点击次数: 2150 |
全文下载次数: 1979 |
中文摘要: |
专利发明人分析为技术人才评价和科研团队识别提供有力的数据支撑.然而,中文姓名存在大量重名现象,使得基于发明人的研究结果出现偏差.本文提出一种基于规则的中文专利发明人重名消解方法.针对专利申请人因为并购、拆分、重组或战略转型等原因造成的名称不一致情况,采用基于向量空间模型的余弦相似度算法进行识别;针对因门牌号书写不规范而造成的地址不一致情况,采用基于邮编和门牌地址的分级匹配算法进行识别;合作者相似度采用Jaccard系数计算.以中国科学技术信息研究所《电动汽车专题数据库》为例,验证该方法的科学性和有效性. |
英文摘要: |
The analysis of patent inventors provides powerful data support for the evaluation of technical talents and the identification of scientific research teams. However, there are large number of duplicate names existing in the Chinese names, making the research results based on inventors deviate. In this paper, we propose an algorithm dealing with the duplicate names of the inventors based on rules. Given the inconsistencies in names of the patent applicants caused by reasons such as merger, split, restructuring or strategic transformation, a cosine similarity algorithm based on Vector Space Model is adopted to judge the relevant institutions. In view of the inconsistent addresses due to the incorrect writing of the house number, a hierarchical matching algorithm based on zip code and house number is applied to identify the similar address information. The similarity of the collaborators is calculated by the Jaccard coefficient. Finally, taking the thematic database on electric vehicles of ISTIC as an example, an empirical research is carried out to verify the scientificity and effectiveness of the method. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |