朱玉强,江涛,李翼飞.外文数据库英译中文作者姓名消歧实践[J].数字图书馆论坛,2022,(2):33~39 |
外文数据库英译中文作者姓名消歧实践 |
Practice of Author Name Disambiguation in Chinese-English Translation of Foreign Language Database |
投稿时间:2022-01-25 |
DOI:10.3772/j.issn.1673-2286.2022.02.005 |
中文关键词: 姓名消歧;地址消歧;数据治理;外文数据库 |
英文关键词: Name Disambiguation; Address Disambiguation; Data Governance; Foreign Language Database |
基金项目:本研究得到2021年度海南省哲学社会科学规划课题(编号:hnsz2021-19)资助。 |
作者 | 单位 | 朱玉强 | 山东师范大学图书馆 | 江涛 | 海南医学院图书馆 | 李翼飞 | 山东师范大学图书馆 |
|
摘要点击次数: 1275 |
全文下载次数: 1034 |
中文摘要: |
针对外文数据库英译中文作者姓名存在多记录指向同一人或同记录指向不同人等情况,模拟人工排检法,整合多源数据、学术社交网络、知识百科及在线翻译网站等语料库,利用网页文档对象自动操作、正则表达式、短文本相似度计算等技术编制程序开展英译中文作者姓名消歧实践。结果表明,算法架构稳定有效、扩展性强,成功率得到从业人员认可,为数据预处理和清洗工作提供了新思路和新方法。 |
英文摘要: |
Aiming at the situation where there are multiple records of the names and addresses of the authors in English-to-Chinese translations of foreign language databases pointing to the same person or the same records pointing to different people, the article simulates manual sorting, integrating multi-source data, academic social networks, knowledge encyclopedias, and online translation websites and other corpora. Use the automatic operation of web document objects, regular expressions, short text similarity calculation and other technologies to compile programs to carry out the practice of disambiguation from English to Chinese name and address. The results show that the algorithm architecture is stable and effective, with strong scalability, and the success rate is recognized by practitioners. It provides new ideas and new methods for data preprocessing and cleaning. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |