张毅,王星光,陈敏,刘瑜.基于语义的文本地理范围提取方法[J].高技术通讯(中文),2012,22(2):165~170 |
基于语义的文本地理范围提取方法 |
A semantics based method for extracting geographic scopes of texts |
修订日期:2011-06-20 |
DOI: |
中文关键词: 地理信息检索(GIR), 文本地理范围, 证据理论 |
英文关键词: geographic information retrieval (GIR), geographic scope of texts, evidence theory |
基金项目:863计划(2007AA120502)和国家自然科学基金(41171296)资助项目 |
作者 | 单位 | 张毅 | 北京大学遥感与地理信息系统研究所 | 王星光 | 北京大学遥感与地理信息系统研究所 | 陈敏 | 北京大学遥感与地理信息系统研究所 | 刘瑜 | 北京大学遥感与地理信息系统研究所 |
|
摘要点击次数: 3542 |
全文下载次数: 2294 |
中文摘要: |
为了能够处理网页文档中的地理信息,提出了一个新颖的自动提取文本地理位置的方法。该方法通过一个三阶段的地理语义处理过程,实现了文本的多尺度地理标注。首先,在地理知识库的支持下,识别文本中的地名;其次,基于地理的和非地理的语义消除地名歧义并且应用证据理论合成排歧证据;最后,基于相关认知理论构建文本的地理参照树,再根据实体间的语义关系计算得到焦点地理实体,从而确定文本的地理位置。以上算法在地理信息检索原型系统GeoSeracher中得到实现,评估结果表明其具有较高的准确度。 |
英文摘要: |
To process geographic information in Web pages, this paper presents a novel method for extracting the geographic scopes of documents. It assigns the multi scale geographic scope to a document through a three stage process for dealing with geographic semantics. Firstly, the toponyms in a document are recognized under the support of the geographic knowledge base. Secondly, the ambiguous toponyms are disambiguated based on geographic and non geographic semantics, and the evidences for disambiguation are combined by the evidence theory. Lastly, a geo referenced tree is constructed based on a cognitive theory and the geographic focuses are obtained according to sematic relationships. The geographic location of a document is therefore determined. The above method was implemented in GeoSearcher, a prototype system for geographic information retrieval. The evaluation results show that the proposed method can reach the higher accuracy. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |