Ji Guang(季光),Wang Guiling,Han Yanbo.[J].高技术通讯(英文),2013,19(2):203~207 |
|
Creating customized data services from web pages |
|
DOI: |
中文关键词: |
英文关键词: web data extraction, structured data, user labeling, customization, data service |
基金项目: |
Author Name | Affiliation | Ji Guang(季光) | | Wang Guiling | | Han Yanbo | |
|
Hits: 851 |
Download times: 0 |
中文摘要: |
|
英文摘要: |
To extract structured data from a web page with customized requirements, a user labels some DOM elements on the page with attribute names. The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts, and are also utilized to retrieve attribute values. To turn the attribute values into a structured result, the attribute pattern needs to be induced. For this purpose, a space-optimized suffix tree called attribute tree is built to transform the document object model (DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order. The pattern is induced bottom-up on the attribute tree, and is further used to build the structured result. Experiments are conducted and show high performance of our approach in terms of precision, recall and structural correctness. |
View Full Text
View/Add Comment Download reader |
Close |
|
|
|