杭建琴,张鸣宇,胡泽文.基于教材文本语料库的自适应主题词表构建——以经济类专业为例[J].情报工程,2024,10(3):114-127 |
基于教材文本语料库的自适应主题词表构建——以经济类专业为例 |
A Textbook Corpus Approach to Constructing a Self-adaptive Subject Word List——Taking the Economics-relevant Majors as an Example |
|
DOI:10.3772/j.issn.2095-915X.2024.03.009 |
中文关键词: 主题词表;凝聚聚类算法;语义共现度;词簇中心词 |
英文关键词: Subject Word List; Cohesion Clustering Algorithm; Semantic Co-occurrence; Central Word of the Word Cluster |
基金项目:国家社会科学基金项目“面向海量科技文献的潜在‘精品’识别方法与应用研究”(20CTQ031);国家社会科学基金一般项目“鄂西北四省市过渡地带方言语法调查与比较研究”(20BYY039);江苏省高校哲学社会科学一般项目“‘一带一路’国家来华留学生学习焦虑情绪对汉语学习的影响及对策研究”(2023SJYB0753)。 |
作者 | 单位 | 杭建琴 | 华中师范大学语言与语言教育研究中心 武汉 430079 | 张鸣宇 | 1. 华中师范大学语言与语言教育研究中心 武汉 430079;2. 武汉大学国际教育学院 武汉 430079 | 胡泽文 | 南京信息工程大学管理工程学院 南京 210044 |
|
摘要点击次数: 640 |
全文下载次数: 0 |
中文摘要: |
[目的/意义]构建一套面向汉语非母语学习者的专业词表对专业学习和国际中文教育学科建设及发展具有重要意义。[方法/过程]针对当前外向型专业词表较少及构建方法单一问题,本文首先从网站爬取小说、新闻和论坛留言构建参照语料库,根据教育部专业课程设置目录,选取专业教材构建专业教材语料库,运用TF-IDF-TF算法遴选专业主题词并构建词共现矩阵,利用凝聚聚类法实现专业主题词聚类。在此基础上,计算词簇内主题词的语义相关性,选取语义共现度最大的词作为词簇中心词,并根据语义相关性编排词表。最后,以经济学类专业为例构建面向留学生的专业主题词表。[结果/结论]结果表明,本文构建的经济类专业主题词表能够较好地提取专业词汇且有效地将语义关联度紧密的专业主题词聚类在同一词簇内,学习者能够快速有效获取相关词簇进行专业自适应学习,并为其他专业主题词表的构建提供了依据。 |
英文摘要: |
[Purpose/Significance] Building a specialized word list for non-native Chinese learners is of great significance for specialized learning and the construction and development of International Chinese Language Education discipline. [Methods/Processes] In response to the current shortage of Chinese specialized word list for foreign learners and the single construction method, this paper first crawls novels, news, and forum comments from websites to construct a reference corpus. Based on the specialized curriculum directory of the Ministry of Education, textbooks are selected to construct a corpus of specialized textbooks. Algorithms are used to select specialized subject words and construct a word co-occurrence matrix. Cohesive clustering is used to achieve subject words clustering. On this basis, calculate the semantic correlation of the subject words within the word cluster, select the word with the highest semantic co-occurrence as the central word of the word cluster, and arrange the word list based on the semantic correlation. Finally, taking economics major as an example, a specialized subject word list for foreign students is constructed. [Results/Conclusions] The results showed that the economic subject word list constructed in this paper can greatly extract the specialized vocabulary, and effectively cluster closely related specialized subject words within the same word cluster. Learners can quickly and effectively obtain relevant word clusters for adaptive learning. What’s more, this method also provides a basis for the construction of other subject word list as well. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |