资康莉* **,王石*,曹存根*.SOM-NCSCM+:抽取式神经网络中文标题生成方法研究[J].高技术通讯(中文),2023,33(8):836~848
SOM-NCSCM+:research on Chinese headline generation method based on extractive neural network
DOI:10. 3772/ j. issn. 1002-0470. 2023. 08. 006
中文关键词: 中文标题生成; 神经网络模型; 主题模型; 聚类模型; 序列标注
英文关键词: Chinese headline generation, neural network model, topic model, clustering model, sequence labeling
资康莉* ** (*中国科学院计算技术研究所智能信息处理重点实验室北京 100190) (**中国科学院大学北京 100049) 
      As a branch of text summarization task, headline generation can help people obtain information efficiently. In this paper, aiming at the lack of large-scale and high-quality Chinese annotation data in the Chinese headline generation task, taking advantage of the feature that headlines can often be formed from words in the contents, a Chinese headline generation method and model based on extractive deep neural network is proposed. The whole model is enhanced with the clustering model and the topic model, from the perspective of combining unsupervised learning model with supervised sequence labeling model. On the Chinese news data lacking manual annotated classifications, the whole model can automatically mine potential feature information within the data, and obtain different data clusters and the topic words to assist Chinese news headline generation by applying the clustering model and topic model, which makes the whole model more adaptable on the Chinese news data of different topics and uneven annotation quality. The experimental results on a dataset of Chinese news headline generation publicly available on the Internet also show that this whole model achieves better performance on the evaluation metrics, including the micro F1, BLEU, ROUGE and compression ratio than the baseline models.
