DING Weijie(丁伟杰)* **,MAO Tingyun***,CHEN Lili***,ZHOU Mingwei***,YUAN Ying* **,HU Wentao* **.[J].高技术通讯(英文),2024,30(4):389~396 |
|
An alert-situation text data augmentation method based on MLM |
|
DOI:10. 3772 / j. issn. 1006-6748. 2024. 04. 006 |
中文关键词: |
英文关键词: deep learning, text data augmentation, masked language model (MLM), alert-situation text classification |
基金项目: |
Author Name | Affiliation | DING Weijie(丁伟杰)* ** | (* Department of Computer and Information Security, Zhejiang Police College, Hangzhou 310053, P. R. China)
(** Key Laboratory of Public Security Information Application Based on Big-Data Architecture, Ministry of Public Security, Hangzhou 310053, P. R. China)
(*** Zhejiang Dahua Technology Co. , Ltd, Hangzhou 310053, P. R. China) | MAO Tingyun*** | | CHEN Lili*** | | ZHOU Mingwei*** | | YUAN Ying* ** | | HU Wentao* ** | |
|
Hits: 56 |
Download times: 105 |
中文摘要: |
|
英文摘要: |
The performance of deep learning models is heavily reliant on the quality and quantity of training data. Insufficient training data will lead to overfitting. However, in the task of alert-situation text classification, it is usually difficult to obtain a large amount of training data. This paper proposes a text data augmentation method based on masked language model (MLM), aiming to enhance the generalization capability of deep learning models by expanding the training data. The method employs a Mask strategy to randomly conceal words in the text, effectively leveraging contextual information to predict and replace masked words based on MLM, thereby generating new training data.Three Mask strategies of character level, word level and N-gram are designed, and the performance of each Mask strategy under different Mask ratios is analyzed and studied. The experimental results show that the performance of the word-level Mask strategy is better than the traditional data augmentation method. |
View Full Text
View/Add Comment Download reader |
Close |
|
|
|