文章摘要
Cai Qiang(蔡强),Li Jing,Li Haisheng,Zuo Min.[J].高技术通讯(英文),2020,26(2):211~216
Improved image captioning with subword units training and transformer
  
DOI:doi:10.3772/j.issn.1006-6748.2020.02.011
中文关键词: 
英文关键词: image captioning, transformer, byte pair encoding (BPE), reinforcement learning
基金项目:
Author NameAffiliation
Cai Qiang(蔡强) (School of Computer and Information Engineering, Beijing Techology and Business University, Beijing 100048, P.R.China) (Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing 100048, P.R.China) (National Engineering Laboratory for Agri-Product Quality Traceability, Beijing 100048, P.R.China) 
Li Jing (School of Computer and Information Engineering, Beijing Techology and Business University, Beijing 100048, P.R.China) (Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing 100048, P.R.China) (National Engineering Laboratory for Agri-Product Quality Traceability, Beijing 100048, P.R.China) 
Li Haisheng (School of Computer and Information Engineering, Beijing Techology and Business University, Beijing 100048, P.R.China) (Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing 100048, P.R.China) (National Engineering Laboratory for Agri-Product Quality Traceability, Beijing 100048, P.R.China) 
Zuo Min (School of Computer and Information Engineering, Beijing Techology and Business University, Beijing 100048, P.R.China) (Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing 100048, P.R.China) (National Engineering Laboratory for Agri-Product Quality Traceability, Beijing 100048, P.R.China) 
Hits: 1454
Download times: 1385
中文摘要:
      
英文摘要:
      Image captioning models typically operate with a fixed vocabulary, but captioning is an open-vocabulary problem. Existing work addresses the image captioning of out-of-vocabulary words by labeling it as unknown in a dictionary. In addition, recurrent neural network (RNN) and its variants used in the caption task have become a bottleneck for their generation quality and training time cost. To address these 2 essential problems, a simpler but more effective approach is proposed for generating open-vocabulary caption, long short-term memory (LSTM) unit is replaced with transformer as decoder for better caption quality and less training time. The effectiveness of different word segmentation vocabulary and generation improvement of transformer over LSTM is discussed and it is proved that the improved models achieve state-of-the-art performance for the MSCOCO2014 image captioning tasks over a back-off dictionary baseline model.
View Full Text   View/Add Comment  Download reader
Close

分享按钮