CUI Wencheng(崔文成),SHI Wentao,SHAO Hong.[J].高技术通讯(英文),2025,31(1):73~85 |
|
MKGViLT: visual-and-language transformer based on medical knowledge graph embedding |
|
DOI:10. 3772 / j. issn. 1006-6748. 2025. 01. 008 |
中文关键词: |
英文关键词: knowledge graph(KG), medical vision question answer (MedVQA), vision-andlanguage transformer |
基金项目: |
Author Name | Affiliation | CUI Wencheng(崔文成) | (School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, P. R. China) | SHI Wentao | | SHAO Hong | |
|
Hits: 27 |
Download times: 38 |
中文摘要: |
|
英文摘要: |
Medical visual question answering ( MedVQA) aims to enhance diagnostic confidence and deepen patients’ understanding of their health conditions. While the Transformer architecture is widely used in multimodal fields, its application in MedVQA requires further enhancement. A critical limitation of contemporary MedVQA systems lies in the inability to integrate lifelong knowledge with specific patient data to generate human-like responses. Existing Transformer-based MedVQA models require enhancing their capabitities for interpreting answers through the applications of medical image knowledge. The introduction of the medical knowledge graph visual language transformer (MKGViLT), designed for joint medical knowledge graphs ( KGs), addresses this challenge.MKGViLT incorporates an enhanced Transformer structure to effectively extract features and combine modalities for MedVQA tasks. The MKGViLT model delivers answers based on richer background knowledge, thereby enhancing performance. The efficacy of MKGViLT is evaluated using the SLAKE and P-VQA datasets. Experimental results show that MKGViLT surpasses the most advanced methods on the SLAKE dataset. |
View Full Text
View/Add Comment Download reader |
Close |
|
|
|