文章摘要
孟月波,王博,刘光辉.多视角解耦增强整合的细粒度分类算法[J].高技术通讯(中文),2024,34(12):1266~1278
多视角解耦增强整合的细粒度分类算法
Multi-perspective decoupling enhancement and integration for fine-grained classification
  
DOI:10. 3772 / j. issn. 1002-0470. 2024. 12. 003
中文关键词: 细粒度; 多视角注意力(MPA); 递进式动态加权融合(PDWF); 图像分类
英文关键词: fine-grained, multi-perspective attention (MPA), progressive dynamic weighted fusion (PDWF), image classification
基金项目:
作者单位
孟月波 (西安建筑科技大学信息与控制工程学院西安 710055) (西安市建筑制造智能化技术重点实验室西安 710055) 
王博  
刘光辉  
摘要点击次数: 16
全文下载次数: 25
中文摘要:
      针对细粒度图像分类中由于背景环境、光照条件、样本姿态和拍摄角度等外部因素导致类内差异显著增加的问题,本文提出了多视角解耦增强整合的细粒度分类算法。首先,为了降低图像中外部因素的干扰,设计多视角注意力(MPA)模块,此模块通过将模型分解为数个视角,迫使每个视角关注不同尺度,实现干扰因素的解耦,并通过对特征进行自注意力建模,引导各个视角进一步挖掘关键特征。其次,提出递进式动态加权融合(PDWF)策略,旨在有效整合解耦后的多个视角信息,该策略通过获取不同视角下通道和空间关系动态调整融合系数,实现多尺度信息的高阶融合。最后,采用递进式训练方法促进视角交互,进一步捕获和整合多尺度特征的互补语义信息。在CUB-200-2011、Stanford-Cars、FGVC-Aircraft公开数据集上进行实验,实验结果表明所提方法分类准确率分别达到90.5%、95.5%和94.2%,优于当前细粒度图像分类任务主流方法。
英文摘要:
      To address the significant intra-class variation caused by external factors such as background environment, lighting conditions, sample posture, and shooting angle in fine-grained image classification, this paper proposes a fine-grained classification algorithm based on multi-perspective decoupling enhancement integration. Firstly, to reduce the interference of external factors in images, a multi-perspective attention (MPA) module is designed. This module decomposes the model into several perspectives, forcing each perspective to focus on different scales, thus decoupling the interference factors. By modeling features with self-attention, each perspective is guided to further mine key features. Secondly, a progressive dynamic weighted fusion (PDWF) strategy is proposed to effectively integrate the decoupled multi-perspective information. This strategy dynamically adjusts the fusion coefficient by acquiring channel and spatial relationships from different perspectives, achieving high-order fusion of multi-scale information. Lastly, a progressive training method is adopted to facilitate perspective interaction, further capturing and integrating complementary semantic information from multi-scale features. Experiments are conducted on three public datasets, CUB-200-2011, Stanford-Cars, and FGVC-Aircraft, and the results show that the proposed method achieves classification accuracy rates of 90.5%, 95.5%, and 94.2%, respectively, which outperforms current mainstream methods for fine-grained image classification tasks.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮