Dual-channel graph convolutional network with multi-order information fusion for skeleton-based action recognition

JIANG Tao(蒋 涛)* **; HU Zhentao*; WANG Kaige*; QIU Qian*; REN Xing*

文章摘要

JIANG Tao(蒋涛)* **,HU Zhentao*,WANG Kaige*,QIU Qian*,REN Xing*.[J].高技术通讯(英文),2025,31(3):257~265

Dual-channel graph convolutional network with multi-order information fusion for skeleton-based action recognition

DOI：10. 3772 / j. issn. 1006-6748. 2025. 03. 005

中文关键词:

英文关键词: human action recognition, graph convolutional network, spatiotemporal fusion,feature extraction

基金项目:

Author Name	Affiliation
JIANG Tao(蒋涛)* **	(* School of Artificial Intelligence, Henan University, Zhengzhou 450000, P. R. China) (** School of Artificial Intelligence, Hezhou University, Hezhou 542800, P. R. China)
HU Zhentao*
WANG Kaige*
QIU Qian*
REN Xing*

Hits: 19

Download times: 21

中文摘要:

英文摘要:

Skeleton-based human action recognition focuses on identifying actions from dynamic skeletal data, which contains both temporal and spatial characteristics. However, this approach faces challenges such as viewpoint variations, low recognition accuracy, and high model complexity. Skeleton-based graph convolutional network (GCN) generally outperform other deep learning methods in recognition accuracy. However, they often underutilize temporal features and suffer from high model complexity, leading to increased training and validation costs, especially on large-scale datasets.This paper proposes a dual-channel graph convolutional network with multi-order information fusion (DM-AGCN) for human action recognition. The network integrates high frame rate skeleton channels to capture action dynamics and low frame rate channels to preserve static semantic information,effectively balancing temporal and spatial features. This dual-channel architecture allows for separate processing of temporal and spatial information. Additionally, DM-AGCN extracts joint keypoints and bidirectional bone vectors from skeleton sequences, and employs a three-stream graph convolutional structure to extract features that describe human movement. Experimental results on the NTU-RGB + D dataset demonstrate that DM-AGCN achieves an accuracy of 89. 4% on the X-Sub and 95. 8% on the X-View, while reducing model complexity to 3. 68 GFLOPs (Giga Floating-point Operations Per Second). On the Kinetics-Skeleton dataset, the model achieves a Top-1 accuracy of 37. 2% and a Top-5 accuracy of 60. 3% , further validating its effectiveness across different benchmarks.

View Full Text View/Add Comment Download reader