| 
       
      | 
         
          | 
			
                | Wu Jin (吴进),An Yiyuan,Dai Wei,Zhao Bo.[J].高技术通讯(英文),2021,27(4):381~387 |  
				|  |  
                | Behavior recognition algorithm based on the improved R3D and LSTM network fusion |  
                |  |  
                | DOI:10.3772/j.issn.1006-6748.2021.04.006 |  
				| 中文关键词: |  
                | 英文关键词: behavior recognition, three-dimensional residual convolutional neural network (R3D), long short-term memory (LSTM), dropout, batch normalization (BN) |  
                | 基金项目: |  
                | | Author Name | Affiliation |  | Wu Jin (吴进) | (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) |  | An Yiyuan | (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) |  | Dai Wei | (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) |  | Zhao Bo | (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) | 
 |  
                | Hits: 2579 |  
                | Download times: 2525 |  
		| 中文摘要: |  
		|  |  
                | 英文摘要: |  
                | Because behavior recognition is based on video frame sequences, this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network (R3D) and long short-term memory (LSTM). First, the residual module is extended to three dimensions, which can extract features in the time and space domain at the same time. Second, by changing the size of the pooling layer window the integrity of the time domain features is preserved, at the same time, in order to overcome the difficulty of network training and over-fitting problems, the batch normalization (BN) layer and the dropout layer are added. After that, because the global average pooling layer (GAP) is affected by the size of the feature map, the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network. Finally, because LSTM has the ability to memorize information and can extract more abstract timing features, the LSTM network is introduced into the R3D network. Experimental results show that the R3D+LSTM network achieves 91% recognition rate on the UCF-101 dataset. |  
                | View Full Text
				
				
				  View/Add Comment  Download reader |  
                | Close |  |  |  |