A Spatial-temporal Attention Module for 3D Convolution Network in Action Recognition

SHENGWEI ZHOU, LIANG BAI, HAORAN WANG, ZHIHONG DENG, XIAOMING ZHU, CHENG GONG

Abstract


Action recognition is a significant but challenging task in the field of computer vision. 3D convolutional neural network is one of the mainstream methods for action recognition because it can process three-dimensional information effectively. However, at present, the performance of 3D convolutional neural networks is not particularly prominent. The main reason is that the information of the video is mainly contained in the key areas of key frames in the video, yet the 3D convolutional neural network usually cannot extract the most critical information in the video effectively. Therefore, we propose a temporal attention and a spatial attention respectively, and combine them into a module called STAM to let models focus more on the key information. We introduced the STAM module into 3D ResNet, and conducted experiments on the UCF101 and HMDB51 datasets. The results demonstrate that our proposed attention module can improve the performance of 3D convolutional neural networks effectively.

Keywords


Action recognition, 3D convolution, Attention, Neural NetworkText


DOI
10.12783/dtcse/cisnrc2019/33302

Refbacks

  • There are currently no refbacks.