[2018 AAAI] Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Table of content (full-version) [paper] [github]

Summary
References

Summary

Skeleton-based action recognition 논문 (포즈 추정 정보만 이용하여 행동을 이식하는 분야)
Spatial-Temporal Graph Convolutional Networks (ST-GCN) 제안
- 노드: body joints, feature vector (coordinate vectors + estimation confidence)
- 엣지: spatial(intra-body edges, neighbor node <= 1), temporal(inter-frame edges, temporal kernel size = 9)
- 그래프 관계: 각 joint마다 인접한 spatial, temporal joint set들을 분할
  - Unilabeling: same label
  - Distance: root node (0), neighbor nodes (1)
  - Spatial configuration: gravity center에서의 거리에 따라서 (0, 1, 2)

[그래프 관계짓는 방법]

Implementation
- ST-GCN
  - Kipf and Welling method [2]
  - For each A, including learnable weight matrix M (initialized as all-one matrix)
- Architecture
  - Input: (3, 300, 18, 2) tensor, (#coordinates + confidence, #frames, #joints, #person)
  - Batch normalization, 9 ST-GCN (ResNet mechanism + dropout, 64/64/64/pooling/128/128/128/pooling/256/256/256 channels)
  - Output: 256-dim feature vector (applying grobal average pooling), Softmax
- 8 TITANX GPU
- Dataset: Kinetics, NTU-RGB+D

[전체 아키텍처]

References

[1] Yan, Sijie, Yuanjun Xiong, and Dahua Lin. “Spatial temporal graph convolutional networks for skeleton-based action recognition.” Thirty-Second AAAI Conference on Artificial Intelligence. 2018.

[2] Kipf, Thomas N., and Max Welling. “Semi-supervised classification with graph convolutional networks.” arXiv preprint arXiv:1609.02907 (2016).