Table of content (full-version) [paper] [github]


Summary

  • Skeleton-based action recognition 논문 (포즈 추정 정보만 이용하여 행동을 이식하는 분야)
  • Spatial-Temporal Graph Convolutional Networks (ST-GCN) 제안
    • 노드: body joints, feature vector (coordinate vectors + estimation confidence)
    • 엣지: spatial(intra-body edges, neighbor node <= 1), temporal(inter-frame edges, temporal kernel size = 9)
    • 그래프 관계: 각 joint마다 인접한 spatial, temporal joint set들을 분할
      • Unilabeling: same label
      • Distance: root node (0), neighbor nodes (1)
      • Spatial configuration: gravity center에서의 거리에 따라서 (0, 1, 2)


[그래프 관계짓는 방법]

picture

  • Implementation
    • ST-GCN
      • Kipf and Welling method [2]
      • For each A, including learnable weight matrix M (initialized as all-one matrix)
    • Architecture
      • Input: (3, 300, 18, 2) tensor, (#coordinates + confidence, #frames, #joints, #person)
      • Batch normalization, 9 ST-GCN (ResNet mechanism + dropout, 64/64/64/pooling/128/128/128/pooling/256/256/256 channels)
      • Output: 256-dim feature vector (applying grobal average pooling), Softmax
    • 8 TITANX GPU
    • Dataset: Kinetics, NTU-RGB+D


[전체 아키텍처]

picture


References

[1] Yan, Sijie, Yuanjun Xiong, and Dahua Lin. “Spatial temporal graph convolutional networks for skeleton-based action recognition.” Thirty-Second AAAI Conference on Artificial Intelligence. 2018.

[2] Kipf, Thomas N., and Max Welling. “Semi-supervised classification with graph convolutional networks.” arXiv preprint arXiv:1609.02907 (2016).