[2019 CVPR] Multi-Label Image Recognition with Graph Convolutional Networks
Table of content (full-version) [paper] [github]
Summary
- Multi-label image recognition ๋ถ์ผ (์์๋ง๋ค ๋ค์์ label ์กด์ฌ)
- Object๋ ์๋ก ๋ณต์กํ topology๋ฅผ ๊ฐ์ง๊ณ ์์ด, lable dependency๋ฅผ ๋ชจ๋ธ๋งํ๋ ๊ฒ์ด ์ค์ํ ๋ถ์ผ โ GCN
[Multi-label image recognition ๊ฐ๋ ]
- ์ ์ฒด ํ๋ ์์ํฌ
- Representation learning
- ์ ๋ ฅ: (448 ร 448) ์์
- ๋ชจ๋: ResNet101 ์ ์ํด์ (2048 ร 14 ร 14) feature vector (ImageNet pretrained), GAP ์ ์ฉ
- ์ถ๋ ฅ: 2048-dim feature vector
- Graph convolutional network
- ์ ๋ ฅ: (C ร 300) word embedding features (pretrained, GLoVe [2])
- ๋ชจ๋: GCN 2๊ฐ (1024, 2048 dimension)
- ์์: H2=h(หAH1W1), H3=h(หAH2W2)
- H: learnable transformation network
- A: correlation matrix (^(โ
) normalized)
- Data-driven way: ํ์ต ์ ์ ์๋ label pair๋ฅผ ์ด์ฉ, ํ๋ฅ ๋ก ๋ณํ
- Assymetric: ์์์ ์ฌ๋์ด ์์ ๋ ํ ๋์ค ๋ผ์ผ๊น์ง ํฌํจ๋๋ ๊ฒ์ด, ํ ๋์ค ๋ผ์ผ์์ ๋ ์ฌ๋์ด ํฌํจ๋ ํ๋ฅ ๋ณด๋ค ์ ๋ค.
- Binary correlation matrix: ํฌ๊ทํ label pair๋ ์คํ๋ ค noise๊ฐ ๋ ์ ์๊ธฐ์ ์๊ณ๊ฐ์ ํตํ (0,1) ์ด์ฐํ
- Re-weighted correlation matrix: clustering๋ ๊ฒ์ฒ๋ผ over-smoothing ๋ ์ ์๊ธฐ ๋๋ฌธ์ 0์ ์ผ์ ํ ๊ฐ ๋ถ์ฌ
- h(โ ): non-linear operator (LeakyReLU)
- ์ถ๋ ฅ: (C ร 2048) inter dependent object classifier
- ์ต์ข
- Dot product, predicted score, sigmoid, multi-label classification loss
- Representation learning
[์ ์ฒด ํ๋ ์์ํฌ]
Experimental results
- Dataset
- MS-COCO, VOC2017
- Ablation studies
- Word embedding ์ข ๋ฅ
- ์๊ณ๊ฐ ๋ณํ
- Re-weighted A์ ์ผ์ ํ ๊ฐ ๋ณํ
- GCN์ layer ์
- ์ถ๊ฐ ์คํ
- Vanilla ResNet๊ณผ ML-GCN์ class๋ณ t-SNE๋น๊ต
- Image retrieval ๋ถ์ผ ๊ด์ ์์์ ์คํ
References
[1] Chen, Zhao-Min, et al. โMulti-Label Image Recognition with Graph Convolutional Networks.โ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[2] Pennington, Jeffrey, Richard Socher, and Christopher Manning. โGlove: Global vectors for word representation.โ Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.