Informative Class Activation Maps 내용 정리 [XAI-20]

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

iMTE

Informative Class Activation Maps 내용 정리 [XAI-20] 본문

Deep learning study/Explainable AI, 설명가능한 AI

Informative Class Activation Maps 내용 정리 [XAI-20]

Wonju Seo 2021. 9. 6. 11:19

논문 제목 : Informative Class Activation Maps

논문 주소 : https://arxiv.org/abs/2106.10472

Informative Class Activation Maps

We study how to evaluate the quantitative information content of a region within an image for a particular label. To this end, we bridge class activation maps with information theory. We develop an informative class activation map (infoCAM). Given a classi

arxiv.org

주요 내용 정리:

1) 저번에 리뷰한 Combinational CAM (https://wewinserv.tistory.com/167)과 비슷하게 target class (i.e., highest probability class)로 생성되는 CAM만 사용하는 것이 아닌, 다른 class의 CAM을 사용함으로써, 보다 정확한 bounding box를 만들어 weakly supervised object localization (WSOL) 문제에서 기존 CAM보다 좋은 성능을 보여주었다.

2) 먼저, CAM에 대해서 알아보면, 주어진 $K$ 개의 feature map $g_1,...,g_K$ 에서 fully connected layer의 weight matrix $W\in \mathbb{R}^{M\times K}$ 의 $w_k^y$ 는 $k$ feature map의 class $y$ 에 대한 weight을 의미한다.

$g_k(a,b)$ 는 $k$ 번째 feature map의 각 point $(a,b)$ 의 importance로 해석 할 수 있고, 특정 class $y$ 에 대한 $(a,b)$ 위치에서의 중요도는 다음과 같이 나타낸다.

$M y (a, b) = \sum k w y k g k (a, b) <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>M</mi><mi>y</mi></msub><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo data-mjx-texclass="OP">\sum</mo><mi>k</mi></munder><msubsup><mi>w</mi><mi>k</mi><mi>y</mi></msubsup><msub><mi>g</mi><mi>k</mi></msub><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo></math>$

CAM은 특정 class $y$ 를 분류하기 위한 feature space에서 가장 중요한 부분을 highlight 하는데, 특정 class $y$ 에 해당하는 softmax layer에 입력되는 입력은 다음과 같다.

$\sum a, b M y (a, b) = n (x) y <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><munder><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><mi>a</mi><mo>,</mo><mi>b</mi></mrow></munder><msub><mi>M</mi><mi>y</mi></msub><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>=</mo><mi>n</mi><mo stretchy="false">(</mo><mi>x</mi><msub><mo stretchy="false">)</mo><mi>y</mi></msub></math>$

직관적으로 $w_k^y$ 는 특정 class $y$ 의 $k$ 번째 feature map의 전체적인 중요도를 의미하며, $M_y(a,b)$ 는 input image $x$ 를 class $y$ 를 유도하는 $(a,b)$ 위치에서 feature map의 중요도를 의미한다.

저자는 기존 연구로 부터 input image와 label 쌍에서 input image와 label의 영역으로 확장하여 label로 mutual information이 높은 영역을 캡처하도록 하였다.

Mutual information은 두 variables 사이의 point-wise mutual information의 expectation으로 다음과 같이 나타낸다.

$I (X, Y) = E x, y [P M I (x, y)] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">I</mi></mrow><mo stretchy="false">(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo stretchy="false">)</mo><mo>=</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">E</mi></mrow><mrow data-mjx-texclass="ORD"><mi>x</mi><mo>,</mo><mi>y</mi></mrow></msub><mo stretchy="false">[</mo><mi>P</mi><mi>M</mi><mi>I</mi><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo stretchy="false">)</mo><mo stretchy="false">]</mo></math>$

$P M I (x, y) = n (x) y - log M \sum y' = 1 e x p (n (x) y') + log M <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>P</mi><mi>M</mi><mi>I</mi><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>=</mo><mi>n</mi><mo stretchy="false">(</mo><mi>x</mi><msub><mo stretchy="false">)</mo><mi>y</mi></msub><mo>-</mo><mi>log</mi><mo data-mjx-texclass="NONE"></mo><munderover><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo data-mjx-alternate="1">'</mo></msup><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>e</mi><mi>x</mi><mi>p</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">(</mo><mi>x</mi><msub><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo data-mjx-alternate="1">'</mo></msup></mrow></msub><mo stretchy="false">)</mo><mo>+</mo><mi>log</mi><mo data-mjx-texclass="NONE"></mo><mi>M</mi></math>$

PMI는 $y$ 가 log-sum-exp에서 maximum argument가 될 때, $\log M$ 에 가까워진다. Classificaiton에 가장 의미있는 region을 찾기 위해서 true label 과 다른 labels의 평균 사이의 difference를 계산하였다.

$Diff(PMI(x))=PMI(x,y∗)−1M−1∑y′≠y∗PMI(x,y′)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>D</mi><mi>i</mi><mi>f</mi><mi>f</mi><mo stretchy="false">(</mo><mi>P</mi><mi>M</mi><mi>I</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>=</mo><mi>P</mi><mi>M</mi><mi>I</mi><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><msup><mi>y</mi><mo>∗</mo></msup><mo stretchy="false">)</mo><mo>−</mo><mfrac><mn>1</mn><mrow><mi>M</mi><mo>−</mo><mn>1</mn></mrow></mfrac><munder><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo data-mjx-alternate="1">′</mo></msup><mo>≠</mo><msup><mi>y</mi><mo>∗</mo></msup></mrow></munder><mi>P</mi><mi>M</mi><mi>I</mi><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><msup><mi>y</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">)</mo></math>$

$=∑(a,b)∈gwy∗g(a,b)−1M−1∑y′≠y∗wy′g(a,b)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mo>=</mo><munder><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>∈</mo><mi>g</mi></mrow></munder><msup><mi>w</mi><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo>∗</mo></msup></mrow></msup><mi>g</mi><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>−</mo><mfrac><mn>1</mn><mrow><mi>M</mi><mo>−</mo><mn>1</mn></mrow></mfrac><munder><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo data-mjx-alternate="1">′</mo></msup><mo>≠</mo><msup><mi>y</mi><mo>∗</mo></msup></mrow></munder><msup><mi>w</mi><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo data-mjx-alternate="1">′</mo></msup></mrow></msup><mi>g</mi><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo></math>$

결과적으로 infoCAM은 다음과 같이 정의된다.

$MDiffy(R)=∑(a,b)∈Rwyg(a,b)−1M−1∑y′≠ywy′(a,b)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msubsup><mi>M</mi><mi>y</mi><mrow data-mjx-texclass="ORD"><mi>D</mi><mi>i</mi><mi>f</mi><mi>f</mi></mrow></msubsup><mo stretchy="false">(</mo><mi>R</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>∈</mo><mi>R</mi></mrow></munder><msup><mi>w</mi><mi>y</mi></msup><mi>g</mi><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>−</mo><mfrac><mn>1</mn><mrow><mi>M</mi><mo>−</mo><mn>1</mn></mrow></mfrac><munder><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo data-mjx-alternate="1">′</mo></msup><mo>≠</mo><mi>y</mi></mrow></munder><msup><mi>w</mi><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo data-mjx-alternate="1">′</mo></msup></mrow></msup><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo></math>$

infoCAM은 다른 labels에 대한 분류 경계를 결정하는 영역을 강조 표시합니다. (위 식을 보면, 원하는 class로 생성되는 부분은 집중 하고, 다른 class에서 주목되는 부분은 average한 다음 제거함으로써, background 영역은 suppressing 하는 것으로 보인다.)

다음으로, average 대신에, most-unlike label을 보는 방법으로 infoCAM+은 다음과 같이 정의된다.

$M D i f f + y (R) = \sum (a, b) \in R w y g (a, b) - w y' g (a, b) <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msubsup><mi>M</mi><mi>y</mi><mrow data-mjx-texclass="ORD"><mi>D</mi><mi>i</mi><mi>f</mi><mi>f</mi><mo>+</mo></mrow></msubsup><mo stretchy="false">(</mo><mi>R</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>\in</mo><mi>R</mi></mrow></munder><msup><mi>w</mi><mi>y</mi></msup><mi>g</mi><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>-</mo><msup><mi>w</mi><mrow data-mjx-texclass="ORD"><msup><mi>y</mi><mo data-mjx-alternate="1">'</mo></msup></mrow></msup><mi>g</mi><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo></math>$

$y' = a r g m i n m \sum (a, b) \in R w m g (a, b) <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msup><mi>y</mi><mo data-mjx-alternate="1">'</mo></msup><mo>=</mo><mi>a</mi><mi>r</mi><mi>g</mi><mi>m</mi><mi>i</mi><msub><mi>n</mi><mi>m</mi></msub><munder><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo><mo>\in</mo><mi>R</mi></mrow></munder><msup><mi>w</mi><mi>m</mi></msup><mi>g</mi><mo stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo></math>$

InfoCAM 방법은 밑의 그림에 과정이 나와있다.

(사실 복잡해보이는데, 복잡한게 1도 없다. feature map에서 infoCAM을 뽑아내고 거기서 bounding box를 만든 다음에 upsampling 하여, bounding box를 만들어 WSOL 문제를 해결하는 것이다.)

3) 결과를 보면 먼저, CUB-200-2011 dataset과 Tiny-ImageNet dataset 에서 infoCAM은 가장 좋은 성능을 보여주었다.

위식에서 ADL은 Attention-based Dropout Layer로 image region에서 가장 discriminative한 부분을 random하게 제거함으로써, CNN-based classifier가 entire object를 고려하도록 한다.

Ablation study의 결과, 모든 기능을 사용하는 것이 성능면에서 이득임을 알 수 있다.

시각화 결과를 보면, 다음과 같다. 기존 CAM 방법보다 bounding box를 잘 잡고 있는 것을 알 수 있다.

최근 트렌드인지 모르겠지만, 다른 class에서 생성되는 CAM을 사용해서 좀 더 precisely 한 CAM을 만들어내는 방법들이 제안되고 있는 것으로 보인다. 다른 class를 사용함으로써, background에 대해 suppressing을 하고 이는 결국 target class의 object에 대한 더 나은 localization을 유도하는 것이다.

저작자표시

'Deep learning study > Explainable AI, 설명가능한 AI' 카테고리의 다른 글

Towards Better Explanations of Class Activation Mapping 내용 정리 [XAI-22] (0)	2021.09.30
Towards Learning Spatially Discriminative Feature Representation 내용 정리 [XAI-21] (0)	2021.09.13
Eigen-CAM: Class Activation Map Using Principal Components 내용 정리 [XAI-19] (0)	2021.09.01
Combinational Class Activation Maps for Weakly Supervised Object Localization 내용 정리 [XAI-18] (0)	2021.08.25
How to Manipulate CNNs to Make Them Lie: the GradCAM Case 내용 정리 [XAI-16] (0)	2021.08.18

'Deep learning study/Explainable AI, 설명가능한 AI' Related Articles

Comments

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

iMTE

iMTE

Informative Class Activation Maps 내용 정리 [XAI-20] 본문

Informative Class Activation Maps 내용 정리 [XAI-20]

논문 제목 : Informative Class Activation Maps

논문 주소 : https://arxiv.org/abs/2106.10472

'Deep learning study > Explainable AI, 설명가능한 AI' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역