PyTorch implementation of Score-CAM

This repository provides an unofficial PyTorch implementation of Score-CAM [1]. Score-CAM is a CAM (class activation mapping) [2] based visual explanation method like Grad-CAM [3] and Grad-CAM++ [4], but Score-CAM does not depend on gradients and can provide stable visual explanations. The code of this repository also contains additional functions, like CSKIP. See the following sections for more details about the additional functions.

The features of this implementation are:

Versatile: The code of this repository is applicable to many types of neural networks, not only for the models provided by torchvision module but also for custom CNN models.
Portable: This repository is easily transplanted to user projects. At this moment what the users need to do is just copy a single Python file to the user's projects.
Less dependent: The core module of this repository has fewer dependencies for easier transplantation to user's projects. The current implementation depends only on numpy and torch module.

Installation

The core module of this repository, scorecam, requires only NumPy and PyTorch.

pip3 install numpy torch

Additionally, the example code examples.py requires OpenCV, Matplotlib and Torchvision.

pip3 install opencv-python matplotlib torchvision

Usage

Minimal example

import numpy as np
import cv2 as cv
import torchvision

# Import ScoreCAM class.
from scorecam import ScoreCAM

# Load NN model.
model = torchvision.models.resnet18(weights="IMAGENET1K_V1")

# Load input image.
image = cv.imread("resources/sample_image_01.jpg", cv.IMREAD_COLOR)
image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
image = cv.resize(image, (224, 224), interpolation=cv.INTER_CUBIC)

# Normalize the image.
IMAGENET1K_MEAN = np.array([0.485, 0.456, 0.406])
IMAGENET1K_STD  = np.array([0.229, 0.224, 0.225])
x = (image / 255.0 - IMAGENET1K_MEAN) / IMAGENET1K_STD

# Create Score-CAM instance.
scorecam = ScoreCAM(model, actmap="layer4")

# Compute visual explanation.
# The argument 'coi' means 'class of interest' and the number 242
# is the index of the label 'boxer' (breed of dog) in ImageNet.
L = scorecam.compute(x, coi=242)
print(L)

Custom scoring function

If your CNN model is not a classification network, the class of interest does not make sense and you need a custom function for scoring in Score-CAM. In such case, you can specify a Python function to the argument coi.

For example, imagine that your CNN is YOLO and outputs a tensor with shape (B, 5 + C, H, W) where B is a batch size, C is a number of class, and (H, W) is an output resolution. If you want to analyze the detection result of class C = c at H = h and W = w, the custom function can be written like the following:

# Define a custom scoring function.
coi_fn = lambda output: output[:, c, h, w]

L = scorecam.compute(x, coi=coi_fn)

Note that the following code

L = scorecam.compute(x, coi=target_index)

is equevarent with

# Define a scoring function.
coi_fn = lambda output: output[:, target_index]

L = scorecam.compute(x, coi=coi_fn)

where target_index is an integer.

What's the Score-CAM?

Score-CAM is a CAM-based method to compute visual explanations for CNN. The other CAM-based methods depend on the gradient of the output of CNN, but Score-CAM does not. Score-CAM scores each channel of the activation map by the prediction result of a masked image which is defined as the Hadamard product of the input image and a channel of activation map. The following figure is a sketch of Score-CAM procedure.

Additional function 1: CSKIP (channel skipping)

Background

One of the weak points of Score-CAM is inference speed. Normally Score-CAM is much slower than other CAM-based methods like GradCAM or GradCAM++. It is considered a tradeoff of inference stability, but we can accelerate the computational time of Score-CAM by a very simple trick. A cause of the long computation time is that Score-CAM requires many forward inferences to compute the output visual explanation. For example, if the number of channels of the activation map is 512, then forward inference of 512 images is required.

Method

However, we can easily imagine that a only very limited number of channels of the activation maps contribute to the output visual explanation. This means that we can reduce the computational time of Score-CAM by omitting "unnecessary" activation maps from the calculation of the visual explanation. Although there may be many ideas on how to measure the "necessity" of each channel of the activation map, in this repository, we use the maximum value of each channel as a measure of necessity.

So the acceleration procedure is summarized like the following:

Get an activation map from the input image,
Sort the channel of the activation map by the maximum value of each channel,
Keep only top K channels and drop other channels from the activation map,
Compute Score-CAM visual explanation using the reduced activation map,

where K is a hyperparameter. We call this method as CSKIP (Channel SKIPping).

How to use

You can easily apply CSKIP by adding cskip=True and sckip_out=K to the argument of the function ScoreCAM.compute, like the following:

L = scorecam.compute(x, coi=242, cskip=True, cskip_out=16)

The hyperparameter K controls the ratio of acceleration. If the channel number of the activation map is 512 and the number of remaining channels K is 16, then you can expect 32 (= 512 / 16) times faster inference.

Numerical results

The following is the experiment results on the author's environment with the following settings:

pre-trained model: ResNet18 trained on ImageNet V1
The layer to extract the activation map: layer4
input image: resources/sample_image_01.jpg
Class of interest: 242 (boxer)
Number of channels to be kept on CSKIP: 16

As you can see, the visualization result of Score-CAM with CSKIP is almost the same as the visualization without CSKIP, however, the computational time is much faster.

Device	Vanilla	SCKIP	Acceleration ratio
CPU	7.621 [sec/image]	0.246 [sec/image]	x 30.98
CUDA (GPU)	0.851 [sec/image]	0.028 [sec/image]	x 30.32

Note that the experiment environment is:

CPU: Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
RAM: 32GB
GPU: NVIDIA GeForce GTX 1660 Ti

Limitations of CSKIP

As you already guessed, excessive channel reduction of CSKIP results in the degradation of the visualization heatmap. The following left figure shows the channel reduction ratio (horizontal axis) and SSIM (structural similarity) which is commonly used as a metric to measure the difference between two images (vertical axis). The dataset used in this result is 500 images randomly chosen from the validation data of ILSVRC 2012.

The following figure on the right shows the relationship between the reduction ratio and computational time measured on the CPU. As you can see, this relationship can be said to be almost linear.

Generally, images are said to be similar if SSIM is 0.95 or higher. Therefore, it seems to be a good idea to set the reduction ratio as sharp as 0.5, which can cut the calculation time in half.

Note that the experiment environment is:

CPU: Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
RAM: 32GB
GPU: NVIDIA GeForce GTX 1660 Ti

References

[1] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu, "Score-CAM: Score-weighted visual explanations for convolutional neural networks", CVPR, 2020. URL

[2] M. Oquab, L. Bottou, I. Laptev, J. Sivic, "Is Object Localization for Free? - Weakly-Supervised Learning With Convolutional Neural Networks", CVPR, 2015. URL

[3] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization", ICCV, 2017. URL

[4] A. Chattopadhyay, A. Sarkar, P. Howlader, and V. Balasubramanian, "Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks", WACV, 2018. URL

tiskw / scorecam-pytorch Goto Github PK

scorecam-pytorch's Introduction

PyTorch implementation of Score-CAM

Installation

Usage

Minimal example

Custom scoring function

What's the Score-CAM?

Additional function 1: CSKIP (channel skipping)

Background

Method

How to use

Numerical results

Limitations of CSKIP

References

scorecam-pytorch's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs