GithubHelp home page GithubHelp logo

tiskw / scorecam-pytorch Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 761 KB

Unofficial PyTorch implementation of Score-CAM with additional functions

License: MIT License

Python 100.00%
class-activation-map pytorch machine-learning python deep-learning

scorecam-pytorch's Introduction

PyTorch implementation of Score-CAM

This repository provides an unofficial PyTorch implementation of Score-CAM [1]. Score-CAM is a CAM (class activation mapping) [2] based visual explanation method like Grad-CAM [3] and Grad-CAM++ [4], but Score-CAM does not depend on gradients and can provide stable visual explanations. The code of this repository also contains additional functions, like CSKIP. See the following sections for more details about the additional functions.

The features of this implementation are:

  • Versatile: The code of this repository is applicable to many types of neural networks, not only for the models provided by torchvision module but also for custom CNN models.
  • Portable: This repository is easily transplanted to user projects. At this moment what the users need to do is just copy a single Python file to the user's projects.
  • Less dependent: The core module of this repository has fewer dependencies for easier transplantation to user's projects. The current implementation depends only on numpy and torch module.
Top image of Score-CAM

Installation

The core module of this repository, scorecam, requires only NumPy and PyTorch.

pip3 install numpy torch

Additionally, the example code examples.py requires OpenCV, Matplotlib and Torchvision.

pip3 install opencv-python matplotlib torchvision

Usage

Minimal example

import numpy as np
import cv2 as cv
import torchvision

# Import ScoreCAM class.
from scorecam import ScoreCAM

# Load NN model.
model = torchvision.models.resnet18(weights="IMAGENET1K_V1")

# Load input image.
image = cv.imread("resources/sample_image_01.jpg", cv.IMREAD_COLOR)
image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
image = cv.resize(image, (224, 224), interpolation=cv.INTER_CUBIC)

# Normalize the image.
IMAGENET1K_MEAN = np.array([0.485, 0.456, 0.406])
IMAGENET1K_STD  = np.array([0.229, 0.224, 0.225])
x = (image / 255.0 - IMAGENET1K_MEAN) / IMAGENET1K_STD

# Create Score-CAM instance.
scorecam = ScoreCAM(model, actmap="layer4")

# Compute visual explanation.
# The argument 'coi' means 'class of interest' and the number 242
# is the index of the label 'boxer' (breed of dog) in ImageNet.
L = scorecam.compute(x, coi=242)
print(L)

Custom scoring function

If your CNN model is not a classification network, the class of interest does not make sense and you need a custom function for scoring in Score-CAM. In such case, you can specify a Python function to the argument coi.

For example, imagine that your CNN is YOLO and outputs a tensor with shape (B, 5 + C, H, W) where B is a batch size, C is a number of class, and (H, W) is an output resolution. If you want to analyze the detection result of class C = c at H = h and W = w, the custom function can be written like the following:

# Define a custom scoring function.
coi_fn = lambda output: output[:, c, h, w]

L = scorecam.compute(x, coi=coi_fn)

Note that the following code

L = scorecam.compute(x, coi=target_index)

is equevarent with

# Define a scoring function.
coi_fn = lambda output: output[:, target_index]

L = scorecam.compute(x, coi=coi_fn)

where target_index is an integer.

What's the Score-CAM?

Score-CAM is a CAM-based method to compute visual explanations for CNN. The other CAM-based methods depend on the gradient of the output of CNN, but Score-CAM does not. Score-CAM scores each channel of the activation map by the prediction result of a masked image which is defined as the Hadamard product of the input image and a channel of activation map. The following figure is a sketch of Score-CAM procedure.

Sketch of Score-CAM procedure

Additional function 1: CSKIP (channel skipping)

Background

One of the weak points of Score-CAM is inference speed. Normally Score-CAM is much slower than other CAM-based methods like GradCAM or GradCAM++. It is considered a tradeoff of inference stability, but we can accelerate the computational time of Score-CAM by a very simple trick. A cause of the long computation time is that Score-CAM requires many forward inferences to compute the output visual explanation. For example, if the number of channels of the activation map is 512, then forward inference of 512 images is required.

Method

However, we can easily imagine that a only very limited number of channels of the activation maps contribute to the output visual explanation. This means that we can reduce the computational time of Score-CAM by omitting "unnecessary" activation maps from the calculation of the visual explanation. Although there may be many ideas on how to measure the "necessity" of each channel of the activation map, in this repository, we use the maximum value of each channel as a measure of necessity.

So the acceleration procedure is summarized like the following:

  • Get an activation map from the input image,
  • Sort the channel of the activation map by the maximum value of each channel,
  • Keep only top K channels and drop other channels from the activation map,
  • Compute Score-CAM visual explanation using the reduced activation map,

where K is a hyperparameter. We call this method as CSKIP (Channel SKIPping).

How to use

You can easily apply CSKIP by adding cskip=True and sckip_out=K to the argument of the function ScoreCAM.compute, like the following:

L = scorecam.compute(x, coi=242, cskip=True, cskip_out=16)

The hyperparameter K controls the ratio of acceleration. If the channel number of the activation map is 512 and the number of remaining channels K is 16, then you can expect 32 (= 512 / 16) times faster inference.

Numerical results

The following is the experiment results on the author's environment with the following settings:

  • pre-trained model: ResNet18 trained on ImageNet V1
  • The layer to extract the activation map: layer4
  • input image: resources/sample_image_01.jpg
  • Class of interest: 242 (boxer)
  • Number of channels to be kept on CSKIP: 16

As you can see, the visualization result of Score-CAM with CSKIP is almost the same as the visualization without CSKIP, however, the computational time is much faster.

Top image of Score-CAM
Device Vanilla SCKIP Acceleration ratio
CPU 7.621 [sec/image] 0.246 [sec/image] x 30.98
CUDA (GPU) 0.851 [sec/image] 0.028 [sec/image] x 30.32

Note that the experiment environment is:

  • CPU: Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
  • RAM: 32GB
  • GPU: NVIDIA GeForce GTX 1660 Ti

Limitations of CSKIP

As you already guessed, excessive channel reduction of CSKIP results in the degradation of the visualization heatmap. The following left figure shows the channel reduction ratio (horizontal axis) and SSIM (structural similarity) which is commonly used as a metric to measure the difference between two images (vertical axis). The dataset used in this result is 500 images randomly chosen from the validation data of ILSVRC 2012.

The following figure on the right shows the relationship between the reduction ratio and computational time measured on the CPU. As you can see, this relationship can be said to be almost linear.

Generally, images are said to be similar if SSIM is 0.95 or higher. Therefore, it seems to be a good idea to set the reduction ratio as sharp as 0.5, which can cut the calculation time in half.

Limitations of CSKIP

Note that the experiment environment is:

  • CPU: Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
  • RAM: 32GB
  • GPU: NVIDIA GeForce GTX 1660 Ti

References

[1] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu, "Score-CAM: Score-weighted visual explanations for convolutional neural networks", CVPR, 2020. URL

[2] M. Oquab, L. Bottou, I. Laptev, J. Sivic, "Is Object Localization for Free? - Weakly-Supervised Learning With Convolutional Neural Networks", CVPR, 2015. URL

[3] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization", ICCV, 2017. URL

[4] A. Chattopadhyay, A. Sarkar, P. Howlader, and V. Balasubramanian, "Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks", WACV, 2018. URL

scorecam-pytorch's People

Contributors

tiskw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.