GithubHelp home page GithubHelp logo

pytorch / captum Goto Github PK

View Code? Open in Web Editor NEW
4.7K 256.0 476.0 284.1 MB

Model interpretability and understanding for PyTorch

Home Page: https://captum.ai

License: BSD 3-Clause "New" or "Revised" License

Python 94.94% Shell 0.56% JavaScript 1.70% Makefile 0.02% Batchfile 0.03% CSS 1.53% HTML 0.01% TypeScript 1.21%
interpretability interpretable-ai interpretable-ml feature-importance feature-attribution

captum's Introduction

PyTorch Logo


PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

Our trunk health (Continuous Integration signals) can be found at hud.pytorch.org.

More About PyTorch

Learn the basics of PyTorch

At a granular level, PyTorch is a library that consists of the following components:

Component Description
torch A Tensor library like NumPy, with strong GPU support
torch.autograd A tape-based automatic differentiation library that supports all differentiable Tensor operations in torch
torch.jit A compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code
torch.nn A neural networks library deeply integrated with autograd designed for maximum flexibility
torch.multiprocessing Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training
torch.utils DataLoader and other utility functions for convenience

Usually, PyTorch is used either as:

  • A replacement for NumPy to use the power of GPUs.
  • A deep learning research platform that provides maximum flexibility and speed.

Elaborating Further:

A GPU-Ready Tensor Library

If you use NumPy, then you have used Tensors (a.k.a. ndarray).

Tensor illustration

PyTorch provides Tensors that can live either on the CPU or the GPU and accelerates the computation by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, mathematical operations, linear algebra, reductions. And they are fast!

Dynamic Neural Networks: Tape-Based Autograd

PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe, and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as torch-autograd, autograd, Chainer, etc.

While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy research.

Dynamic graph

Python First

PyTorch is not a Python binding into a monolithic C++ framework. It is built to be deeply integrated into Python. You can use it naturally like you would use NumPy / SciPy / scikit-learn etc. You can write your new neural network layers in Python itself, using your favorite libraries and use packages such as Cython and Numba. Our goal is to not reinvent the wheel where appropriate.

Imperative Experiences

PyTorch is designed to be intuitive, linear in thought, and easy to use. When you execute a line of code, it gets executed. There isn't an asynchronous view of the world. When you drop into a debugger or receive error messages and stack traces, understanding them is straightforward. The stack trace points to exactly where your code was defined. We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.

Fast and Lean

PyTorch has minimal framework overhead. We integrate acceleration libraries such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed. At the core, its CPU and GPU Tensor and neural network backends are mature and have been tested for years.

Hence, PyTorch is quite fast — whether you run small or large neural networks.

The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives. We've written custom memory allocators for the GPU to make sure that your deep learning models are maximally memory efficient. This enables you to train bigger deep learning models than before.

Extensions Without Pain

Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward and with minimal abstractions.

You can write new neural network layers in Python using the torch API or your favorite NumPy-based libraries such as SciPy.

If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate. No wrapper code needs to be written. You can see a tutorial here and an example here.

Installation

Binaries

Commands to install binaries via Conda or pip wheels are on our website: https://pytorch.org/get-started/locally/

NVIDIA Jetson Platforms

Python wheels for NVIDIA's Jetson Nano, Jetson TX1/TX2, Jetson Xavier NX/AGX, and Jetson AGX Orin are provided here and the L4T container is published here

They require JetPack 4.2 and above, and @dusty-nv and @ptrblck are maintaining them.

From Source

Prerequisites

If you are installing from source, you will need:

  • Python 3.8 or later (for Linux, Python 3.8.1+ is needed)
  • A compiler that fully supports C++17, such as clang or gcc (gcc 9.4.0 or newer is required)

We highly recommend installing an Anaconda environment. You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.

NVIDIA CUDA Support

If you want to compile with CUDA support, select a supported version of CUDA from our support matrix, then install the following:

Note: You could refer to the cuDNN Support Matrix for cuDNN versions with the various supported CUDA, CUDA driver and NVIDIA hardware

If you want to disable CUDA support, export the environment variable USE_CUDA=0. Other potentially useful environment variables may be found in setup.py.

If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to install PyTorch for Jetson Nano are available here

AMD ROCm Support

If you want to compile with ROCm support, install

  • AMD ROCm 4.0 and above installation
  • ROCm is currently supported only for Linux systems.

If you want to disable ROCm support, export the environment variable USE_ROCM=0. Other potentially useful environment variables may be found in setup.py.

Intel GPU Support

If you want to compile with Intel GPU support, follow these

If you want to disable Intel GPU support, export the environment variable USE_XPU=0. Other potentially useful environment variables may be found in setup.py.

Install Dependencies

Common

conda install cmake ninja
# Run this command from the PyTorch directory after cloning the source code using the “Get the PyTorch Source“ section below
pip install -r requirements.txt

On Linux

conda install intel::mkl-static intel::mkl-include
# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda121  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo

# (optional) If using torch.compile with inductor/triton, install the matching version of triton
# Run from the pytorch directory after cloning
# For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
make triton

On MacOS

# Add this package on intel x86 processor machines only
conda install intel::mkl-static intel::mkl-include
# Add these packages if torch.distributed is needed
conda install pkg-config libuv

On Windows

conda install intel::mkl-static intel::mkl-include
# Add these packages if torch.distributed is needed.
# Distributed package support on Windows is a prototype feature and is subject to changes.
conda install -c conda-forge libuv=1.39

Get the PyTorch Source

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

Install PyTorch

On Linux

If you would like to compile PyTorch with new C++ ABI enabled, then first run this command:

export _GLIBCXX_USE_CXX11_ABI=1

If you're compiling for AMD ROCm then first run this command:

# Only run this if you're compiling for ROCm
python tools/amd_build/build_amd.py

Install PyTorch

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py develop

Aside: If you are using Anaconda, you may experience an error caused by the linker:

build/temp.linux-x86_64-3.7/torch/csrc/stub.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1

This is caused by ld from the Conda environment shadowing the system ld. You should use a newer version of Python that fixes this issue. The recommended Python version is 3.8.1+.

On macOS

python3 setup.py develop

On Windows

Choose Correct Visual Studio Version.

PyTorch CI uses Visual C++ BuildTools, which come with Visual Studio Enterprise, Professional, or Community Editions. You can also install the build tools from https://visualstudio.microsoft.com/visual-cpp-build-tools/. The build tools do not come with Visual Studio Code by default.

If you want to build legacy python code, please refer to Building on legacy code and CUDA

CPU-only builds

In this mode PyTorch computations will run on your CPU, not your GPU

conda activate
python setup.py develop

Note on OpenMP: The desired OpenMP implementation is Intel OpenMP (iomp). In order to link against iomp, you'll need to manually download the library and set up the building environment by tweaking CMAKE_INCLUDE_PATH and LIB. The instruction here is an example for setting up both MKL and Intel OpenMP. Without these configurations for CMake, Microsoft Visual C OpenMP runtime (vcomp) will be used.

CUDA based build

In this mode PyTorch computations will leverage your GPU via CUDA for faster number crunching

NVTX is needed to build Pytorch with CUDA. NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto an already installed CUDA run CUDA installation once again and check the corresponding checkbox. Make sure that CUDA with Nsight Compute is installed after Visual Studio.

Currently, VS 2017 / 2019, and Ninja are supported as the generator of CMake. If ninja.exe is detected in PATH, then Ninja will be used as the default generator, otherwise, it will use VS 2017 / 2019.
If Ninja is selected as the generator, the latest MSVC will get selected as the underlying toolchain.

Additional libraries such as Magma, oneDNN, a.k.a. MKLDNN or DNNL, and Sccache are often needed. Please refer to the installation-helper to install them.

You can refer to the build_pytorch.bat script for some other environment variables configurations

cmd

:: Set the environment variables after you have downloaded and unzipped the mkl package,
:: else CMake would throw an error as `Could NOT find OpenMP`.
set CMAKE_INCLUDE_PATH={Your directory}\mkl\include
set LIB={Your directory}\mkl\lib;%LIB%

:: Read the content in the previous section carefully before you proceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2019 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
set CMAKE_GENERATOR_TOOLSET_VERSION=14.27
set DISTUTILS_USE_SDK=1
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,17^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%

:: [Optional] If you want to override the CUDA host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.exe

python setup.py develop
Adjust Build Options (Optional)

You can adjust the configuration of cmake variables optionally (without building first), by doing the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done with such a step.

On Linux

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build --cmake-only
ccmake build  # or cmake-gui build

On macOS

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only
ccmake build  # or cmake-gui build

Docker Image

Using pre-built images

You can also pull a pre-built docker image from Docker Hub and run with docker v19.03+

docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.

Building the image yourself

NOTE: Must be built with a docker version > 18.06

The Dockerfile is supplied to build images with CUDA 11.1 support and cuDNN v8. You can pass PYTHON_VERSION=x.y make variable to specify which Python version is to be used by Miniconda, or leave it unset to use the default.

make -f docker.Makefile
# images are tagged as docker.io/${your_docker_username}/pytorch

You can also pass the CMAKE_VARS="..." environment variable to specify additional CMake variables to be passed to CMake during the build. See setup.py for the list of available variables.

make -f docker.Makefile

Building the Documentation

To build documentation in various formats, you will need Sphinx and the readthedocs theme.

cd docs/
pip install -r requirements.txt

You can then build the documentation by running make <format> from the docs/ folder. Run make to get a list of all available output formats.

If you get a katex error run npm install katex. If it persists, try npm install -g katex

Note: if you installed nodejs with a different package manager (e.g., conda) then npm will probably install a version of katex that is not compatible with your version of nodejs and doc builds will fail. A combination of versions that is known to work is [email protected] and [email protected]. To install the latter with npm you can run npm install -g [email protected]

Previous Versions

Installation instructions and binaries for previous PyTorch versions may be found on our website.

Getting Started

Three-pointers to get you started:

Resources

Communication

Releases and Contributing

Typically, PyTorch has three minor releases a year. Please let us know if you encounter a bug by filing an issue.

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

To learn more about making a contribution to Pytorch, please see our Contribution page. For more information about PyTorch releases, see Release page.

The Team

PyTorch is a community-driven project with several skillful engineers and researchers contributing to it.

PyTorch is currently maintained by Soumith Chintala, Gregory Chanan, Dmytro Dzhulgakov, Edward Yang, and Nikita Shulga with major contributions coming from hundreds of talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Koepf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

Note: This project is unrelated to hughperkins/pytorch with the same name. Hugh is a valuable contributor to the Torch community and has helped with many things Torch and PyTorch.

License

PyTorch has a BSD-style license, as found in the LICENSE file.

captum's People

Contributors

99warriors avatar agaction avatar amyreese avatar aobo-y avatar bilalsal avatar caraya10 avatar cicichen01 avatar crawlingcub avatar cyrjano avatar diegoolano avatar dkrako avatar edward-io avatar gabrieltseng avatar j0nreynolds avatar jessijzhao avatar miguelmartin75 avatar mruberry avatar nanohanno avatar narinek avatar orionr avatar pingjunchen avatar progamergov avatar reubend avatar shubhammuttepawar avatar shuwenw avatar stanislavglebik avatar thatch avatar vivekmig avatar yucu avatar zpao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

captum's Issues

Dealing with .view in DeepLift

Hi, I am an undergrad student looking to apply Captum's implementation of DeepLift for a Graph Convolution Network

Below is a snippet of the code in the forward function that is causing problems:

to_conv1d = batch_sortpooling_graphs.view((-1, 1, self.k * self.total_latent_dim))
    conv1d_res = self.conv1d_params1(to_conv1d)
    conv1d_res = self.conv1d_activation(conv1d_res)
    conv1d_res = self.maxpool1d(conv1d_res)
    conv1d_res = self.conv1d_params2(conv1d_res)
    conv1d_res = self.conv1d_activation(conv1d_res)

    to_dense = conv1d_res.view(len(graph_sizes), -1)

    if self.output_dim > 0:
        out_linear = self.out_params(to_dense)
        reluact_fp = self.conv1d_activation(out_linear)
    else:
        reluact_fp = to_dense
    return self.conv1d_activation(reluact_fp)

As you can see, my code requires several reshapes of the tensors as it moves from the input to the 1d convolution layer and finally to the dense layer. Running as is gives me the following error:

Traceback (most recent call last):
  File "main.py", line 625, in <module>
    attribution = dl.attribute(input, additional_forward_args=[15], target=1)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 202, in attribute
    additional_forward_args=additional_forward_args,
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_utils/gradient.py", line 92, in compute_gradients
    grads = torch.autograd.grad(torch.unbind(output), inputs)
  File "/home/user/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 157, in grad
    inputs, allow_unused)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 284, in _backward_hook
    inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 284, in <genexpr>
    inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
RuntimeError: The size of tensor a (160) must match the size of tensor b (19) at non-singleton dimension 2

The shapes of each tensors are as follows:

batch_sortpooling_graphs: torch.Size([1, 19, 97])
conv1d_res (immediately after line 1): torch.Size([1, 1, 1843])
to_dense: torch.Size([1, 160])

May I ask if anyone has any idea how to circumvent this such that the DeepLift can work with tensor reshapes? Thank you!

Computing contributions w.r.t. logits rather than final activations

Often, in practice, we wish to compute the contributions w.r.t. the logits of the final sigmoid/softmax, rather than w.r.t. the final network output itself. This is to avoid artifacts that can be caused by the saturating nature of the sigmoid/softmax, and comes into play when comparing attributions between examples. It is particularly relevant if gradient*input is used as an attribution method, because for examples with very confident predictions, the sigmoid/softmax outputs tend to saturate and the gradients will approach zero. I'm wondering if it may be worth mentioning this in the documentation - in the current "getting started", the toy model has a sigmoid output:

Screenshot 2019-10-08 at 2 19 52 PM

I'm concerned that a naive user may try to compare the magnitudes of attributions across different examples without realizing that, for sigmoid/softmax outputs, it may be worth removing the final nonlinearity before doing such a comparison. We discuss this in Section 3.6 of the deeplift paper. Ideally there would be an option in Captum to ignore the final nonlinearity, but I realize it may not be trivial to add that option. Sorry if this is already addressed and I missed it.

DeepLIFT fails when reusing MaxPool2d layer

Using Captum v0.1, so I'm not sure whether this happens with current master.

Something I have noticed when trying out DeepLIFT with CNNs is that reusing MaxPool2d layers instead of explicitly defining one per usage results in RuntimeErrors. Maybe this is related to #199

For example, consider the CIFAR10 tutorial.
If we were to change the network structure to just reuse the self.pool1 as follows:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        # self.pool2 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.relu1 = nn.ReLU()
        self.relu2 = nn.ReLU()
        self.relu3 = nn.ReLU()
        self.relu4 = nn.ReLU()

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.pool1(self.relu2(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = self.relu3(self.fc1(x))
        x = self.relu4(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

Training works just fine, but attributing with DeepLIFT should fail due to size mismatch, such as (unfortunately I can't download the dataset right now, using a local version):

~\envs\lib\site-packages\captum\attr\_core\deep_lift.py in <genexpr>(.0)
    282          """
    283         delta_in = tuple(
--> 284             inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
    285         )
    286         delta_out = tuple(

RuntimeError: The size of tensor a (10) must match the size of tensor b (28) at non-singleton dimension 3

Is this a bug or necessary convention? Note that reusing pooling layers actually occurs in official PyTorch tutorials.

captum insights port?

I am running the example application and wanted to ask if it's possible to set a particular port for the app?

Thanks

"RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor "

Hi, i am trying to interpret my intent classification model by using your "IMDB tutorial" and im facing the following error "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor
". This error raises during the forward pass of an RNN (lstm) which takes as input a pack sequence (pack_padded_sequence library).

GradientShap's `attribute` method `baselines` argument should be None

class GradientShap(GradientAttribution):
def __init__(self, forward_func):
r"""
Args:
forward_func (function): The forward function of the model or
any modification of it
"""
GradientAttribution.__init__(self, forward_func)
def attribute(
self,
inputs,
baselines,
n_samples=5,
stdevs=0.0,
target=None,
additional_forward_args=None,
return_convergence_delta=False,
):

According to the docs, the baselines parameter in the attribute method of GradientShap is optional, and is replaced with a zero-filled Tensor as the same size as the input if not provided. However at the moment it's a required argument.

Computing LayerConductance in IMDB sentiment analysis Tutorial

I am trying to compute layer conductance in the IMDB tutorial, and I keep getting a scalar issue. Any guidance on how I should pass the input (test_input_tensor) to get the attributions.

cond = LayerConductance(model, model.convs)
cond_vals = cond.attribute(test_input_tensor,target=1)

Thank you!

Models failing with error - Module has no input attribute

I am working with a number of models from the torchreid library. When I use DeepLift on these models, some work and some do not. For example, the DenseNet, MLFN, and MuDeep models work fine, but the OSNet, ResNetMid, and ResNet-50 (and some others) model do not. (N.B. I modified the models to not use inplace=True for nn.ReLU().)

These models that fail usually fail with an error along the lines of 'Sigmoid' object has no attribute 'input' (though it also fails for the same reason if ReLU is used), however I can't see what I need to change in this model in order for it to work with DeepLift.

What is different about these models that cause this error? I understand the error message, but I don't understand why the the module doesn't have an input attribute.

RuntimeError: expected device cpu but got device cuda:0 when training and visualizing model on IMDB

I was trying to reproduce the Interpreting text models: IMDB Sentiment Analysis but training my model instead of just loading a pretrained one.

I adapted the code of the original CNN tutorial but when I get to the point of calling interpret_sentence the following error occurs:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-68d49a3d040b> in <module>()
----> 1 interpret_sentence(model, 'It was a fantastic performance !', label=1)
      2 interpret_sentence(model, 'Best film ever', label=1)
      3 interpret_sentence(model, 'Such a great show!', label=1)
      4 interpret_sentence(model, 'It was a horrible movie', label=0)
      5 interpret_sentence(model, 'I\'ve never watched something as bad', label=0)

2 frames
<ipython-input-22-cbf5d478566f> in interpret_sentence(model, sentence, min_len, label)
     29     # compute attributions and approximation delta using integrated gradients
     30     attributions_ig, delta = ig.attribute(
---> 31         input_embedding, reference_embedding, n_steps=500, return_convergence_delta=True
     32     )
     33 

/usr/local/lib/python3.6/dist-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
    232                 end_point,
    233                 additional_forward_args=additional_forward_args,
--> 234                 target=target,
    235             )
    236             return _format_attributions(is_inputs_tuple, attributions), delta

/usr/local/lib/python3.6/dist-packages/captum/attr/_utils/attribution.py in compute_convergence_delta(self, attributions, start_point, end_point, target, additional_forward_args)
    232         row_sums = [_sum_rows(attribution) for attribution in attributions]
    233         attr_sum = torch.tensor([sum(row_sum) for row_sum in zip(*row_sums)])
--> 234         return attr_sum - (end_point - start_point)
    235 
    236 

RuntimeError: expected device cpu but got device cuda:0

I am not sure, but I suppose the problem is that torch.tensor being created without any device argument. Can I work around this issue?

In this Colab Notebook you can reproduce the error.

Returning only the gradients/"multipliers"

Hi all,

Just wanted to put this particular use-case on your radar. Sometimes we find that it is useful to get access to just the gradients ("multipliers"), before they are multiplied by the difference-from-reference to get the final attribution. Specifically, we use the multipliers to estimate how the network might have responded had it seen slightly different inputs. We refer to these estimates as "hypothetical contribution scores". If you are curious how these hypothetical contributions look, here's a notebook (on a fork of the DeepSHAP repository) where I compute hypothetical contributions in the context of genomic data: https://github.com/AvantiShri/shap/blob/0b0350ba3a42af275f6e99ca2e3c5877d7d94f8a/notebooks/deep_explainer/PyTorch%20Deep%20Explainer%20DeepSEA%20example.ipynb

You've all done an awesome job with this repository, and I will definitely point it to the pytorch users in my lab once the release is formally announced. I totally understand if the ability to return just the multipliers is not something that you are likely to incorporate in the main release; I'm sure we can easily fork the repository and add that feature in for our lab's purposes.

Thanks again!
Av

Internal Server Error

I am running 'captum' on OS X 10.11.6 (also Ubuntu 16.04LTS).
The example 'python -m captum.insights.example' gets and Internal Server Error when I try
to connect to http://localhost:51283/ with Safari.

Any ideas?

============================= test session starts ==============================
platform darwin -- Python 3.6.7, pytest-5.0.1, py-1.8.0, pluggy-0.13.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/davidlaxer/captum/.hypothesis/examples')
rootdir: /Users/davidlaxer/captum
plugins: hypothesis-3.88.3
collected 212 items                                                            

tests/attr/test_approximation_methods.py ....                            [  1%]
tests/attr/test_common.py ........                                       [  5%]
tests/attr/test_data_parallel.py ssssssssssssssss                        [ 13%]
tests/attr/test_deeplift_basic.py ......                                 [ 16%]
tests/attr/test_deeplift_classification.py .....F..                      [ 19%]
tests/attr/test_gradient.py ........                                     [ 23%]
tests/attr/test_gradient_shap.py ...                                     [ 25%]
tests/attr/test_input_x_gradient.py .........                            [ 29%]
tests/attr/test_integrated_gradients_basic.py ........................   [ 40%]
tests/attr/test_integrated_gradients_classification.py ........          [ 44%]
tests/attr/test_internal_influence.py ..........                         [ 49%]
tests/attr/test_layer_activation.py ......                               [ 51%]
tests/attr/test_layer_conductance.py .............                       [ 58%]
tests/attr/test_layer_gradient_x_activation.py ......                    [ 60%]
tests/attr/test_neuron_conductance.py .........                          [ 65%]
tests/attr/test_neuron_gradient.py ........                              [ 68%]
tests/attr/test_neuron_integrated_gradients.py ........                  [ 72%]
tests/attr/test_saliency.py .........                                    [ 76%]
tests/attr/test_targets.py ...................................           [ 93%]
tests/attr/test_utils_batching.py .........                              [ 97%]
tests/attr/models/test_base.py .                                         [ 98%]
tests/attr/models/test_pytext.py ss                                      [ 99%]
tests/insights/test_contribution.py ..                                   [100%]

=================================== FAILURES ===================================
_____________ Test.test_softmax_classification_batch_zero_baseline _____________

self = <tests.attr.test_deeplift_classification.Test testMethod=test_softmax_classification_batch_zero_baseline>

    def test_softmax_classification_batch_zero_baseline(self):
        num_in = 40
        input = torch.arange(0.0, num_in * 3.0, requires_grad=True).reshape(3, num_in)
        baselines = 0 * input
    
        model = SoftmaxDeepLiftModel(num_in, 20, 10)
        dl = DeepLift(model)
    
>       self.softmax_classification(model, dl, input, baselines)

tests/attr/test_deeplift_classification.py:54: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/attr/test_deeplift_classification.py:117: in softmax_classification
    self._assert_attributions(model, attributions, input, baselines, delta, target2)
tests/attr/test_deeplift_classification.py:129: in _assert_attributions
    "some samples".format(delta),
E   AssertionError: False is not true : The sum of attribution values tensor([0.0008, 0.0023, 0.0039]) is not nearly equal to the difference between the endpoint for some samples
=============================== warnings summary ===============================
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/lib/pretty.py:91
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/lib/pretty.py:91: DeprecationWarning: IPython.utils.signatures backport for Python 2 is deprecated in IPython 6, which only supports Python 3
    from IPython.utils.signatures import signature

/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/utils/module_paths.py:28
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/utils/module_paths.py:28: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_batch
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_batch_4D_input
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_multi_ref
tests/attr/test_deeplift_basic.py::Test::test_relu_linear_deeplift
tests/attr/test_deeplift_basic.py::Test::test_tanh_deeplift
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool1d
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool2d
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool3d
tests/attr/test_deeplift_classification.py::Test::test_sigmoid_classification
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_multi_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_zero_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_multi_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_zero_baseline
tests/attr/test_targets.py::Test::test_multi_target_deep_lift
tests/attr/test_targets.py::Test::test_multi_target_deep_lift_shap
tests/attr/test_targets.py::Test::test_simple_target_deep_lift
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap_single_tensor
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap_tensor
  /Users/davidlaxer/captum/captum/attr/_core/deep_lift.py:327: UserWarning: Setting forward, backward hooks and attributes on non-linear
                 activations. The hooks and attributes will be removed
              after the attribution is finished
    after the attribution is finished"""

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
tests/attr/test_layer_conductance.py::Test::test_matching_conv_with_baseline_conductance
tests/attr/test_layer_conductance.py::Test::test_matching_pool1_conductance
tests/attr/test_layer_conductance.py::Test::test_matching_pool2_conductance
tests/attr/test_neuron_gradient.py::Test::test_matching_intermediate_gradient
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_input_relu2
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
tests/attr/test_targets.py::Test::test_multi_target_deep_lift
tests/attr/test_targets.py::Test::test_multi_target_input_x_gradient
tests/attr/test_targets.py::Test::test_multi_target_saliency
tests/attr/test_targets.py::Test::test_simple_target_deep_lift
tests/attr/test_targets.py::Test::test_simple_target_input_x_gradient
tests/attr/test_targets.py::Test::test_simple_target_saliency
tests/attr/test_targets.py::Test::test_simple_target_saliency_tensor
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 0 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:34: UserWarning: Input Tensor 1 had a non-zero gradient tensor, which is being reset to 0.
    "which is being reset to 0." % index

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 2 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 1 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/models/test_base.py::Test::test_interpretable_embedding_base
  /Users/davidlaxer/captum/captum/attr/_models/base.py:168: UserWarning: In order to make embedding layers more interpretable they will
          be replaced with an interpretable embedding layer which wraps the
          original embedding layer and takes word embedding vectors as inputs of
          the forward function. This allows to generate baselines for word
          embeddings and compute attributions for each embedding dimension.
          The original embedding layer must be set
          back by calling `remove_interpretable_embedding_layer` function
          after model interpretation is finished.
    after model interpretation is finished."""

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/colors.py:101: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    ret = np.asscalar(ex)

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/image.py:424: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    a_min = np.asscalar(a_min.astype(scaled_dtype))

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/image.py:425: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    a_max = np.asscalar(a_max.astype(scaled_dtype))

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=========================== short test summary info ============================
SKIPPED [1] tests/attr/test_data_parallel.py:116: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:187: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:254: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:38: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:68: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:98: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:137: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:168: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:219: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:24: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:56: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:84: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:123: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:154: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:200: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:235: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/models/test_pytext.py:81: Skip the test since PyText is not installed
SKIPPED [1] tests/attr/models/test_pytext.py:68: Skip the test since PyText is not installed
FAILED tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_zero_baseline
======= 1 failed, 193 passed, 18 skipped, 60 warnings in 1188.87 seconds =======

$ python -m captum.insights.example

Fetch data and view Captum Insights at http://localhost:51283/

<IPython.lib.display.IFrame object at 0x1211f1c18>

Screen Shot 2019-10-18 at 9 08 09 AM

Could I use captum for object localisation?

Hello,
Can I use this library for object localisation tasks? Would you think you could prepare some very easy tutorial for this? I bet that this would be very helpful for many people since labelling images with bounding boxes or polygons is really time consuming as you know.

On custom architecture

@orionr @zpao @asmeurer @asuhan @kostmo great work by the team , this is what i was looking for , i have few queries .

  1. can captum used be used for architectures like object detection and semantic segmentation
  2. would i be able to see the intermediate learnings during training

Not able to Load Vectors

Hey Guys,

While trying to run this tutorial , I am facing issues in loading the Glove vector. After loading the vector, it is showing me size of vocabulary equal to 2, but ideally it is should be more thn 10000. Can anyone help me out in this ?

image

My pytorch version is 1.3.1
Torchtext version is 0.5.1

Help me in this. Thanks !

#fix_error

Captum for regression problem

Hi all,

I am wondering if there are examples that I could learn to use Captum for regression problem as well as using volume data. My problem setting is feeding volume data with WxHxD (64x64x64) to a 3D convnet which has only one neuron in the top layer that output a real number. Thanks.

ImportError: cannot import name 'LayerIntegratedGradients'

`---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in
13
14 from captum.attr import visualization as viz
---> 15 from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
16 from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

ImportError: cannot import name 'LayerIntegratedGradients'`

how to get captum insights working

it gives error after visualizer.render()
Screenshot (205)

and how do I get this saved image?

# show a screenshot if using notebook non-interactively
from IPython.display import Image
Image(filename='img/captum_insights.png')

Captum Insights not working in Google Colab

When trying to run the Getting started with Captum Insights tutorial in a Google Colab notebook, I stumbled upon the following issue: When calling visualizer.render(debug=False), the result looks like in the screenshot below.

Screenshot 2019-10-11 at 21 05 48

The reason for this behavior is that Captum's render() method does not redirect requests as e.g. shown in TensorBoard's _display_colab() method. While the current implementation works fine with regular IPython notebooks, Colab requires some additional tweaks as described in the TensorBoard code.

Do you have any plans to support Colab or is this even a priority? If no one is already working on this, I could make a PR adding some code similar to TensorBoard as a proof of concept.

Building failure for captum wheel package

When I built a python wheel package for captum with the following command:

BUILD_INSIGHTS=1 python setup.py bdist_wheel --python-tag py3

I got an error message:

error: can't copy 'captum/insights/frontend/widget/static/extension.js': doesn't exist or not a regular file

I found some errors on the setup.py file, where the paths for extension.js, index.js and index.js.map were not correct.

One solution is the following:

diff --git a/setup.py b/setup.py
index 87f5068..ee0a379 100755
--- a/setup.py
+++ b/setup.py
@@ -150,9 +150,9 @@ if __name__ == "__main__":
             (
                 "share/jupyter/nbextensions/jupyter-captum-insights",
                 [
-                    "captum/insights/frontend/widget/static/extension.js",
-                    "captum/insights/frontend/widget/static/index.js",
-                    "captum/insights/frontend/widget/static/index.js.map",
+                    "captum/insights/widget/static/extension.js",
+                    "captum/insights/widget/static/index.js",
+                    "captum/insights/widget/static/index.js.map",
                 ],
             ),
             (

Captum for Bert Sentence Classification

Hi there,

I tried to apply Captum Tutorial for Q&A to Bert Sentence Classification task, but I am facing difficulties to adapt baselines / references part of the code for Classification and the new HugginFace Tokenizer.

Just want to check if someone is working in the same topic, so we can share experiences.

how would it be possible to calculate the Integrated gradients for model which has embedding

Hi everyone,

I am applying the integrated gradient method on my dataset which has categorical and numerical data, in which I convert categorical data into embedding and concatenate with numerical. But the output of integrated gradients for all the categorical values are zero and for the numerical ones is calculated correctly.
I have tried to do it with LayerIntegratedGradients but as far as I do not have the developer version of captum installed it failed.
any suggestion?

Issue with resnet18 model

I tried to use any of the saliency methods and I get this error:
AttributeError: 'AvgPool2d' object has no attribute 'divisor_override'

Do not understand why that happens?

Toy Example breaks with CUDA on compute_convergence_delta for Integrated Gradients

For the toy example with cuda

model = ToyModel()
model = model.cuda()
model.eval()

input = torch.rand(2, 3).cuda()
baseline = torch.zeros(2, 3).cuda()

ig = IntegratedGradients(model)
attributions, delta = ig.attribute(input, baseline, target=0, return_convergence_delta=True)

fails with the error

~/anaconda3/envs/heterokaryon/lib/python3.7/site-packages/captum/attr/_utils/attribution.py in compute_convergence_delta(self, attributions, start_point, end_point, target, additional_forward_args)
    232         row_sums = [_sum_rows(attribution) for attribution in attributions]
    233         attr_sum = torch.tensor([sum(row_sum) for row_sum in zip(*row_sums)])
--> 234         return attr_sum - (end_point - start_point)
    235 
    236 

RuntimeError: expected device cpu and dtype Float but got device cuda:0 and dtype Float

presumably since attr_sum is not on GPU. Turning return_convergence_delta to False results in no error.

Similar issues may arise in other places, though I haven't checked.

What is the desired output for _select_targets in common.py?

Hi again,

I have a question to ask about the _select_targets function, specifically when used for the DeepLift implementation. I figured out that the output passed into this function is based on the output from the last layer of the architecture. For my architecture, my last layer is a log_softmax. Sorry if it is a silly question but should i return the predicted class (only 2 classes), the loss value or the class probability of the target class as output?

Attached below is the code snippet for _select_targets for your reference.

def _select_targets(output, target):
    output = output[0]
    num_examples = output.shape[0]
    dims = len(output.shape)

    if target is None:
        return output
    elif isinstance(target, int) or isinstance(target, tuple):
        return _verify_select_column(output, target)
    elif isinstance(target, torch.Tensor):
        if torch.numel(target) == 1 and isinstance(target.item(), int):
            return _verify_select_column(output, target.item())
        elif len(target.shape) == 1 and torch.numel(target) == num_examples:
            assert dims == 2, "Output must be 2D to select tensor of targets."
            return torch.gather(output, 1, target.reshape(len(output), 1))
        else:
            raise AssertionError(
                "Tensor target dimension %r is not valid." % (target.shape,)
            )
    elif isinstance(target, list):
        assert len(target) == num_examples, "Target list length does not match output!"
        if type(target[0]) is int:
            assert dims == 2, "Output must be 2D to select tensor of targets."
            return torch.gather(output, 1, torch.tensor(target).reshape(len(output), 1))
        elif type(target[0]) is tuple:
            return torch.stack(
                [output[(i,) + targ_elem] for i, targ_elem in enumerate(target)]
            )
        else:
            raise AssertionError("Target element type in list is not valid.")
    else:
        raise AssertionError("Target type %r is not valid." % target)

is this a typo?

in readme,

Next we will use IntegratedGradients algorithms to assign attribution scores to each input feature with respect to the second target output.

and then target=0, is set, should it be first target output?

Captum Insights not working in SageMaker

When I try to run Captum Insights from a SageMaker notebook terminal on port 6006 by browsing to <sagemaker_notebook_address>/proxy/6006/, the tab name shows "Captum Insights", but the web page is blank. The same method works fine on my local system, or fine with tensorboard/flask apps through SageMaker. It seems to be a problem with Captum+SageMaker specifically.

Screenshot 2019-10-24 05 16 27

Alternatively, when attempting to run tutorials/CIFAR_TorchVision_Captum_Insights.ipynb I get this error from within a notebook:

Screenshot 2019-10-24 05 26 03

(I get the same error with visualizer.render(), just with less details)


Details:

I upgraded my SageMaker pytorch_p36 conda environment to torch==1.3.0. I installed captum from source with git clone https://github.com/pytorch/captum.git and then installed Insights with:

conda install -c conda-forge yarn
BUILD_INSIGHTS=1 python setup.py develop

Then ran the example with python captum/insights/example.py

And tried to access via <sagemaker_notebook_address>/proxy/6006/ (the same way I access a running tensorboard server)

I also tried it with/without modifying line 66 in insights/server.py from tcp.bind(("", 0)) to tcp.bind(("", 6006)) in order to use port 6006 (since this port seemed to work fine for running a tensorboard server).

Integrated gradients using with pack_padded_sequence returns error

Hi all,

I am using integrated gradient (IG) package from Captum package, which I apply one LSTM on varying length sequences and then I try to get IG from the trained model using the following line of code:

attr, delta = ig.attribute((data, seq_lengths), target=1, return_convergence_delta=True)

but I am getting the following error:

RuntimeError: lengths array must be sorted in decreasing order when enforce_sorted is True. You can pass enforce_sorted=False to pack_padded_sequence and/or pack_sequence to sidestep this requirement if you do not need ONNX exportability.

however, I have sorted the lengths of the array in each batch in decreasing order.
please note that If I use this IG without using pack_padded_sequence it works perfectly.

regarding the previous error, I set enforce_sorted=False in pack_padded_sequence but I am getting another error:

RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0

Here is the length of all the samples which none of them are less than zero:

tensor([23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20,
14, 10])

any help would be much appreciated.

Import error for Occlusion

Getting import error for Occlusion on running the tutorial Interpreting vision for Resnet.

Error details
ImportError Traceback (most recent call last)
in
15 from captum.attr import IntegratedGradients
16 from captum.attr import GradientShap
---> 17 from captum.attr import Occlusion
18 from captum.attr import NoiseTunnel
19 from captum.attr import visualization as viz
ImportError: cannot import name 'Occlusion' from 'captum.attr' (/home/ubuntu/opt/anaconda3/envs/pytorch/lib/python3.7/site-packages/captum/attr/init.py)

Cannot install the latest version

When I tried to install the latest version, I got errors below.


    error: can't copy 'captum/insights/frontend/widget/static/extension.js': doesn't exist or not a regular file
    ----------------------------------------

ERROR: Command errored out with exit status 1: /root//.pyenv/versions/3.7.4/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-xijz5fxd/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-xijz5fxd/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-lcp60r6o/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

It seems to be caused by the wrong js path, captum/insights/frontend/widget/static/extension.js.

Captum for BERT

Hi,
Thanks for the great work. The LSTM tutorial looks very nice.
Are any suggestions on how to use Captum for Transformer-based / BERT-like pre-trained contextualized word embeddings? If I want to see the attribution of each token in the word embedding layer, is it that I'd also need the FFN layer for fine-tuning downstream tasks in order to get the gradients? The current code is implemented with torch/text; would really appreciate it if you could some hints how to integrate it with BERT models(e.g. huggingface/transformers).

Thank you.

BibTeX for citation

Hi folks,
Is there a proper .bib format available for Captum for the purposes of citation in research papers?

Thanks!

Undesirable behavior of LayerActivation in networks with inplace ReLUs

Hi,
I was trying to use captum.attr._core.layer_activation.LayerActivation to get the activation of the first convolutional layer in a simple model. Here is my code:

torch.manual_seed(23)
np.random.seed(23)
model = nn.Sequential(nn.Conv2d(3, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True),
                      nn.Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True))

layer_act = LayerActivation(model, model[0])
input = torch.randn(1, 3, 5, 5)
mylayer = model[0]
print(torch.norm(mylayer(input) - layer_act.attribute(input), p=2))

In fact, I have computed the activation in two different ways and compared them afterwards. Obviously, I expected a value close to zero to be printed as the output, however, this is what I got:

tensor(3.4646, grad_fn=<NormBackward0>)

I hypothesize that the inplace ReLU layer after the convolutional layer acts on its output since there were many zeros in the activation computed by Captum ( i.e. layer_act.attribute(input)). In fact, when I changed the architecture of the network to the following:

model = nn.Sequential(nn.Conv2d(3, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(),
                      nn.Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True))

then the outputs matched.

System information

  • Python 3.7.0
  • torch 1.3.0
  • Captum 0.1.0

rand_img_dist defined but not used in the official tutorial

Hi,

In the tutorial Model Interpretation for Pretrained ResNet Model, for the occlusion experiment, rand_img_dist = torch.cat([input * 0, input * 1]) is defined but never used, maybe you want to remove it.

occlusion = Occlusion(model)

rand_img_dist = torch.cat([input * 0, input * 1])
attributions_occ = occlusion.attribute(input,
                                       strides = (3, 50, 50),
                                       target=pred_label_idx,
                                       sliding_window_shapes=(3,60, 60),
                                       baselines=0)

_ = viz.visualize_image_attr_multiple(np.transpose(attributions_occ.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      np.transpose(transformed_img.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      ["original_image", "heat_map"],
                                      ["all", "positive"],
                                      show_colorbar=True,
                                      outlier_perc=2,
                                     )

CUDA OOM Error

Hi,

I am currently integrating Captum into my deep learning tool kit, thx for providing this lib.

When I try to run IntegratedGradients on a standard densenet201 model that is on a cuda device (11GB vram), I am getting an out-of-memory error even for one input image.

Just a quick check: Is this normal behaviour?

How to intepret BERT for SequenceClassification?

Hi @NarineK and captum team, thanks for all the great work on interpretability with PyTorch.

As others here (see #150, #249), I am trying to interpret a BERT classifier finetuned on a binary classification task, using the transformers library from HuggingFace.
Indeed, I have

model = BertForSequenceClassification.from_pretrained('finetuned-bert-base-cased')

I am not being great at doing this, starting from the SQUAD example https://github.com/pytorch/captum/blob/master/tutorials/Bert_SQUAD_Interpret.ipynb

So far, I left almost everything else untouched and redefined

def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, add_special_tokens=False)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

which I call with input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id) and a custom forward method that reads

def custom_forward(inputs, token_type_ids=None, position_ids=None, attention_mask=None, position=0):
    outputs = predict(inputs, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask)
    preds = outputs[0]
   #preds is like
   #tensor([[-1.9723,  2.2183]], grad_fn=<AddmmBackward>)
    return torch.tensor([torch.softmax(preds, dim = 1)[0][1]], requires_grad = True)

which I use in lig = LayerIntegratedGradients(custom_forward, model.bert.embeddings).

When calling lig.attribute (as in the tutorial), I get

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Can you help me debug the above? I guess I am messing something up with the custom_forward method, and maybe also construct_input_ref_pair... or more.

I am happy to post a working solution once done with this!

Plan for perturbation-based methods

Hello,
Kudos for the great work. I believe this has great potential.
I wonder what is in your roadmap, especially regarding perturbation-based attribution methods (Occlusion, LIME/KernelSHAP, Shapley Value sampling, etc.).

Are these planned at all? While being orders of magnitude slower, these methods have the advantage that they can be applied to any black-box model (ie. any network architecture is supported out-of-the-box, with no need to instrument layers or implement custom modules). The implementation into Captum should be easier too. Moreover, Shapley Value attributions have unique theoretical properties that might be important when speed is not critical.

While it makes sense to focus on gradient-based methods first, maybe the structure of the library should be such that these methods can be easily added in the future.

Scripting/tracing Captum classes

Hello, I was experimenting with Captum and I was wondering if there was any way to trace/script an attribution model in order to just obtain the final heatmap as output of the serialized file.

I did not find any reference in the documentation nor in the code, and did not manage to integrate it myself by creating intermediate classes to, for example, wrap the Saliency class in a torch.nn.Module one.

Is there something I am missing / is it in the future plans?

Captum Insights build fails on Linux Ubuntu18.04

Cannot build and launch Captum insights on Linux Ubutnu18.04 (inside VM VirtualBox):

(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$ conda install -c conda-forge yarn
Collecting package metadata (repodata.json): done
Solving environment: done

All requested packages already installed.

(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$ BUILD_INSIGHTS=1 python setup.py develop
-- Building version 0.2.0
-- Building Captum Insights
Running: ./scripts/build_insights.sh
~/eStep/XAI/Software/captum/captum/insights/frontend ~/eStep/XAI/Software/captum

Install Dependencies

yarn install v1.22.0
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning " > @babel/[email protected]" has unmet peer dependency "@babel/core@^7.0.0-0".
warning "@babel/plugin-proposal-class-properties > @babel/[email protected]" has unmet peer dependency "@babel/core@^7.0.0".
warning " > [email protected]" has unmet peer dependency "@babel/core@^7.0.0".
warning " > [email protected]" has unmet peer dependency "webpack@>=2".
warning "react-scripts > @typescript-eslint/eslint-plugin > [email protected]" has unmet peer dependency "typescript@>=2.8.0 || >= 3.2.0-dev || >= 3.3.0-dev || >= 3.4.0-dev || >= 3.5.0-dev || >= 3.6.0-dev || >= 3.6.0-beta || >= 3.7.0-dev || >= 3.7.0-beta".
warning " > [email protected]" has unmet peer dependency "prop-types@^15.0.0".
warning " > [email protected]" has unmet peer dependency "[email protected]".
error An unexpected error occurred: "EPERM: operation not permitted, symlink '../../../parser/bin/babel-parser.js' -> '/home/elena/eStep/XAI/Software/captum/captum/insights/frontend/node_modules/@babel/core/node_modules/.bin/parser'".
info If you think this is a bug, please open a bug report with the information provided in "/home/elena/eStep/XAI/Software/captum/captum/insights/frontend/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
Traceback (most recent call last):
File "setup.py", line 105, in
build_insights()
File "setup.py", line 88, in build_insights
subprocess.check_call(command)
File "/home/elena/anaconda3/envs/captum/lib/python3.7/subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command './scripts/build_insights.sh' returned non-zero exit status 1.
(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$

Request: example with multilabel attribution

The provided vision examples and documentation are excellent for single-class classification, but I am struggling to implement a multi-label use case.

For my use case, I use a single channel image of a cell nucleus as input. The target is a tensor the describes whether or not the cell was positive for each of 22 different protein markers, e.g. tensor([0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1.,
0., 0., 0., 0.], dtype=torch.float64)
...that is, each cell can be positive for multiple markers, not only one. This is a simple multi-label classification task, where my model is the boilerplate torchvision.models.resnet18 with a custom final layer that accommodates the desired output.

I use the CIFAR vision example as a starting point as follows:
image

But I get AssertionError: Tensor target dimension torch.Size([22]) is not valid. I see from the docstring for saliency.attribute that targets/outputs with with greater than two dimensions should be passed as tuples, but when I pass tuple(labels[ind]) instead, I get AssertionError: Cannot choose target column with output shape torch.Size([1, 22])..

Ideally, I'd like to set up an AttributionVisualizer that looks like the following mock-up:

image

...where I can click each element of the prediction (e.g. CK19) and see the corresponding attribution image for that marker.

Any chance that a multi-label classification example like this could be supplied?

Much thanks!

Documentation on `baseline` argument in DeepLiftShap

Hi all,

Thank you so much for the invitation to captum. Very grateful to all of you for putting this together! I had a quick question regarding the documentation. Currently, in the arguments description for DeepLiftShap, it says "The first dimension in baseline tensors defines the distribution from which we randomly draw samples" (

randomly draw samples. All other dimensions starting after
). However, when I look at the code, it seems as though all the baselines are used for all the inputs (i.e. I'm not seeing any code that I would associate with sampling). Is my understanding correct? I actually prefer the deterministic behavior because in my lab we typically supply multiple baselines per input and we want all the baselines to be used.

Thanks,
Avanti

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.