GithubHelp home page GithubHelp logo

deel-ai / influenciae Goto Github PK

View Code? Open in Web Editor NEW
54.0 3.0 3.0 2.08 MB

๐Ÿ‘‹ Influenciae is a Tensorflow Toolbox for Influence Functions

Home Page: https://deel-ai.github.io/influenciae

License: Other

Makefile 0.18% Python 99.82%
influence-functions explainable-ai explainability outlier-detection misclassification fairness-ai

influenciae's People

Contributors

agustin-picard avatar dv-ai avatar fel-thomas avatar lucashervier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

influenciae's Issues

Dataset to evaluate usage

For me, it's not clear the real usage of the API when using "dataset_to_evaluate != None"

When using dataset_to_evaluate differents than None, the user want to retrieve the sample of the training which are the most important relative of a particular set of sample to evaluate.

In the current implementation, dataset to evaluate shall have the same size of the dataset of training. But in practice, the user wants to found the closest sample of training dataset for few sample.

For exemple, for 10 samples to evaluate, the user shall call the api 10 times with 10 datasets containing a copy of "size of training" of the same sample. And after that the user shall sort the influence score to find the 5 elements and to parse the dataset of training set to found these elements.

I found this particular complexe and not optimal because the API is called 10 times.

[Bug]: - from_string constructor of "IHVPCalculator" doesn't work for "cgd" and "lissa"

Module

Common

Contact Details

No response

Current Behavior

When using, as expected, the from_string interface to IHVPCalculator, e.g. :

influence_calculator_cgd = FirstOrderInfluenceCalculator(influence_model, batched_ds, "cgd")

or

influence_calculator_lissa = FirstOrderInfluenceCalculator(influence_model, batched_ds, "lissa")

we get the following error :

TypeError: ConjugateGradientDescentIHVP.__init__() missing 1 required positional argument: 'train_dataset'

Note that this behavior isn't observed when using "exact" string.

Expected Behavior

Having train_dataset argument of ConjugateGradientDescentIHVP.__init__() and LissaIHVP.__init__() automatically set to batched_ds (that we already have to put in FirstOrderInfluenceCalculator

Version

v0.1.0

Environment

- OS:
- Python version:
- Tensorflow version:
- Packages used version:

Relevant log output

No response

To Reproduce

import numpy as np
np.random.seed(42)

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

from sklearn.model_selection import train_test_split

import json

from tensorflow.keras.utils import to_categorical
import tensorflow as tf
tf.random.set_seed(42)

from deel.influenciae.common import InfluenceModel, ConjugateGradientDescentIHVP, ExactIHVP, LissaIHVP
from deel.influenciae.influence import FirstOrderInfluenceCalculator, SecondOrderInfluenceCalculator
from deel.influenciae.trac_in import TracIn
from deel.influenciae.utils import ORDER
from deel.influenciae.boundary_based import WeightsBoundaryCalculator
from deel.influenciae.benchmark.base_benchmark import ModelsSaver

def generate_data(N, center1 = np.array([-0.5,0]), center2 = np.array([0.5,0])):
  data = np.random.uniform(low=-1.,high=1.,size=(N,2))
  y = np.array([1*((x[0]-center1[0])**2 + (x[1]-center1[1])**2 <= 0.1 or (x[0]-center2[0])**2 + (x[1]-center2[1])**2 <= 0.1) for x in data])
  return data,y

def plot_data(x,y):
  for i,(pos_x,pos_y) in enumerate(x):
    if y[i]:
      plt.scatter(pos_x,pos_y,c='red',linewidths=0.5)
    else:
      plt.scatter(pos_x,pos_y,c='blue',linewidths=0.5)


x,y = generate_data(20)

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train)).map(lambda x, y: (tf.cast(x, tf.float32), tf.cast(y, tf.float32)))
test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test)).map(lambda x, y: (tf.cast(x, tf.float32), tf.cast(y, tf.float32)))


model = tf.keras.Sequential([
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
# saver = ModelsSaver([3*i + 10 for i in range(14)],optimizer)


model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy'])

model.fit(train_ds.batch(16), epochs=2, validation_data=test_ds.batch(16))


# Transform the Tensorflow model into an InfluenceModel (that implements some specific functionality for computing influences)

unreduced_loss = tf.keras.losses.BinaryCrossentropy(reduction=tf.keras.losses.Reduction.NONE)
influence_model = InfluenceModel(model, start_layer=-1, loss_function=unreduced_loss)

# Start by creating an object capable of efficiently computing inverse-hessian-vector products
# ihvp_calculator = ConjugateGradientDescentIHVP(influence_model, extractor_layer=-1,train_dataset=train_ds.batch(1))

# Create the influence calculator object
influence_calculator_exact = FirstOrderInfluenceCalculator(influence_model, train_ds.batch(1), "exact" )
influence_calculator_cgd = FirstOrderInfluenceCalculator(influence_model, train_ds.batch(1), "cgd")
influence_calculator_lissa = FirstOrderInfluenceCalculator(influence_model, train_ds.batch(1), "lissa")

[Bug]: Two definitions in the `base_influence.py`

Module

Common

Contact Details

[email protected]

Current Behavior

If we take a close look on the class BaseInfluenceCalculator we can see that the method _estimate_individual_influence_values_from_batch is defined two times. Once as an @abstractmethod and the other time explicitly.

Expected Behavior

To have a single definition for the method _estimate_individual_influence_values_from_batch for the class BaseInfluenceCalculator

Version

v0.1.0

Environment

- OS:
- Python version:
- Tensorflow version:
- Packages used version:

Relevant log output

No response

To Reproduce

N/A

[Bug]: __step() method of WeightsBoundaryCalculator executes eagerly

Module

None

Contact Details

No response

Current Behavior

The execution is eager. This is because mangled name isn't supported.

Expected Behavior

Execution in graph.

Version

v0.1.0

Environment

- OS:
- Python version:
- Tensorflow version:
- Packages used version:

Relevant log output

WARNING:tensorflow:AutoGraph could not transform <bound method WeightsBoundaryCalculator.__step of <tensorflow.python.eager.function.TfMethodTarget object at 0x7fba8c7b8a00>> and will run it as-is.
Cause: mangled names are not yet supported.

To Reproduce

just call WeightsBoundaryCalculator.__step

Question

Hello.

Is there any example notebook that computes influence functions for a non image dataset?
If I use a benchmark dataset, let's say titanic, what changes should I do in order to compute the influences?

I am waiting for your response!
Thank you

[Bug]: - Calling estimate_influence_values_group of second_order_influence_calculator with a single argument doesn't work as expected

Module

Influence

Contact Details

No response

Current Behavior

The method estimate_influence_values_group of second_order_influence_calculator with a single argument should, according to documentation, set group_to_evaluate = group_train, as it is the case in first_order_influence_calculator .

It is not the case here and the rest of the method cannot work properly. (ValueError: The dataset must be batched before performing this operation. because we test if None is a batched dataset).

Expected Behavior

Quick fix :

if group_to_evaluate is None: group_to_evaluate = group_train

at the beggining of the function

Version

v0.1.0

Environment

No response

Relevant log output

No response

To Reproduce

.

[Feature Request]: - Your request

Module

Common

Contact Details

No response

Feature Request

Can we calculate IF for simple models such as logistic regression?

A minimal example

No response

Version

v0.1.0

Environment

- OS:
- Python version:
- Tensorflow version:
- Packages used version:

Benchmark IHVP Calculator

as discussed by @lucashervier, it would be good to have a way to properly benchmark our inverse hessian vector product calculator on different parameter dimensions -- mnist, cifar, imagenet convolution networks for example -- in order to have benchmarks on the amount of memory needed to build the inverse hessian in each case.

We still need to figure out if we integrate that in the test suite, or if we provide a notebook to benchmark ?
We could meet at some point to discuss about it (@lucashervier, @Agustin-Picard). :)

[Bug]: Magic constants in boundary tests

Module

None

Contact Details

[email protected]

Current Behavior

There are constants in the tests for the boundary_based module whose origin is not clear.

Ex: the influence_computed_expected variable in the following test.

def test_compute_influence_values():
    model = Sequential()
    model.add(Input(shape=(3,)))
    model.add(Dense(2, kernel_initializer=tf.constant_initializer([[1, 1, 1], [0, 0, 0]]),
                    bias_initializer=tf.constant_initializer([4.0, 0.0])))

    calculator = WeightsBoundaryCalculator(model)

    inputs_train = tf.zeros((1, 3))
    targets_train = tf.one_hot(tf.zeros((1,), dtype=tf.int32), 2)
    train_set = tf.data.Dataset.from_tensor_slices((inputs_train, targets_train)).batch(1)

    influence_computed_score = calculator._compute_influence_values(train_set)

    # modify the bias term to get equal logits
    influence_computed_expected = tf.convert_to_tensor([[-np.sqrt(2.0) * 2.0]], dtype=tf.float32)

    assert tf.reduce_max(tf.abs(influence_computed_expected - influence_computed_score)) < 1E-6

The same happens in the test for the other boundary method.

Expected Behavior

We expect tests to be clear to maximize maintainability in the long term. This sometimes means explaining how the expected value is calculated.

Version

v0.1.0

Environment

- OS:
- Python version:
- Tensorflow version:
- Packages used version:

Relevant log output

No response

To Reproduce

N/A

[Feature Request] Add parallel_iterations and experimental_use_pfor parameters in `_compute_inv_hessian` (ExactIHVP)

Is your feature request related to a problem? Please describe.

When using the first-order-influence-koh-liang branch I have some trouble when I want to compute the exact inverse hessian product on a semantic segmentation model. Here is a minimal example and the corresponding outpout logs that I got:

import tensorflow as tf

from influenciae.common.model_wrappers import InfluenceModel
from influenciae.influence.inverse_hessian_vector_product import ExactIHVP

IMG_SIZE = 768
NUM_CLASSES = 20

inp = tf.keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))

# A conv block
x = tf.keras.layers.Conv2D(filters=32, kernel_size=1, strides=(1, 1))(inp)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation('relu')(x)
# FCN block
x = tf.keras.layers.UpSampling2D(
    size=(IMG_SIZE // x.shape[1], IMG_SIZE// x.shape[2]),
    interpolation="bilinear",
)(x)
model_output = tf.keras.layers.Conv2D(NUM_CLASSES, kernel_size=(1, 1), padding="same")(x)

# define model
model = tf.keras.Model(inputs=inp, outputs=model_output)
# freeze all layers except last one
for layer in model.layers:
    layer.trainable = False
for layer in model.layers[-1:]:
    layer.trainable = True
print(model.summary())
# define a loss for semantic segmentation fitting reduction None
class CustomLoss2(tf.keras.losses.Loss):

    def __init__(self, num_classes, ignore_label):
        super(CustomLoss2, self).__init__(name='CustomLoss2', reduction=tf.keras.losses.Reduction.NONE)

        self.num_classes = num_classes
        self.ignore_label = ignore_label

    def call(self, y_true, y_pred):

        sample_weights = tf.cast(tf.not_equal(y_true, self.ignore_label), dtype=tf.float32)
        one_hot_gt = tf.stop_gradient(tf.one_hot(y_true, self.num_classes))

        loss = tf.nn.softmax_cross_entropy_with_logits(one_hot_gt, y_pred)
        weighted_loss = tf.multiply(loss, tf.squeeze(sample_weights))

        # Compute mean loss over spatial dimension.
        num_non_zero = tf.reduce_sum(
            tf.cast(tf.not_equal(weighted_loss, 0.0), tf.float32), 1)
        loss_sum_per_sample = tf.reduce_sum(weighted_loss, 1)
        return tf.reduce_sum(tf.math.divide_no_nan(loss_sum_per_sample, num_non_zero), 1)

if __name__ == "__main__":
    random_input = tf.random.normal(shape=(4, IMG_SIZE, IMG_SIZE, 3))
    random_target = tf.random.uniform(shape=(4, IMG_SIZE, IMG_SIZE), minval=0, maxval=NUM_CLASSES-1, dtype=tf.int32)

    random_dataset = tf.data.Dataset.from_tensor_slices((random_input, random_target))

    # define InfluenceModel
    influence_model = InfluenceModel(model, target_layer=-1, loss_function=CustomLoss2(NUM_CLASSES, ignore_label=255))
    # freeze all layers except last one
    for layer in influence_model.layers:
        layer.trainable = False
    for layer in influence_model.layers[-1:]:
        layer.trainable = True
    ihvp_calculator = ExactIHVP(influence_model, random_dataset.take(1).batch(1))

Logs:

(bdd_env) (base) lucas.hervier@soda01:~/bdd100$ python issue_minimal.py 
2022-02-11 10:59:15.926556: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-02-11 10:59:17.602358: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-02-11 10:59:17.658599: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.659380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:21:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.86GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-02-11 10:59:17.659421: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.660154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties: 
pciBusID: 0000:4a:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.86GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-02-11 10:59:17.660173: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-02-11 10:59:17.661903: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-02-11 10:59:17.661930: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-02-11 10:59:17.662492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-02-11 10:59:17.662623: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-02-11 10:59:17.663131: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2022-02-11 10:59:17.663545: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-02-11 10:59:17.663616: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-02-11 10:59:17.663665: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.664449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.665198: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.665944: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.666664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2022-02-11 10:59:17.666925: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-11 10:59:17.786724: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.787447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:21:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.86GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-02-11 10:59:17.787484: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.788149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties: 
pciBusID: 0000:4a:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.86GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-02-11 10:59:17.788187: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.788895: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.789599: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.790300: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:17.790980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2022-02-11 10:59:17.791016: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-02-11 10:59:18.257978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-02-11 10:59:18.258015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 1 
2022-02-11 10:59:18.258021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N N 
2022-02-11 10:59:18.258024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1:   N N 
2022-02-11 10:59:18.258195: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:18.258947: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:18.259658: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:18.260379: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:18.261077: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:18.261784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22302 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:21:00.0, compute capability: 8.6)
2022-02-11 10:59:18.262082: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-11 10:59:18.262783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22312 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:4a:00.0, compute capability: 8.6)
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 768, 768, 3)]     0         
_________________________________________________________________
conv2d (Conv2D)              (None, 768, 768, 32)      128       
_________________________________________________________________
dropout (Dropout)            (None, 768, 768, 32)      0         
_________________________________________________________________
batch_normalization (BatchNo (None, 768, 768, 32)      128       
_________________________________________________________________
activation (Activation)      (None, 768, 768, 32)      0         
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 768, 768, 32)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 768, 768, 20)      660       
=================================================================
Total params: 916
Trainable params: 660
Non-trainable params: 256
_________________________________________________________________
None
2022-02-11 10:59:18.626040: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-02-11 10:59:18.644320: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3700110000 Hz
2022-02-11 10:59:18.672693: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-02-11 10:59:19.060262: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
2022-02-11 10:59:19.548581: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-02-11 10:59:19.925576: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
WARNING:tensorflow:Using a while_loop for converting Conv2D
WARNING:tensorflow:Using a while_loop for converting Conv2DBackpropInput
WARNING:tensorflow:Using a while_loop for converting ResizeBilinearGrad
2022-02-11 11:04:57.055515: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 45.00GiB (rounded to 48318382080)requested by op loop_body/PartitionedCall/pfor/PartitionedCall/gradients/gradient_tape/model/conv2d_1/Conv2D/Conv2DBackpropFilter_grad/Conv2D/pfor/Tile
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2022-02-11 11:04:57.055561: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc
2022-02-11 11:04:57.055569: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (256):   Total Chunks: 24, Chunks in use: 24. 6.0KiB allocated for chunks. 6.0KiB in use in bin. 1.3KiB client-requested in use in bin.
2022-02-11 11:04:57.055575: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (512):   Total Chunks: 1, Chunks in use: 1. 512B allocated for chunks. 512B in use in bin. 384B client-requested in use in bin.
2022-02-11 11:04:57.055581: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1024):  Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2022-02-11 11:04:57.055587: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2048):  Total Chunks: 5, Chunks in use: 4. 12.5KiB allocated for chunks. 10.5KiB in use in bin. 10.5KiB client-requested in use in bin.
2022-02-11 11:04:57.055593: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4096):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055598: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055605: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16384):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055613: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (32768):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055621: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (65536):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055631: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (131072):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055636: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (262144):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055641: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (524288):        Total Chunks: 1, Chunks in use: 0. 571.0KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055647: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1048576):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055656: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2097152):       Total Chunks: 3, Chunks in use: 3. 6.75MiB allocated for chunks. 6.75MiB in use in bin. 6.06MiB client-requested in use in bin.
2022-02-11 11:04:57.055664: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4194304):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055671: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8388608):       Total Chunks: 1, Chunks in use: 1. 9.00MiB allocated for chunks. 9.00MiB in use in bin. 9.00MiB client-requested in use in bin.
2022-02-11 11:04:57.055679: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16777216):      Total Chunks: 1, Chunks in use: 1. 27.00MiB allocated for chunks. 27.00MiB in use in bin. 27.00MiB client-requested in use in bin.
2022-02-11 11:04:57.055687: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (33554432):      Total Chunks: 4, Chunks in use: 3. 172.68MiB allocated for chunks. 135.00MiB in use in bin. 135.00MiB client-requested in use in bin.
2022-02-11 11:04:57.055695: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (67108864):      Total Chunks: 5, Chunks in use: 5. 360.00MiB allocated for chunks. 360.00MiB in use in bin. 333.00MiB client-requested in use in bin.
2022-02-11 11:04:57.055702: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055709: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (268435456):     Total Chunks: 1, Chunks in use: 0. 21.22GiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-02-11 11:04:57.055718: I tensorflow/core/common_runtime/bfc_allocator.cc:1014] Bin for 45.00GiB was 256.00MiB, Chunk State: 
2022-02-11 11:04:57.055732: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 21.22GiB | Requested Size: 45.00MiB | in_use: 0 | bin_num: 20, prev:   Size: 72.00MiB | Requested Size: 72.00MiB | in_use: 1 | bin_num: -1, for: loop_body/PartitionedCall/pfor/PartitionedCall/gradients/model/conv2d_1/Conv2D_grad/Conv2DBackpropFilter/pfor/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, stepid: 15496386686427765080, last_action: 4278547630, for: UNUSED, stepid: 15496386686427765080, last_action: 4278547628
2022-02-11 11:04:57.055739: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 23385669632
2022-02-11 11:04:57.055747: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000000 of size 1280 by op ScratchBuffer action_count 4278547493 step 0 next 1
2022-02-11 11:04:57.055753: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000500 of size 256 by op Fill action_count 4278547503 step 0 next 5
2022-02-11 11:04:57.055758: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000600 of size 256 by op Fill action_count 4278547504 step 0 next 2
2022-02-11 11:04:57.055764: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000700 of size 256 by op Sub action_count 4278547495 step 0 next 3
2022-02-11 11:04:57.055768: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000800 of size 256 by op Sub action_count 4278547496 step 0 next 4
2022-02-11 11:04:57.055774: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000900 of size 256 by op Fill action_count 4278547505 step 0 next 8
2022-02-11 11:04:57.055778: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000a00 of size 256 by op Fill action_count 4278547506 step 0 next 9
2022-02-11 11:04:57.055784: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000b00 of size 256 by op Fill action_count 4278547507 step 0 next 6
2022-02-11 11:04:57.055790: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000c00 of size 512 by op Add action_count 4278547500 step 0 next 7
2022-02-11 11:04:57.055795: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000e00 of size 256 by op Fill action_count 4278547508 step 0 next 10
2022-02-11 11:04:57.055801: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6000f00 of size 256 by op Fill action_count 4278547509 step 0 next 11
2022-02-11 11:04:57.055806: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001000 of size 256 by op Fill action_count 4278547519 step 0 next 15
2022-02-11 11:04:57.055812: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001100 of size 256 by op AssignVariableOp action_count 4278547520 step 0 next 18
2022-02-11 11:04:57.055818: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001200 of size 256 by op Mul action_count 4278547522 step 0 next 20
2022-02-11 11:04:57.055823: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001300 of size 256 by op Add action_count 4278547524 step 0 next 22
2022-02-11 11:04:57.055829: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001400 of size 256 by op Equal action_count 4278547529 step 0 next 24
2022-02-11 11:04:57.055835: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001500 of size 256 by op CustomLoss2/weighted_loss/Const action_count 4278547533 step 0 next 26
2022-02-11 11:04:57.055841: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001600 of size 256 by op CustomLoss2/NotEqual_1/y action_count 4278547534 step 0 next 27
2022-02-11 11:04:57.055847: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001700 of size 256 by op model/batch_normalization/FusedBatchNormV3 action_count 4278547560 step 13684086849625510338 next 34
2022-02-11 11:04:57.055852: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001800 of size 256 by op model/batch_normalization/FusedBatchNormV3 action_count 4278547561 step 13684086849625510338 next 35
2022-02-11 11:04:57.055858: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001900 of size 256 by op model/batch_normalization/FusedBatchNormV3 action_count 4278547562 step 13684086849625510338 next 12
2022-02-11 11:04:57.055864: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001a00 of size 256 by op Sub action_count 4278547511 step 0 next 13
2022-02-11 11:04:57.055869: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001b00 of size 256 by op Sub action_count 4278547512 step 0 next 14
2022-02-11 11:04:57.055875: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001c00 of size 256 by op model/batch_normalization/FusedBatchNormV3 action_count 4278547563 step 13684086849625510338 next 36
2022-02-11 11:04:57.055880: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001d00 of size 256 by op model/batch_normalization/FusedBatchNormV3 action_count 4278547564 step 13684086849625510338 next 37
2022-02-11 11:04:57.055886: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6001e00 of size 256 by op gradient_tape/UnsortedSegmentSum/pfor/mul_1 action_count 4278547624 step 0 next 45
2022-02-11 11:04:57.055892: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 7fdcc6001f00 of size 2048 by op UNUSED action_count 0 step 0 next 16
2022-02-11 11:04:57.055898: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6002700 of size 2560 by op Add action_count 4278547516 step 0 next 17
2022-02-11 11:04:57.055903: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6003100 of size 9437184 by op RandomUniformInt action_count 4278547528 step 0 next 19
2022-02-11 11:04:57.055909: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6903100 of size 3072 by op gradient_tape/CustomLoss2/Tile action_count 4278547532 step 0 next 25
2022-02-11 11:04:57.055915: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6903d00 of size 2560 by op gradient_tape/model/conv2d_1/Conv2D/Conv2DBackpropFilter action_count 4278547612 step 13684086849625510338 next 44
2022-02-11 11:04:57.055921: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6904700 of size 2560 by op gradient_tape/UnsortedSegmentSum/pfor/Tile action_count 4278547623 step 0 next 43
2022-02-11 11:04:57.055927: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 7fdcc6905100 of size 584704 by op UNUSED action_count 4278547633 step 0 next 28
2022-02-11 11:04:57.055934: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6993d00 of size 2359296 by op CustomLoss2/ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_float_Cast action_count 4278547544 step 13684086849625510338 next 33
2022-02-11 11:04:57.055940: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6bd3d00 of size 2359296 by op gradient_tape/UnsortedSegmentSum/pfor/UnsortedSegmentSum action_count 4278547632 step 15496386686427765080 next 41
2022-02-11 11:04:57.055946: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc6e13d00 of size 2359296 by op CustomLoss2/softmax_cross_entropy_with_logits action_count 4278547592 step 13684086849625510338 next 42
2022-02-11 11:04:57.055952: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 7fdcc7053d00 of size 39515136 by op UNUSED action_count 4278547619 step 13684086849625510338 next 21
2022-02-11 11:04:57.055957: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcc9603100 of size 28311552 by op Add action_count 4278547525 step 0 next 23
2022-02-11 11:04:57.055963: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdccb103100 of size 75497472 by op model/activation/Relu-0-1-TransposeNCHWToNHWC-LayoutOptimizer action_count 4278547566 step 13684086849625510338 next 31
2022-02-11 11:04:57.055969: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdccf903100 of size 47185920 by op gradient_tape/CustomLoss2/softmax_cross_entropy_with_logits/mul action_count 4278547610 step 13684086849625510338 next 32
2022-02-11 11:04:57.055975: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcd2603100 of size 75497472 by op model/conv2d/BiasAdd-0-1-TransposeNCHWToNHWC-LayoutOptimizer action_count 4278547558 step 13684086849625510338 next 29
2022-02-11 11:04:57.055981: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcd6e03100 of size 75497472 by op model/up_sampling2d/resize/ResizeBilinear action_count 4278547568 step 13684086849625510338 next 30
2022-02-11 11:04:57.055988: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcdb603100 of size 75497472 by op loop_body/PartitionedCall/pfor/PartitionedCall/gradients/CustomLoss2/softmax_cross_entropy_with_logits_grad/Softmax/pfor/Softmax action_count 4278547625 step 15496386686427765080 next 38
2022-02-11 11:04:57.055994: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdcdfe03100 of size 47185920 by op CustomLoss2/softmax_cross_entropy_with_logits action_count 4278547593 step 13684086849625510338 next 39
2022-02-11 11:04:57.056000: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdce2b03100 of size 47185920 by op model/conv2d_1/BiasAdd-0-0-TransposeNCHWToNHWC-LayoutOptimizer action_count 4278547589 step 13684086849625510338 next 40
2022-02-11 11:04:57.056006: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7fdce5803100 of size 75497472 by op loop_body/PartitionedCall/pfor/PartitionedCall/gradients/model/conv2d_1/Conv2D_grad/Conv2DBackpropFilter/pfor/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer action_count 4278547630 step 15496386686427765080 next 46
2022-02-11 11:04:57.056012: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 7fdcea003100 of size 22781677312 by op UNUSED action_count 4278547628 step 15496386686427765080 next 18446744073709551615
2022-02-11 11:04:57.056017: I tensorflow/core/common_runtime/bfc_allocator.cc:1051]      Summary of in-use Chunks by size: 
2022-02-11 11:04:57.056024: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 24 Chunks of size 256 totalling 6.0KiB
2022-02-11 11:04:57.056030: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 512 totalling 512B
2022-02-11 11:04:57.056038: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1280 totalling 1.2KiB
2022-02-11 11:04:57.056048: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 2560 totalling 7.5KiB
2022-02-11 11:04:57.056057: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 3072 totalling 3.0KiB
2022-02-11 11:04:57.056067: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 2359296 totalling 6.75MiB
2022-02-11 11:04:57.056076: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 9437184 totalling 9.00MiB
2022-02-11 11:04:57.056087: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 28311552 totalling 27.00MiB
2022-02-11 11:04:57.056096: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 47185920 totalling 135.00MiB
2022-02-11 11:04:57.056104: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 5 Chunks of size 75497472 totalling 360.00MiB
2022-02-11 11:04:57.056113: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 537.77MiB
2022-02-11 11:04:57.056122: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 23385669632 memory_limit_: 23385669632 available bytes: 0 curr_region_allocation_bytes_: 46771339264
2022-02-11 11:04:57.056135: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
Limit:                     23385669632
InUse:                       563890432
MaxInUse:                    600314368
NumAllocs:                          92
MaxAllocSize:                 99865600
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2022-02-11 11:04:57.056161: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ***_________________________________________________________________________________________________
2022-02-11 11:04:57.056221: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at tile_ops.cc:198 : Resource exhausted: OOM when allocating tensor with shape[640,1,768,768,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "issue_minimal.py", line 67, in <module>
    ihvp_calculator = ExactIHVP(influence_model, random_dataset.take(1).batch(1))
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/Influenciae-0.0.1-py3.8.egg/influenciae/influence/inverse_hessian_vector_product.py", line 59, in __init__
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/Influenciae-0.0.1-py3.8.egg/influenciae/influence/inverse_hessian_vector_product.py", line 83, in _compute_inv_hessian
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/tensorflow/python/eager/backprop.py", line 1175, in jacobian
    output = pfor_ops.pfor(loop_fn, target_size,
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/tensorflow/python/ops/parallel_for/control_flow_ops.py", line 206, in pfor
    outputs = f()
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 956, in _call
    return self._concrete_stateful_fn._call_flat(
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call
    outputs = execute.execute(
  File "/home/lucas.hervier/bdd100/bdd_env/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError:  OOM when allocating tensor with shape[640,1,768,768,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node loop_body/PartitionedCall/pfor/PartitionedCall/gradients/gradient_tape/model/conv2d_1/Conv2D/Conv2DBackpropFilter_grad/Conv2D/pfor/Tile}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_f_1291]

Function call stack:
f

As you can see, I face an OOM issue when trying to allocate a tensor with shape [640, 1, 768, 768, 32]. 640 is the number of weights (so basically the gradient vector size) 1 the number of inputs and [768, 768, 32] is the size of the input ONCE he got through all the layers except the last one. And as you might notice, this vector is allocated when we try to do:

hess = tf.squeeze(tape_hess.jacobian(grads, weights))

In the function _compute_inv_hessian in the inverse_hessian_vector_product.py file.

Describe the solution you'd like

I know that to compute the hessian we need this vector. But I was wondering if we cannot split this vector among the grads dim and my colleague @dv-ai has found out a workaround solution if you make some little change in the _compute_inv_hessian function:

Old:

  def _compute_inv_hessian(self, dataset: tf.data.Dataset) -> tf.Tensor:
      """
      Compute the (pseudo)-inverse of the hessian matrix wrt to the model's parameters using backward-mode AD.

      Disclaimer: this implementation trades memory usage for speed, so it can be quite memory intensive, especially
      when dealing with big models.

      Args:
          dataset: tf.data.Dataset
              A TF dataset containing the whole or part of the training dataset for the computation of the inverse
              of the mean hessian matrix.

      Returns:
          A tf.Tensor with the resulting inverse hessian matrix
      """
      weights = self.model.weights
      with tf.GradientTape(persistent=False, watch_accessed_variables=False) as tape_hess:
          tape_hess.watch(weights)
          grads = self.model.batch_gradient(dataset) if dataset._batch_size == 1 \
              else self.model.batch_jacobian(dataset)

      hess = tf.squeeze(tape_hess.jacobian(grads, weights))
      hessian = tf.reduce_mean(tf.reshape(hess, (-1, int(tf.reduce_prod(weights.shape)), int(tf.reduce_prod(weights.shape)))), axis=0)

      return tf.linalg.pinv(hessian)

Alternative:

  def _compute_inv_hessian(self, dataset: tf.data.Dataset) -> tf.Tensor:
      """
      Compute the (pseudo)-inverse of the hessian matrix wrt to the model's parameters using
      backward-mode AD.

      Disclaimer: this implementation trades memory usage for speed, so it can be quite
      memory intensive, especially when dealing with big models.

      Parameters
      ----------
      dataset
          A TF dataset containing the whole or part of the training dataset for the
          computation of the inverse of the mean hessian matrix.

      Returns
      ----------
      inv_hessian
          A tf.Tensor with the resulting inverse hessian matrix
      """
      weights = self.model.weights
      with tf.GradientTape(persistent=True, watch_accessed_variables=False) as tape_hess:
          tape_hess.watch(weights)
          grads = self.model.batch_gradient(dataset) if dataset._batch_size == 1 \
              else self.model.batch_jacobian(dataset) # pylint: disable=W0212

      hess = tf.squeeze(tape_hess.jacobian(grads, weights, parallel_iterations=10, experimental_use_pfor=False))

      hessian = tf.reduce_mean(tf.reshape(hess,
                                          (-1, int(tf.reduce_prod(weights.shape)),
                                           int(tf.reduce_prod(weights.shape)))), axis=0)

      return tf.linalg.pinv(hessian)

By changing: persistent to True and by setting in the .jacobian call the parameters: parallel_iterations=10 and experimental_use_pfor=False the computation is done.

N.B: 10 is not important as long it is a natural divider of the number of grads length (unfortunate for prime number though)

See if I add to my script:

print(ihvp_calculator.inv_hessian)

I got:

tf.Tensor(
[[ 2.3457441  -0.07872738 -0.11368337 ...  0.02131678  0.02238739
   0.04105094]
 [-0.07837234  2.576137   -0.12778574 ...  0.02324321  0.02715976
   0.03761083]
 [-0.11375846 -0.12770845  2.8135462  ...  0.02000072  0.0255051
   0.03220554]
 ...
 [ 0.02132007  0.02319163  0.01998054 ...  0.7005969  -0.01072854
  -0.0270703 ]
 [ 0.02241289  0.02717561  0.02547131 ... -0.0106203   0.87094194
  -0.03467852]
 [ 0.04103031  0.03757853  0.03215647 ... -0.02701988 -0.0346096
   0.77158403]], shape=(640, 640), dtype=float32)

The computation still take some times but that make sense since there is a lot of parameters. Is there any way to set those parameters in the constructor or at least when calling _compute_inv_hessian. Or otherwise, to automatically split the computation over the different gradients ?

Additional remarks
While doing those experimentations I also noticed a few thing:

  • In compute_hvpyou do:
if self.hessian is None:
    self.hessian = tf.linalg.pinv(self.inv_hessian)

But you can only go into this if statement since you never do self.hessian=smth. Why not affecting self.hessian = hessianin _compute_inv_hessian ? Since you need the hessian to compute the inverse why are you using again tf.linalg.pinv which is very costly ?

  • In comon/model_wrappers.py:
    I would change in both _gradient and _jacobian the following lines:
with tf.GradientTape() as tape:

To:

 with tf.GradientTape(watch_accessed_variables=False) as tape:

But maybe there is a good reason to not do it ?

Otherwise, it is a really nice work and I know my issue is already related to more advanced Use Cases, I apologize for that!

Extend the use of `tf.function`

The benchmark shows that the speed in our use cases soars when we vectorize our critical functions well with tf.function.
I propose to go over the code before any new implementation and make sure we maximize the code in tf.function.

This will require moving / splitting the 'branching code' (e.g assert on dataset...) from the computation code.

@Agustin-Picard , I think we can help on that with @lucashervier as we already did it in Xplique. I think we can propose you something, a draft or a V0 and we can discuss if it fits with your vision of the lib. ;)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.