inseq-team / inseq Goto Github PK

Interpretability for sequence generation models 🐛 🔍

License: Apache License 2.0

Makefile 0.60% Python 99.35% Dockerfile 0.06%

attribution-methods captum deep-learning explainable-ai generative-ai huggingface interpretability language-generation language-model large-language-models natural-language-processing sequence-to-sequence transformers

inseq's People

Contributors

Stargazers

Watchers

inseq's Issues

Error of input text not matching decoded output of tokenizer

🐛 Bug Report

If text is used as input to the attribute method that contains spaces before non-alphanumeric characters, the decoding of the tokenizer does not match the input anymore, leading to the assert error.

🔬 How To Reproduce

Give as input a text containing a special character (e.g.: . or ?) preceded by a white space while using a GPT-like model (and tokenizer).

Code sample

Steps to reproduce the behavior:

import inseq
model = inseq.load_model('gpt2', attribution_method='input_x_gradient') # Or any other gpt-like model
model.attribute(input_texts='Hello . This is an example')

Returns the following error: AssertionError: Forced generations with decoder-only models must start with the input texts.

Environment

OS: macOS (but independent from the OS)
Python version: Python 3.9.15
Inseq version, get it with: 0.4.0

Expected behavior

The decoded text output from the tokenizer should be identical to the input text, allowing the assert to be correctly verified:

assert all(
    generated_texts[idx].startswith(input_texts[idx]) for idx in range(len(input_texts))
), "Forced generations with decoder-only models must start with the input texts."

Additional context

The issue is related to the type of tokenizer used, already reported in huggingface/transformers#21119. To solve the problem, it is recommended to use the clean_up_tokenization_spaces=False flag when decoding the text.

models cannot be loaded with instantiated `PreTrainedTokenizerFast`

🐛 Bug Report

When loading a model with an already instantiated fast tokenizer from Huggingface, below error is thrown:

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'MBart50TokenizerFast(name_or_path='facebook/mbart-large-50-many-to-many-mmt', vocab_size=250054, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '', 'sep_token': '', 'pad_token': '', 'cls_token': '', 'mask_token': '', 'additional_special_tokens': ['ar_AR', 'cs_CZ', 'de_DE', 'en_XX', 'es_XX', 'et_EE', 'fi_FI', 'fr_XX', 'gu_IN', 'hi_IN', 'it_IT', 'ja_XX', 'kk_KZ', 'ko_KR', 'lt_LT', 'lv_LV', 'my_MM', 'ne_NP', 'nl_XX', 'ro_RO', 'ru_RU', 'si_LK', 'tr_TR', 'vi_VN', 'zh_CN', 'af_ZA', 'az_AZ', 'bn_IN', 'fa_IR', 'he_IL', 'hr_HR', 'id_ID', 'ka_GE', 'km_KH', 'mk_MK', 'ml_IN', 'mn_MN', 'mr_IN', 'pl_PL', 'ps_AF', 'pt_XX', 'sv_SE', 'sw_KE', 'ta_IN', 'te_IN', 'th_TH', 'tl_XX', 'uk_UA', 'ur_PK', 'xh_ZA', 'gl_ES', 'sl_SI']}, clean_up_tokenization_spaces=True)'. Use 'repo_type' argument if needed.

🔬 How To Reproduce

Steps to reproduce the behavior:

try to run load a model with instantiated fast tokenizer (see code sample below)

Code sample

import inseq
from transformers import (MBartForConditionalGeneration, MBart50TokenizerFast)

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")

de_tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt", src_lang="de_DE", tgt_lang="ko_KR")

attr_model = inseq.load_model(model, "attention", tokenizer=de_tokenizer)

Environment

OS: macOS
Python version: 3.10.9
Inseq version: 0.4.0

Expected behavior

Providing pretrained fast-tokenizers should be supported.

Additional context

The issue seems to originate in huggingface_model.py where several type definitions are set as transformers.PreTrainedTokenizer instead of transformers.PreTrainedTokenizerBase, effectively disallowing any Fast tokenizer. the issue seems to be line 112 in particular. Changing the type definitions solves the issue

[Summary] Add perturbation feature attribution methods

🚀 Feature Request

The following is a non-exhaustive list of perturbation-based feature attribution methods that could be added to the library:

Method name	Source	In Captum	Code implementation	Status
(Layer) Feature Ablation¹	-	✅	`pytorch/captum`
Occlusion	Zeiler and Fergus '13	✅	`pytorch/captum`	✅
Shapley Value Sampling	Castro et al. '09	✅	`pytorch/captum`
Lime	Ribeiro et al. '16	✅	`pytorch/captum`	✅
KernelShap	Lundberg and Lee '17	✅	`pytorch/captum`
Editing ²	-	-	-
Greedy Rationalization ³	Vafa et al. '21	-	`keyonvafa/sequential-rationales`
Information Bottleneck	Jiang et al. '20	-	`DFKI-NLP/thermostat`
BayesLime	Slack et al. '21	-	`dylan-slack/Modeling-Uncertainty-Local-Explainability`
BayesSHAP	Slack et al. '21	-	`dylan-slack/Modeling-Uncertainty-Local-Explainability`
Input Reduction	Feng et al. '18	-	-
Input Marginalization	Kim et al. '20	-	-
Occlusion & Language Modeling	Harbecke and Alt '20	-	`DFKI-NLP/OLM`
Context Probing ⁴	Cífka and Liutkus '22	-	`cifkao/context-probing`
Weighted SHAP	Kwon and Zou '22	-	`ykwon0407/WeightedSHAP`
Value Zeroing	Mohebbi et al. '23	-	`hmohebbi/ValueZeroing`	✅
Comprehensiveness-as-a-metric	Zhou et al. '23	-	`YilunZhou/solvability-explainer`
Sufficiency-as-a-metric	Zhou et al. '23	-	`YilunZhou/solvability-explainer`
Causal Tracing	Meng et al. '22	-	`kmeng01/rome`
Attention Knockout⁵	Geva et al. '23	-	-
ReAGent	Zhao et al. '24	-	`casszhao/ReAGent`	✅
SyntaxSHAP	Amara et al. '24	-	`k-amara/syntax-shap`

Notes:

For more information on Editing, see point 3 in #112 .

Called ablation, but perform masking of features using a baseline.
Editing replaces tokens with their nearest neighbors in the vocabulary embedding space and measures saliency as the drop in performance for the target. In the future, this can allow users to specify a custom editing strategy via an input Callable.
Possibly overlapping with feature ablation up to some measure.
Valid only for decoder-only models.
Verify whether it would be exactly equivalent to Value Zeroing, include only if functionally different (alias otherwise).

Visualize Attention Weights for a Decoder Only Model

Question

How can I visualize the attention weights for a decoder only model like Pythia for a given input prompt?

Additional context

I went over the tutorial here which uses an Encoder-Decoder model and wanted to try this out for a Decoder only model

I tried to replace the model name only but it does not seem to work -

model = inseq.load_model("EleutherAI/pythia-70m-deduped", "input_x_gradient")

out = model.attribute(
    input_texts="Hello everyone, hope you're enjoying the tutorial!",
    attribute_target=True,
    method="attention"
)
# out[0] is a shortcut for out.sequence_attributions[0]
out.sequence_attributions[0].source_attributions.shape

but I get the error -

AttributeError                            Traceback (most recent call last)
[<ipython-input-5-6df0e921faca>](https://localhost:8080/#) in <cell line: 11>()
      9 )
     10 # out[0] is a shortcut for out.sequence_attributions[0]
---> 11 out.sequence_attributions[0].source_attributions.shape

AttributeError: 'NoneType' object has no attribute 'shape'

However strangely I can still look at the outputs using -

out.sequence_attributions[0]._aggregator
out.show()

Is this the intended functioning?

Also I would love to get some help in interpreting the generated plot

I'm confused about why are there some full rows and some rows with certain values masked and what exactly does a cell signify. I know this might be a trivial thing :(

Checklist

I've searched the project's issues.

HuggingFace's FSMTForConditionalGeneration is not properly integrated in the library

🐛 Bug Report

I get an error when trying to load an FSMTForConditionalGeneration model. It doesn't seem to have a get_decoder function, and instead the decoder attribute should be used. In this thread, we can also collect other seq2seq and decoder-only models from HuggingFace that may not be integrated properly in Inseq.

🔬 How To Reproduce

Steps to reproduce the behavior:

I have only tested it on the fix-macos-issues branch, but I suspect it is a more general problem.
Loading a FSMT model leads to an AttributeError: 'FSMTForConditionalGeneration' object has no attribute 'get_decoder'

Code sample

model = inseq.load_model("facebook/wmt19-en-de", "integrated_gradients")
out = model.attribute(
  "The developer argued with the designer because her idea cannot be implemented.",
  n_steps=100
)
out.show()

Environment

OS: macOS
Python version: 3.10.7

📈 Expected behavior

Inseq should load the FSMT model without any problems.

Generation pass seems not to be batched.

🐛 Bug Report

I'm trying to generate multiple attributions with a large LM (10B+ params) on a dataset of 2000+ sentences and no constrained decoding.
Apparently, the generate step in the pipeline crashes with CUDA OOM no matter the batch_size I set (even with batch_size=1). The generation itself seems not batched since if I pass a smaller set of texts, the attribution goes smoothly.

🔬 How To Reproduce

Steps to reproduce the behavior:

model = inseq.load_model(
    args.model_name_or_path, # 10B+ model
    "integrated_gradients",
    load_in_8bit=True,
    device_map="auto",
)

out = model.attribute(
    input_texts=texts, # 2000+ sentences
    n_steps=50,
    return_convergence_delta=True,
    step_scores=["probability"],
    batch_size=1,
)

Whereas if I do

model = inseq.load_model(
    args.model_name_or_path,
    "integrated_gradients",
    load_in_8bit=True,
    device_map="auto",
)

# raise NotImplementedError()
n_batches = len(texts) // args.batch_size
print("Splitting texts into n batches", n_batches)
batches = np.array_split(texts, n_batches)

for batch in tqdm(batches, desc="Batch", total=len(batches)):
    out = model.attribute(
        input_texts=batch.tolist(),
        n_steps=50,
        return_convergence_delta=True,
        step_scores=["probability"],
        batch_size=len(batch),
        internal_batch_size=len(batch),
        generation_args=asdict(generation_args),
        show_progress=True,
    )

it all seems to work.

Environment

OS: Linux / Windows / macOS]
Python version, get it with:
Inseq version: 0.4.0

Set tqdm to iterate over sentences when doing attributions for multiple sentences

Description

When computing attributions for a list of sentences the tqdm iterator prints out the iteration per token, which gives no insight into how far in you are with the corpus of sentences that you are attributing over. I would suggest that when attributing over a List of strings the tqdm iterates per sentence, and drops the per token iteration.

Commit to Help

Happy to have a go at this if you agree this could be nice.

I'm willing to help with this feature.

Time tracking / Runtime report

🚀 Feature Request

Next to tqdm, it could be helpful to get some runtime report.
After generating many explanations, this could give insight about the average computation time needed for one example.

Add new `FeatureAttributionOutput` class

🚀 Feature Request

Adding this new class to become the default output for the AttributionModel.attribute method. This will entail the following naming changes:

FeatureAttributionStepOutput --> FeatureAttributionRawStepOutput
FeatureAttributionOutput --> FeatureAttributionStepOutput
NEW FeatureAttributionOutput, replacing both OneOrMoreFeatureAttributionSequenceOutputs and OneOrMoreFeatureAttributionSequenceOutputsWithStepOutputs.

Advantages:

Unify the outputs into a single dataclass to ensure consistency in output types.
Allow for extra fields to preserve information about the attribution process in the generated output, with the purpose of avoiding values-only classes lacking the original information about models and methods used for attribution.
Extensible for other types of attribution

Initial formulation:

@dataclass
class FeatureAttributionOutput:
    """
    Output produced by the `AttributionModel.attribute` method.

    Attributes:
        sequence_attributions (list of :class:`~inseq.data.FeatureAttributionSequenceOutput`): List 
			containing all attributions performed on input sentences (one per input sentence, including 
			source and optionally target-side attribution).
		step_attributions (list of :class:`~inseq.data.FeatureAttributionStepOutput`, optional): List 
			containing all step attributions (one per generation step performed on the batch), returned if 
			`output_step_attributions=True`.
		info (dict with str keys and str values): Dictionary including all available parameters used to 
			perform the attribution. 
	"""

Move save_attributions and load_attributions inside the class, removing the global methods.
Add a show method calling the the one in every FeatureAttributionSequenceOutput and concatenating outputs if return_html=True.
Add a join method allowing to extend the sequence_attributions and step_attributions lists if info match, raising a ValueError: attributions produced under different settings cannot be combined error.

Bug about 'contrast_prob_diff'

🐛 Bug Report

Hi Inseq team, thanks again for your contribution.
I just noticed a bug when using 'contrast_prob_diff' - output attributions seem to be reversed.
It outputs the attribution of false_answer but not true_answer.

🔬 How To Reproduce

What I did is:

use contrast_prob_diff, and the output attribute for 'morning' and 'evening' is:
use probability (where the variable target_false is useless), and the output attribute for 'morning' and 'evening' is:

The code for the attribute part is: (top: contrast_prob_diff; bottom: probability)

Add `scores_precision` parameter to `FeatureAttributionOutput.save`

Description

This issue addresses the high space requirements of large attribution scores tensors by adding a scores_precision parameter to FeatureAttributionOutput.save method.

Proposant: @g8a9

Motivation

Currently, tensors in FeatureAttributionOutput objects (attributions and step scores) are serialized in float32 precision as a default when using out.save(). While it is possible to compress the representation of these values with ndarray_compact=True, the resulting JSON files are usually quite large. Using more parsimonious data types could reduce the size of saved objects and facilitate systematic analyses leveraging large amounts of data.

Proposal

float32 precision should probably remain the default behavior, as we do not want to cause any information loss by default.

float16 and float8 should also be considered, both in the signed and unsigned variants, since leveraging the strictly positive nature of some score types would allow supporting greater precision while halving space requirements. Unsigned values will be used as defaults if no negative scores are present in a tensor.

float16 can be easily used by casting tensors to the native torch.float16 data type, which would preserve precision up to 4 decimal values for scores normalized in the [-1;1] interval (8 for unsigned tensors). This corresponds to 2 or 4 decimal places for float8. However, this data type is not supported natively in Pytorch, so tensors should be converted to torch.int8 and torch.uint8 instead and transformed in floats upon reloading the object.

Bug about get_scores_dicts() function

🐛 Bug Report

Hello, thanks for your contribution firstly! The tool is really helpful and beautiful.
However, when I try to use get_scores_dicts() to output the attributions, I got an error 'index out of bounds'.

I print out the aggr and find it seems to be a bug:

🔬 How To Reproduce

import inseq
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
import pandas as pd

model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m")
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")

data = [
        {
            'prompt': 'people usually take shower in the',
            'target_true': ' morning',
            'target_false': ' evening',
            }]

for layer in range(24):
    print('layer:', str(layer))
    attrib_model = inseq.load_model(
        model,
        "layer_gradient_x_activation",
        tokenizer="bigscience/bloom-560m",
        target_layer=model.transformer.h[layer].mlp,
    )
    for i, ex in enumerate(data):
        print(ex)
        # e.g. "The capital of Spain is"
        #prompt = ex["relation"].format(ex["subject"])
        prompt = ex["prompt"]
        # e.g. "The capital of Spain is Madrid"
        true_answer = prompt + ex["target_true"]
        # e.g. "The capital of Spain is Paris"
        false_answer = prompt + ex["target_false"]
        # Contrastive attribution of true vs false answer
        out = attrib_model.attribute(
            prompt,
            true_answer,
            attributed_fn="contrast_prob_diff",
            contrast_targets=false_answer,
            step_scores=["contrast_prob_diff"],
            show_progress=False,
        )
        out.show()
        out.get_scores_dicts()

Environment

OS: Linux
Python version: 3.11
Inseq version: 0.5.0.dev0

Add Fairseq support

🚀 Feature Request

Adding support for Fairseq models on top of the AttributionModel abstraction, similarly to what was done for 🤗 transformers models.

🔈 Motivation

pytorch/fairseq is a core library for training seq2seq models in Pytorch. Adding support would allow for extended experimentation with state-of-the-art model, especially for NMT.

🔗 Additiona details

keyonvafa/sequential-rationales uses different attribution methods on FairseqEncoderDecoderModel models and can provide inspiration for an implementation aiming to access the internals of such models.

`CUDA out of memory` for larger datasets during attribution

🐛 Bug Report

When loading inseq with a larger dataset, on a CUDA device, an out-of-memory error is occurring regardless of the defined batch_size. I believe that is is caused by the call to self.encode inattribution_model.py lines 345 and 347, which is operating on the full inputs instead of a single batch and moves all inputs to the CUDA device after the encoding.

🔬 How To Reproduce

Steps to reproduce the behavior:

Load any model without pre-generated targets
Load a larger dataset with at least 1000 samples
Call the .attribute() method with any batch_size parameter

Code sample

Environment

OS: macOS
Python version: 3.10
Inseq version: 0.4.0

Expected behavior

The input texts should ideally only be encoded or moved to the GPU once they are actually processed.

Additional context

Bug: MPS not working properly on pytorch 1.12

🐛 Bug Report

As long as pytorch 1.12 is still used (basically until 1.13.1 comes out), the "mps" backend seems to be too unstable to use it, failing several of the tests. Even setting PYTORCH_ENABLE_MPS_FALLBACK=1 in the environment does not fully remove this issue.

🔬 How To Reproduce

Steps to reproduce the behavior:

run make fast-test (or any other command) on macOS with "mps" support

Environment

OS: macOS
Python version: 3.9.7

📈 Expected behavior

Tests should run successfully

📎 Additional context

A quickfix would be to set the default device in inseq.utils.torch_utils.py to "cpu" for mps-environments as well for now, until pytorch 1.13.1 is released.

def get_default_device() -> str:
    if is_cuda_available() and is_cuda_built():
        return "cuda"
    elif is_mps_available() and is_mps_built():
        return "cpu"
    else:
        return "cpu"

implementation of get_post_variable_assignment_hook

Question

I'm not sure if I'm looking in the wrong places, or if it is missing, but I cannot find an implementation for get_post_variable_assignment_hook which is mentioned in inseq.utils' s __all__ and used in the value-zeroing implementation.

Checklist

I've searched the project's issues.

[Summary] Add popular NLG faithfulness evaluation datasets

🚀 Feature Request

In order to facilitate the evaluation of different interpretability techniques, I propose to identify a set of commonly used datasets from the literature, create 🤗 Datasets loading scripts to have them in a shared format, and host them on the Inseq organization in the Hugging Face hub.

This would provide a shared interface for:

Faithfulness metrics applied at a dataset level.
Future support of instance attribution methods.

The following table summarizes some of the datasets used in the literature:

Name	Task	Data source	Paper	Description
SCAT	Translation	neulab/contextual-mt	Yin et al. '21	Contextual coreference in translation, with disambiguating context highlights from translators
Lambada + Rationales	Language Modeling	keyonvafa/sequential-rationales	Vafa et al. '21	Next word prediction with human-annotated previous relevant context
Europarl Gold Alignments	Translation	TBD	TBD	Gold alignments for various language pairs in the Europarl corpus

The ExNLP Datasets website summarizes various sources available for NLP explainability, verify what is relevant to generation.

Slow `DiscretizedIntegratedGradientAttribution` method, also on GPU

🐛 Bug Report

Inference on a google colab GPU is very slow. There is no significant difference if the model runs on cuda or CPU

🔬 How To Reproduce

The following model.attribute(...) code runs for around 33 to 47 seconds both on a colab CPU or GPU. I tried passing the device to the model and the model.device confirms that it's running on cuda, but it still takes very long to run only 2 sentences. (I don't know the underlying computations for attribution enough to know if this is to be expected, or if this should be faster. If it's always that slow, then it seems practically infeasible to analyse larger corpora)

import inseq
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

print(inseq.list_feature_attribution_methods())
model = inseq.load_model("google/flan-t5-small", attribution_method="discretized_integrated_gradients", device=device)

model.to(device)

out = model.attribute(
    input_texts=["We were attacked by hackers. Was there a cyber attack?", "We were not attacked by hackers. Was there a cyber attack?"],
)

model.device

Environment

OS: linux, google colab
Python version: Python 3.8.10
Inseq version: 0.3.3

Expected behavior

Faster inference with a GPU/cuda

(Thanks btw, for the fix for returning the per-token scores in a dictionary, the new method works well :) )

Confusing call of `merge_attributions` method

Description

The merge_attributions method of the FeatureAttributionOutput class can be used as static but also called from an object, instance of that class.
In the latter case, the method's output turns out to be unintended since attribution merge is performed only for the objects in the parameter list and not with the object itself calling the method.
Therefore, it would be advisable to move the method out of the class to be used as a utility.

Motivation

See above.

Additional context

Example code used as a demonstration of the method behaviour:

>>> import inseq
>>> seq_model = inseq.load_model('gpt2', attribution_method='input_x_gradient')
>>> out1 = seq_model.attribute(input_texts=['hello world!', 'How are you? '])
>>> out2 = seq_model.attribute(input_texts=['I am going to ', 'My name is '])

Correct usage:

>>> inseq.FeatureAttributionOutput.merge_attributions([out1, out2])
FeatureAttributionOutput({
    sequence_attributions: list with 4 elements of type GranularFeatureAttributionSequenceOutput:[
...

Misleading behaviour, leading to the loss of attributions contained in out1:

>>> out1.merge_attributions([out2])
FeatureAttributionOutput({
    sequence_attributions: list with 2 elements of type GranularFeatureAttributionSequenceOutput:[
...

Commit to Help

I'm willing to help with this feature.

Inconsistent batching for DiscretizedIntegratedGradients attributions

🐛 Bug Report

Despite fixing batched attribution so that results are consistent with individual attribution (see #110), the method DiscretizedIntegratedGradients still produces different results when applied to a batch of examples.

🔬 How To Reproduce

Instantiate a AttributionModel with the discretized_integrated_gradients method.
Perform an attribution for a batch of examples
Perform an attribution for a single example present in the previous batch
Compare the attributions obtained in the two cases

Code sample

import inseq

model = inseq.load_model("Helsinki-NLP/opus-mt-en-de", "discretized_integrated_gradients")

out_multi = model.attribute(
    [
        "This aspect is very important",
        "Why does it work after the first?",
        "This thing smells",
        "Colorless green ideas sleep furiously"
    ],
    n_steps=20,
    return_convergence_delta=True,
)

out_single = model.attribute(
    [ "Why does it work after the first?" ],
    n_steps=20,
    return_convergence_delta=True,
)

assert out_single.attributions == out_multi[1].attributions # raises AssertionError

Environment

OS: 20.04
Python version: 3.8

📈 Expected behavior

Same as #110

📎 Additional context

The problem is most likely due to a faulty scaling of the gradients in the _attribute method of the DiscretizedIntegratedGradients class.

GPT2 Integrated Gradients - empty input gives false results

🐛 Bug Report

When leaving the input texts empty for GPT2 with integrated gradients, the saliency map seems to be incorrect and giving false results. The goal is to only give <|endoftext|>, the BOS token, as input (and let GPT-2 generate from nothing basically), which can be done by leaving the input empty.

The problem is here:

inseq/inseq/attr/feat/feature_attribution.py

Line 303 in b5d3610

 sequences = self.attribution_model.formatter.get_text_sequences(self.attribution_model, batch) 

inseq/inseq/models/decoder_only.py

Lines 177 to 182 in b5d3610

 @staticmethod 

 def get_text_sequences(attribution_model: "DecoderOnlyAttributionModel", batch: DecoderOnlyBatch) -> TextSequences: 

 return TextSequences( 

 sources=None, 

 targets=attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True), 

 )

The call to TextSequences in this method sets skip_special_tokens to True, removing the <|endoftext|> from the input. This also prevents a user from giving <|endoftext|> as the only input (and at the start of the generated text), since it is removed in the input. In that case, when running, there will be an error that the generated text does not begin with the input text.

It can be resolved by temporarily changing the line to:

sequences = TextSequences(
            sources=None,
            targets=self.attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True, skip_special_tokens=False),
        )

However, the feature attribution is zero for every <|endoftext|> token in the input and the output. I'm not sure whether or not this is meant to be, the same process with the ecco package gives attribution to this token. Also, the first token (in this case This) gets zero attribution, which is probably not supposed to be the case.

Summary:

Visual glitch when leaving the GPT-2 input empty.
Unable to give <|endoftext|> as input because it is removed when processing.
The temporary fix described above reveals that the feature attribution to <|endoftext|> is zero. This is probably not correct.

🔬 How To Reproduce

Steps to reproduce the behavior:

Run the code sample.

Code sample

import inseq
model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
    "",
    "This is a demo sentence."
).show()

Environment

OS: Windows 10
Python version: 3.10.9
Inseq version: 0.5.0.dev0 (pulled from the main branch on 1 June 2023)

Expected behavior

See bug report. This is the integrated gradients result from the ecco package on the same sentence, also using integrated gradients:

I assume this would be correct, however, they leave the baseline default.

Limit the plots displayed through `.show()` function

Limit the plots displayed through .show() function

Description

Currently, the .show() function on the attribution output will display plots for all generated attributions by default. For larger batches, this can lead to huge outputs in a notebook/ large numbers of HTML files being generated. It might be preferable to provide a sensible default value to the number of plots that are displayed in a notebook and allow users to specify for themselves how many/which attributions they want to have visualized by referring to their index.

Similar functionality is already possible now by manually choosing the indices of out.sequence_attributions and calling the .show() methods of the individual attribution outputs, so this function would mainly be a convenience function for new users.

Motivation

see above

Additional context

Commit to Help

I'm willing to help with this feature.

Add registered contrastive logits difference step function

Motivation

Contrastive attributions are currently supported thanks to custom attributed targets (see #138). The current definition of the contrastive attribution custom function can be found here.

The current implementation is problematic for integrated_gradients and similar methods using multiple approximation steps since the contrastive forward uses static ids instead of embeddings obtained as steps between the original contrastive input and a baseline. Moreover, the current implementation allows only for granular token-based comparisons proposed in the original work by Yin and Neubig (2022), but comparing spans of different lengths could also be desirable.

Given this will add further complexity to the custom attributed function, and given the interest in such an application, it would be ideal to include a pre-registered version of the contrastive attribution step function inside Inseq to enable easy and quick usage.

Design

Step function name: contrast_probs_diff, since the contrastive comparison is done by taking the difference of output probabilities between a regular and a contrastive example.

Extra arguments:

contrast_input: required, can be either an input text, a sequence of ids or embeddings for the contrastive example. The function will handle the formatting to match the original input.
input_start_span_ids, contrast_start_span_ids: Two lists containing initial ids for ever span in the input and the contrast that we want to consider as single units for attribution purposes (e.g. [0, 2, 5] for input_start_span_ids says By default None, set to list(range(len(input_ids))) and list(range(len(contrast_ids))) respectively (i.e. every token is treated separately, as in normal feature attribution).

Notes for span ids:

Must verify that all ids are valid given string length, and that the two lists have same length to ensure contrastive comparison for every span step.
The step function is called at every token, but we consider multi-token spans. If the current token is an start span token for the input, we compute the product of probabilities for all tokens in the current input span, the corresponding product for all tokens in the current contrast span and output their diff. If it's not a start token (i.e. its id in the span is not included in input_start_span_ids) 0 is returned.

Aggregation:

The default abs_max function used for span aggregation (span_aggregate) of attributions will return the attributions for the first token of every span (see description in the previous section) if the same spans are used with a ContiguousSpanAggregator. The aggregate_map for the step score should also be set to abs_max upon registration.

Test case for larger (prompting-based) language models (+ LLM.int8())

🚀 Feature Request

Following our discussion, it might be valuable to add a test case including larger language models that work with prompting such as T0 and its variants. Since availability of these kinds of models is becoming more common (see "Motivation"), we should show an example of feature attribution for them.

🔈 Motivation

The bitsandbytes integration and 8-bit precision (Dettmers et al., 2022) released in August enables the use of larger models on single GPU setups.

FeatureAttributionSequenceOutput has no repr implemented

🐛 Bug Report

I am running IG attributions for a list of strings, and was trying to work with the resulting inseq.data.attribution.FeatureAttributionOutput object. To inspect what is in there I was looking at the sequence_attributions, but I can't print this object because the objects in there don't have a __repr__ method.

🔬 How To Reproduce

Code sample

import inseq

model = inseq.load_model("gpt2", "integrated_gradients")

sens = ["this is the first sentence. followed by a second"]
prefix = [sen[:sen.index('.')+1] for sen in sens]

attributions = model.attribute(
    prefix,
    generated_texts=sens,
    n_steps=500,
    internal_batch_size=50
)

print(attributions.sequence_attributions)

Returns:

TypeError: __repr__ returned non-string (type dict)

Environment

Google Colab
Inseq 0.4.0

allow negative values for `attr_pos_end` parameter

Allow negative values for attr_pos_end

Description

allow negative values to be defined for the attr_pos_end parameter.

Motivation

This would make it possible to e.g. define that the last token (which is often just the EOS token) should be removed from the generated attributions. It probably needs to be evaluated if this is possible for batched attributions, but I think at least for singular attributions, it could be a nice quality-of-life feature. Especially when attributing over multiple sentences where using a positive value is more complicated due to different sentence lengths.

Additional context

Commit to Help

I'm willing to help with this feature.

Add support for PEFT models

Description

Currently, only models corresponding to the PreTrainedModel instance are supported. It would be useful to add support for models using Parameter-Efficient Fine-Tuning (🤗 PEFT) methods.

Motivation

Adding support for 🤗 PEFT models would allow the same analyses to be performed on models optimised and trained to be efficient on consumer hardware.

Additional context

Mostly tbd, as PEFT uses a small number of different (trainable) parameters to those in the original PreTrainedModel model.

Commit to Help

I'm willing to help with this feature.

Add optional `NoiseTunnel` smoothing wrapper

Description

This issue requests the inclusion of a wrapper for the NoiseTunnel method to make it available for all attribution classes.

Motivation

Smoothing techniques like the one proposed by Smilkov et al. 2017 can provide a more robust estimation of feature attributions, but they are largely ignored for NLP applications. Including support for noise-injecting techniques in the library would encourage their adoption in the broader research community.

Add Saliency Cards to documentation

Description

Saliency cards (Paper | Repository) introduce a structured framework to document feature attribution methods' strengths and applicability to different use-cases. Introducing saliency cards specific to sequential generation tasks would help Inseq users in selecting more principled approaches for their analysis.

Motivation

Copying from the original paper's abstract:

Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model’s output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in evaluation metrics, existing approaches assume universal desiderata for saliency methods (e.g., faithfulness) that do not account for diverse user needs. In response, we introduce saliency cards: structured documentation of how saliency methods operate and their performance across a battery of evaluative metrics.

Additional context

Introducing ad-hoc cards in Inseq should be preferable than contributing to the original saliency cards repository since 1) they will be more easily used and improved by the Inseq community and 2) the original authors focus solely on vision-centric applications.
The following sections are relevant for the integration of saliency cards into Inseq:
- Determinism: Determinism measures if a saliency method will always produce the same saliency map given a particular input, label, and model.
- Hyperparameter Dependence: Hyperparameter dependence measures a saliency method’s sensitivity to user-specified parameters. By documenting a method’s hyperparameter dependence, saliency cards inform users of consequential parameters and how to set them appropriately.
- Model Agnosticism: Model agnosticism measures how much access to the model a saliency method requires. *Since several future methods need access to specific modules (see #173 for example), this part could document which parameters will need to be defined in the ModelConfig class before usage.
- Computational Efficiency: Computational efficiency measures how computationally intensive it is to produce the saliency map. Using the same models, we could report unified benchmarks across different methods (and different parameterizations, in some cases).
- Semantic Directness: Saliency methods abstract different aspects of model behavior, and semantic directness represents the complexity of this abstraction (i.e. what the reported scores correspond to). For example, discussing the difference between salience and sensitivity for raw gradients vs. input x gradient (see Appendix B of Geva et al. 2023)
- (Added) Granularity: Specifying the granularity of the scores returned by the attribution method (e.g. raw gradient attribution returns one score per hidden size of the model embeddings, corresponding to the gradient with respect to the attributed_fn propagated through the model.
- (Added) Target dependence: Specifying whether the method relies on model final predictions to derive importance scores, or whether these are extracted from model internal processes (e.g. for raw attention weights).
The Sensitivity Testing and Perceptibility Testing sections describe empirical measurements of minimality/robustness rather than inherent properties of methods. As such, they should be added only in the presence of a reproducible study using Inseq to compare different methods.

[Summary] Add internals-based feature attribution methods

🚀 Feature Request

The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:

Method name	Source	Code implementation	Status
Last-Layer Attention	Jain and Wallace '19	`successar/AttentionExplanation`	✅
Aggregated Attention	Jain and Wallace '19	`successar/AttentionExplanation`	✅
Attention Flow	Abnar and Zuidema '20	`samiraabnar/attention_flow`
Attention Rollout	Abnar and Zuidema '20	`samiraabnar/attention_flow`
Attention with Values Norm (Attn-N)	Kobayashi et al '20	`gorokoba560/norm-analysis-of-transformer`
Attention with Residual Norm (AttnRes-N)	Kobayashi et al '20	`gorokoba560/norm-analysis-of-transformer`
Attention with Attention Block Norm (AttnResLn-N or LnAttnRes-N)	Kobayashi et al '21	`gorokoba560/norm-analysis-of-transformer`
Attention-driven Relevance Propagation	Chefer et al. '21	`hila-chefer/Transformer-MM-Explainability`
ALTI+	Ferrando et al '22	`mt-upc/transformer-contributions-nmt`
GlobEnc	Modarressi et al. '22	`mohsenfayyaz/globenc`
Attention with Attention Block + FFN Norm (AttnResLnFF-N or LnAttnResFF-N)	Kobayashi et al '23	-
Attention x Transformer Block Norm	Kobayashi et al '23	-
Logit	Ferrando et al '23	`mt-upc/logit-explanations`
ALTI-Logit	Ferrando et al '23	`mt-upc/logit-explanations`
DecompX	Modarressi et al '23	`mohsenfayyaz/DecompX`

Notes:

Add the possibility to scale attention weights by the norm of value vectors, shown to be effective for alignment and encoder models (Ferrando and Costa-jussà '21, Treviso et al. '21)
The ALTI+ technique is an extension of the ALTI method by Ferrando et al. '22 (paper, code) to Encoder-Decoder architectures. It was recently used by the Facebook team to detect hallucinated toxicity by highlighting toxic keywords paying attention to the source (NLLB paper, Figure 31).
Attention Flow is very computationally expensive to compute but has proven SHAP guarantees for same-layer attribution, which is not the case for Rollout or other methods. Flow and rollout should be propagation methods rather than stand-alone approaches since they are used for most attention-based attributions.
GlobEnc corresponds roughly to Attention x Transformer Block Norm but ignores the FFN part, that in the latter is incorporated by a localized application of Integrated Gradients with 0-valued baselines (authors' default)

Add custom attribution baseline

🚀 Feature Request

Add an optional baselines field to the attribute method of AttributionModel. If not specified, baselines takes a default value of None and preserves the default behavior of using UNK tokens as a "no-information" baseline for attribution methods requiring one (e.g. integrated gradients, deeplift). The argument can take one of the following values:

str: The baseline is an alternative text. In this case, the text needs to be encoded and embedded inside FeatureAttribution.prepare to fill the baseline_ids and baseline_embeds fields of the Batch class. For now, only strings matching the original input length after tokenization are supported.
sequence(int): The baseline is a list of input ids. In this case, we embed the ids as described above. Again, the length must match the original input ids length.
torch.tensor: We would be interested in passing baseline embeddings explicitly, e.g. to allow for baselines not matching the original input shape that could be derived by averaging embeddings of different spans. In this case, the baseline embeddings field of Batch is populated directly (after checking that the shape is consistent with input embeddings) and the baseline ids field will be populated with some special id (e.g. -1) to mark that the ids were not provided. Important: This modality should raise a ValueError if used in combination with a layer method since layer methods that require a baseline use baseline ids explicitly as inputs for the forward_func used for attribution instead of baseline embeddings.
tuple of previous types: If we want to specify both source and target baselines when using attribute_target=True, the input will be a tuple of one of the previous types. The same procedure will be applied separately to define source and target baselines, except for the encoding that will require the tokenizer.as_target_tokenizer() context manager to encode strings.
list or tuple of lists of previous types: When multiple baselines are specified, we return the expected attribution score (i.e. average, assuming normality) by computing attributions for all available baselines and averaging the final results. See Section 2.2 of Erion et al. 2020 for more details.

🔈 Motivation

When working on minimal pairs, we might be interested in defining the contribution of specific words in the source or the target prefix not only in absolute terms by using a "no-information" baseline, but as the relative effect between the words composing the pair. Adding the possibility of using a custom baseline would enable this type of comparisons.

🛰 Notes

It will be important to validate whether the hooked method makes use of a baseline via the use_baseline attribute, raising a warning that the value of the custom input baseline would be ignored otherwise
Since baselines will support all input types (str, ids, embeds), it would be the right time to enable such support for the input of the attribute function. This could be achieved by an extra attribution_input field set to None by default that will substitute input_texts in the call to prepare_and_attribute, and get set to input_texts if not specified.

Add support for decoder-only models

🚀 Feature Request

Adding support for decoder-only models like GPT-2 on top of the AttributionModel abstraction. The change will involve a radical refactoring of the whole attribution pipeline to enable target-only attribution and passing Batch objects instead of EncoderDecoderBatch if decoder-only attribution is performed.

The output attribution classes would mostly stay the same, with the exception of source attributions becoming optional.

How to programmatically extract attribution scores per token?

Checklist

I've searched the project's issues.

❓ Question

How do I programmatically extract the per-token scores to have them in a list or dictionary, mapped to each token?

I understand how to show the scores per token visually, but I don't how to extract them from the "out" object for further downstream processing

model = inseq.load_model("google/flan-t5-base", attribution_method="discretized_integrated_gradients")
out = model.attribute(
    input_texts=["We were attacked by hackers. Was there a cyber attack?", "We were not attacked by hackers. Was there a cyber attack?"],
)
out.sequence_attributions[0]
out.show()

Minor fixes to v0.3

🐛 Bug Report

Track here minimal fixes needed to v0.3:

Bump required version of transformers to 4.22.0 in pyproject.toml, required for M1 support and new target tokenization API
Fix tqdm finishing at N-1 in attribute.

Add ALTI+ implementation

Description

The ALTI+ method is an extension of ALTI for encoder-decoder (and by extension, decoder-only) models.

Authors: @gegallego @javiferran

Implementation notes:

The current implementation extracts input features for key, query and value projections and computes intermediate steps using the Kobayashi refactoring to obtain the transformed vectors used in the final ALTi computation.
The computation of attention layer outputs is carried on up to the resultant (i.e. the actual output of the attention layer) in order to check that the result matches the original output of the attention layer forward pass. This is only done for sanity checking purposes but it's not especially heavy from a computational perspective, so it can be preserved (e.g. raise an error if the outputs doesn't match to signal the model is maybe not supported)
Focusing on GPT-2 as an example model, the per-head attention weights and outputs (i.e. matmul of weights and value vectors) are returned here so they can be extracted with a hook and used to compute the transformed vectors needed for ALTI.
Pre- and post-layer norm models are handled differently because the transformed vectors are the final outputs of the attention block, regardless of the position of the layer norm (it needs to be included in any case). In the Kobayashi decomposition of the attention layer the bias component needs to be separate both for the layer norm and the output projections, so we need to make sure whether this is possible out of the box, or it needs to be computed in an ad-hoc hook.
If we are interested in the output vectors before the bias is added, we can extract the bias vector alongside the output of the attention module and subtract the former from the latter.
For aggregating ALTI+ scores in order to obtain overall importance we will use the extended rollout implementation that is currently being developed in #173.

Refererence implementation mt-upc/transformer-contributions-nmt

Bug-Tracker MPS issues

🐛 Bug Report

Even after updating to the newest pytorch version 1.13.1 several issues with the mps-backend still remain when it is enabled in the code. There still seems to be some inconsistency across the different devices depending on the operations that are run, as can be seen below.

The goal of this issue is primarily to collect and highlight these problems.

🔬 How To Reproduce

Steps to reproduce the behavior:

go to inseq/utils/torch_utils and change cpu to mps in line 229 to enable the mps-backend
run make fast-test to run the tests

Code sample

see above

Environment

OS: macOS
Python version: 3.9.7

Screenshots

Running the tests this way generates the following error report:

========================================================================================== short test summary info ===========================================================================================
FAILED tests/attr/feat/test_feature_attribution.py::test_mcd_weighted_attribution - NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - NotImplementedError: The operator 'aten::cumsum.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
==================================================================== 3 failed, 25 passed, 442 deselected, 6 warnings in 76.36s (0:01:16) =====================================================================

When run with the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 set, the following errors are still occuring:

========================================================================================== short test summary info ===========================================================================================
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - AssertionError: assert 26 == 27
==================================================================== 2 failed, 26 passed, 442 deselected, 6 warnings in 113.36s (0:01:53) ====================================================================

These errors do not occur when running the tests on other backends, implying that there is still some inconsistency between mps and the other torch backends.

📈 Expected behavior

All tests should run consistently across all torch backends.

📎 Additional context

Migrate `torchtyping` to `jaxtyping`

Description

As for TransformerLensOrg/TransformerLens#164, this project uses Patrick Kidger's torchtyping to give tensor types including shapes. However, jaxtyping usage is recommender for newer projects (not JAX specific, better maintained, more compatible with type checkers, etc).

[Summary] Add gradient-based attribution methods

🚀 Feature Request

The following is a non-exhaustive list of gradient-based feature attribution methods that could be added to the library:

Method name	Source	In Captum	Code implementation	Status
DeepLiftSHAP	-	✅	`pytorch/captum`
GradientSHAP¹	Lundberg and Lee '17	✅	`pytorch/captum`
Guided Backprop	Springenberg et al. '15	✅	`pytorch/captum`
LRP ²	Bach et al. '15	✅	`pytorch/captum`
Guided Integrated Gradients	Kapishnikov et al. '21		`PAIR-code/saliency`
Projected Gradient Descent (PGD) ³	Madry et al. '18, Yin et al. '22		`uclanlp/NLP-Interpretation-Faithfulness`
Sequential Integrated Gradients	Enguehard '23		`josephenguehard/time_interpret`
Greedy PIG ⁴	Axiotis et al. '23
AttnLRP	Achtibat et al. '24		`rachtibat/LRP-for-Transformers`

Notes:

The Deconvolution method can also be added, but it seems to perform the same procedure as Guided Backprop, so it wasn't included to avoid deduplication.

The method was already present in inseq but was removed due to instability in the single example vs. batched setting, reintroducing it will need this problem to be fixed.
Custom rules for the supported architectures need to be defined in order to adapt the LRP attribution method to our use-case. An existing implementation of LRP rules for Transformer models in Tensorflow is available here: [lena-voita/the-story-of-heads](https://github.com/lena-voita/the-story-of-heads).
The method leverage gradient information to perform adversarial replacement, so its collocation in the gradient-based family should be reviewed.
Similar to Sequential Integrated Gradient, but instead of focusing on one word at a time, at every iteration the top features identified by attribution are fixed (i.e. baseline is set to identity) and the remaining ones are attributed again in the next round.

torchtyping requires typeguard version <3.0

🐛 Bug Report

In the requirements.txt it is stated that typeguard==3.0.1, but torchtyping is incompatible with typeguard versions >3.0 (https://github.com/patrick-kidger/torchtyping#installation).

For me this led to issues with importing inseq, raising the following import error:

ImportError: cannot import name 'LiteralString' from 'typing_extensions'

However, it turned out the issue there did not stem from the typing_extensions library, but the typeguard version: once I had set that to version 2.13.3 the error disappeared.

Use `compute_transition_scores` for step scores

Description

🤗 Transformers v4.26.0 introduces the compute_transition_scores function to simplify the return of log probabilities. Example taken from docs link above:

from transformers import GPT2Tokenizer, AutoModelForCausalLM
import numpy as np

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(["Today is"], return_tensors="pt")

# Example 1: Print the scores for each token generated with Greedy Search
outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
transition_scores = model.compute_transition_scores(
    outputs.sequences, outputs.scores, normalize_logits=True
)
input_length = inputs.input_ids.shape[1]
generated_tokens = outputs.sequences[:, input_length:]
for tok, score in zip(generated_tokens[0], transition_scores[0]):
    # | token | token string | logits | probability
    print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")

# Output
|   262 |  the     | -1.414 | 24.33%
|  1110 |  day     | -2.609 | 7.36%
|   618 |  when    | -2.010 | 13.40%
|   356 |  we      | -1.859 | 15.58%
|   460 |  can     | -2.508 | 8.14%

Motivation

We want to use compute_transition_scores to calculate the probability step score in Inseq, and all derivative scores, with the purpose of ensuring a continued compatibility with Transformers.

`petals` compatibility issue tracker

🐛 Bug Report

In principle, inseq should work with the DistributedBloomModel implemented in the petals package to perform feature attribution of the 176B Bloom model in a distributed setup. However, some compatibility issues currently undermine the interoperability of the two libraries.

🔬 How To Reproduce

Refer to bigscience-workshop/petals#178 for additional precisions.

Add decoding probabilities to the `attribute` method

🚀 Feature Request

Allow users to extract the probabilities associated with each generated token at every attribution step. This behavior can be controlled by a parameter output_probabilities: bool = False passed to the attribute function.

🔈 Motivation

Uncertanity features such as generation probabilities have proven useful in many cases to estimate the quality of the generated sentence (see Fomicheva et al., 2020 as an example for QE in NMT)

📓 Notes

The Huggingface library does not support the extraction of token-by-token probabilities from the generate method at the moment.

Inconsistent behavior for batched attribution

🐛 Bug Report

Attribution patterns differ when more than one example is attributed at the same time, even for deterministic methods (e.g. IG).

🔬 How To Reproduce

Attribute the same sentence twice using any attribution model and method, once as the only attributed text and once with other sentences. The resulting attributions differ, despite the results predicted by the model are the same.

Code sample

from inseq import load

model = load("Helsinki-NLP/opus-mt-en-de", "integrated_gradients")
single_out = model.attribute("This is an example sentence")
multi_out = model.attribute(["This is an example sentence", "This is another example"])
assert single_out.attributions == multi_out[0].attributions # raises AssertionError

Environment

OS: Linux 20.04
Python version: Python 3.8

📈 Expected behavior

The attributions must be the same regardless of the number of examples that are attributed at once.

📎 Additional context

The question is whether this is a bug or a methodological error due to the architecture of the seq2seq models. A first step would be to start from the BERT tutorial in Captum and change the methods to accept multiple examples in input. If the attribution patterns are consistent, in principle the problem is somewhere in the cross-attention mechanism.

Broken documentation link in the Readme

🐛 Bug Report

🔬 How To Reproduce

Steps to reproduce the behavior:

Click on the documentation link in the Readme.

Export the visualization without jupyter

🚀 Feature Request

Accessing the visualization (via .show()) currently requires jupyter. It would be nice to have an option to export it as an image from the console.

🔈 Motivation

With out being a FeatureAttributionOutput, html = out.show(return_html=True) returns an error:

AttributeError: return_html=True is can be used only inside an IPython environment.

If the HTML content is returned without error, one can use e.g. imgkit to create images.

Migrate HTML visualizations to Gradio Blocks

🚀 Feature Request

~~Use the newly introduced PySvelte library and the Svelte framework to deduplicate HTML visualization from the main body of the library.~~

Update: Provided the new capabilities of Gradio v3.0 and the introduction of support for custom components with Gradio Blocks (using Svelte as frontend) and tabbed interfaces enabling multi-visualization widgets, Gradio becomes the most simple and interesting choice for Inseq visualizations.

[Summary] Add metrics for feature attribution evaluation

🚀 Feature Request

The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:

Method name	Source	Code implementation	Status
Sensitivity	Yeh et al. '19	`pytorch/captum`
Infidelity	Yeh et al. '19	`pytorch/captum`
Log Odds	Shrikumar et al. '17	`INK-USC/DIG`
Sufficiency	De Young et al. '20	`INK-USC/DIG`
Comprehensiveness	De Young et al. '20	`INK-USC/DIG`
Human Agreement	Atanasova et al. '20	`copenlu/xai-benchmark`
Confidence Indication	Atanasova et al. '20	`copenlu/xai-benchmark`
Cross-Model Rationale Consistency	Atanasova et al. '20	`copenlu/xai-benchmark`
Cross-Example Rationale Consistency (Dataset Consistency)	Atanasova et al. '20	`copenlu/xai-benchmark`
Sensitivity	Yin et al. '22	`Iuclanlp/NLP-Interpretation-Faithfulness`
Stability	Yin et al. '22	`Iuclanlp/NLP-Interpretation-Faithfulness`

Notes:

The Log Odds metric is just the negative logarithm of the Comprehensiveness metric. The application of - log can be controlled by a parameter do_log_odds: bool = False in the same function. The reciprocal can be obtained for the Sufficiency metric.
All metrics that control masking/dropping a portion of the inputs via a top_k parameter can benefit from a recursive application to ensure the masking of most salient tokens at all times, as described in Madsen et al. '21. This could be captured by a parameter recursive_steps: Optional[int] = None. If specified, a masking of size top_k // recursive_steps + int(top_k % recursive_steps > 0) is performed for recursive_steps times, with the last step having size equal to top_k % recursive_steps if top_k % recursive_steps > 0.
The Sensitivity and Infidelity methods add noise to input embeddings, which could produce unrealistic input embeddings for the model (see discussion in Sanyal et al. '21). Both sensitivity and infidelity can include a parameter discretize: bool = False that when turned on replaces the top-k inputs with their nearest neighbors in the vocabulary embedding space instead of their noised versions. Using Stability is more principled in this context since fluency is preserved by the two step procedure presented by Alzantot et al. '18, which includes a language modeling component. An additional parameter sample_topk_neighbors: int = 1 can be used to control the nearest neighbors' pool size used for replacement.
Sensitivity by Yin et al. '22 is an adaptation to the NLP domain of Sensitivity-n by Yeh et al. '19. An important difference is that the norm of the noise vector causing the prediction to flip is used as a metric in Yin et al. '22, while the original Sensitivity in Captum uses the difference between original and noised prediction scores. The first should be prioritized for implementation.
Cross-Lingual Faithfulness by Zaman and Belinkov '22 (code) is a special case of the Dataset Consistency metric by Atanasova et al. 2020 in which the pair is constituted by an example and its translated variant.

Overviews

A Comparative Study of Faithfulness Metrics for Model Interpretability Methods, Chan et al. '22

Add target-side attribution

🚀 Feature Request

Allow users to perform feature attributions on the target prefix. The behavior is controlled by a new attribute_target: bool = False parameter passed to the AttributionModel.attribute method.

🔈 Motivation

Attributing only on the source is reductive, since the influence of the target prefix is fundamental in determining the outcome of the next generation step in many occasions (e.g. a prefix Ladies and will strongly bias the next token towards Gentlemen, regardless of the source sequence).

.source_attributions tensor returns float64 with integrated gradients method

🐛 Bug Report

The output tensor .source_attributions in FeatureAttributionSequenceOutput is of type float64 when using "integrated_gradients" method, rather than the expected float32.

🔬 How To Reproduce

Steps to reproduce the behavior:

Run any model using "integrated_gradients" method
Inspect the dtype of out.sequence_attributions[0].source_attributions

Code sample

import inseq
model = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "integrated_gradients")
out = model.attribute(
  "The developer argued with the designer because her idea cannot be implemented.",
  n_steps=100
)
print(out.sequence_attributions[0].source_attributions.dtype)

Environment

Python 3.8.16

📈 Expected behavior

The dtype should be float32.

📎 Additional context

Other methods ("saliency", "input_x_gradient", "deeplift") return float32.
Interestingly, "discretized_integrated_gradients" also returns float32, but "layer_integrated_gradients" returns float64.

Minor issues with v0.2.0, update CLI commands

🐛 Bug Report

FeatureAttributionSequenceOutput.show() raises an error in console mode due to TokenWithId having replaced str tokens.
Loading a FeatureAttributionOutput object with load() should instantiate default AggregableMixin attributes to guarantee that show() will work out of the box after loading.
format_input_texts in AttributionModel can be moved to utils/misc, since it does not require self.
The __repr__ of TensorWrapper and FeatureAttributionOutput classes should direct to the prettified __str__ representation by default.
The CLI in __main__ is not working; it should be adapted to the updated library and refactored so that inseq attribute text [PARAMS] calls the normal attribution function. A future inseq attribute file [PARAMS] command will be added to directly attribute sentences from a file. Create a separate commands folder in the library to group those.

	@staticmethod
	def get_text_sequences(attribution_model: "DecoderOnlyAttributionModel", batch: DecoderOnlyBatch) -> TextSequences:
	return TextSequences(
	sources=None,
	targets=attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True),
	)

inseq-team / inseq Goto Github PK

inseq's People

Contributors

Stargazers

Watchers

Forkers

inseq's Issues

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

Expected behavior

Additional context

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

Expected behavior

Additional context

🚀 Feature Request

Footnotes

Question

Additional context

Checklist

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

📈 Expected behavior

🐛 Bug Report

🔬 How To Reproduce

Environment

Description

Commit to Help

🚀 Feature Request

🚀 Feature Request

🐛 Bug Report

🔬 How To Reproduce

Description

Motivation

Proposal

🐛 Bug Report

🔬 How To Reproduce

Environment

🚀 Feature Request

🔈 Motivation

🔗 Additiona details

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

Expected behavior

Additional context

🐛 Bug Report

🔬 How To Reproduce

Environment

📈 Expected behavior

📎 Additional context

Question

Checklist

🚀 Feature Request

🐛 Bug Report

🔬 How To Reproduce

Environment

Expected behavior

Description

Motivation

Additional context

Commit to Help

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

📈 Expected behavior

📎 Additional context

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

Expected behavior