inseq-team / inseq Goto Github PK
View Code? Open in Web Editor NEWInterpretability for sequence generation models ๐ ๐
Home Page: https://inseq.org
License: Apache License 2.0
Interpretability for sequence generation models ๐ ๐
Home Page: https://inseq.org
License: Apache License 2.0
If text is used as input to the attribute
method that contains spaces before non-alphanumeric characters, the decoding of the tokenizer does not match the input anymore, leading to the assert error.
Give as input a text containing a special character (e.g.: .
or ?
) preceded by a white space while using a GPT-like model (and tokenizer).
Steps to reproduce the behavior:
import inseq
model = inseq.load_model('gpt2', attribution_method='input_x_gradient') # Or any other gpt-like model
model.attribute(input_texts='Hello . This is an example')
Returns the following error: AssertionError: Forced generations with decoder-only models must start with the input texts.
Python 3.9.15
0.4.0
The decoded text output from the tokenizer should be identical to the input text, allowing the assert to be correctly verified:
assert all(
generated_texts[idx].startswith(input_texts[idx]) for idx in range(len(input_texts))
), "Forced generations with decoder-only models must start with the input texts."
The issue is related to the type of tokenizer used, already reported in huggingface/transformers#21119. To solve the problem, it is recommended to use the clean_up_tokenization_spaces=False
flag when decoding the text.
When loading a model with an already instantiated fast tokenizer from Huggingface, below error is thrown:
HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'MBart50TokenizerFast(name_or_path='facebook/mbart-large-50-many-to-many-mmt', vocab_size=250054, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '', 'sep_token': '', 'pad_token': '', 'cls_token': '', 'mask_token': '', 'additional_special_tokens': ['ar_AR', 'cs_CZ', 'de_DE', 'en_XX', 'es_XX', 'et_EE', 'fi_FI', 'fr_XX', 'gu_IN', 'hi_IN', 'it_IT', 'ja_XX', 'kk_KZ', 'ko_KR', 'lt_LT', 'lv_LV', 'my_MM', 'ne_NP', 'nl_XX', 'ro_RO', 'ru_RU', 'si_LK', 'tr_TR', 'vi_VN', 'zh_CN', 'af_ZA', 'az_AZ', 'bn_IN', 'fa_IR', 'he_IL', 'hr_HR', 'id_ID', 'ka_GE', 'km_KH', 'mk_MK', 'ml_IN', 'mn_MN', 'mr_IN', 'pl_PL', 'ps_AF', 'pt_XX', 'sv_SE', 'sw_KE', 'ta_IN', 'te_IN', 'th_TH', 'tl_XX', 'uk_UA', 'ur_PK', 'xh_ZA', 'gl_ES', 'sl_SI']}, clean_up_tokenization_spaces=True)'. Use 'repo_type' argument if needed.
Steps to reproduce the behavior:
import inseq
from transformers import (MBartForConditionalGeneration, MBart50TokenizerFast)
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
de_tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt", src_lang="de_DE", tgt_lang="ko_KR")
attr_model = inseq.load_model(model, "attention", tokenizer=de_tokenizer)
OS: macOS
Python version: 3.10.9
Inseq version: 0.4.0
Providing pretrained fast-tokenizers should be supported.
The issue seems to originate in huggingface_model.py
where several type definitions are set as transformers.PreTrainedTokenizer
instead of transformers.PreTrainedTokenizerBase
, effectively disallowing any Fast tokenizer. the issue seems to be line 112 in particular. Changing the type definitions solves the issue
The following is a non-exhaustive list of perturbation-based feature attribution methods that could be added to the library:
Notes:
Callable
.
How can I visualize the attention weights for a decoder only model like Pythia for a given input prompt?
I went over the tutorial here which uses an Encoder-Decoder model and wanted to try this out for a Decoder only model
I tried to replace the model name only but it does not seem to work -
model = inseq.load_model("EleutherAI/pythia-70m-deduped", "input_x_gradient")
out = model.attribute(
input_texts="Hello everyone, hope you're enjoying the tutorial!",
attribute_target=True,
method="attention"
)
# out[0] is a shortcut for out.sequence_attributions[0]
out.sequence_attributions[0].source_attributions.shape
but I get the error -
AttributeError Traceback (most recent call last)
[<ipython-input-5-6df0e921faca>](https://localhost:8080/#) in <cell line: 11>()
9 )
10 # out[0] is a shortcut for out.sequence_attributions[0]
---> 11 out.sequence_attributions[0].source_attributions.shape
AttributeError: 'NoneType' object has no attribute 'shape'
However strangely I can still look at the outputs using -
out.sequence_attributions[0]._aggregator
out.show()
Is this the intended functioning?
Also I would love to get some help in interpreting the generated plot
I'm confused about why are there some full rows and some rows with certain values masked and what exactly does a cell signify. I know this might be a trivial thing :(
issues
.I get an error when trying to load an FSMTForConditionalGeneration
model. It doesn't seem to have a get_decoder
function, and instead the decoder
attribute should be used. In this thread, we can also collect other seq2seq and decoder-only models from HuggingFace that may not be integrated properly in Inseq.
Steps to reproduce the behavior:
I have only tested it on the fix-macos-issues
branch, but I suspect it is a more general problem.
Loading a FSMT model leads to an AttributeError: 'FSMTForConditionalGeneration' object has no attribute 'get_decoder'
model = inseq.load_model("facebook/wmt19-en-de", "integrated_gradients")
out = model.attribute(
"The developer argued with the designer because her idea cannot be implemented.",
n_steps=100
)
out.show()
Inseq should load the FSMT model without any problems.
I'm trying to generate multiple attributions with a large LM (10B+ params) on a dataset of 2000+ sentences and no constrained decoding.
Apparently, the generate
step in the pipeline crashes with CUDA OOM no matter the batch_size
I set (even with batch_size=1
). The generation itself seems not batched since if I pass a smaller set of texts, the attribution goes smoothly.
Steps to reproduce the behavior:
model = inseq.load_model(
args.model_name_or_path, # 10B+ model
"integrated_gradients",
load_in_8bit=True,
device_map="auto",
)
out = model.attribute(
input_texts=texts, # 2000+ sentences
n_steps=50,
return_convergence_delta=True,
step_scores=["probability"],
batch_size=1,
)
Whereas if I do
model = inseq.load_model(
args.model_name_or_path,
"integrated_gradients",
load_in_8bit=True,
device_map="auto",
)
# raise NotImplementedError()
n_batches = len(texts) // args.batch_size
print("Splitting texts into n batches", n_batches)
batches = np.array_split(texts, n_batches)
for batch in tqdm(batches, desc="Batch", total=len(batches)):
out = model.attribute(
input_texts=batch.tolist(),
n_steps=50,
return_convergence_delta=True,
step_scores=["probability"],
batch_size=len(batch),
internal_batch_size=len(batch),
generation_args=asdict(generation_args),
show_progress=True,
)
it all seems to work.
When computing attributions for a list of sentences the tqdm iterator prints out the iteration per token, which gives no insight into how far in you are with the corpus of sentences that you are attributing over. I would suggest that when attributing over a List of strings the tqdm iterates per sentence, and drops the per token iteration.
Happy to have a go at this if you agree this could be nice.
Next to tqdm, it could be helpful to get some runtime report.
After generating many explanations, this could give insight about the average computation time needed for one example.
Adding this new class to become the default output for the AttributionModel.attribute
method. This will entail the following naming changes:
FeatureAttributionStepOutput
--> FeatureAttributionRawStepOutput
FeatureAttributionOutput
--> FeatureAttributionStepOutput
FeatureAttributionOutput
, replacing both OneOrMoreFeatureAttributionSequenceOutputs
and OneOrMoreFeatureAttributionSequenceOutputsWithStepOutputs
.Advantages:
Initial formulation:
@dataclass
class FeatureAttributionOutput:
"""
Output produced by the `AttributionModel.attribute` method.
Attributes:
sequence_attributions (list of :class:`~inseq.data.FeatureAttributionSequenceOutput`): List
containing all attributions performed on input sentences (one per input sentence, including
source and optionally target-side attribution).
step_attributions (list of :class:`~inseq.data.FeatureAttributionStepOutput`, optional): List
containing all step attributions (one per generation step performed on the batch), returned if
`output_step_attributions=True`.
info (dict with str keys and str values): Dictionary including all available parameters used to
perform the attribution.
"""
save_attributions
and load_attributions
inside the class, removing the global methods.show
method calling the the one in every FeatureAttributionSequenceOutput
and concatenating outputs if return_html=True
.join
method allowing to extend the sequence_attributions
and step_attributions
lists if info
match, raising a ValueError: attributions produced under different settings cannot be combined
error.Hi Inseq team, thanks again for your contribution.
I just noticed a bug when using 'contrast_prob_diff' - output attributions seem to be reversed.
It outputs the attribution of false_answer but not true_answer.
What I did is:
use contrast_prob_diff, and the output attribute for 'morning' and 'evening' is:
use probability (where the variable target_false is useless), and the output attribute for 'morning' and 'evening' is:
The code for the attribute part is: (top: contrast_prob_diff; bottom: probability)
This issue addresses the high space requirements of large attribution scores tensors by adding a scores_precision
parameter to FeatureAttributionOutput.save
method.
Proposant: @g8a9
Currently, tensors in FeatureAttributionOutput
objects (attributions and step scores) are serialized in float32
precision as a default when using out.save()
. While it is possible to compress the representation of these values with ndarray_compact=True
, the resulting JSON files are usually quite large. Using more parsimonious data types could reduce the size of saved objects and facilitate systematic analyses leveraging large amounts of data.
float32
precision should probably remain the default behavior, as we do not want to cause any information loss by default.
float16
and float8
should also be considered, both in the signed and unsigned variants, since leveraging the strictly positive nature of some score types would allow supporting greater precision while halving space requirements. Unsigned values will be used as defaults if no negative scores are present in a tensor.
float16
can be easily used by casting tensors to the native torch.float16
data type, which would preserve precision up to 4 decimal values for scores normalized in the [-1;1] interval (8 for unsigned tensors). This corresponds to 2 or 4 decimal places for float8
. However, this data type is not supported natively in Pytorch, so tensors should be converted to torch.int8
and torch.uint8
instead and transformed in floats upon reloading the object.
Hello, thanks for your contribution firstly! The tool is really helpful and beautiful.
However, when I try to use get_scores_dicts() to output the attributions, I got an error 'index out of bounds'.
I print out the aggr and find it seems to be a bug:
import inseq
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
import pandas as pd
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m")
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")
data = [
{
'prompt': 'people usually take shower in the',
'target_true': ' morning',
'target_false': ' evening',
}]
for layer in range(24):
print('layer:', str(layer))
attrib_model = inseq.load_model(
model,
"layer_gradient_x_activation",
tokenizer="bigscience/bloom-560m",
target_layer=model.transformer.h[layer].mlp,
)
for i, ex in enumerate(data):
print(ex)
# e.g. "The capital of Spain is"
#prompt = ex["relation"].format(ex["subject"])
prompt = ex["prompt"]
# e.g. "The capital of Spain is Madrid"
true_answer = prompt + ex["target_true"]
# e.g. "The capital of Spain is Paris"
false_answer = prompt + ex["target_false"]
# Contrastive attribution of true vs false answer
out = attrib_model.attribute(
prompt,
true_answer,
attributed_fn="contrast_prob_diff",
contrast_targets=false_answer,
step_scores=["contrast_prob_diff"],
show_progress=False,
)
out.show()
out.get_scores_dicts()
Adding support for Fairseq models on top of the AttributionModel
abstraction, similarly to what was done for ๐ค transformers models.
pytorch/fairseq
is a core library for training seq2seq models in Pytorch. Adding support would allow for extended experimentation with state-of-the-art model, especially for NMT.
keyonvafa/sequential-rationales
uses different attribution methods on FairseqEncoderDecoderModel
models and can provide inspiration for an implementation aiming to access the internals of such models.
When loading inseq with a larger dataset, on a CUDA device, an out-of-memory error is occurring regardless of the defined batch_size
. I believe that is is caused by the call to self.encode
inattribution_model.py
lines 345 and 347, which is operating on the full inputs instead of a single batch and moves all inputs to the CUDA device after the encoding.
Steps to reproduce the behavior:
.attribute()
method with any batch_size
parameterOS: macOS
Python version: 3.10
Inseq version: 0.4.0
The input texts should ideally only be encoded or moved to the GPU once they are actually processed.
As long as pytorch 1.12 is still used (basically until 1.13.1 comes out), the "mps" backend seems to be too unstable to use it, failing several of the tests. Even setting PYTORCH_ENABLE_MPS_FALLBACK=1
in the environment does not fully remove this issue.
Steps to reproduce the behavior:
make fast-test
(or any other command) on macOS with "mps" supportTests should run successfully
A quickfix would be to set the default device in inseq.utils.torch_utils.py
to "cpu" for mps-environments as well for now, until pytorch 1.13.1 is released.
def get_default_device() -> str:
if is_cuda_available() and is_cuda_built():
return "cuda"
elif is_mps_available() and is_mps_built():
return "cpu"
else:
return "cpu"
I'm not sure if I'm looking in the wrong places, or if it is missing, but I cannot find an implementation for get_post_variable_assignment_hook
which is mentioned in inseq.utils
' s __all__
and used in the value-zeroing implementation.
issues
.In order to facilitate the evaluation of different interpretability techniques, I propose to identify a set of commonly used datasets from the literature, create ๐ค Datasets loading scripts to have them in a shared format, and host them on the Inseq organization in the Hugging Face hub.
This would provide a shared interface for:
The following table summarizes some of the datasets used in the literature:
Name | Task | Data source | Paper | Description |
---|---|---|---|---|
SCAT | Translation | neulab/contextual-mt | Yin et al. '21 | Contextual coreference in translation, with disambiguating context highlights from translators |
Lambada + Rationales | Language Modeling | keyonvafa/sequential-rationales | Vafa et al. '21 | Next word prediction with human-annotated previous relevant context |
Europarl Gold Alignments | Translation | TBD | TBD | Gold alignments for various language pairs in the Europarl corpus |
The ExNLP Datasets website summarizes various sources available for NLP explainability, verify what is relevant to generation.
Inference on a google colab GPU is very slow. There is no significant difference if the model runs on cuda or CPU
The following model.attribute(...)
code runs for around 33 to 47 seconds both on a colab CPU or GPU. I tried passing the device to the model and the model.device confirms that it's running on cuda, but it still takes very long to run only 2 sentences. (I don't know the underlying computations for attribution enough to know if this is to be expected, or if this should be faster. If it's always that slow, then it seems practically infeasible to analyse larger corpora)
import inseq
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(inseq.list_feature_attribution_methods())
model = inseq.load_model("google/flan-t5-small", attribution_method="discretized_integrated_gradients", device=device)
model.to(device)
out = model.attribute(
input_texts=["We were attacked by hackers. Was there a cyber attack?", "We were not attacked by hackers. Was there a cyber attack?"],
)
model.device
Faster inference with a GPU/cuda
(Thanks btw, for the fix for returning the per-token scores in a dictionary, the new method works well :) )
The merge_attributions
method of the FeatureAttributionOutput
class can be used as static but also called from an object, instance of that class.
In the latter case, the method's output turns out to be unintended since attribution merge is performed only for the objects in the parameter list and not with the object itself calling the method.
Therefore, it would be advisable to move the method out of the class to be used as a utility.
See above.
Example code used as a demonstration of the method behaviour:
>>> import inseq
>>> seq_model = inseq.load_model('gpt2', attribution_method='input_x_gradient')
>>> out1 = seq_model.attribute(input_texts=['hello world!', 'How are you? '])
>>> out2 = seq_model.attribute(input_texts=['I am going to ', 'My name is '])
Correct usage:
>>> inseq.FeatureAttributionOutput.merge_attributions([out1, out2])
FeatureAttributionOutput({
sequence_attributions: list with 4 elements of type GranularFeatureAttributionSequenceOutput:[
...
Misleading behaviour, leading to the loss of attributions contained in out1
:
>>> out1.merge_attributions([out2])
FeatureAttributionOutput({
sequence_attributions: list with 2 elements of type GranularFeatureAttributionSequenceOutput:[
...
Despite fixing batched attribution so that results are consistent with individual attribution (see #110), the method DiscretizedIntegratedGradients
still produces different results when applied to a batch of examples.
discretized_integrated_gradients
method.import inseq
model = inseq.load_model("Helsinki-NLP/opus-mt-en-de", "discretized_integrated_gradients")
out_multi = model.attribute(
[
"This aspect is very important",
"Why does it work after the first?",
"This thing smells",
"Colorless green ideas sleep furiously"
],
n_steps=20,
return_convergence_delta=True,
)
out_single = model.attribute(
[ "Why does it work after the first?" ],
n_steps=20,
return_convergence_delta=True,
)
assert out_single.attributions == out_multi[1].attributions # raises AssertionError
Same as #110
The problem is most likely due to a faulty scaling of the gradients in the _attribute
method of the DiscretizedIntegratedGradients
class.
When leaving the input texts empty for GPT2 with integrated gradients, the saliency map seems to be incorrect and giving false results. The goal is to only give <|endoftext|>, the BOS token, as input (and let GPT-2 generate from nothing basically), which can be done by leaving the input empty.
The problem is here:
inseq/inseq/attr/feat/feature_attribution.py
Line 303 in b5d3610
inseq/inseq/models/decoder_only.py
Lines 177 to 182 in b5d3610
The call to TextSequences
in this method sets skip_special_tokens
to True, removing the <|endoftext|>
from the input. This also prevents a user from giving <|endoftext|>
as the only input (and at the start of the generated text), since it is removed in the input. In that case, when running, there will be an error that the generated text does not begin with the input text.
It can be resolved by temporarily changing the line to:
sequences = TextSequences(
sources=None,
targets=self.attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True, skip_special_tokens=False),
)
However, the feature attribution is zero for every <|endoftext|> token in the input and the output. I'm not sure whether or not this is meant to be, the same process with the ecco package gives attribution to this token. Also, the first token (in this case This) gets zero attribution, which is probably not supposed to be the case.
Summary:
<|endoftext|>
as input because it is removed when processing.<|endoftext|>
is zero. This is probably not correct.Steps to reproduce the behavior:
import inseq
model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
"",
"This is a demo sentence."
).show()
See bug report. This is the integrated gradients result from the ecco package on the same sentence, also using integrated gradients:
I assume this would be correct, however, they leave the baseline default.
Limit the plots displayed through .show()
function
Currently, the .show()
function on the attribution output will display plots for all generated attributions by default. For larger batches, this can lead to huge outputs in a notebook/ large numbers of HTML files being generated. It might be preferable to provide a sensible default value to the number of plots that are displayed in a notebook and allow users to specify for themselves how many/which attributions they want to have visualized by referring to their index.
Similar functionality is already possible now by manually choosing the indices of out.sequence_attributions
and calling the .show()
methods of the individual attribution outputs, so this function would mainly be a convenience function for new users.
see above
Contrastive attributions are currently supported thanks to custom attributed targets (see #138). The current definition of the contrastive attribution custom function can be found here.
The current implementation is problematic for integrated_gradients
and similar methods using multiple approximation steps since the contrastive forward uses static ids instead of embeddings obtained as steps between the original contrastive input and a baseline. Moreover, the current implementation allows only for granular token-based comparisons proposed in the original work by Yin and Neubig (2022), but comparing spans of different lengths could also be desirable.
Given this will add further complexity to the custom attributed function, and given the interest in such an application, it would be ideal to include a pre-registered version of the contrastive attribution step function inside Inseq to enable easy and quick usage.
Step function name: contrast_probs_diff
, since the contrastive comparison is done by taking the difference of output probabilities between a regular and a contrastive example.
Extra arguments:
contrast_input
: required, can be either an input text, a sequence of ids or embeddings for the contrastive example. The function will handle the formatting to match the original input.input_start_span_ids
, contrast_start_span_ids
: Two lists containing initial ids for ever span in the input and the contrast that we want to consider as single units for attribution purposes (e.g. [0, 2, 5]
for input_start_span_ids
says By default None
, set to list(range(len(input_ids)))
and list(range(len(contrast_ids)))
respectively (i.e. every token is treated separately, as in normal feature attribution).Notes for span ids:
input_start_span_ids
) 0 is returned.Aggregation:
The default abs_max
function used for span aggregation (span_aggregate
) of attributions will return the attributions for the first token of every span (see description in the previous section) if the same spans are used with a ContiguousSpanAggregator
. The aggregate_map
for the step score should also be set to abs_max
upon registration.
Following our discussion, it might be valuable to add a test case including larger language models that work with prompting such as T0 and its variants. Since availability of these kinds of models is becoming more common (see "Motivation"), we should show an example of feature attribution for them.
The bitsandbytes integration and 8-bit precision (Dettmers et al., 2022) released in August enables the use of larger models on single GPU setups.
I am running IG attributions for a list of strings, and was trying to work with the resulting inseq.data.attribution.FeatureAttributionOutput
object. To inspect what is in there I was looking at the sequence_attributions
, but I can't print this object because the objects in there don't have a __repr__
method.
import inseq
model = inseq.load_model("gpt2", "integrated_gradients")
sens = ["this is the first sentence. followed by a second"]
prefix = [sen[:sen.index('.')+1] for sen in sens]
attributions = model.attribute(
prefix,
generated_texts=sens,
n_steps=500,
internal_batch_size=50
)
print(attributions.sequence_attributions)
Returns:
TypeError: __repr__ returned non-string (type dict)
Allow negative values for attr_pos_end
allow negative values to be defined for the attr_pos_end
parameter.
This would make it possible to e.g. define that the last token (which is often just the EOS token) should be removed from the generated attributions. It probably needs to be evaluated if this is possible for batched attributions, but I think at least for singular attributions, it could be a nice quality-of-life feature. Especially when attributing over multiple sentences where using a positive value is more complicated due to different sentence lengths.
Add support for PEFT models
Currently, only models corresponding to the PreTrainedModel
instance are supported. It would be useful to add support for models using Parameter-Efficient Fine-Tuning (๐ค PEFT) methods.
Adding support for ๐ค PEFT models would allow the same analyses to be performed on models optimised and trained to be efficient on consumer hardware.
Mostly tbd, as PEFT uses a small number of different (trainable) parameters to those in the original PreTrainedModel
model.
This issue requests the inclusion of a wrapper for the NoiseTunnel
method to make it available for all attribution classes.
Smoothing techniques like the one proposed by Smilkov et al. 2017 can provide a more robust estimation of feature attributions, but they are largely ignored for NLP applications. Including support for noise-injecting techniques in the library would encourage their adoption in the broader research community.
Saliency cards (Paper | Repository) introduce a structured framework to document feature attribution methods' strengths and applicability to different use-cases. Introducing saliency cards specific to sequential generation tasks would help Inseq users in selecting more principled approaches for their analysis.
Copying from the original paper's abstract:
Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a modelโs output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in evaluation metrics, existing approaches assume universal desiderata for saliency methods (e.g., faithfulness) that do not account for diverse user needs. In response, we introduce saliency cards: structured documentation of how saliency methods operate and their performance across a battery of evaluative metrics.
Introducing ad-hoc cards in Inseq should be preferable than contributing to the original saliency cards repository since 1) they will be more easily used and improved by the Inseq community and 2) the original authors focus solely on vision-centric applications.
The following sections are relevant for the integration of saliency cards into Inseq:
Determinism: Determinism measures if a saliency method will always produce the same saliency map given a particular input, label, and model.
Hyperparameter Dependence: Hyperparameter dependence measures a saliency methodโs sensitivity to user-specified parameters. By documenting a methodโs hyperparameter dependence, saliency cards inform users of consequential parameters and how to set them appropriately.
Model Agnosticism: Model agnosticism measures how much access to the model a saliency method requires. *Since several future methods need access to specific modules (see #173 for example), this part could document which parameters will need to be defined in the ModelConfig
class before usage.
Computational Efficiency: Computational efficiency measures how computationally intensive it is to produce the saliency map. Using the same models, we could report unified benchmarks across different methods (and different parameterizations, in some cases).
Semantic Directness: Saliency methods abstract different aspects of model behavior, and semantic directness represents the complexity of this abstraction (i.e. what the reported scores correspond to). For example, discussing the difference between salience and sensitivity for raw gradients vs. input x gradient (see Appendix B of Geva et al. 2023)
(Added) Granularity: Specifying the granularity of the scores returned by the attribution method (e.g. raw gradient attribution returns one score per hidden size of the model embeddings, corresponding to the gradient with respect to the attributed_fn
propagated through the model.
(Added) Target dependence: Specifying whether the method relies on model final predictions to derive importance scores, or whether these are extracted from model internal processes (e.g. for raw attention weights).
The Sensitivity Testing and Perceptibility Testing sections describe empirical measurements of minimality/robustness rather than inherent properties of methods. As such, they should be added only in the presence of a reproducible study using Inseq to compare different methods.
The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:
Notes:
Add an optional baselines
field to the attribute
method of AttributionModel
. If not specified, baselines
takes a default value of None and preserves the default behavior of using UNK tokens as a "no-information" baseline for attribution methods requiring one (e.g. integrated gradients, deeplift). The argument can take one of the following values:
str
: The baseline is an alternative text. In this case, the text needs to be encoded and embedded inside FeatureAttribution.prepare
to fill the baseline_ids
and baseline_embeds
fields of the Batch
class. For now, only strings matching the original input length after tokenization are supported.
sequence(int)
: The baseline is a list of input ids. In this case, we embed the ids as described above. Again, the length must match the original input ids length.
torch.tensor
: We would be interested in passing baseline embeddings explicitly, e.g. to allow for baselines not matching the original input shape that could be derived by averaging embeddings of different spans. In this case, the baseline embeddings field of Batch
is populated directly (after checking that the shape is consistent with input embeddings) and the baseline ids field will be populated with some special id (e.g. -1
) to mark that the ids were not provided. Important: This modality should raise a ValueError
if used in combination with a layer method since layer methods that require a baseline use baseline ids explicitly as inputs for the forward_func
used for attribution instead of baseline embeddings.
tuple
of previous types: If we want to specify both source and target baselines when using attribute_target=True
, the input will be a tuple of one of the previous types. The same procedure will be applied separately to define source and target baselines, except for the encoding that will require the tokenizer.as_target_tokenizer()
context manager to encode strings.
list
or tuple
of lists of previous types: When multiple baselines are specified, we return the expected attribution score (i.e. average, assuming normality) by computing attributions for all available baselines and averaging the final results. See Section 2.2 of Erion et al. 2020 for more details.
When working on minimal pairs, we might be interested in defining the contribution of specific words in the source or the target prefix not only in absolute terms by using a "no-information" baseline, but as the relative effect between the words composing the pair. Adding the possibility of using a custom baseline would enable this type of comparisons.
It will be important to validate whether the hooked method makes use of a baseline via the use_baseline
attribute, raising a warning that the value of the custom input baseline would be ignored otherwise
Since baselines will support all input types (str, ids, embeds), it would be the right time to enable such support for the input of the attribute function. This could be achieved by an extra attribution_input
field set to None
by default that will substitute input_texts
in the call to prepare_and_attribute
, and get set to input_texts
if not specified.
Adding support for decoder-only models like GPT-2 on top of the AttributionModel abstraction. The change will involve a radical refactoring of the whole attribution pipeline to enable target-only attribution and passing Batch
objects instead of EncoderDecoderBatch
if decoder-only attribution is performed.
The output attribution classes would mostly stay the same, with the exception of source attributions becoming optional.
issues
.How do I programmatically extract the per-token scores to have them in a list or dictionary, mapped to each token?
I understand how to show the scores per token visually, but I don't how to extract them from the "out" object for further downstream processing
model = inseq.load_model("google/flan-t5-base", attribution_method="discretized_integrated_gradients")
out = model.attribute(
input_texts=["We were attacked by hackers. Was there a cyber attack?", "We were not attacked by hackers. Was there a cyber attack?"],
)
out.sequence_attributions[0]
out.show()
Track here minimal fixes needed to v0.3:
transformers
to 4.22.0 in pyproject.toml
, required for M1 support and new target tokenization APItqdm
finishing at N-1 in attribute
.The ALTI+ method is an extension of ALTI for encoder-decoder (and by extension, decoder-only) models.
Authors: @gegallego @javiferran
Implementation notes:
The current implementation extracts input features for key, query and value projections and computes intermediate steps using the Kobayashi refactoring to obtain the transformed vectors used in the final ALTi computation.
The computation of attention layer outputs is carried on up to the resultant (i.e. the actual output of the attention layer) in order to check that the result matches the original output of the attention layer forward pass. This is only done for sanity checking purposes but it's not especially heavy from a computational perspective, so it can be preserved (e.g. raise an error if the outputs doesn't match to signal the model is maybe not supported)
Focusing on GPT-2 as an example model, the per-head attention weights and outputs (i.e. matmul of weights and value vectors) are returned here so they can be extracted with a hook and used to compute the transformed vectors needed for ALTI.
Pre- and post-layer norm models are handled differently because the transformed vectors are the final outputs of the attention block, regardless of the position of the layer norm (it needs to be included in any case). In the Kobayashi decomposition of the attention layer the bias component needs to be separate both for the layer norm and the output projections, so we need to make sure whether this is possible out of the box, or it needs to be computed in an ad-hoc hook.
If we are interested in the output vectors before the bias is added, we can extract the bias vector alongside the output of the attention module and subtract the former from the latter.
For aggregating ALTI+ scores in order to obtain overall importance we will use the extended rollout implementation that is currently being developed in #173.
Refererence implementation mt-upc/transformer-contributions-nmt
Even after updating to the newest pytorch version 1.13.1 several issues with the mps-backend still remain when it is enabled in the code. There still seems to be some inconsistency across the different devices depending on the operations that are run, as can be seen below.
The goal of this issue is primarily to collect and highlight these problems.
Steps to reproduce the behavior:
inseq/utils/torch_utils
and change cpu
to mps
in line 229 to enable the mps-backendmake fast-test
to run the testssee above
Running the tests this way generates the following error report:
========================================================================================== short test summary info ===========================================================================================
FAILED tests/attr/feat/test_feature_attribution.py::test_mcd_weighted_attribution - NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - NotImplementedError: The operator 'aten::cumsum.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
==================================================================== 3 failed, 25 passed, 442 deselected, 6 warnings in 76.36s (0:01:16) =====================================================================
When run with the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1
set, the following errors are still occuring:
========================================================================================== short test summary info ===========================================================================================
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - AssertionError: assert 26 == 27
==================================================================== 2 failed, 26 passed, 442 deselected, 6 warnings in 113.36s (0:01:53) ====================================================================
These errors do not occur when running the tests on other backends, implying that there is still some inconsistency between mps and the other torch backends.
All tests should run consistently across all torch backends.
As for TransformerLensOrg/TransformerLens#164, this project uses Patrick Kidger's torchtyping
to give tensor types including shapes. However, jaxtyping
usage is recommender for newer projects (not JAX specific, better maintained, more compatible with type checkers, etc).
The following is a non-exhaustive list of gradient-based feature attribution methods that could be added to the library:
Method name | Source | In Captum | Code implementation | Status |
DeepLiftSHAP | - | โ | pytorch/captum |
|
GradientSHAP1 | Lundberg and Lee '17 | โ | pytorch/captum |
|
Guided Backprop | Springenberg et al. '15 | โ | pytorch/captum |
|
LRP 2 | Bach et al. '15 | โ | pytorch/captum |
|
Guided Integrated Gradients | Kapishnikov et al. '21 | PAIR-code/saliency |
||
Projected Gradient Descent (PGD) 3 | Madry et al. '18, Yin et al. '22 | uclanlp/NLP-Interpretation-Faithfulness |
||
Sequential Integrated Gradients | Enguehard '23 | josephenguehard/time_interpret |
||
Greedy PIG 4 | Axiotis et al. '23 | |||
AttnLRP | Achtibat et al. '24 | rachtibat/LRP-for-Transformers |
Notes:
In the requirements.txt it is stated that typeguard==3.0.1
, but torchtyping
is incompatible with typeguard versions >3.0 (https://github.com/patrick-kidger/torchtyping#installation).
For me this led to issues with importing inseq, raising the following import error:
ImportError: cannot import name 'LiteralString' from 'typing_extensions'
However, it turned out the issue there did not stem from the typing_extensions
library, but the typeguard
version: once I had set that to version 2.13.3
the error disappeared.
๐ค Transformers v4.26.0 introduces the compute_transition_scores
function to simplify the return of log probabilities. Example taken from docs link above:
from transformers import GPT2Tokenizer, AutoModelForCausalLM
import numpy as np
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(["Today is"], return_tensors="pt")
# Example 1: Print the scores for each token generated with Greedy Search
outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
transition_scores = model.compute_transition_scores(
outputs.sequences, outputs.scores, normalize_logits=True
)
input_length = inputs.input_ids.shape[1]
generated_tokens = outputs.sequences[:, input_length:]
for tok, score in zip(generated_tokens[0], transition_scores[0]):
# | token | token string | logits | probability
print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")
# Output
| 262 | the | -1.414 | 24.33%
| 1110 | day | -2.609 | 7.36%
| 618 | when | -2.010 | 13.40%
| 356 | we | -1.859 | 15.58%
| 460 | can | -2.508 | 8.14%
We want to use compute_transition_scores
to calculate the probability
step score in Inseq, and all derivative scores, with the purpose of ensuring a continued compatibility with Transformers.
In principle, inseq
should work with the DistributedBloomModel
implemented in the petals
package to perform feature attribution of the 176B Bloom model in a distributed setup. However, some compatibility issues currently undermine the interoperability of the two libraries.
Refer to bigscience-workshop/petals#178 for additional precisions.
Allow users to extract the probabilities associated with each generated token at every attribution step. This behavior can be controlled by a parameter output_probabilities: bool = False
passed to the attribute
function.
Uncertanity features such as generation probabilities have proven useful in many cases to estimate the quality of the generated sentence (see Fomicheva et al., 2020 as an example for QE in NMT)
The Huggingface library does not support the extraction of token-by-token probabilities from the generate
method at the moment.
Attribution patterns differ when more than one example is attributed at the same time, even for deterministic methods (e.g. IG).
Attribute the same sentence twice using any attribution model and method, once as the only attributed text and once with other sentences. The resulting attributions differ, despite the results predicted by the model are the same.
from inseq import load
model = load("Helsinki-NLP/opus-mt-en-de", "integrated_gradients")
single_out = model.attribute("This is an example sentence")
multi_out = model.attribute(["This is an example sentence", "This is another example"])
assert single_out.attributions == multi_out[0].attributions # raises AssertionError
The attributions must be the same regardless of the number of examples that are attributed at once.
The question is whether this is a bug or a methodological error due to the architecture of the seq2seq models. A first step would be to start from the BERT tutorial in Captum and change the methods to accept multiple examples in input. If the attribution patterns are consistent, in principle the problem is somewhere in the cross-attention mechanism.
Steps to reproduce the behavior:
Accessing the visualization (via .show()
) currently requires jupyter. It would be nice to have an option to export it as an image from the console.
With out
being a FeatureAttributionOutput
, html = out.show(return_html=True)
returns an error:
AttributeError: return_html=True is can be used only inside an IPython environment.
If the HTML content is returned without error, one can use e.g. imgkit to create images.
Use the newly introduced PySvelte library and the Svelte framework to deduplicate HTML visualization from the main body of the library.
Update: Provided the new capabilities of Gradio v3.0 and the introduction of support for custom components with Gradio Blocks (using Svelte as frontend) and tabbed interfaces enabling multi-visualization widgets, Gradio becomes the most simple and interesting choice for Inseq visualizations.
The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:
Method name | Source | Code implementation | Status |
Sensitivity | Yeh et al. '19 | pytorch/captum |
|
Infidelity | Yeh et al. '19 | pytorch/captum |
|
Log Odds | Shrikumar et al. '17 | INK-USC/DIG |
|
Sufficiency | De Young et al. '20 | INK-USC/DIG |
|
Comprehensiveness | De Young et al. '20 | INK-USC/DIG |
|
Human Agreement | Atanasova et al. '20 | copenlu/xai-benchmark |
|
Confidence Indication | Atanasova et al. '20 | copenlu/xai-benchmark |
|
Cross-Model Rationale Consistency | Atanasova et al. '20 | copenlu/xai-benchmark |
|
Cross-Example Rationale Consistency (Dataset Consistency) | Atanasova et al. '20 | copenlu/xai-benchmark |
|
Sensitivity | Yin et al. '22 | Iuclanlp/NLP-Interpretation-Faithfulness |
|
Stability | Yin et al. '22 | Iuclanlp/NLP-Interpretation-Faithfulness |
Notes:
The Log Odds metric is just the negative logarithm of the Comprehensiveness metric. The application of - log can be controlled by a parameter do_log_odds: bool = False
in the same function. The reciprocal can be obtained for the Sufficiency metric.
All metrics that control masking/dropping a portion of the inputs via a top_k
parameter can benefit from a recursive application to ensure the masking of most salient tokens at all times, as described in Madsen et al. '21. This could be captured by a parameter recursive_steps: Optional[int] = None
. If specified, a masking of size top_k // recursive_steps + int(top_k % recursive_steps > 0)
is performed for recursive_steps
times, with the last step having size equal to top_k % recursive_steps
if top_k % recursive_steps > 0
.
The Sensitivity and Infidelity methods add noise to input embeddings, which could produce unrealistic input embeddings for the model (see discussion in Sanyal et al. '21). Both sensitivity and infidelity can include a parameter Using Stability is more principled in this context since fluency is preserved by the two step procedure presented by Alzantot et al. '18, which includes a language modeling component. An additional parameter discretize: bool = False
that when turned on replaces the top-k inputs with their nearest neighbors in the vocabulary embedding space instead of their noised versions.sample_topk_neighbors: int = 1
can be used to control the nearest neighbors' pool size used for replacement.
Sensitivity by Yin et al. '22 is an adaptation to the NLP domain of Sensitivity-n by Yeh et al. '19. An important difference is that the norm of the noise vector causing the prediction to flip is used as a metric in Yin et al. '22, while the original Sensitivity in Captum uses the difference between original and noised prediction scores. The first should be prioritized for implementation.
Cross-Lingual Faithfulness by Zaman and Belinkov '22 (code) is a special case of the Dataset Consistency metric by Atanasova et al. 2020 in which the pair is constituted by an example and its translated variant.
A Comparative Study of Faithfulness Metrics for Model Interpretability Methods, Chan et al. '22
Allow users to perform feature attributions on the target prefix. The behavior is controlled by a new attribute_target: bool = False
parameter passed to the AttributionModel.attribute
method.
Attributing only on the source is reductive, since the influence of the target prefix is fundamental in determining the outcome of the next generation step in many occasions (e.g. a prefix Ladies and
will strongly bias the next token towards Gentlemen
, regardless of the source sequence).
The output tensor .source_attributions in FeatureAttributionSequenceOutput is of type float64 when using "integrated_gradients" method, rather than the expected float32.
Steps to reproduce the behavior:
import inseq
model = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "integrated_gradients")
out = model.attribute(
"The developer argued with the designer because her idea cannot be implemented.",
n_steps=100
)
print(out.sequence_attributions[0].source_attributions.dtype)
Python 3.8.16
The dtype should be float32.
Other methods ("saliency", "input_x_gradient", "deeplift") return float32.
Interestingly, "discretized_integrated_gradients" also returns float32, but "layer_integrated_gradients" returns float64.
FeatureAttributionSequenceOutput.show()
raises an error in console mode due to TokenWithId
having replaced str
tokens.
Loading a FeatureAttributionOutput
object with load()
should instantiate default AggregableMixin
attributes to guarantee that show()
will work out of the box after loading.
format_input_texts
in AttributionModel
can be moved to utils/misc
, since it does not require self
.
The __repr__
of TensorWrapper
and FeatureAttributionOutput
classes should direct to the prettified __str__
representation by default.
The CLI in __main__
is not working; it should be adapted to the updated library and refactored so that inseq attribute text [PARAMS]
calls the normal attribution function. A future inseq attribute file [PARAMS]
command will be added to directly attribute sentences from a file. Create a separate commands
folder in the library to group those.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.