jessevig / bertviz Goto Github PK
View Code? Open in Web Editor NEWBertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
License: Apache License 2.0
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
License: Apache License 2.0
Hi,
Thanks for your great tool.
I was wondering if it is possible to display multiple visualization for multiple sentences at once in one display? If yes, I would be grateful if you guide me on how to do it.
Thank you very much in advance
Hi,
Thank you for writing this tool.
I was wondering if there is any way to compute word-level attention and also point their respective positions within the context of a sentence in a given multi-sentence text.
Sticking to the same example below, how to find the BERT word attention for "cat" in two different sentences:
[the cat sat on the mat. the cat lay on the rug.]
Thanks!
I am very impressed from the features bertviz is offering till now. I was wondering if there is any approach for vizualizing the relation between a query and the results ranked by a BERT Model fine tuned on the passage reranking task. This fine tuned model predicts the relevance of a passage being the right "response" to a query and it would be nice to somehow vizualize how the query is connected to the passages.
Thanks in advance!
Is there any way to see the attention visualization for Bert Questioning and Answering model ?? Because I couldn't see BertForQuestionAnswering class in bertviz.pytorch_transformers_attn? I have fine-tuned over a QA dataset using hugging-face transformers and wanted to see the visualization for it. Can you suggest any way of doing it ??
BertModel finetuned for a sequence classification task does not give expected results on visualisation.
Ideally, the pretrained model should be loaded into BertForSequenceClassification, but that model does not return attentions scores for visualisation.
When loaded into BertModel (0 to 11 layers), I assume the 11th layer (right before classification layer in BertForSequenceClassification) is the right layer to check attention distribution.
But every word is equally attentive to every other word.
I am wondering what can be the possible reasons and how I can fix it.
Thanks.
Thank you for your work on this repo - the visualizations are really useful and helpful towards my research. I'm wondering if I've run into some sort of bug or limitation of the visualizations, because I'm unable to show more than one visualization in a notebook - in two different cells that is. The second cell always shows an empty output when I try to run it (See attached picture). I see the same behavior with all the views.
Update: I've also noticed that sometimes the second visualization is displayed, but then the first visualization disappears.
After using bertviz to visualize attention in a notebook, how to export vector pictures that meet the requirements of the paper?
Hello, thanks for the work.
Short bug report.
In safari, when you are hovering over the visualisation in google colab, the window in of the visualisation is scrolling down automatically, making it impossible to work with model_view and neuron_view. Works in chrome.
Thanks again
Hi,
thank you for the nice tool!
I would like to understand why you are not loading the models directly from the transformers package.
Which part of the transformers model do you need to adapt to make it compatible with bertviz? I would like to add other huggingface models such as DistilBERT.
Did you consider a way to load the model directly from transformers?
Thank you in advance for your kind response.
Hi,
Could you let me know how to use bertviz for NER task?. @jessevig
Thanks for the great tool!
It would be nice to be able to save the visualizations for specific layers/heads as images. I have not been able to find a spot in the model/head/neuron_view.js file to add a saving function.
Do you maybe have a suggestion on how to save the visualizations as images?
Thanks!
I want to ask about the Chinese BERT model can be used represented by words instead of character?
Because when I do BERT visual for Chinese can only see its attention from character to character.
I want to see attention from words to words. Can I change this?
Thanks a lot for your help
I am using bertviz with BertModel for visualisation., I'm using the neuron detail viz. Every time I pass a sentence of more than 5 words., the runtime gets disconnected. Is their someway else to still visualise them, I am trying to understand long term dependencies.
Hey, first, thank you for both creating + maintaining this repo!
I'm trying to get the basic BERT visualization working using Chrome on my local machine. The notebook is throwing an error when I execute the JS cell.
%%javascript
require.config({
paths: {
d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
}
});
returns an error: Javascript Error: require is not defined
I tried:
(1) doing a clean pip install
from your requirements.txt
to confirm I had everything
(2) I also found documentation from Jupyter indicating there's a specific jupyter-require
package, which I installed; and also used the magic %load_ext jupyter_require
; stopping / restarting / reloading the kernel, etc.
Any idea what's going on here?
update: i just tried opening in a notebook (vs the lab environment) and it worked fine! So that's a totally reasonable workaround for me. If there's an easy fix, would love to hear it, but otherwise feel free to just close this. thankya!
model_type = 'bert'
model_version_3 = './bertviz/tests/saved_model'
model_config = './bertviz/tests/saved_model/bert_config.json'
do_lower_case = False
config = BertConfig.from_json_file(model_config)
model = BertForSequenceClassification.from_pretrained(model_version_3, from_tf=True, config=config)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show(model, model_type, tokenizer, sentence_a, sentence_b)
I have tried this code, but it shows error:
AttributeError: 'BertForSequenceClassification' object has no attribute 'bias'
Also, I'm unable to pass parameter num_labels in BertForSequenceClassification.from_pretrained()
It was showing error: init() got an unexpected keyword argument 'num_label'
Help me to fix it.
Thanks.
From the official document:`
The main breaking change when migrating from pytorch-pretrained-bert to pytorch-transformers is that the models forward method always outputs a tuple with various elements depending on the model and the configuration parameters.
It seems like the structure of attentions also changed. The attention for each layer is just a tensor, not key-value pairs.
If I am changing sentence then the visualization is coming. Why so?
Thank you for the great work! Is it possible to make this work with the large uncased BERT model as well? Currently it loads the model but the Layer dropdown is not populated and the visualization is not shown. Simply changing the hardcoded layers from 12 to 24 in attention.js does not work.
Hi,
I am using huggingface's transformers BertForSequenceClassification to train a BERT model. Now, I want to load my saved model and use it in your head_view_bert notebook. It does not assume a local model... Can you tell me how to fix this? here's my code:
model_version = 'bert-base-cased'
do_lower_case = True
modelpath = '~/Documents/insight/projects/factCC/models/saved_models/'
model = BertForSequenceClassification.from_pretrained(modelpath, from_tf=True)
tokenizer = TFBertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show_head_view(model, tokenizer, sentence_a, sentence_b)
This is the was I load my model normally for test data and prediction... yet I can't load it here... here's the error I get:
OSError Traceback (most recent call last)
in
2 do_lower_case = True
3 modelpath = '~/Documents/insight/projects/factCC/models/saved_models/'
----> 4 model = BertForSequenceClassification.from_pretrained(modelpath, from_tf=True)
5
6 # model = BertModel.from_pretrained(model_version, output_attentions=True)
/usr/local/lib/python3.6/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
348 resume_download=resume_download,
349 proxies=proxies,
--> 350 **kwargs
351 )
352 else:
/usr/local/lib/python3.6/site-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
171 ', '.join(cls.pretrained_config_archive_map.keys()),
172 config_file, CONFIG_NAME)
--> 173 raise EnvironmentError(msg)
174
175 except json.JSONDecodeError:
OSError: Model name '~/Documents/insight/projects/factCC/models/saved_models/' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-japanese, bert-base-japanese-whole-word-masking, bert-base-japanese-char, bert-base-japanese-char-whole-word-masking, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/~/Documents/insight/projects/factCC/models/saved_models//config.json' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.
Hi, I have trained an XLM model that translates from English to Spanish. A model for this language pair is not available on huggingface's repo. Is there any way to load my saved model?
Hi,
I am using BertForSequenceClassification to do binary classification task. Do you know how to get word embedding and gradient output for each word in BertForSequenceClassification https://github.com/huggingface/transformers/blob/7d7fe4997f83d6d858849a659302b9fdc32c3337/src/transformers/modeling_bert.py#L1075?
My use case: I want model to automatically tag a certain number of indicative words in test sentence to interpret why this sentence is labeled as COVID or non COVID. So my solution is to interpret this using gradient. But i haven't figured out how to output word embedding and gradient
The function load_tf_weights_in_bert
in modeling_bert.py
is buggy and throws a lot of attribute errors because of what seems as improper parsing of the variable names and the pointer pointing to the entire model.
For instance for the variable bert/encoder/layer_0/attention/output/dense/kernel
it throws an attribute error along the lines of Bert model has no attribute weight
because the pointer is the model bert
itself whereas the pointer should be bert.encoder.layer.0.attention.output.dense
.
Hi @jessevig ,
Thank you so much for making this library. It's an incredibly effective and easy way to create beautiful attention visualizations. I'm currently trying to implement for a tutorial in the DeepChem library. However, I'm running into a multitude of issues. When I use the RoBERTa notebook code, it doesn't display anything and is blank:
When I try to re-use the BERT colab code on RoBERTa, it throws the following error:
I'm able to successfully run the example BERT code, it runs successfully, however:
I'd love to get your advice on how I can fix this. I'm really interested in visualizing the attention using the library for validating the robustness of the model, and for future technical presentations and papers.
The Colab notebook can be accessed here.
Thanks!
I've been looking for a tool which can give me some type of token-based extractive summarization to solve an especially interesting problem in the Competitive Debate community. I think that this tool will help me solve it.
I've wanted to create a neural network which summarizes texts by using a "highlighter", that is, by summarizing documents out of the words used in the original document (but NOT the sentences). I cannot seem to find a neural network based method that does exactly what I'm asking.... but the attention mechanism (and it's visualizations) show highlights of a particular source document in terms of highlighting the most important parts to cause a transformation to document b. This seems to be what I want
Actually, just as I typed out the previous paragraph, I'm getting the idea to do something like this: Take a news article and an abstractivly made short "summary" of a news article, and then take the most attended to tokens in the transformation between news article and summary and use that as the summary itself. Can I use bertviz to do what I am describing via bert, and if I can't, what are my best options?
Hi,
Is it possible to calculate the attention weight for a full sentence.
For example, for sentence "This is test"
the attention matrix for the first layer and the first attention head is
[
[0.4,0.1,0.2],
[0.1,0.5,0.3],
[0.7,0.3,0.2]
]
if I get the average of that matrix, does that reflect the attention weight for the full sentence or that does not make sense??
Thanks for this repo and article. Also i think will be interesting visualize attention with special [MASK] token what used for BERT pre training. How it interacts with the other tokens in sequence.
Hi, thanks for the great visualization tool!
I'm just wondering whether we can have a feature which renders head view in horizontal direction? The reason is that it's more suitable to show the sequence of tokens in the horizontal direction for language like Chinese, Japanese or Korean.
In the above example, typical sentences in Chinese take about 6,70 characters but it already uses a lot of space showing 10 of them in the current head view.
Thanks again for the great tool!
Hi,
I'm using Roberta, GPT2, BERT and Grover for various tasks, I was hoping to see if I could use Bertviz to help me work out what the previous word would have been if it was or not present.
Ah thinking about this I could just use Roberta for that. But regardless, my question still stands.
Is it possible to use this for previous-word prediction?
Many thanks.
Vince.
Thanks for making such great visualization!
I wonder whether it is possible to render the vis for easier exploration into a normal html page rather than showing on jupyter notebook?
And suggestion or example for achieving this will be appreciated!
Hi Jesse,
I find you work very interesting, thanks a lot for putting it out there!
I was digging into the attention values being visualized in the BERT map, specifically the return value of _get_attentions()
, and found that the token-to-token attention weights are not symmetrical, as I would have expected. For instance, consider:
layer, head = 11, 0
att = _get_attentions(tokens_a, tokens_b, atts)
attmx = np.array(att['a']['att'][layer][head])
Here, the matrix attmx
might look like this:
array([[0.10391058, 0.09832697, 0.09166335, 0.14575878, 0.08784127],
[0.09632228, 0.09650009, 0.09524056, 0.12355924, 0.09061429],
[0.12465193, 0.10896012, 0.11306546, 0.11939598, 0.10786319],
[0.09877665, 0.10982872, 0.08591022, 0.11621149, 0.1339225 ],
[0.11143579, 0.0954979 , 0.09444219, 0.1312461 , 0.07381313]])
How should the fact that it's asymmetrical be interpreted? If we consider the [CLS] token at the output layer (layer 11 in bert-base?), would the attention it receives from the second token in the previous layer be attmx[0][1] == 0.09832697
(or attmx[1][0] == 0.09632228
)? Are either of these values incidental and can be safely ignored?
Thanks in advance!
-Samuel
Does bertviz support Chinese? (bert-base-chinese)
Hi, Thank you for this great work.
can I use this code to plot my model(I am useing BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model_type = 'bert'
model_version = 'bert-base-uncased'
do_lower_case = True
model = model #(this my model)
#tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = sentences[0]
sentence_b = sentences[1]
call_html()
show(model, model_type, tokenizer, sentence_a, sentence_b)
I changed only the model with my model, and the sentences and I got this error??!!please help or share any blog that explain how to plot my model
AttributeError: 'BertTokenizer' object has no attribute 'cls_token'
Thank you in advance
Hi,
I have a general question. If you feel like this is not relevant / don't feel like discussing this here, feel free to close this issue. Why do you associate the values of intermediate layers of the transformer with the input tokens? Is there a property of BERT/transformers that binds the representation to those specific tokens?
The way I see it, the representation of layer i + 1 is essentially a weighted sum over the input sequence. If attention chooses to give zero weight to the input token/hidden vector, the new representation will be absolutely unrelated to the initial input sequence. So what does it tell us that layer 5 is paying attention to the hidden representation of elements 4-5 in layer 4? Are they really still associated with the initial input tokens? Do you see what I'm getting at?
I tried load XLNet only with three layers (it does work with full XLNet) but with three the example model_view_xlnet.ipynb does not work
config = XLNetConfig.from_pretrained('/transformers/')
config.n_layer = 3
config.num_labels = 3
model = XLNetModel.from_pretrained('/transformers/')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-7c9c3356caa4> in <module>
17 input_id_list = input_ids[0].tolist() # Batch index 0
18 tokens = tokenizer.convert_ids_to_tokens(input_id_list)
---> 19 model_view(attention, tokens)
~/projects/bertviz/bertviz/model_view.py in model_view(attention, tokens, sentence_b_start, prettify_tokens)
78 attn_seq_len = len(attn_data['all']['attn'][0][0])
79 if attn_seq_len != len(tokens):
---> 80 raise ValueError(f"Attention has {attn_seq_len} positions, while number of tokens is {len(tokens)}")
81 display(Javascript('window.params = %s' % json.dumps(params)))
82 display(Javascript(vis_js))
ValueError: Attention has 768 positions, while number of tokens is 14
It seems that some words not in the BERT vocab can be broken down into WordPiece tokens that are in the pretrained model and visualized, but others are not?
I tried this on some words from Norse mythology using the Colab notebook. "Ragnarok" can be broken down into rag ##nar ##ok but somehow "Valhalla" cannot. I tried it on sentences from tax documents and it doesn't quite work there either.
Is there any way to use bertviz to visualise the importance of the different words respect to a given prediction of a classification task (BertClassifier)?
Similar to this: https://docs.fast.ai/text.interpret.html#interpret
Thank you
I'm running the attention visualizations on a server without GUI.
Is there an easy way to run, e.g., head_view_bert.py
and save the interactive visualizations to a local .html file which can then be viewed on another machine?
It seems you removed encode_plus, what is the successor? All the notebook includes inputs = tokenizer.encode_plus(text, return_tensors='pt', add_special_tokens=True)
which is wrong and raise an error.
Hi there,
First of all, thank you for building this package. It's exactly what I'm looking for right now! :)
I'm trying to use model_view
to visualize the attention. I can use head_view
and it works perfectly fine, but when I try to use model_view
with everything else the same -- I just get a black rectangle. See the image below.
Any suggestions would be much appreciated.
I want to know why there is not CLS token in the visualizaition of XLNet . I used your XLNet code to visualize my fine tuned XLNet but I am not getting in any [CLS] token in the input . Can you tell , how can i get that ?
Hi,
Can I use bertViz with Transformers model implemented using pytorch nn.Transform??
Thanks,
Fatma
i've use the bert from huggingface to fine tuning the model for my task already. And i'm trying to use my model in this project so i can see my model in these 3 style provided in this project.
By the way, my task is text classification with pair of sentence.
Finally,thanks a lot to this paper and project, it's so excellent
I'm using colab but it doesn't work. Help.
%%javascript
require.config({
paths: {
d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
}
});
def` show_head_view(model, tokenizer, sentence_a, sentence_b=None):
inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt',
add_special_tokens=True)
input_ids = inputs['input_ids']
if sentence_b:
token_type_ids = inputs['token_type_ids']
attention = model(input_ids, token_type_ids=token_type_ids)[-1]
sentence_b_start = token_type_ids[0].tolist().index(1)
else:
attention = model(input_ids)[-1]
sentence_b_start = None
input_id_list = input_ids[0].tolist() # Batch index 0
tokens = tokenizer.convert_ids_to_tokens(input_id_list)
head_view(attention, tokens, sentence_b_start)
model_version = 'bert-base-uncased'
do_lower_case = True
model = BertModel.from_pretrained(model_version, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = "the cat sat on the mat"
sentence_b = "the cat lay on the rug"
show_head_view(model, tokenizer, sentence_a, sentence_b)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.