GithubHelp home page GithubHelp logo

jessevig / bertviz Goto Github PK

View Code? Open in Web Editor NEW
6.4K 69.0 750.0 198.37 MB

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

Home Page: https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1

License: Apache License 2.0

JavaScript 11.38% Python 86.19% Jupyter Notebook 2.43%
natural-language-processing machine-learning visualization neural-network pytorch nlp bert transformer gpt2 roberta

bertviz's Issues

A question for the output of the neuron_view_roberta.ipynb

Hi, I have a question for the output of the neuron_view_roberta.ipynb. I did not get the result including Q, K, and Q dot K like the example shown. Here is the result that I have got and here is the result that the example shown.
The result I got.
360截图20200629223248496

The result from the example shown.
neuron_thumbnail

Many thanks!

Multiple visualization for multiple sentences at one display

Hi,
Thanks for your great tool.
I was wondering if it is possible to display multiple visualization for multiple sentences at once in one display? If yes, I would be grateful if you guide me on how to do it.
Thank you very much in advance

Neuron View Inconsistency

As you can see in the following example, the neuron view output is different than what you've provided in the readme of this repo. Please also check this one.

The current output:
image

The demanded output:
image

word attention weights

Hi,

Thank you for writing this tool.

I was wondering if there is any way to compute word-level attention and also point their respective positions within the context of a sentence in a given multi-sentence text.

Sticking to the same example below, how to find the BERT word attention for "cat" in two different sentences:
[the cat sat on the mat. the cat lay on the rug.]

Thanks!

Vizualization for query-result pairs

I am very impressed from the features bertviz is offering till now. I was wondering if there is any approach for vizualizing the relation between a query and the results ranked by a BERT Model fine tuned on the passage reranking task. This fine tuned model predicts the relevance of a passage being the right "response" to a query and it would be nice to somehow vizualize how the query is connected to the passages.

Thanks in advance!

How can use bertviz for Bert Questioning Answering??

Is there any way to see the attention visualization for Bert Questioning and Answering model ?? Because I couldn't see BertForQuestionAnswering class in bertviz.pytorch_transformers_attn? I have fine-tuned over a QA dataset using hugging-face transformers and wanted to see the visualization for it. Can you suggest any way of doing it ??

Issues in visualizing a fine tuned model

BertModel finetuned for a sequence classification task does not give expected results on visualisation.
Ideally, the pretrained model should be loaded into BertForSequenceClassification, but that model does not return attentions scores for visualisation.
When loaded into BertModel (0 to 11 layers), I assume the 11th layer (right before classification layer in BertForSequenceClassification) is the right layer to check attention distribution.
But every word is equally attentive to every other word.
I am wondering what can be the possible reasons and how I can fix it.
Thanks.
Screenshot 2019-07-30 at 11 19 46 AM

Unable to visualize more than once in the same notebook

Thank you for your work on this repo - the visualizations are really useful and helpful towards my research. I'm wondering if I've run into some sort of bug or limitation of the visualizations, because I'm unable to show more than one visualization in a notebook - in two different cells that is. The second cell always shows an empty output when I try to run it (See attached picture). I see the same behavior with all the views.

Screen Shot 2020-04-24 at 4 26 06 PM

Update: I've also noticed that sometimes the second visualization is displayed, but then the first visualization disappears.

How do I export vector graphics?

After using bertviz to visualize attention in a notebook, how to export vector pictures that meet the requirements of the paper?

Hovering issue

Hello, thanks for the work.
Short bug report.
In safari, when you are hovering over the visualisation in google colab, the window in of the visualisation is scrolling down automatically, making it impossible to work with model_view and neuron_view. Works in chrome.

Thanks again

Load directly huggingface/transformers models

Hi,
thank you for the nice tool!
I would like to understand why you are not loading the models directly from the transformers package.

Which part of the transformers model do you need to adapt to make it compatible with bertviz? I would like to add other huggingface models such as DistilBERT.

Did you consider a way to load the model directly from transformers?

Thank you in advance for your kind response.

Saving visualizations

Thanks for the great tool!

It would be nice to be able to save the visualizations for specific layers/heads as images. I have not been able to find a spot in the model/head/neuron_view.js file to add a saving function.

Do you maybe have a suggestion on how to save the visualizations as images?

Thanks!

Chinese BERT model can be used represented by words instead of character

I want to ask about the Chinese BERT model can be used represented by words instead of character?
Because when I do BERT visual for Chinese can only see its attention from character to character.
I want to see attention from words to words. Can I change this?

Thanks a lot for your help

Runtime disconnected frequently

I am using bertviz with BertModel for visualisation., I'm using the neuron detail viz. Every time I pass a sentence of more than 5 words., the runtime gets disconnected. Is their someway else to still visualise them, I am trying to understand long term dependencies.

"require"

Hey, first, thank you for both creating + maintaining this repo!

I'm trying to get the basic BERT visualization working using Chrome on my local machine. The notebook is throwing an error when I execute the JS cell.

%%javascript
require.config({
  paths: {
      d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
      jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
  }
});

returns an error: Javascript Error: require is not defined

I tried:
(1) doing a clean pip install from your requirements.txt to confirm I had everything
(2) I also found documentation from Jupyter indicating there's a specific jupyter-require package, which I installed; and also used the magic %load_ext jupyter_require; stopping / restarting / reloading the kernel, etc.

Any idea what's going on here?

update: i just tried opening in a notebook (vs the lab environment) and it worked fine! So that's a totally reasonable workaround for me. If there's an easy fix, would love to hear it, but otherwise feel free to just close this. thankya!

how to use BertForSequenceClassification class for tf checkpoint fine-tuned on sentence classification task(eg. GLUE task).

model_type = 'bert'
model_version_3 = './bertviz/tests/saved_model'
model_config = './bertviz/tests/saved_model/bert_config.json'
do_lower_case = False
config = BertConfig.from_json_file(model_config)
model = BertForSequenceClassification.from_pretrained(model_version_3, from_tf=True, config=config)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show(model, model_type, tokenizer, sentence_a, sentence_b)

I have tried this code, but it shows error:
AttributeError: 'BertForSequenceClassification' object has no attribute 'bias'

Also, I'm unable to pass parameter num_labels in BertForSequenceClassification.from_pretrained()
It was showing error: init() got an unexpected keyword argument 'num_label'

Help me to fix it.

Thanks.

it's not compatible with latest pytorch-transformers anymore

From the official document:`

The main breaking change when migrating from pytorch-pretrained-bert to pytorch-transformers is that the models forward method always outputs a tuple with various elements depending on the model and the configuration parameters.

It seems like the structure of attentions also changed. The attention for each layer is just a tensor, not key-value pairs.

Visualize large BERT model

Thank you for the great work! Is it possible to make this work with the large uncased BERT model as well? Currently it loads the model but the Layer dropdown is not populated and the visualization is not shown. Simply changing the hardcoded layers from 12 to 24 in attention.js does not work.

How to use saved model for bertviz

Hi,

I am using huggingface's transformers BertForSequenceClassification to train a BERT model. Now, I want to load my saved model and use it in your head_view_bert notebook. It does not assume a local model... Can you tell me how to fix this? here's my code:

model_version = 'bert-base-cased'
do_lower_case = True
modelpath = '~/Documents/insight/projects/factCC/models/saved_models/'
model = BertForSequenceClassification.from_pretrained(modelpath, from_tf=True)

model = BertModel.from_pretrained(model_version, output_attentions=True)

tokenizer = TFBertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show_head_view(model, tokenizer, sentence_a, sentence_b)

This is the was I load my model normally for test data and prediction... yet I can't load it here... here's the error I get:

OSError Traceback (most recent call last)
in
2 do_lower_case = True
3 modelpath = '~/Documents/insight/projects/factCC/models/saved_models/'
----> 4 model = BertForSequenceClassification.from_pretrained(modelpath, from_tf=True)
5
6 # model = BertModel.from_pretrained(model_version, output_attentions=True)

/usr/local/lib/python3.6/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
348 resume_download=resume_download,
349 proxies=proxies,
--> 350 **kwargs
351 )
352 else:

/usr/local/lib/python3.6/site-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
171 ', '.join(cls.pretrained_config_archive_map.keys()),
172 config_file, CONFIG_NAME)
--> 173 raise EnvironmentError(msg)
174
175 except json.JSONDecodeError:

OSError: Model name '~/Documents/insight/projects/factCC/models/saved_models/' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-japanese, bert-base-japanese-whole-word-masking, bert-base-japanese-char, bert-base-japanese-char-whole-word-masking, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/~/Documents/insight/projects/factCC/models/saved_models//config.json' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

How do I use this tool for my own model?

Hi, I have trained an XLM model that translates from English to Spanish. A model for this language pair is not available on huggingface's repo. Is there any way to load my saved model?

General Question about word embedding and gradient output

Hi,
I am using BertForSequenceClassification to do binary classification task. Do you know how to get word embedding and gradient output for each word in BertForSequenceClassification https://github.com/huggingface/transformers/blob/7d7fe4997f83d6d858849a659302b9fdc32c3337/src/transformers/modeling_bert.py#L1075?

My use case: I want model to automatically tag a certain number of indicative words in test sentence to interpret why this sentence is labeled as COVID or non COVID. So my solution is to interpret this using gradient. But i haven't figured out how to output word embedding and gradient

Unable to load weights properly from tf checkpoint

The function load_tf_weights_in_bert in modeling_bert.py is buggy and throws a lot of attribute errors because of what seems as improper parsing of the variable names and the pointer pointing to the entire model.

For instance for the variable bert/encoder/layer_0/attention/output/dense/kernel it throws an attribute error along the lines of Bert model has no attribute weight because the pointer is the model bert itself whereas the pointer should be bert.encoder.layer.0.attention.output.dense.

Attention visualization for RoBERTa is blank, raises KeyError using call_html()

Hi @jessevig ,

Thank you so much for making this library. It's an incredibly effective and easy way to create beautiful attention visualizations. I'm currently trying to implement for a tutorial in the DeepChem library. However, I'm running into a multitude of issues. When I use the RoBERTa notebook code, it doesn't display anything and is blank:
Screen Shot 2020-06-19 at 11 07 07 PM

When I try to re-use the BERT colab code on RoBERTa, it throws the following error:
Screen Shot 2020-06-19 at 11 09 00 PM

I'm able to successfully run the example BERT code, it runs successfully, however:
Screen Shot 2020-06-19 at 11 10 03 PM

I'd love to get your advice on how I can fix this. I'm really interested in visualizing the attention using the library for validating the robustness of the model, and for future technical presentations and papers.

The Colab notebook can be accessed here.

Thanks!

not working on Safari?

Hey!
First of all, it was such a good work!
But some notebooks, "bertviz_detail.ipynb", are not working properly in safari.
Should it work on safari while working with jupyter?
Screen Shot 2019-06-19 at 11 00 51 AM

Using the attention to summarize a document

I've been looking for a tool which can give me some type of token-based extractive summarization to solve an especially interesting problem in the Competitive Debate community. I think that this tool will help me solve it.

I've wanted to create a neural network which summarizes texts by using a "highlighter", that is, by summarizing documents out of the words used in the original document (but NOT the sentences). I cannot seem to find a neural network based method that does exactly what I'm asking.... but the attention mechanism (and it's visualizations) show highlights of a particular source document in terms of highlighting the most important parts to cause a transformation to document b. This seems to be what I want

Actually, just as I typed out the previous paragraph, I'm getting the idea to do something like this: Take a news article and an abstractivly made short "summary" of a news article, and then take the most attended to tokens in the transformation between news article and summary and use that as the summary itself. Can I use bertviz to do what I am describing via bert, and if I can't, what are my best options?

Attention weight for a sentence

Hi,
Is it possible to calculate the attention weight for a full sentence.

For example, for sentence "This is test"

the attention matrix for the first layer and the first attention head is
[
[0.4,0.1,0.2],
[0.1,0.5,0.3],
[0.7,0.3,0.2]
]

if I get the average of that matrix, does that reflect the attention weight for the full sentence or that does not make sense??

[MASK] token attention patterns.

Thanks for this repo and article. Also i think will be interesting visualize attention with special [MASK] token what used for BERT pre training. How it interacts with the other tokens in sequence.

Horizontal head view feature

Hi, thanks for the great visualization tool!

I'm just wondering whether we can have a feature which renders head view in horizontal direction? The reason is that it's more suitable to show the sequence of tokens in the horizontal direction for language like Chinese, Japanese or Korean.

image

In the above example, typical sentences in Chinese take about 6,70 characters but it already uses a lot of space showing 10 of them in the current head view.

Thanks again for the great tool!

I have a problem loading my own model

Hi,I have a problem loading my own model
I can successfully load my fine-tune BERT model
But it has a problem message ### TypeError: object of type 'float' has no len() when I call head_view(attention, tokens)
I don't know how to solve this bug

擷取
Thanks a lot for your help

Can I use this to do previous-word prediction?

Hi,
I'm using Roberta, GPT2, BERT and Grover for various tasks, I was hoping to see if I could use Bertviz to help me work out what the previous word would have been if it was or not present.

Ah thinking about this I could just use Roberta for that. But regardless, my question still stands.

Is it possible to use this for previous-word prediction?

Many thanks.
Vince.

standalone web demo

Thanks for making such great visualization!

I wonder whether it is possible to render the vis for easier exploration into a normal html page rather than showing on jupyter notebook?

And suggestion or example for achieving this will be appreciated!

Attention matrix is asymmetric

Hi Jesse,
I find you work very interesting, thanks a lot for putting it out there!

I was digging into the attention values being visualized in the BERT map, specifically the return value of _get_attentions(), and found that the token-to-token attention weights are not symmetrical, as I would have expected. For instance, consider:

layer, head = 11, 0
att = _get_attentions(tokens_a, tokens_b, atts)
attmx = np.array(att['a']['att'][layer][head])

Here, the matrix attmx might look like this:

array([[0.10391058, 0.09832697, 0.09166335, 0.14575878, 0.08784127],
       [0.09632228, 0.09650009, 0.09524056, 0.12355924, 0.09061429],
       [0.12465193, 0.10896012, 0.11306546, 0.11939598, 0.10786319],
       [0.09877665, 0.10982872, 0.08591022, 0.11621149, 0.1339225 ],
       [0.11143579, 0.0954979 , 0.09444219, 0.1312461 , 0.07381313]])

How should the fact that it's asymmetrical be interpreted? If we consider the [CLS] token at the output layer (layer 11 in bert-base?), would the attention it receives from the second token in the previous layer be attmx[0][1] == 0.09832697 (or attmx[1][0] == 0.09632228)? Are either of these values incidental and can be safely ignored?

Thanks in advance!
-Samuel

BertForSequenceClassification.from_pretrained

Hi, Thank you for this great work.
can I use this code to plot my model(I am useing BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

model_type = 'bert'
model_version = 'bert-base-uncased'
do_lower_case = True
model = model #(this my model)
#tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)
sentence_a = sentences[0]
sentence_b = sentences[1]
call_html()
show(model, model_type, tokenizer, sentence_a, sentence_b)
I changed only the model with my model, and the sentences and I got this error??!!please help or share any blog that explain how to plot my model
AttributeError: 'BertTokenizer' object has no attribute 'cls_token'

Thank you in advance

Interpretability of a BERT model's intermediate layers

Hi,

I have a general question. If you feel like this is not relevant / don't feel like discussing this here, feel free to close this issue. Why do you associate the values of intermediate layers of the transformer with the input tokens? Is there a property of BERT/transformers that binds the representation to those specific tokens?

The way I see it, the representation of layer i + 1 is essentially a weighted sum over the input sequence. If attention chooses to give zero weight to the input token/hidden vector, the new representation will be absolutely unrelated to the initial input sequence. So what does it tell us that layer 5 is paying attention to the hidden representation of elements 4-5 in layer 4? Are they really still associated with the initial input tokens? Do you see what I'm getting at?

visualization of only 3 layers / example model_view_xlnet.ipynb

I tried load XLNet only with three layers (it does work with full XLNet) but with three the example model_view_xlnet.ipynb does not work

config = XLNetConfig.from_pretrained('/transformers/')
config.n_layer = 3
config.num_labels = 3
model = XLNetModel.from_pretrained('/transformers/')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-7c9c3356caa4> in <module>
     17 input_id_list = input_ids[0].tolist() # Batch index 0
     18 tokens = tokenizer.convert_ids_to_tokens(input_id_list)
---> 19 model_view(attention, tokens)

~/projects/bertviz/bertviz/model_view.py in model_view(attention, tokens, sentence_b_start, prettify_tokens)
     78     attn_seq_len = len(attn_data['all']['attn'][0][0])
     79     if attn_seq_len != len(tokens):
---> 80         raise ValueError(f"Attention has {attn_seq_len} positions, while number of tokens is {len(tokens)}")
     81     display(Javascript('window.params = %s' % json.dumps(params)))
     82     display(Javascript(vis_js))

ValueError: Attention has 768 positions, while number of tokens is 14

Some words are unable to be visualized?

It seems that some words not in the BERT vocab can be broken down into WordPiece tokens that are in the pretrained model and visualized, but others are not?

I tried this on some words from Norse mythology using the Colab notebook. "Ragnarok" can be broken down into rag ##nar ##ok but somehow "Valhalla" cannot. I tried it on sentences from tax documents and it doesn't quite work there either.

Save attention visualizations as local html file

I'm running the attention visualizations on a server without GUI.
Is there an easy way to run, e.g., head_view_bert.py and save the interactive visualizations to a local .html file which can then be viewed on another machine?

encode_plus is not in GPT2 Tokenizer

It seems you removed encode_plus, what is the successor? All the notebook includes inputs = tokenizer.encode_plus(text, return_tensors='pt', add_special_tokens=True) which is wrong and raise an error.

model_view shows black image?

Hi there,

First of all, thank you for building this package. It's exactly what I'm looking for right now! :)

I'm trying to use model_view to visualize the attention. I can use head_view and it works perfectly fine, but when I try to use model_view with everything else the same -- I just get a black rectangle. See the image below.

image

Any suggestions would be much appreciated.

attention details

Hi,

I found a non-working colab notebook here while I was reading here.

I was wondering if there is any way to get the attention details including the keys and vectors.

Thanks!

Missing [CLS] token in XLNet

I want to know why there is not CLS token in the visualizaition of XLNet . I used your XLNet code to visualize my fine tuned XLNet but I am not getting in any [CLS] token in the input . Can you tell , how can i get that ?

How to use model i've trained in this project

i've use the bert from huggingface to fine tuning the model for my task already. And i'm trying to use my model in this project so i can see my model in these 3 style provided in this project.
By the way, my task is text classification with pair of sentence.
Finally,thanks a lot to this paper and project, it's so excellent

layer and attention are empty.

I'm using colab but it doesn't work. Help.

%%javascript
require.config({
paths: {
d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
}
});

def` show_head_view(model, tokenizer, sentence_a, sentence_b=None):

inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', 

add_special_tokens=True)

input_ids = inputs['input_ids']
if sentence_b:
    token_type_ids = inputs['token_type_ids']
    attention = model(input_ids, token_type_ids=token_type_ids)[-1]
    sentence_b_start = token_type_ids[0].tolist().index(1)
else:
    attention = model(input_ids)[-1]
    sentence_b_start = None
input_id_list = input_ids[0].tolist() # Batch index 0
tokens = tokenizer.convert_ids_to_tokens(input_id_list)    
head_view(attention, tokens, sentence_b_start)

model_version = 'bert-base-uncased'
do_lower_case = True

model = BertModel.from_pretrained(model_version, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)

sentence_a = "the cat sat on the mat"
sentence_b = "the cat lay on the rug"

show_head_view(model, tokenizer, sentence_a, sentence_b)

capture

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.