Here is the code: <div class="snippet-clipboard-content notranslate position-relat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to visulaize attentions for Hugging face custom models about lit HOT 15 OPEN

pratikchhapolika commented on August 25, 2024

How to visulaize attentions for Hugging face custom models

from lit.

Comments (15)

jameswex commented on August 25, 2024 1

To run in a notebook, in your jupyter notebook, create your dataset and model classes and then create a LitWidget object with those objects and call render on it. An example can be seen here https://colab.sandbox.google.com/github/PAIR-code/lit/blob/main/lit_nlp/examples/notebooks/LIT_sentiment_classifier.ipynb in colab, but the code would be the same in jupyter.

If you want to see gradient-based salience methods in the LIT UI, then your model will need to have the apporpriate inputs and outputs to support them. See https://github.com/PAIR-code/lit/wiki/components.md#token-based-salience for details for having your model support the different salience methods.

from lit.

jameswex commented on August 25, 2024

To return values from predict_minibatch, need to convert that tensor([0.6403, 0.3597]) into a raw array of just [0.6403, 0.3597] as opposed to a tensor

from lit.

pratikchhapolika commented on August 25, 2024

To return values from predict_minibatch, need to convert that tensor([0.6403, 0.3597]) into a raw array of just [0.6403, 0.3597] as opposed to a tensor

In which line of code?

from lit.

jameswex commented on August 25, 2024

Not sure, you should check all your entries in batched_output to be sure they are normal python lists and not tensors. It might be the 'probas' entry that is the issue here.

from lit.

pratikchhapolika commented on August 25, 2024

Not sure, you should check all your entries in batched_output to be sure they are normal python lists and not tensors. It might be the 'probas' entry that is the issue here.

Updated the code but getting this warning not error.

from lit.

pratikchhapolika commented on August 25, 2024

@jameswex how can I launch the app in jupyter notebook itself instead as web page? How can I modify above code to do it?

from lit.

pratikchhapolika commented on August 25, 2024

@jameswex second question is, how to get gradient visulaization in salience maps. In the above code?

from lit.

pratikchhapolika commented on August 25, 2024

When I change to PCA viz its gives TypeError: (-0.7481077572209469+0j) is not JSON serializable.

from lit.

pratikchhapolika commented on August 25, 2024

To run in a notebook, in your jupyter notebook, create your dataset and model classes and then create a LitWidget object with those objects and call render on it. An example can be seen here https://colab.sandbox.google.com/github/PAIR-code/lit/blob/main/lit_nlp/examples/notebooks/LIT_sentiment_classifier.ipynb in colab, but the code would be the same in jupyter.

If you want to see gradient-based salience methods in the LIT UI, then your model will need to have the apporpriate inputs and outputs to support them. See https://github.com/PAIR-code/lit/wiki/components.md#token-based-salience for details for having your model support the different salience methods.

OK. 

Also how to overcome TypeError: (-0.7481077572209469+0j) is not JSON serializable.

from lit.

jameswex commented on August 25, 2024

The model and dataset code shouldn't change for notebooks. It's just that you create a LitWidget with the model and datasets, instead of a Server. Then you call render on the widget object.

I'm not sure about the root cause of that specific error. It's most likely that your predict_minibatch fn is returning some value for one of its fields for each example that isn't a basic, JSON-serializable type.

from lit.

pratikchhapolika commented on August 25, 2024

The model and dataset code shouldn't change for notebooks. It's just that you create a LitWidget with the model and datasets, instead of a Server. Then you call render on the widget object.

I'm not sure about the root cause of that specific error. It's most likely that your predict_minibatch fn is returning some value for one of its fields for each example that isn't a basic, JSON-serializable type.

Converted all to list. Still same error.

def predict_minibatch(self, inputs):
       # Preprocess to ids and masks, and make the input batch.
       encoded_input = self.tokenizer.batch_encode_plus([ex["sentence"] for ex in inputs],return_tensors="pt",add_special_tokens=True,max_length=512,padding="longest",truncation="longest_first")

       # Check and send to cuda (GPU) if available
       if torch.cuda.is_available():
           self.model.cuda()
           for tensor in encoded_input:
               encoded_input[tensor] = encoded_input[tensor].cuda()
       # Run a forward pass.
       with torch.no_grad():  # remove this if you need gradients.
           out: transformers.modeling_outputs.SequenceClassifierOutput = self.model(**encoded_input)
           unused_attentions = out.attentions
           # print(unused_attentions)
           # print(type(unused_attentions))

       # Post-process outputs.
       batched_outputs = {
           "probas": torch.nn.functional.softmax(out.logits, dim=-1).tolist(),
           "input_ids": encoded_input["input_ids"],
           "ntok": torch.sum(encoded_input["attention_mask"], dim=1).tolist(),
           "cls_emb": out.hidden_states[-1][:, 0].tolist(),  # last layer, first token
       }

       for i in range(len(unused_attentions)):
           batched_outputs[f"layer_{i:d}_attention"] = unused_attentions[i].detach().numpy()

       # unbatched_outputs = utils.unbatch_preds(batched_outputs)
       # Return as NumPy for further processing.

       # for k, v in batched_outputs.items():
       #     print("batched_output")
       #     print(v)
       #     print(type(v))
       detached_outputs = {k: v for k, v in batched_outputs.items()}
       # print("detached_outputs")
       # print(detached_outputs)
       # Unbatch outputs so we get one record per input example.
       for output in utils.unbatch_preds(detached_outputs):
           ntok = output.pop("ntok")
           output["tokens"] = self.tokenizer.convert_ids_to_tokens(output.pop("input_ids")[1:ntok - 1])
           # print('output["tokens"]')
           # print(output)
           yield output

from lit.

iftenney commented on August 25, 2024

Can you print the contents of batched_outputs, including types?

The error above:

TypeError: (-0.7481077572209469+0j) is not JSON serializable.

Looks like the value is a complex number a+bj, which is probably why it's not able to be serialized. NumPy arrays of floats should be fine, though; they'll be automatically converted to lists here: https://github.com/PAIR-code/lit/blob/main/lit_nlp/lib/serialize.py#L32

from lit.

pratikchhapolika commented on August 25, 2024

Can you print the contents of batched_outputs, including types?

The error above:
TypeError: (-0.7481077572209469+0j) is not JSON serializable.
Looks like the value is a complex number a+bj, which is probably why it's not able to be serialized. NumPy arrays of floats should be fine, though; they'll be automatically converted to lists here: https://github.com/PAIR-code/lit/blob/main/lit_nlp/lib/serialize.py#L32

@iftenney here is the outputs.

batched_outputs after for loop

for i in range(len(unused_attentions)):
            batched_outputs[f"layer_{i:d}_attention"] = unused_attentions[i].detach().numpy()

{'probas': [[0.6018652319908142, 0.3981347680091858], [0.5785479545593262, 0.42145204544067383], [0.6183280348777771, 0.3816719651222229], [0.6127758026123047, 0.3872241675853729]], 'input_ids': tensor([[  101,  2079,  2017,  2031,  1037, 19085,  2030,  1037, 12436, 20876,
          1029,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  2026,  5980,  2097,  5091,  2022,  3407,  2085,   999,   999,
          6293,  2026,  2524,  1010, 14908,  2266,  2046,  2115, 12436, 20876,
          3531,   999,   999,   999,   999,   999,   999,  1029, 10047,  2013,
          3742,  1057,  1029,   102,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  1045,  2031,  1037, 19085,  1012,  1045,  2215,  2000, 13988,
          1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  8840,  2140,  8700,  3348, 12436, 20876,  3398,  2054,  1996,
         12436, 20876,  2054,  2003,  2009,  2170,  1045,  2409,  1057,  1996,
         12436, 20876,  2087,  2450,  2655,  2009,  1037, 22418,  3398,  4521,
          4596,  2012,  2026,  2814, 17710, 13668,  2054,  2106,  2017,  1998,
          2115,  3611,  2079,  2253,  2041,  2000,  4521,  2253,  2005,  1037,
          3298,  2074,  2985,  2051,  2362,  2042,  2785,  1997, 26352,  2098,
          2041,  2651,  2339,  1029,  2049,  7929,  2073,  2106,  2017,  4553,
          2055,  3348,  4033,  1005,  1056,  2428,  2021,  2113,  2070,  2616,
          2073,  2106,  1057,  2175,  1029,  7592,  1029,   102]]), 'ntok': [12, 34, 12, 88], 'cls_emb': [[-0.014076177030801773, -0.0728173702955246, -0.078043133020401, 0.0938369482755661, -0.17423537373542786, 0.07189369201660156, 0.6690779328346252, 1.1941571235656738, -0.5418111085891724, 0.09891873598098755, 0.34711796045303345, -0.3437187671661377, -0.1604285091161728, -0.10622479021549225, 0.3024073839187622, 0.12053345888853073, -0.01676577888429165, ......]]

'layer_0_attention': array([[[[4.81520668e-02, 4.21391986e-02, 2.80070100e-02, ...,
          0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
         [1.28935039e-01, 3.51342373e-02, 8.70151743e-02, ...,
          0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
         [1.03371695e-01, 5.93042485e-02, 6.06599301e-02, ...,
          0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
         ..., dtype=float32), 'layer_1_attention': array([[[[3.56219172e-01....

batched_output key and value
[[0.6018652319908142, 0.3981347680091858], [0.5785479545593262, 0.42145204544067383], [0.6183280348777771, 0.3816719651222229], [0.6127758026123047, 0.3872241675853729]]
<class 'list'>

batched_output key and value

 for k, v in batched_outputs.items():
            print("batched_output key and value")
            print(v)
            print(type(v))
            print("*******************************************")

tensor([[  101,  2079,  2017,  2031,  1037, 19085,  2030,  1037, 12436, 20876,
          1029,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  2026,  5980,  2097,  5091,  2022,  3407,  2085,   999,   999,
          6293,  2026,  2524,  1010, 14908,  2266,  2046,  2115, 12436, 20876,
          3531,   999,   999,   999,   999,   999,   999,  1029, 10047,  2013,
          3742,  1057,  1029,   102,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  1045,  2031,  1037, 19085,  1012,  1045,  2215,  2000, 13988,
          1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  8840,  2140,  8700,  3348, 12436, 20876,  3398,  2054,  1996,
         12436, 20876,  2054,  2003,  2009,  2170,  1045,  2409,  1057,  1996,
         12436, 20876,  2087,  2450,  2655,  2009,  1037, 22418,  3398,  4521,
          4596,  2012,  2026,  2814, 17710, 13668,  2054,  2106,  2017,  1998,
          2115,  3611,  2079,  2253,  2041,  2000,  4521,  2253,  2005,  1037,
          3298,  2074,  2985,  2051,  2362,  2042,  2785,  1997, 26352,  2098,
          2041,  2651,  2339,  1029,  2049,  7929,  2073,  2106,  2017,  4553,
          2055,  3348,  4033,  1005,  1056,  2428,  2021,  2113,  2070,  2616,
          2073,  2106,  1057,  2175,  1029,  7592,  1029,   102]])
<class 'torch.Tensor'>
*******************************************
batched_output key and value
[12, 34, 12, 88]
<class 'list'>

detached_outputs

detached_outputs = {k: v for k, v in batched_outputs.items()}
        print("detached_outputs")

{'probas': [[0.6018652319908142, 0.3981347680091858], [0.5785479545593262, 0.42145204544067383], [0.6183280348777771, 0.3816719651222229], [0.6127758026123047, 0.3872241675853729]], 'input_ids': tensor([[  101,  2079,  2017,  2031,  1037, 19085,  2030,  1037, 12436, 20876,
          1029,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  2026,  5980,  2097,  5091,  2022,  3407,  2085,   999,   999,
          6293,  2026,  2524,  1010, 14908,  2266,  2046,  2115, 12436, 20876,
          3531,   999,   999,   999,   999,   999,   999,  1029, 10047,  2013,
          3742,  1057,  1029,   102,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  1045,  2031,  1037, 19085,  1012,  1045,  2215,  2000, 13988,
          1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  8840,  2140,  8700,  3348, 12436, 20876,  3398,  2054,  1996,
         12436, 20876,  2054,  2003,  2009,  2170,  1045,  2409,  1057,  1996,
         12436, 20876,  2087,  2450,  2655,  2009,  1037, 22418,  3398,  4521,
          4596,  2012,  2026,  2814, 17710, 13668,  2054,  2106,  2017,  1998,
          2115,  3611,  2079,  2253,  2041,  2000,  4521,  2253,  2005,  1037,
          3298,  2074,  2985,  2051,  2362,  2042,  2785,  1997, 26352,  2098,
          2041,  2651,  2339,  1029,  2049,  7929,  2073,  2106,  2017,  4553,
          2055,  3348,  4033,  1005,  1056,  2428,  2021,  2113,  2070,  2616,
          2073,  2106,  1057,  2175,  1029,  7592,  1029,   102]]), 'ntok': [12, 34, 12, 88], 'cls_emb': [[-0.014076177030801773, -0.0728173702955246, -0.078043133020401, 0.0938369482755661, -0.17423537373542786, 0.07189369201660156, 0.6690779328346252, 1.1941571235656738, -0.5418111085891724, 0.09891873598098755, 0.34711796045303345, -0.3437187671661377, -0.1604285091161728, -0.10622479021549225, 0.3024073839187622, 0.12053345888853073, -0.01676577888429165, 0.67

from lit.

iftenney commented on August 25, 2024

Thanks, all of those values look okay although the indentation is very strange so I could be missing something.
Can you post the error you're still seeing? You might try running under pdb and seeing which field it's coming from.

from lit.

aryan1107 commented on August 25, 2024

@pratikchhapolika
To visualize Huggingface models you can start by adding any basic models directly to LIT. Here is one example which I did using Huggingface.... the code might help #691

from lit.

How to visulaize attentions for Hugging face custom models about lit HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs