arrasl / lrp_for_lstm Goto Github PK
View Code? Open in Web Editor NEWLayer-wise Relevance Propagation (LRP) for LSTMs.
License: Other
Layer-wise Relevance Propagation (LRP) for LSTMs.
License: Other
I've trained a bidirectional LSTM model on clickstream data sequences, and I want to explain the predictions using LRP. Would it be possible to adjust this implementation, so it can be used on sequential data?
A sequence, for example, looks like: [5,4,3,4,4,3,3,4,5]
My model consists of an encoder LSTM, an attention layer and a Linear decoder layer for the task of binary classification. So far I have propagated LRP all the way till the hidden layer inputs to the attention layer and am not sure how to propagate each hidden layer relevance to the input layer through the encoder LSTM.
This repo only assumes that the model is a simple encoder LSTM, and a linear decoder which takes the final hidden state as input to produce the output class, if I am right.
How can I propagate these individual hidden layer scores throught the LSTM using this approach? If I only try and propagate the last hidden layer scores through the LSTM using this code it 1. doesn't take the other hidden state scores into account 2. assumes that the attention layer only takes the last hidden state as the input.
I understand that this may be an open question, any help/advice on how to proceed will be greatly appreciated.
Hello @ArrasL,
Thank you for open sourcing your implementation of LRP.
Your work here and the comparative analysis in the paper is really interesting.
I tried to add LIME, LIMSSE to the list of methods and conduct similar experiments as suggested in the latter paper.
I rechecked the scores of my implementation of gradient based methods with yours in run_example.ipynb.
I used relevance as weights for creating sentence representations, I used them in 2 settings,
a. as raw scores
b. normalized values (softmax)
I have attached the plot for your reference. I wanted to check with you how you had used them because my plot for Gradient Input is not close to the one that we see in the paper.
Any guidelines or directions would be extremely helpful as i wish to reproduce the results.
ModuleNotFoundError Traceback (most recent call last)
in ()
----> 1 from code.LSTM.LSTM_bidi import *
2 from code.util.heatmap import html_heatmap
3
4 import codecs
5 import numpy as np
ModuleNotFoundError: No module named 'code.LSTM'; 'code' is not a package
can this be expanded for 2 classes instead of 5 sentiment classes? Would it require to change LSTM_bidi file?
Hello! May I assume the trained LSTM in the example consists of 1 LSTM layer? Can it be adapted into stacked LSTM scenario? Thx.
is it possible to train the bilsmt mode, does it require changing the model file with new weights?
The current class only supports loading of pretrained models and I am having a little trouble changing the init and implementing the train() function to train this model from scratch.
Thanks,
Sharan
Is it possible to adapt this code in order to deal with a regression problem? Let's say that each word of a sentence is a timestep of a timeseries and the number of words actually represents the history feed to the LSTM.
will it work on windows computer?
is Torch or TF needed?
I've been reviewing the bidirectional LSTM used in this example because I'd like to apply the LRP technique to my own model. However, I noticed that the weights and biases in the Keras LSTM are different from here and I would like to verify my observations are correct.
In this example, there are weights on the LSTM outputs from the Left and Right LSTM. Is that the implementation of the Dense layer?
In this example, there are separate biases for the h_Left/h_Right arrays and the x_Left/x_Right arrays? Keras provides only a single bias array for the two and I was wondering whether there are different LSTM architectures being followed. I also noticed that alewarne (at https://github.com/alewarne/Layerwise-Relevance-Propagation-for-LSTMs/blob/master/lstm_network.py) implemented this code with a single bias array.
Thanks for providing this reference code - I appreciate it.
My model consists of a multi-layer LSTM, an attention layer and a linear layer for binary classification. I need the stacked LSTMs and attention layer to achieve decent performance for a relatively complex task. I'm chugging through your detailed explanation for issue #8 (thank you!) for attention layer.
How would I propagate back the scores through the stacked LSTMs? Any guidance and advice would be much appreciated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.