Comments (12)
see same error when I trying to run the model, exactly same number. Is this related to TF version?
from hierarchical-attention-networks.
i just downgraded my tensorflow version to 1.1
i found that the implementation of bidirectional_dynamic_rnn has been changed in 1.2
from hierarchical-attention-networks.
Ok, I'll fix it next week (I stopped using tensorflow around 1.0).
from hierarchical-attention-networks.
I still encounter the same error on tensorflow 1.4.1. Is there a quick way to fix this?
from hierarchical-attention-networks.
Didn’t have any time to look at it, sorry! Would be happy to merge your PR, I suspect its something trivial.
from hierarchical-attention-networks.
Has there been a successful fix on this? I am also running into the same issue on tensorflow 1.5.0
from hierarchical-attention-networks.
ValueError: Trying to share variable tcm/word/fw/multi_rnn_cell/cell_0/bn_lstm/W_xh, but specified shape (100, 320) and found shape (200, 320).
I encountered this ValueError too.
Environment:
- tensorflow(1.6.0),
- Python 3.6.4.
from hierarchical-attention-networks.
exactly same error here
Environment:
tensorflow(1.6.0),
Python 3.6.2.
from hierarchical-attention-networks.
Based on my understanding of this issue in stackoverflow, I modified the code so that the the 'cell's passed as argument to HANClassifierModel in sentence_cell and word_cell become functions returning a cell. Then I called them when instanciating the bidirectional_rnn for sentence level and word level. In fact, each of the 2 cells necessary for the 2 bidirectional RNNs has to be instanciated as a different cell.
So the changes are :
# Defined cell entries as functions
def cell_maker():
cell = BNLSTMCell(80, is_training) # h-h batchnorm LSTMCell
# cell = GRUCell(30)
return MultiRNNCell([cell]*5)`
model = HANClassifierModel(
vocab_size=vocab_size,
embedding_size=200,
classes=classes,
word_cell=cell_maker, # put the function as a cell entry ( without calling it)
sentence_cell=cell_maker, # put the function as a cell entry ( without calling it)
word_output_size=100,
sentence_output_size=100,
device=args.device,
learning_rate=args.lr,
max_grad_norm=args.max_grad_norm,
dropout_keep_proba=0.5,
is_training=is_training,)
then
word_encoder_output, _ = bidirectional_rnn(
self.word_cell(), self.word_cell(), # called the function twice here
word_level_inputs, word_level_lengths,
scope=scope)
and
sentence_encoder_output, _ = bidirectional_rnn(
self.sentence_cell(), self.sentence_cell(), # called the function twice here
sentence_inputs, self.sentence_lengths, scope=scope)
It runs now for me, but I can't confirm the performances, since I don't have GPU to make a complete train/test process. Can anyone try it ?
from hierarchical-attention-networks.
@Sora77 - How much time did it took you to complete 1 epoch of training on CPU?
@Others - How much time did it took you guys to train on CPU/GPU and what hardware resources were you using?
For me currently it takes around 20 sec per iteration on GeForce GTX 970 4GB. Which I feel is pretty slow. Just wanted to have some idea what kind of speed gain i can expect with a better hardware.
from hierarchical-attention-networks.
I don't have the logs anymore, but I remember it was more or less the same speed that you have. Some implementations do not properly leverage GPUs power.
I advice you to replace you Tensorflow-gpu library with Tensorflow and see for yourself. If it's the same try to look for the bottleneck in the code.
from hierarchical-attention-networks.
Thanks @Sora77 for the quick response and your suggestions :)
from hierarchical-attention-networks.
Related Issues (20)
- some error in yelp_prepare.py HOT 4
- How to make `TensorBoard Projector` work.
- Why use orthogonal_initializer ?
- Error While Running yelp_prepare.py HOT 3
- Is the embedding initialized with a pre-trained one? HOT 2
- GRU VS LSTM HOT 1
- Are uw and us global weights? just to conform. HOT 1
- Mask for attention weight
- Getting same sentence level outputs for very different documents. Can someone please help.
- Embeddings for special tokens/padding?
- dev accuracy: nan???
- en-core-web-sm needs to be installed beforehand
- Same cell for word and sentence level HOT 3
- Performance on the paper's dataset HOT 6
- Won't the code leads to different input shape for different batch?
- Visualize word and sentence attention weight as color coded in the paper HOT 1
- Performance on Yelp 15
- Implementation using tf.contrib.seq2seq. HOT 7
- Attention layer output HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hierarchical-attention-networks.