The performance reported in the Readme has not been computed on the same dataset used

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Performance on the paper's dataset about hierarchical-attention-networks HOT 6 OPEN

ematvey commented on July 26, 2024

Performance on the paper's dataset

from hierarchical-attention-networks.

Comments (6)

superzhangxing commented on July 26, 2024

@gabrer Hi, do you have re-implemented on dataset of original paper?
In my implementation, I can't get the same performance according to the paper.

from hierarchical-attention-networks.

MH23333 commented on July 26, 2024

@superzhangxing Hi, in my implementation, I can only get 63.7% accuracy rate on the dev dataset of yelp2013 under the same configuration according to the paper. The accuracy rate on the test dataset is supposed to be a bit lower, which is different from the performance in the paper. Is there some idea? Can those tricks reduce the gap？

from hierarchical-attention-networks.

superzhangxing commented on July 26, 2024

@MH23333 Hi, do you use pre-trained word embeddings or random initialized ones? I train them with word2vec method on train and dev dataset. I believe it will improve the accurate. I have the same configuration according to the paper and get around 67% accurate on dev and test dataset. Such as optimizer with SGD+Momentum, and momentum parameter with 0.9. The only trick I think is aligning the sentence length in each batch to accelerate the training speed. and it also has been mentioned in the paper.

from hierarchical-attention-networks.

MH23333 commented on July 26, 2024

@superzhangxing Appreciate for your quickly reply！I use word embeddings in the way same as you, and set all hyper parameters mentioned in the paper. Maybe other unmentioned hyper parameters have an important influence on the results. How to set the following two parameters may be important: sentence length(how many words in a sentnece) and document length(how many sentences in a document).
I will do more experiments. Many thanks!

from hierarchical-attention-networks.

superzhangxing commented on July 26, 2024

@MH23333 The max sentence length and max document length are both set 40. Please note that I use dynamic rnn , so I don't use fixed sentence length or fixed document length. I'm not sure whether it makes the influence.

from hierarchical-attention-networks.

MH23333 commented on July 26, 2024

@superzhangxing Thanks a lot! I also use dynamic rnn and masked attention. More contrast tests will be performed. Hope I can get better results.

from hierarchical-attention-networks.

Recommend Projects

Performance on the paper's dataset about hierarchical-attention-networks HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs