GithubHelp home page GithubHelp logo

Comments (6)

superzhangxing avatar superzhangxing commented on July 26, 2024

@gabrer Hi, do you have re-implemented on dataset of original paper?
In my implementation, I can't get the same performance according to the paper.

from hierarchical-attention-networks.

MH23333 avatar MH23333 commented on July 26, 2024

@superzhangxing Hi, in my implementation, I can only get 63.7% accuracy rate on the dev dataset of yelp2013 under the same configuration according to the paper. The accuracy rate on the test dataset is supposed to be a bit lower, which is different from the performance in the paper. Is there some idea? Can those tricks reduce the gap?

from hierarchical-attention-networks.

superzhangxing avatar superzhangxing commented on July 26, 2024

@MH23333 Hi, do you use pre-trained word embeddings or random initialized ones? I train them with word2vec method on train and dev dataset. I believe it will improve the accurate. I have the same configuration according to the paper and get around 67% accurate on dev and test dataset. Such as optimizer with SGD+Momentum, and momentum parameter with 0.9. The only trick I think is aligning the sentence length in each batch to accelerate the training speed. and it also has been mentioned in the paper.

from hierarchical-attention-networks.

MH23333 avatar MH23333 commented on July 26, 2024

@superzhangxing Appreciate for your quickly reply!I use word embeddings in the way same as you, and set all hyper parameters mentioned in the paper. Maybe other unmentioned hyper parameters have an important influence on the results. How to set the following two parameters may be important: sentence length(how many words in a sentnece) and document length(how many sentences in a document).
I will do more experiments. Many thanks!

from hierarchical-attention-networks.

superzhangxing avatar superzhangxing commented on July 26, 2024

@MH23333 The max sentence length and max document length are both set 40. Please note that I use dynamic rnn , so I don't use fixed sentence length or fixed document length. I'm not sure whether it makes the influence.

from hierarchical-attention-networks.

MH23333 avatar MH23333 commented on July 26, 2024

@superzhangxing Thanks a lot! I also use dynamic rnn and masked attention. More contrast tests will be performed. Hope I can get better results.

from hierarchical-attention-networks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.