GithubHelp home page GithubHelp logo

jrc1995 / qa-babi-r_net Goto Github PK

View Code? Open in Web Editor NEW
0.0 4.0 0.0 57 KB

Attempted implementation of a minimal R-NET (with a different answer layer). Trained and tested only on the induction tasks of bAbi 10k dataset.

License: MIT License

Jupyter Notebook 75.34% Python 24.66%

qa-babi-r_net's Introduction

QA with R-NET

The model is mostly based on R-NET:

"R-NET: MACHINE READING COMPREHENSION WITH SELF-MATCHING NETWORKS" - Natural Language Computing Group, Microsoft Research Asia

As of 6th December 2017, a version of R-NET (with ensembles) acheives the highest position in SQuAD leaderboards.

For this model I used pre-trained GloVe embeddings of 100 dimensions from:
https://nlp.stanford.edu/projects/glove/

I skipped character embedding. I used to the positional encoding as used in "End-To-End Memory Networks" (- Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus), for sentence embedding of the facts. That is, the network works with sentential representations of facts, instead of working with word level representations of the facts.

Only 1-layered Bi-GRUs were used for encoding questions, passages and other works. The original implementation used 3-layered Bi-GRU for certain tasks.

Instead of a pointer network (which was used to predict the span of the original passage to work with SQuAD), I used a single GRU upon the question-aware self attended document representation (the output of the self-matching attention layer) and linearly transformed the final hidden state to get the probability distribution for the single word answer.

I used the transpose of the embedding matrix to linearly transform the final hidden state of the output layer into the probability distribution. Using embedding matrix for the final conversion to probability distribution seems to usually speed up training, sometimes quite substantially, without bringing any apparent detrimental effects.

I trained and tested the model on the induction tasks of bAbi 10k dataset. The accuracy isn't too different from my DMN+ implementation for this particular task when used with these specific settings.

Some other hyperparameters are different.

This is a rough and quick implementation, I made up in a hour or two. There may be some issues that I have overlooked.

qa-babi-r_net's People

Contributors

jrc1995 avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.