GithubHelp home page GithubHelp logo

vishwajit123 / keras_snli Goto Github PK

View Code? Open in Web Editor NEW

This project forked from smerity/keras_snli

0.0 1.0 0.0 12 KB

Simple Keras model that tackles the Stanford Natural Language Inference (SNLI) corpus using summation and/or recurrent neural networks

License: MIT License

Python 100.00%

keras_snli's Introduction

Keras SNLI baseline example

This repository contains a simple Keras baseline to train a variety of neural networks to tackle the Stanford Natural Language Inference (SNLI) corpus.

The aim is to determine whether a premise sentence is entailed, neutral, or contradicts a hypothesis sentence - i.e. "A soccer game with multiple males playing" entails "Some men are playing a sport" while "A black race car starts up in front of a crowd of people" contradicts "A man is driving down a lonely road".

The model architecture is:

  • Extract a 300D word vector from the fixed GloVe vocabulary
  • Pass the 300D word vector through a ReLU "translation" layer
  • Encode the premise and hypothesis sentences using the same encoder (summation, GRU, LSTM, ...)
  • Concatenate the two 300D resulting sentence embeddings
  • 3 layers of 600D ReLU layers
  • 3 way softmax

Visual image description of the model

Training uses RMSProp and stops after N epochs have passed with no improvement to the validation loss. Following Liu et al. 2016, the GloVe embeddings are not updated during training. Following Munkhdalai & Yu 2016, the out of vocabulary embeddings remain zeroed out.

One of the most important aspects when using fixed Glove embeddings with summation is the "translation" layer. Bowman et al. 2016 use such a layer when moving from 300D to the lower dimensional 100D hidden state. This is likely highly important for the summation method as it allows the GloVe space to be shifted before summation. Technically when done with training the "translated" GloVe embeddings could be precomputed and this layer removed, decreasing the number of parameters, but ¯\_(ツ)_/¯

The model is relatively simple yet sits at a far higher level than other comparable baselines (specifically summation, GRU, and LSTM models) listed on the SNLI page. The summary: don't dismiss well tuned GloVe bag of words models - they can still be competitive and are far faster to train!

Model Parameters Train Validation Test
300D sum(word vectors) + 3 x 600D ReLU (this code) 1.2m 0.831 0.823 0.825
300D GRU + 3 x 600D ReLU (this code) 1.7m 0.843 0.830 0.823
300D LSTM + 3 x 600D ReLU (this code) 1.9m 0.855 0.829 0.823
-- --- --- --- ---
300D LSTM encoders (Bowman et al. 2016) 3.0m 0.839 - 0.806
1024D GRU w/ unsupervised 'skip-thoughts' pre-training (Vendrov et al. 2015) 15m 0.988 - 0.814
300D Tree-based CNN encoders (Mou et al. 2015) 3.5m 0.833 - 0.821
300D SPINN-PI encoders (Bowman et al. 2016) 3.7m 0.892 - 0.832
600D (300+300) BiLSTM encoders (Liu et al. 2016) 3.5m 0.833 - 0.834

Only the numbers for pure sentential embedding models are shown here. The SNLI homepage shows the full list of models where attentional models perform better. If I've missed including any comparable models, submit a pull request.

All models could benefit from a more thorough evaluation and/or grid search as the existing parameters are guesstimates inspired by various papers (Bowman et al. 2015, Bowman et al. 2016, Liu et al. 2016). That the summation of word embeddings (jokingly referred to as SumRNN) performs so well compared to GRUs or LSTMs is a surprise and warrants additional investigation. Further work should be done exploring the hyperparameters of the GRU and LSTM such that they beat the SumRNN baseline.

keras_snli's People

Contributors

smerity avatar

Watchers

Vishwajit Sasi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.