GithubHelp home page GithubHelp logo

hmm-rnn's Introduction

hmm-rnn's People

Contributors

janmbuys avatar ybisk avatar

Stargazers

 avatar  avatar

Watchers

 avatar

hmm-rnn's Issues

Best Train/Val Numbers

Model Train Val MostCommon
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop06 51.860 80.610 0.3757
elman.ramsprop.drop0.4.dim850.lr0.002.trshdecay10.wdecay1e5 50.240 87.270 0.3858
rrnn-r.ramsprop.drop0.6.dim800.lr0.002.trshdecay10.wdecay1e5 56.370 88.910 0.3525
rnn-3.ramsprop.drop0.2.dim900.lr0.002.trshdecay10.wdecay1e5 77.130 107.450
rnn-2.ramsprop.drop0.5.dim850.lr0.002.trshdecay10.wdecay1e5 113.950 162.410
rnn-1.ramsprop.drop02.dim850.lr0.002.trshdecay10.wdecay1e7 201.350 207.950 0.2898
hmm-g none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE 195.910 243.510 0.3977 (max)
hmm-new none ramsprop.drop0.dim900.lr0.002.trshdecay10 233.220 284.590
hmm+1 none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE 208.090 287.000
hmm word h200 e200 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 210.630 288.150 0.4354 (marg)
hmm-new-c word ramsprop.drop0.dim900.lr0.002.trshdecay10 245.420 288.620
hmm-new-rnn-emit none ramsprop.drop0.dim900.lr0.002.trshdecay10 202.570 299.580
hmm none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE 246.080 304.090 0.5002 (marg)
hmm-new-elman-hmm-emit ramsprop.drop0.dim900.lr0.002.trshdecay10 325.140 343.040
hmm+1 word h200 e200 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 327.890 351.530

Tagging

Accuracy -- Does the most common tag of the word predicted, match the gold tag.
Perplexity -- p(w) --> p(t) against gold

HMM numbers are 1-best cluster, not marginal

model LM Prp UPOS PTB
hmm_none_h900_lr0.001_drop0.0_ramsprop_wd0.0 304.09 68.23 52.36
hmm_word_h200_lr20.0_drop0.0_sgd_wd0.0 288.15 61.66 45.16
hmm-g_none_h900_lr0.001_drop0.0_ramsprop_wd0.0 243.51 59.64 44.62
rnn-1_word_h850_lr0.002_drop0.2_ramsprop 207.95 48.54 36.68
rrnn-r_word_h800_lr0.002_drop0.6_ramsprop 88.91 52.63 43.06
elman_word_h850_lr0.002_drop0.4_ramsprop 87.27 54.59 44.97
lstm_word_h650_lr10.0_drop0.6_sgd 80.61 55.08 45.75

LM experiments

  • Perform sanity checks that models behave roughly as before with updated implementation.
  • Add dropout and do a minimal amount of hyperparameter tuning (although for good LM performance better optimization techniques will required).
  • Run experiments to compare models on PTB setup (once available).

Full Work List

Models

  1. LSTM
  2. RAN -- Implemented simplified version (RRNN)
  3. Elman (sigmoid)
  4. Elman (softmax)
  5. Elman (early softmax) -- decomposed hidden+input
  6. HMM (delayed emission) -- Implemented. but due to tensor expands using lots of GPU memory can only use small hidden state size (up to 150).
  7. HMM (delayed transition)
  8. HMM (still w/ word cond)
  9. HMM (vanilla)

TODO
Max dim always 1024

  1. LSTM --@janmbuys Tuning
    -- No Dropout
    -- SGD (two strategies)
    -- Dims
    -- LRs
  2. Implement the shit above
  3. LogSpace HMM -- made a tweak, it now seems to be getting ppl's very close to prob space.
  4. GridSearch -- Try and overfit
  • Elman (3) -- @ybisk attempting to tune
  • Elman (4)
  • Elman (5)
  • HMM (6)
  • HMM (7)
  • HMM (8)
  • HMM (9)
  1. Total parameter calculation - implemented

PTB LM setup

  • Load PTB data.
  • Compute perplexity on training and validation data.
  • Train with truncated backpropagation through time.

Models to implement

Add here whatever ideas we have and want to implement:

  • RAN and other additive RNN variants
  • HMM with delayed softmax in emission distribution

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.