harvardnlp / sent-conv-torch Goto Github PK

Text classification using a convolutional neural network.

License: MIT License

Lua 56.10% Python 15.62% Jupyter Notebook 26.77% Shell 1.51%

sent-conv-torch's Introduction

Sentence Convolution Code in Torch

This code implements Kim (2014) sentence convolution code in Torch with GPUs. It replicates the results on existing datasets, and allows training of models on arbitrary other text datasets.

Quickstart

To make data in hdf5 format, run the following (with word2vec .bin path and choice of dataset):

python preprocess.py MR /path/to/word2vec.bin

To run training with GPUs:

th main.lua -data MR.hdf5 -cudnn 1 -gpuid 1

Results are timestamped and saved to the results/ directory.

Dependencies

The training pipeline requires Python hdf5 (the h5py module) and the following lua packages:

hdf5
cudnn

Training on word2vec architecture models requires downloading word2vec and unzipping. Simply run the script

./get_word2vec.sh

Creating datasets

We provide the following datasets: MR, SST1, SST2, SUBJ, TREC, CR, MPQA. All raw training data is located in the data/ directory. The SST1, SST2 data have both test and dev sets, and TREC has a test set.

The data takes word2vec embeddings, processes the vocabulary, and outputs a data matrix of vocabulary indices for each sentence.

To create the hdf5 file, run the following with DATASET as one of the described datasets:

python preprocess.py DATASET /path/to/word2vec.bin

The script outputs:

the DATASET.hdf5 file with the data matrix and word2vec embeddings
a DATASET.txt file with a word-index dictionary for the word embeddings

Training on custom datasets

We allow training on arbitrary text datasets. They should be formatted in the same way as the sample data, with one sentence per line, and the first word the class label (0-indexed). Our code handles most parsing of punctuation, possessives, capitalization, etc.

Example line:

1 no movement , no yuks , not much of anything .

Then run:

python preprocess.py custom /path/to/word2vec.bin --train /path/to/train/data --test /path/to/test/data --dev /path/to/dev/data

The output file's name can be set with the flag --custom_name (default is named custom).

Running torch

Training is typically done with 10-fold cross-validation and 25 epochs. If the data set comes with a test set, we don't do cross validation (but split training data 90/10 for the dev set). If the data comes with the dev set, we don't do the split for train/dev.

There are four main model architectures we implemented, as described in Kim (2014): rand, static, nonstatic, multichannel.

rand initializes the word embeddings randomly and learns them.
static initializes the word embeddings to word2vec and keeps the weight static.
nonstatic also initializes to word2vec, but allows them to be learned.
multichannel has two word2vec embedding layers, one static and one nonstatic. The two layers outputs are summed.

It is highly recommended that GPUs are used during training if possible (see Results section for timing benchmarks).

Separating out training and testing is easy; use the parameters -train_only and -test_only. Also, pretrained models at any stage can be loaded from a .t7 file with -warm_start_model (see more parameters below).

Output

The code outputs a checkpoint .t7 file for every fold with name -savefile. The default name is TIMESTAMP_results.

The following are saved as a table:

dev_scores with dev scores,
test scores with test scores,
opt with model parameters,
model with best model (as determined by dev score),
embeddings with the updated word embeddings

Model augmentations

A few modifications were made to the model architecture as experiments.

we include an option to include highway layers at the final MLP step (which increases depth of the model),
we also include highway layers at the convolutional step (which performs multiple convolutions on the resulting feature maps) as an option,
we experimented with skip kernels of size 5 (added in parallel with the other kernel sizes)

Results from these experiments are described below in the Results section.

Parameters

The following is a list of complete parameters allowed by the torch code.

model_type: Model architecture, as described above. Options: rand, static, nonstatic, multichannel
data: Training dataset to use, including word2vec data. This should be a .hdf5 file made with preprocess.py.
cudnn: Use GPUs if set to 1, otherwise set to 0
seed: Random seed, set to -1 for actual randomness
folds: Number of folds for cross-validation.
debug: Print debugging info including timing and confusions
savefile: Name of output .t7 file, which will hold the trained model. Default is TIMESTAMP_results
zero_indexing: Set to 1 if data is zero indexed
warm_start_model: Load a .t7 file with pretrained model. Should contain a table with key 'model'
train_only: Set to 1 to only train (no testing)
test_only: Given a .t7 file with model, test on testing data
dump_feature_maps_file: Filename for dumping feature maps of convolution at test time. This will be a .hdf5 file with fields feature_maps for the features at each time step and word_idxs for the word indexes (aligned with the last word of the filter). This currently only works for models with a single filter size. This is saved for the best model on fold 1.
preds_file: Filename for writing predictions (with test_only set to 1). Output is zero indexed.

Training hyperparameters:

num_epochs: Number of training epochs.
optim_method: Gradient descent method. Options: adadelta, adam
L2s: Set L2 norm of final linear layer weights to this.
batch_size: Batch size for training.

Model hyperparameters:

num_feat_maps: Number of convolution feature maps.
kernels: Kernel sizes of different convolutions.
dropout_p: Dropout probability.
highway_mlp: Number of highway MLP layers (0 for none)
highway_conv_layers: Number of highway convolutional layers (0 for none)
skip_kernel: Set 1 to use skip kernels

Results

The following results were collected with the same training setup as in Kim (2014) (same parameters, 10-fold cross validation if data has no test set, 25 epochs).

Scores

Dataset	`rand`	`static`	`nonstatic`	`multichannel`
MR	75.9	80.5	81.3	80.8
SST1	42.2	44.8	46.7	44.6
SST2	83.5	85.6	87.0	87.1
Subj	89.2	93.0	93.4	93.2
TREC	88.2	91.8	92.8	91.8
CR	78.3	83.3	84.4	83.7
MPQA	84.6	89.6	89.7	89.6

With 5 trials on SST1, we have a mean nonstatic score of 46.7 with standard deviation 1.69.

With 1 highway layer, SST1 achieves a mean score of mean 47.8, stddev 0.857, over 5 trials, and with 2 highway layers, mean 47.1, stddev 1.47, over 10 trials.

Timing

We ran timing benchmarks on SST1, which has train/dev/test data sizes of 156817/1101/2210. We used a batch size of 50.

| non-GPU | GPU --- | --- | --- per epoch | 2475 s | 54.0 s per batch | 787 ms | 15.6 ms

From these results, we see that using GPUs achieves almost a 50x speedup on training. This allows much faster tuning of parameters and model experimentation.

Relevant publications

This code is based on Kim (2014) and the corresponding Theano code.

Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751, Doha, Qatar. Association for Computational Linguistics.

Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. In Advances in Neural Information Processing Systems (pp. 2368-2376).

sent-conv-torch's People

Contributors

Stargazers

Watchers

Forkers

prateeky2806 kalyanp pandav2 tomokane wanjinchang hitluobin vseledkin hitwsl ml-lab techstone sohuren shihuaxing binbinbian hexingwei fatmas1982 neo4reo chagge imclab benjamesbabala gatsbyustc dhiegorp lixin4ever akmiller01 jsk11 yangqiokay fangzheng354 vyraun dramesh14 hfxunlp bfolkens lanouyu qingniufly miradel51 mouradmars damnupucci milk120706 archenroot tanganyao seominlee guodao pratheeksh butterflyaichinese mikewlange saquib-ali-khan colinsongf rintukutum a382695908 maoxiake littlelittlelittlelittlelittle worldelite clairett softwarelight nonamexujingda madhurigumma huyingxi soskek caolusg s4gustian skybirdhe ghiblifield cosecant-csc garminwu gilcreativity fengzhi19940518 sothoth042 shubhampachori12110095 coderbyr iammonster2333 suzhoushr vinodkumarcvk07 icc-qi ai3dvision quanpinjie afcarl luhg kimjeyoung topgunforone munaachyuta kalengit djfunksalot jiminglan zenrran jeremyasapp wurentidai giserh chenghuige seanlee97 a515151 sophialz caizd1994 unosonu cwang120 linkaiyi123 yishuihanhan jind11 languageandintelligence jasminexjf liu-guo-jing seopbo chunlinx

sent-conv-torch's Issues

No Harvard specific references

Instead of "Yoon", call him Kim (2014) . Instead of "/n/rushlab/" do "/path/to/" . Should be trivial for external groups to run.

training error

I got this error when I run training (main.lua).
I am sure MR.hdf5 file is exist.
Can someone give me some suggest....
Thanks a lot !!!

Aspire-VN7-791:~/NLP/sent-conv-torch-master$ th main.lua -data MR.hdf5 -cudnn 1
loading data...
data loaded!
vocab size: 18766
vec size: 300
==> fold 1
/home/swpc/torch/install/bin/luajit: main.lua:106: attempt to call global 'get_layer' (a nil value)
stack traceback:
main.lua:106: in function 'build_model'
main.lua:179: in function 'train_loop'
main.lua:342: in function 'main'
main.lua:355: in main chunk
[C]: in function 'dofile'
...swpc/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405e90

Memory error on GPU during testing

Hi,

When I run this code on GPU, it's running fine for training, but while testing it gives memory error.
Can you think of any reason for it?

out of memory at /home/shashank/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
[C]: at 0x7f3894b51280
[C]: in function 'cat'
./trainer.lua:167: in function 'test'
main1.lua:198: in function 'train_loop'
main1.lua:340: in function 'main'
main1.lua:353: in main chunk
[C]: in function 'dofile'
...hank/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

HDF5 installation with luarocks error

Hello!
When typing sudo luarocks install hdf5 I get the following errors:

gcclua.c: In function ‘int gcclua_tree_get_type_anonymous(lua_State*)’:
gcclua.c:636:22: error: ‘TYPE_ANONYMOUS_P’ was not declared in this scope
lua_pushboolean(L, TYPE_ANONYMOUS_P(*t));
^~~~~~~~~~~~~~~~
gcclua.c:636:22: note: suggested alternative: ‘MAP_ANONYMOUS’
lua_pushboolean(L, TYPE_ANONYMOUS_P(*t));
^~~~~~~~~~~~~~~~
MAP_ANONYMOUS

How do I fix that?

invalid device ordinal at /tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c:719

rzai@rzai00:/prj/sent-conv-torch$ th main.lua -data MR.hdf5 -cudnn 1
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c line=719 error=10 : invalid device ordinal
/home/rzai/torch/install/bin/luajit: main.lua:282: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c:719
stack traceback:
[C]: in function 'setDevice'
main.lua:282: in function 'main'
main.lua:354: in main chunk
[C]: in function 'dofile'
...rzai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
rzai@rzai00:/prj/sent-conv-torch$

I got two gtx1080

ValueError: invalid literal for int() with base 10

Hi there,
given links for Word2Vec are not working (get_word2vec.sh fails downloading) so I built a copy from this repo which is the only resource available at the moment (the Google repo seems to be down).

As a standalone program, word2vec appears to be working correctly, but when combined to the preprocess.py in this project, I cant manage to make it work.
It generates the .txt file, but the process crashes before outputting the hdf5 one with a:
ValueError: invalid literal for int() with base 10: '\x7fELF\x02\x01\x01'
when trying to preprocess any of the given datasets (i.e. MR).

I've tried with various word2vec build but the result is always the same.

module 'model/convNN.lua' not found:No LuaRocks module found for model/convNN.lua

rzai@rzai00:/prj/sent-conv-torch$ CUDA_VISIBLE_DEVICES=1 th main.lua -data MR.hdf5 -cudnn 0
loading data...
data loaded!
vocab size: 18766
vec size: 300
==> fold 1
epoch: 1 train perf: 70.476744186047 %, val perf 81.062124248497 %
epoch: 2 train perf: 81.337209302326 %, val perf 82.765531062124 %
epoch: 3 train perf: 87.093023255814 %, val perf 81.763527054108 %
epoch: 4 train perf: 91.337209302326 %, val perf 82.765531062124 %
epoch: 5 train perf: 94.709302325581 %, val perf 82.364729458918 %
epoch: 6 train perf: 96.546511627907 %, val perf 82.865731462926 %
epoch: 7 train perf: 98.186046511628 %, val perf 82.865731462926 %
epoch: 8 train perf: 98.872093023256 %, val perf 82.965931863727 %
epoch: 9 train perf: 99.267441860465 %, val perf 83.567134268537 %
epoch: 10 train perf: 99.558139534884 %, val perf 83.867735470942 %
epoch: 11 train perf: 99.779069767442 %, val perf 84.168336673347 %
epoch: 12 train perf: 99.755813953488 %, val perf 84.268537074148 %
epoch: 13 train perf: 99.779069767442 %, val perf 84.168336673347 %
epoch: 14 train perf: 99.872093023256 %, val perf 84.068136272545 %
epoch: 15 train perf: 99.883720930233 %, val perf 84.068136272545 %
epoch: 16 train perf: 99.883720930233 %, val perf 83.967935871743 %
epoch: 17 train perf: 99.953488372093 %, val perf 84.268537074148 %
epoch: 18 train perf: 99.953488372093 %, val perf 84.268537074148 %
epoch: 19 train perf: 99.93023255814 %, val perf 84.468937875752 %
epoch: 20 train perf: 99.96511627907 %, val perf 83.967935871743 %
epoch: 21 train perf: 99.941860465116 %, val perf 84.168336673347 %
epoch: 22 train perf: 99.96511627907 %, val perf 84.36873747495 %
epoch: 23 train perf: 99.96511627907 %, val perf 84.468937875752 %
epoch: 24 train perf: 99.96511627907 %, val perf 84.068136272545 %
epoch: 25 train perf: 99.976744186047 %, val perf 84.468937875752 %
best dev err: 84.468937875752 %, epoch 19
test perf 81.707317073171 %
saving checkpoint to results/20161106_2343_model_1.t7
==> fold 2
/home/rzai/torch/install/bin/luajit: /home/rzai/torch/install/share/lua/5.1/trepl/init.lua:384: module 'model/convNN.lua' not found:No LuaRocks module found for model/convNN.lua
no field package.preload['model/convNN.lua']
no file '/home/rzai/.luarocks/share/lua/5.1/model/convNN/lua.lua'
no file '/home/rzai/.luarocks/share/lua/5.1/model/convNN/lua/init.lua'
no file '/home/rzai/torch/install/share/lua/5.1/model/convNN/lua.lua'
no file '/home/rzai/torch/install/share/lua/5.1/model/convNN/lua/init.lua'
no file './model/convNN/lua.lua'
no file '/home/rzai/torch/install/share/luajit-2.1.0-beta1/model/convNN/lua.lua'
no file '/usr/local/share/lua/5.1/model/convNN/lua.lua'
no file '/usr/local/share/lua/5.1/model/convNN/lua/init.lua'
no file '/home/rzai/.luarocks/lib/lua/5.1/model/convNN/lua.so'
no file '/home/rzai/torch/install/lib/lua/5.1/model/convNN/lua.so'
no file '/home/rzai/torch/install/lib/model/convNN/lua.so'
no file './model/convNN/lua.so'
no file '/usr/local/lib/lua/5.1/model/convNN/lua.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
no file '/home/rzai/.luarocks/lib/lua/5.1/model/convNN.so'
no file '/home/rzai/torch/install/lib/lua/5.1/model/convNN.so'
no file '/home/rzai/torch/install/lib/model/convNN.so'
no file './model/convNN.so'
no file '/usr/local/lib/lua/5.1/model/convNN.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'error'
/home/rzai/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
main.lua:80: in function 'build_model'
main.lua:178: in function 'train_loop'
main.lua:341: in function 'main'
main.lua:354: in main chunk
[C]: in function 'dofile'
...rzai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
rzai@rzai00:/prj/sent-conv-torch$
rzai@rzai00:/prj/sent-conv-torch$ ll model/
total 24
drwxrwxr-x 2 rzai rzai 4096 10月 25 20:36 ./
drwxrwxr-x 6 rzai rzai 4096 11月 6 18:43 ../
-rw-rw-r-- 1 rzai rzai 4345 10月 25 20:36 convNN.lua
-rw-rw-r-- 1 rzai rzai 1392 10月 25 20:36 highway_conv.lua
-rw-rw-r-- 1 rzai rzai 957 10月 25 20:36 highway_mlp.lua
rzai@rzai00:/prj/sent-conv-torch$

Value Error when running preprocess

Hi, I'm currently facing the following issue with the given code.

[pfrcks:...jects/CBDet/sent-conv-torch]$ python preprocess.py MR ~/Downloads/GoogleNews-vectors-negative300.bin.gz
Traceback (most recent call last):
  File "preprocess.py", line 224, in <module>
    main()
  File "preprocess.py", line 195, in main
    w2v = load_bin_vec(args.w2v, word_to_idx)
  File "preprocess.py", line 15, in load_bin_vec
    vocab_size, layer1_size = map(int, header.split())
ValueError: invalid literal for int() with base 10: '\x1f\x8b\x08\x08\x80\xff\xa8R\x02\x03GoogleNews-vectors-negative300.bin'

Can you please shed some light on this? I've also tried using the word2vec taken from https://code.google.com/archive/p/word2vec/ source but it gives the same error.

Too slow

Hello! I am running this code on amazon p2.xlarge instance, and it is working much slower on GPU than on CPU.
That is the most time-consuming place:

trainer.lua
-- renormalize linear row weights
local w = layers.linear.weight
for j = 1, w:size(1) do
renorm(w[j])
end

The version of Lua I use is 5.2. CUDA is 7.5, and cuDNN is 5.1
Do you have any suggestions?

Bug with more than 10 classes

Hello!

The preprocessing code assumes that the class label is of one character. This fails (without notice) with more than 10 classes. It only takes the first character of the class.

The bug is on this line:
y = int(line[0]) + 1

I was able to fix by changing the function line_to_words to:

def line_to_words(line, dataset):
  if dataset == 'SST1' or dataset == 'SST2':
    clean_line = clean_str_sst(line.strip())
  else:
    clean_line = clean_str(line.strip())
  words = clean_line.split(' ')
  label= words[0]
  words = words[1:]

  return words,label

Then I changed line 51 to:
words,_ = line_to_words(line, dataset)

and then changed lines 100,101 to:

words,y = line_to_words(line, dataset)
y=int(y)+1

utf-8 decoding of Google word2vec embeddings incorrect

In preprocess.py, function load_bin_vec : the line ch = f.read(1) is incorrect, atleast for Python3.
Need to add .decode("utf-8") for correctly decoding characters from GoogleNews-vectors-negative300.bin

Also, need to convert the code to Python3 for 2020 use.

How to output prediction label? someone knows?

the static performance is better than nonstatic performance

I want to ask if all datasets use the same hyperparameters?
I didn't modify the hyperparameters and train model on the TREC, but the score I got is showing like:
static:
dev scores: 0.922, test score:0.936
while nonstatic:
dev scores: 0.938, test score:0.928

Or could you explain the score in scores table?，It's dev score?

No option for classes prediction on new unlabeled data

We do have an option to train model and test it on labeled data and calculate accuracy on testing. But what about model predictions for new unlabeled data?

It would be really useful to add prediction mode.

saving best model for validation

During validation you save the best model using ->
if dev_err > best_err then best_model = model:clone() best_epoch = epoch best_err = dev_err end

Why is the best model the one where the validation error is the maximum ? Am I missing something here ?

Results script

Let's add a bash script to generate all the results and timing numbers. This is helpful for people trying to improve on the current results.

Make dependencies clear

Note what it requires. For instance needs at least torch-hdf5 and python hdf5.

Even better we can do this with a clear error. For instance in Elements RNN they do:

assert(not nn.AbstractRecurrent, "update nnx package : luarocks install nnx")

Add README.md into the data/ directory

Describe each of the data sets and where they come from (citation + link)

Error: " bad argument #1 to 'squeeze' "

Hi authors,

I'm trying to run the training with th main.lua -data SST1.hdf5 -cudnn 0. However, I get the error "bad argument #1 to 'squeeze'" with the recent version of torch (and the code). Here is the error log:

./trainer.lua:159: bad argument #1 to 'squeeze' (dimension out of range at ~/torch/pkg/torch/lib/TH/generic/THTensor.c:489)
stack traceback:
        [C]: in function 'squeeze'
        ./trainer.lua:159: in function 'test'
        main.lua:198: in function 'train_loop'
        main.lua:341: in function 'main'
        main.lua:354: in main chunk

Checking the dimension via print(conv_layer.output:size()), I get

  50
  57
 100
[torch.LongStorage of size 3]

Changing the squeeze argument to 3 instead of 4 allows the training to run, but I'm not sure whether this is the intended value for different conditions.

Effect on performance from class imbalance in a multiclass-classification setting?

Hi,

I am trying out this model on my custom dataset with the following frequency distribution of class labels :

7: 23849, 0: 15159, 1: 6445, 4: 5759, 5: 3969, 3: 3659, 2: 2845, 6: 492

I am getting ~65% accuracy on ~16K testing samples after training on the above mentioned dataset. Can class imbalance be one of the reason for this low accuracy?

I am using the model in it's original setting (assuming the best settings as reported in the paper).

Please explain how HighwayMPL is imported

I am trying to use highway_mlp.lua in my code but can't understand where it is getting loaded in your code. It is used as global in convNN.lua:

local highway = HighwayMLP.mlp((#layer1) * opt.num_feat_maps, opt.highway_layers)

but nowhere in your code it is actually instantiated. How does convNN.lua is able to register HighwayMPL as a global?

How is it working?

Fixing RNG seed

Hi,
I am using this code as benchmark in one of my papers. Now I need to fix the random seed in order to do so. I am trying the seed flag, - '-seed 42'. But even with this flag I am getting different results each time.

Can you guide me on this?

Thanks

Argument to dump word vectors

It would be cool if the make_hdf5.py script output a text file of dict/ids, so you could get back the mapping. Additionally the non-static code could redump the new word-embeddings.

nn.Linear(input, output) should be nn.Linear(input, output, false)?

Short question. In highway_mlp.lua line 18:

transform_gate = nn.Sigmoid()(nn.AddConstant(bias)(nn.Linear(size, size)(inputs[i])))

Shouldn't you have defined linear as nn.Linear(size, size, false)?
Because if not given, nn.Linear will add its internal bias by default.

Better file names

Let's move model extensions to their own directories, and give the python script a better name.

No global code

Let's move everything into functions. Only global code should be a call to main()

Parameter for GPU selection

Make the GPU selection a command-line param

Problem with device assertion

Hi everyone, when I trying to run main.lua with my dataset, which was created due to following command:

python preprocess_utf8.py custom w2v/ruscorpora_russe.model.bin --train Trainset.txt --test Testset.txt --dev Validset.txt --custom_name ParentClass

I always get the error:

/tmp/luarocks_cunn-scm-1-9034/cunn/lib/THCUNN/ClassNLLCriterion.cu:52: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-3543/cutorch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered
/home/ubuntu/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at /tmp/luarocks_cutorch-scm-1-3543/cutorch/lib/THC/generic/THCStorage.c:32
stack traceback:
        [C]: at 0x7fac9f291f60
        [C]: in function '__index'
        ...ntu/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:52: in function 'forward'
        ./trainer.lua:187: in function 'test'
        main.lua:197: in function 'train_loop'
        main.lua:340: in function 'main'
        main.lua:353: in main chunk
        [C]: in function 'dofile'
        ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00405e90

What kind of thing caused it? In my opinion, there may be some discrepancy between sizes of dev,test, train sets. But why does it matter? It is just my hypothesis. Does anyone know something about it?

Asking about the max_time

Hi authors,

Could I ask for an explanation for the part

max_time = nn.Max(3)(cudnn.ReLU()(conv_layer))

Theoretically, Is it different than using a temporal maxpooling on conv_layer ? I am quite confused at this part, because when I tried to use temporal maxpooling on the whole output sequence, it gave me different result.

Thank you very much,

Doesn't classify on 10+ classes

When training on 10+, network seems to reduce number of classes to 10.

I found it by printing out target[i] on line 175 in trainer.lua - it shows that classes always numbered from 1 to 10:
for i = 1, batch_size do
confusion:add(outputs[i], targets[i])
print('target '..targets[i])
end

Also you can see it in debug mode: confusion matrix is always 10x10 sized.

Haven't locate where exactly the issue is yet.

Allow training on other dataset

Give clear instructions on how to train on new datasets.

words embedding normalization

I didn't see any normalization for the word vectors (maybe i missed it). should we do it for the word embedding or we should use the same vector representation as (google's) word2vec ?

how to change learning decay in this model?

Separate out training and testing

Ideally there would be separate entrypoints in the code for training and testing since a trained model could be tested on anything else.

How to deal with padding (0 vector) without maskZero in ConvolutionLayer?

I find there is no maskZero for the temporalConvolutionLayer.
In this case, if we meet a window of all embeddings are paddings, which means 0 vectors.,
I think the convolution layer will always give us the bias.

But wouldn't the output be 0? Or we don't need maskZero in Convolution Layer?

Bug in Custom Dataset name

Hello,

In the current version, the 'custom_name' argument does not work for the file preprocess.py.
To resolve it, you can move the following lines 174 and 175 to line 184.

 if dataset == 'custom':
    dataset = args.custom_name

Currently, dataset is changed from custom to a new name, and then you check again for dataset=='custom' on line 179.

When I train on custom data set that vocab size is 158690 ,it also runs error of out of momery.

Hi ,I trained on a custom data set that vocab size is 158690 ,it also runs error as the figure shows.so ,the system only can train on the small date set? Can you tell me what the problem is.Thank you very much.

problem running with GPU

Hei,

First of all great job!
I am trying your code with GPU turned on but I get the error below (I tried it on different machines, and it persists), hope you can help. I found a commit by Karpathy which looks related, if it can help.

th main.lua -data MR.hdf5 -cudnn 1 -gpuid 1
loading data...
data loaded!
vocab size: 18766
vec size: 300
==> fold 1
/home/elia/torch/install/bin/luajit: invalid arguments: CudaTensor CudaTensor CudaTensor number
expected arguments: CudaTensor | [CudaTensor] [CudaLongTensor] CudaTensor index
stack traceback:
[C]: at 0x7f0e5efcdd30
[C]: in function 'max'
/home/elia/torch/install/share/lua/5.1/nn/Max.lua:30: in function 'func'
/home/elia/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
/home/elia/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
./trainer.lua:61: in function 'opfunc'
/home/elia/torch/install/share/lua/5.1/optim/adadelta.lua:31: in function 'optim_method'
./trainer.lua:86: in function 'train'
main.lua:195: in function 'train_loop'
main.lua:341: in function 'main'
main.lua:354: in main chunk
[C]: in function 'dofile'
...elia/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Citations in readme

For instance see https://github.com/HIPS/Spearmint