ttpro1995 / treelstmsentiment Goto Github PK

Pytorch implementation of Sentiment Classification in Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

License: MIT License

Python 81.89% Shell 0.30% Java 17.81%

sentiment-classification pytorch python treelstm

treelstmsentiment's Introduction

Tree-Structured Long Short-Term Memory Networks

A PyTorch based implementation of Tree-LSTM from Kai Sheng Tai's paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks.

Requirements

PyTorch Deep learning library
tqdm: display progress bar
meowlogtool: a logger that write everything on console to file
Java >= 8 (for Stanford CoreNLP utilities)
Python >= 3

Usage

First run the script ./fetch_and_preprocess.sh

This downloads the following data:

Stanford Sentiment Treebank (sentiment classification task)
Glove word vectors (Common Crawl 840B) -- Warning: this is a 2GB download!

and the following libraries:

Sentiment classification

python sentiment.py --name <name_of_log_file> --model_name <constituency|dependency> --epochs 10

We have not fully test on fine grain classification yet. Binary classification accuracy on both model are the same in original paper.

Acknowledgements

Kai Sheng Tai for the original LuaTorch implementation
Pytorch team for Python library
Riddhiman Dasgupta for his implement on sentiment relatedness https://github.com/dasguptar/treelstm.pytorch which I based on as starter code.

License

MIT

treelstmsentiment's People

Contributors

Stargazers

Watchers

treelstmsentiment's Issues

Constituency trees with only left child

the binaryTree implementation (

TreeLSTMSentiment/model.py

Line 111 in c61f73e

class BinaryTreeLSTM(nn.Module):

) does not work for constituency trees with only left child and no right child.

Index out of range error for very common case

Hey ttpro1995,

Thanks for producing this code. I run the dependency version fine, but when I run:

python sentiment.py --name constituency_log.txt --model_name constituency --epochs 10

I get the following stack trace:

  File "sentiment.py", line 228, in <module>
    main()
  File "sentiment.py", line 184, in main
    train_loss = trainer.train(train_dataset)
  File "/Users/tosiecki/Tom/TreeLSTMSentiment/trainer.py", line 37, in train
    output, err = self.model.forward(tree, emb, training = True)
  File "/Users/tosiecki/Tom/TreeLSTMSentiment/model.py", line 366, in forward
    tree_state, loss = self.tree_module(tree, inputs, training)
  File "/Users/tosiecki/Tom/env2.7/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/tosiecki/Tom/TreeLSTMSentiment/model.py", line 156, in forward
    _, child_loss = self.forward(tree.children[idx], embs, training)
  File "/Users/tosiecki/Tom/TreeLSTMSentiment/model.py", line 156, in forward
    _, child_loss = self.forward(tree.children[idx], embs, training)
  File "/Users/tosiecki/Tom/TreeLSTMSentiment/model.py", line 156, in forward
    _, child_loss = self.forward(tree.children[idx], embs, training)
  File "/Users/tosiecki/Tom/TreeLSTMSentiment/model.py", line 158, in forward
    lc, lh, rc, rh = self.get_child_state(tree)
  File "/Users/tosiecki/Tom/TreeLSTMSentiment/model.py", line 174, in get_child_state
    rc, rh = tree.children[1].state
IndexError: list index out of range

but it is very very common to have a tree with 1 child. How was this handled when you were testing? I tried to add a elif tree.num_children == 1 case, but I'm not super familiar with what should go here and it is a hack. Do you have any suggestions as to how to fix this?

Tom.

bug in model dependency

run python sentiment.py --name de --model_name dependency --epochs 10 has a bug.
tree.state = self.node_forward(embs[tree.idx-1], child_c, child_h) IndexError: index 48 is out of bounds for dimension 0 with size 27
please, how to solve it?

"None" in training labels

When I run the code with the command "--model_name dependency --epochs 10", there're errors like this:

Traceback (most recent call last):
  File "sentiment.py", line 228, in <module>
    main()
  File "sentiment.py", line 76, in main
    train_dataset = SSTDataset(train_dir, vocab, args.num_classes, args.fine_grain, args.model_name)
  File "/Users/runqi/Desktop/TreeLSTMSentiment/dataset.py", line 123, in __init__
    self.labels = torch.Tensor(self.labels) # let labels be tensor
TypeError: must be real number, not NoneType

Environment: python 3.6, Mac OS, pytorch 0.4.1

It seems like some root labels are None. Does anyone else have the same issue?

bug in testing for fine-grain classification

TreeLSTMSentiment/trainer.py

Line 73 in c61f73e

output[:,1] = -9999 # no need middle (neutral) value

We need to do this only for binary classification. But for fine-grained it should be removed.

Problem with Binary classification

If we make the fine_grain False, it performs the binary classification. But with binary, I can see there are three labels 0, 1, 2. So is it really a binary task?

What is getParameters() used for?

Looks like nowhere is that called.

javac can'not find lib/*.java

Hello
When I run "fetch_and_preprocess.sh", there is an error:
javac: can not find file: lib/*.java

error : edu.stanford.nlp.trees do not exist

lib\CollapseUnaryTransformer.java:3: 错误: 程序包edu.stanford.nlp.ling不存在
import edu.stanford.nlp.ling.Label;
^
lib\CollapseUnaryTransformer.java:4: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.Tree;
^
lib\CollapseUnaryTransformer.java:5: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.TreeTransformer;
^
lib\CollapseUnaryTransformer.java:6: 错误: 程序包edu.stanford.nlp.util不存在
import edu.stanford.nlp.util.Generics;
^
lib\CollapseUnaryTransformer.java:17: 错误: 找不到符号
public class CollapseUnaryTransformer implements TreeTransformer {
^
符号: 类 TreeTransformer
lib\CollapseUnaryTransformer.java:18: 错误: 找不到符号
public Tree transformTree(Tree tree) {
^
符号: 类 Tree
位置: 类 CollapseUnaryTransformer
lib\CollapseUnaryTransformer.java:18: 错误: 找不到符号
public Tree transformTree(Tree tree) {
^
符号: 类 Tree
位置: 类 CollapseUnaryTransformer
lib\ConstituencyParse.java:1: 错误: 程序包edu.stanford.nlp.process不存在
import edu.stanford.nlp.process.WordTokenFactory;
^
lib\ConstituencyParse.java:2: 错误: 程序包edu.stanford.nlp.ling不存在
import edu.stanford.nlp.ling.HasWord;
^
lib\ConstituencyParse.java:3: 错误: 程序包edu.stanford.nlp.ling不存在
import edu.stanford.nlp.ling.Word;
^
lib\ConstituencyParse.java:4: 错误: 程序包edu.stanford.nlp.ling不存在
import edu.stanford.nlp.ling.CoreLabel;
^
lib\ConstituencyParse.java:5: 错误: 程序包edu.stanford.nlp.process不存在
import edu.stanford.nlp.process.PTBTokenizer;
^
lib\ConstituencyParse.java:6: 错误: 程序包edu.stanford.nlp.util不存在
import edu.stanford.nlp.util.StringUtils;
^
lib\ConstituencyParse.java:7: 错误: 程序包edu.stanford.nlp.parser.lexparser不存在
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
^
lib\ConstituencyParse.java:8: 错误: 程序包edu.stanford.nlp.parser.lexparser不存在
import edu.stanford.nlp.parser.lexparser.TreeBinarizer;
^
lib\ConstituencyParse.java:9: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.GrammaticalStructure;
^
lib\ConstituencyParse.java:10: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.GrammaticalStructureFactory;
^
lib\ConstituencyParse.java:11: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.PennTreebankLanguagePack;
^
lib\ConstituencyParse.java:12: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.Tree;
^
lib\ConstituencyParse.java:13: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.Trees;
^
lib\ConstituencyParse.java:14: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.TreebankLanguagePack;
^
lib\ConstituencyParse.java:15: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.TypedDependency;

What can I do with all these packages do not exits???

Problem with fine-grain classification

In fine-grained mode, dev percentage is more than 80% in my test, but the result in paper is about 50%. I just wonder how to test the fine-grained classification correctly, thx~

ClassNotFoundException when producing dependency parsers

When I tried to run the cmd in the 'dependency_parse' function in preprocess_sst.py, an error occurred as follows:

Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.tagger.maxent.ExtractorNonAlphanumeric

I guess ExtractorNonAlphanumeric should exist in the corresponding directory in the stanford-parser.jar, but it is not the case actually. Can you help solve this issue? Many thanks.

Doubts about the BInaryTreeLSTM implementation

In your implementation of the BinaryTreeLSTM,

Can you explain the leaf / base condition - self.ox just passes through a linear layer, that is understandable since there is no hidden state for leafs, but why is there no weight params for input, update or forget gating, and why is cell state just passed through a linear layer?
And also for non-leaf nodes, you are completely ignoring passing the input through a linear layer, for all the gating units. Is there an explanation for that? In ChildSum, you have weight parameters for x_j, why not in n-ary lstm ?

self.ix = nn.Linear(self.in_dim,self.mem_dim)