movie_sentiment's Introduction

Movie Sentiment Analysis

Dataset

Kaggle: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

Records: 50,000
Columns: 2
- review
- sentiment - "positive" and "negative" => binary classification problem

Dependencies

Pandas
Seaborn
Numpy
Scikit-learn
Tensorflow
Keras
Matplotlib
Pickle

pip install -r requirements.txt

Deep Learning using Neural Networks

Simple Neural Network

Sequential model
One Embedding layer
Flattening layer
Dense layer
- activation function

[Notebook](https://github.com/likarajo/movie_sentiment/blob/master/model_NN.ipynb)

Convolutional Neural Network (CNN)

Primarily used for 2D data classification, such as images. Work well with 1D text data as well. Tries to find specific features in the first layer. In the next layers, the initially detected features are joined together to form bigger features. Ref: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

Sequential model
One Embedding layer
1D convolutional layer
- features or kernels
- activation function
Global max pooling layer
- reduce feature size
Dense layer
- activation function

[Notebook](https://github.com/likarajo/movie_sentiment/blob/master/model_CNN.ipynb)

Recurrent Neural Network (CNN)

Long Short Term Memory Network (LSTM)
Recurrent neural networks variant

Sequential model
One Embedding layer
LSTM layer
- neurons
Dense layer
- activation function

[Notebook](https://github.com/likarajo/movie_sentiment/blob/master/model_RNN.ipynb)

Techniques used

Keras Embedding Layer
Stanford CoreNLP GloVe word embeddings

Conclusion

The difference between the accuracy values for training and test sets is much smaller in Recurrent NN as compared to that in Simple NN and Convolutional NN.
The difference between the loss values is negligible in Recurrent NN.
- Model is NOT overfitting

So RNN is the best best algorithm for the model for our text classification.