movieBERT - BERT Model for Predicting Tags Movies

In our project we use the BERT base pre-trained model in order to predict tags for movies from a dataset called "mspt_full_data_csv".

Directory Tree

data: contains a custom dataset loading methods;
- MovieDataset.py
- filtered_mt.csv
- mspt_full_data_csv
model: contains the BERT model class for the classifier;
- BERTClassifier.py
preprocessing: contains a pyhton file used to preprocess the dataset;
- Preprocessing.py
training: contains the training and validation flow of our BERT model;
- Training.py
utils: contains the methods to perform traing, validation and test;
- Utils.py
Inference.py: with this file, you can test the model with the three movie examples written in the "test_text" array;
movieBERT-Colab.ipynb: the Colab file where we have done all the tests. You can download it, upload on Google Colab and visualize it.
Inference.py: with this file, you can test the model with the three movie examples written in the "test_text" array;
movieBERT-Colab.ipynb: the Colab file where we have done all the tests. You can upload it on Google Colab and visualize all the outputs. You can also check the Colab file at this link.
LSTM_test.ipynb: the Colab file where we have created an LSTM model to compare the performances of our BERT-based model with this one. You can upload it on Google Colab (the same link as the previous point) or you can check it at this link;
movieGPT-Colab.ipynb: the Colab file where we have tried to implement a GPT-based version of our model, in order to compare the twos. Because of the lack of power computation we had, we haven't trained it, but you can try on your own!

N.B.: to test the model, you have to download the fine-tuned BERT .pth file from the following Google Drive link and put the file into the "model" folder. To test the LSTM, you have to download the trained model from this link and you can find the "filtered_mt.csv" dataset in the "data" folder.

Description of the model

Our Bert model aims to predict films tags, given a carefully preprocessed dataset. The dataset in question is a csv file containing film name, description and genre (we have chosen maximum 5 types of genres). We preprocessed the dataset in order to define tokens to be passed to the model for learning. We then used the BERT base model (consisting of 12 Transformer Encoders) and defined suitable hyperparameters and started the training. We evaluated various variants of the model (3 versions in particular) in order to improve its accuracy. We selected one of them.

vannisil / moviebert Goto Github PK

moviebert's Introduction

movieBERT - BERT Model for Predicting Tags Movies

Directory Tree

Description of the model

Features of our BERT Model

Canva presentation

Authors

moviebert's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs