GithubHelp home page GithubHelp logo

moviebert's Introduction

movieBERT - BERT Model for Predicting Tags Movies

In our project we use the BERT base pre-trained model in order to predict tags for movies from a dataset called "mspt_full_data_csv".

Directory Tree

  • data: contains a custom dataset loading methods;
    • MovieDataset.py
    • filtered_mt.csv
    • mspt_full_data_csv
  • model: contains the BERT model class for the classifier;
    • BERTClassifier.py
  • preprocessing: contains a pyhton file used to preprocess the dataset;
    • Preprocessing.py
  • training: contains the training and validation flow of our BERT model;
    • Training.py
  • utils: contains the methods to perform traing, validation and test;
    • Utils.py
  • Inference.py: with this file, you can test the model with the three movie examples written in the "test_text" array;
  • movieBERT-Colab.ipynb: the Colab file where we have done all the tests. You can download it, upload on Google Colab and visualize it.
  • Inference.py: with this file, you can test the model with the three movie examples written in the "test_text" array;
  • movieBERT-Colab.ipynb: the Colab file where we have done all the tests. You can upload it on Google Colab and visualize all the outputs. You can also check the Colab file at this link.
  • LSTM_test.ipynb: the Colab file where we have created an LSTM model to compare the performances of our BERT-based model with this one. You can upload it on Google Colab (the same link as the previous point) or you can check it at this link;
  • movieGPT-Colab.ipynb: the Colab file where we have tried to implement a GPT-based version of our model, in order to compare the twos. Because of the lack of power computation we had, we haven't trained it, but you can try on your own!

N.B.: to test the model, you have to download the fine-tuned BERT .pth file from the following Google Drive link and put the file into the "model" folder. To test the LSTM, you have to download the trained model from this link and you can find the "filtered_mt.csv" dataset in the "data" folder.

Description of the model

Our Bert model aims to predict films tags, given a carefully preprocessed dataset. The dataset in question is a csv file containing film name, description and genre (we have chosen maximum 5 types of genres). We preprocessed the dataset in order to define tokens to be passed to the model for learning. We then used the BERT base model (consisting of 12 Transformer Encoders) and defined suitable hyperparameters and started the training. We evaluated various variants of the model (3 versions in particular) in order to improve its accuracy. We selected one of them.

Features of our BERT Model

  • Predicting movies tags from a given prompt. Below, an example of its output: output

Canva presentation

You can see the project presentation at this link

Authors

moviebert's People

Contributors

vannisil avatar uzingr avatar

Watchers

Kostas Georgiou avatar  avatar

Forkers

uzingr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.