GithubHelp home page GithubHelp logo

benckx / dnn-movie-posters Goto Github PK

View Code? Open in Web Editor NEW
47.0 4.0 34.0 1.97 MB

Classify movie posters by genre

Python 98.90% Shell 1.10%
cnn cnn-classification keras tensorflow tensorflow-examples keras-tutorials keras-classification-models movies tutorial machine-learning ai machine-learning-tutorial

dnn-movie-posters's Introduction

About

A simple demo / tutorial / experiment / portfolio project for me to better understand the concepts of Machine Learning.

Classify Movies by Genre

Use Convolutional Neural Network (CNN) to classify movies posters by genre. It is a multi-label classification problem (movies can belong to multiple genres). Each instance (movie poster) has an independent probability to belong to each label (genre).

The implementation is based on Keras and TensorFlow.

With 14,265 train samples and 2,826 validation samples (movies from 1977 to 2017), 106x161 images and after 50 epochs, the results look like this ([!] indicates the predicted genre is not found in the original dataset):

        

The Matrix (1999)                ['Action: 91%', 'Drama[!]: 25%', 'Adventure[!]: 13%']
The Others (2001)                ['Drama[!]: 76%', 'Horror: 65%', 'Action[!]: 41%']
Alien: Resurrection (1997)       ['Horror: 67%', 'Action: 64%', 'Drama[!]: 43%']
The Martian (2015)               ['Drama: 95%', 'Adventure: 81%', 'Comedy[!]: 23%']


        

The Truman Show (1998)           ['Comedy: 98%', 'Drama: 76%', 'Romance[!]: 7%']
Pretty Woman (1990)              ['Romance: 99%', 'Comedy: 99%', 'Drama[!]: 22%']
Whatever Works (2009)            ['Drama[!]: 86%', 'Comedy: 78%', 'Romance: 76%']
Bienvenue chez les C.. (2008)    ['Comedy: 98%', 'Romance: 98%', 'Drama[!]: 7%']


        

Paprika (2006)                   ['Animation: 66%', 'Comedy[!]: 58%', 'Adventure: 31%']
Spirited Away (2001)             ['Animation: 83%', 'Drama[!]: 57%', 'Adventure: 42%']
Castle in the Sky (1986)         ['Animation: 88%', 'Adventure: 78%', 'Comedy[!]: 30%']
Zootopia (2016)                  ['Animation: 62%', 'Adventure: 59%', 'Comedy: 49%']


Overall accuracy is 45% (I'm actually not sure it's the most suited metrics for this).

Dataset

The dataset was found on Kaggle and contains about 27,000 posters.

It is split as followed:

  • Training: 5/7
  • Validation: 1/7
  • Test: 1/7

Module movies_dataset.py provides functions to access the dataset easily (parse MovieGenres.csv, list movies, get movie genres, get poster, etc).

Dataset Parameters

  • min_year and max_year: Movie release time range (e.g. from 1977 to 2017). Posters design is very dependent on release year, therefore using a larger time range might increase noise.
  • genres: Classes. In the current configuration, genres are grouped by 3 (Comedy, Drama, Action), 7 (idem + Animation, Romance, Adventure, Horror) or 14 (idem + Sci-Fi, Crime, Mystery, Thriller, War, Family, Western)
  • ratio: Original pictures size is 182x268 (ratio 100). You can use a smaller pictures for quicker (but probably less accurate) model training (30, 40, 50, etc).

Model Parameters

  • epochs: Number of epochs.
  • version: Version of the model. Different versions can have different parameters (e.g. kernel size, etc), so different configurations can be compared easily.

How to

Linux prerequisites

  • imagemagick (to resize the original poster image files)

Modules prerequisites

Get posters data

  • Use flag -download to download the posters from Amazon (based on the URLs provided in MovieGenre.csv)
  • Use flag -resize to create smaller posters (30%, 40%, etc)
  • Use parameter -min_year=1980 to filter out the oldest movies
python3 get_data.py -download -resize

Train the model

This script builds and trains models. Models are saved to 'saved_models'. One or multiple models (with different parameters) can be produced.

python3 __main__.py

Evaluate the model and test predictions

This script iterates through all the saved models in 'saved_models' and evaluates them on the test data.

python3 tests.py

Generate Movies Posters with DCGAN

Use Deep Convolutional Generative Adversarial Networks (DCGAN) to generate movie posters:

Watch training video

How to

1. Download the forked DCGAN-tensorflow.

git clone https://github.com/benckx/DCGAN-tensorflow.git

2. Prepare dataset with the parameters you want (git clone this project and download posters first if you didn't):

python3 prepare_dcgan_dataset.py -min_year=1980 -exclude_genres=Animation,Comedy,Family -ratio=60

This will create a folder 'dcgan_movies_posters' with all the posters selected from the parameters values.

3. Move folder 'dcgan_movies_posters' to DCGAN-tensorflow/data/dcgan_movies_posters

4. In DCGAN-tensorflow, run the command with the parameters you need (the parameters I added or removed are documented here):

python3 main.py --dataset dcgan_movies_posters --grid_height=6 --grid_width=10  -sample_rate=2 --train

Run in the Cloud

AWS EC2:

  • AMI: ami-e07e779a. No packages install required.
  • Instance type: g2.2xlarge
  • Run source activate tensorflow_p36 to activate the correct Anaconda environment.

Going Further

A few things I'm currently working on or thinking about:

CNN

  • Predict movie release year / rating from the poster
  • Improve model versioning to compare different settings (kernel size, loss function, etc.)
  • Print neurons state for each genre

GAN

  • Run the dataset on this other GAN model
  • Migrate DCGAN-tensorflow to Keras
  • Find a way to query a GAN model with parameters, for example: generate a Sci-Fi movie poster made in the 80s
  • Explore how GAN can be applied to sound and video

dnn-movie-posters's People

Contributors

benckx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.