GithubHelp home page GithubHelp logo

adrz / movie-posters-convnet Goto Github PK

View Code? Open in Web Editor NEW
42.0 4.0 3.0 10.8 MB

Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network

Python 47.20% HTML 15.96% JavaScript 9.67% CSS 23.06% Shell 2.49% Dockerfile 1.62%
python tensorflow keras movies posters scraping tsne vgg16

movie-posters-convnet's Introduction

Build Status codecov

Demo

Overview

Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network. Visualization using flask as a backend and d3js for the frontend.

This project is divided into 3 main scripts:

  • get_posters.py
    • retrieve the posters from impawards.com.
    • create a thumbnail for each posters for the visualization.
  • get_features_from_cnn.py
    • extract the last convolution layer of a pre-trained ConvNet (VGG-16 or ResNet50)
  • get_data_visu.py
    • dimension reduction for data-visualization with umap.
    • compute the cosine similarity and extract the 6 ``closest'' images for each posters.

To get parameters descriptions:

  • python src/get_XXX.py --help

Requirements

OS

  • Linux/Unix/OSX (requirement for wget)
  • Python 3.3+
  • ImageMagick
  • Postgresql

Packages Python

  • BeautifulSoup 4.4
  • Tensorflow
  • Keras
  • Pandas
  • requests
  • sklearn
  • numpy
  • PIL
  • flask

Warnings

The extraction of the features from ConvNet is long if you do not owned a GPU. The computation of the similarity between each posters required O(n^2) in memory which required around 32Go of RAM.

Installation

Clone the depot:

$ git clone https://github.com/adrz/movie-posters-convnet.git
$ cd movie-posters-convnet/
$ virtualenv -p python3 env
$ source env/bin/activate
$ pip install -r requirements-gpu.txt

Create postgresql database (supposed you already install postgresql):

$ psql -U postgres -c "createuser movieposters;"
$ psql -U postgres -c "createdb movieposters;"
$ psql -U postgres -c "alter user movieposters with encrypted password 'yourpassword';"
$ psql -U postgres -c "grant all privileges on database movieposters to movieposters ;"

Usage

Computation

After cloning you can just launch the bash script that will:

  • download posters from 1920 to 2016
  • compute features
  • compute the datavisualization features
$ python src/get_posters.py -c config/development.conf
$ python src/get_get_features_from_cnn.py -c config/development.conf
$ python src/get_data_visu.py -c config/development.conf

Then grab a coffee...

Visualization

$ source env/bin/activate
$ configapi=./config/development.conf
$ python app.py

Then launch index.html into your favorite browser:

$ chromium 127.0.0.1:5000/index.html

or

$ chromium 127.0.0.1:5000/index_complete.html

Results

Cherry-piking from the top-200 closest couple of posters (relative to cosine distance):


































License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

movie-posters-convnet's People

Contributors

adrz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.