GithubHelp home page GithubHelp logo

jeff-da / film Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ethanjperez/film

0.0 1.0 0.0 2.53 MB

FiLM: Visual Reasoning with a General Conditioning Layer

License: Other

Python 93.23% Shell 6.77%

film's Introduction

FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

This code implements a Feature-wise Linear Modulation approach to Visual Reasoning - answering multi-step questions on images. This codebase reproduces results from the AAAI 2018 paper "FiLM: Visual Reasoning with a General Conditioning Layer," which extends prior work "Learning Visual Reasoning Without Strong Priors" presented at ICML's MLSLP workshop.

Code Outline

This code is a fork from the code for "Inferring and Executing Programs for Visual Reasoning" available here.

Our FiLM Generator is located in vr/models/film_gen.py, and our FiLMed Network and FiLM layer implementation is located in vr/models/filmed_net.py.

We inserted a new model mode "FiLM" which integrates into forked code for CLEVR baselines and the Program Generator + Execution Engine model. Throughout the code, for our model, our FiLM Generator acts in place of the "program generator" which generates the FiLM parameters for an the FiLMed Network, i.e. "execution engine." In some sense, FiLM parameters can vaguely be thought of as a "soft program" of sorts, but we use this denotation in the code to integrate better with the forked models.

Setup and Training

Because of this integration, setup instructions for the FiLM model are nearly the same as for "Inferring and Executing Programs for Visual Reasoning." We will post more detailed instructions on how to use our code in particular soon for more step-by-step guidance. For now, the guidelines below should give substantial direction to those interested.

First, follow the virtual environment setup instructions.

Second, follow the CLEVR data preprocessing instructions.

Lastly, model training details are similar at a high level (though adapted for FiLM and our repo) to these for the Program Generator + Execution Engine model, though our model only uses one step of training, rather than a 3-step training procedure.

The below script has the hyperparameters and settings to reproduce FiLM CLEVR results:

sh scripts/train/film.sh

For CLEVR-Humans, data preprocessing instructions are here. The below script has the hyperparameters and settings to reproduce FiLM CLEVR-Humans results:

sh scripts/train/film_humans.sh

Training a CLEVR-CoGenT model is very similar to training a normal CLEVR model. Training a model from pixels requires modifying the preprocessing with scripts included in the repo to preprocess pixels. The scripts to reproduce our results are also located in the scripts/train/ folder.

We tried to not break existing models from the CLEVR codebase with our modifications, but we haven't tested their code after our changes. We recommend using using the CLEVR and "Inferring and Executing Programs for Visual Reasoning" code directly.

Training a solid FiLM CLEVR model should only take ~12 hours on a good GPU (See training curves in the paper appendix).

Running models

We added an interactive command line tool for use with the below command/script. It's actually super enjoyable to play around with trained models. It's great for gaining intuition around what various trained models have or have not learned and how they tackle reasoning questions.

python run_model.py --program_generator <FiLM Generator filepath> --execution_engine <FiLMed Network filepath>

By default, the command runs on this CLEVR image in our repo, but you may modify which image to use via command line flag to test on any CLEVR image.

CLEVR vocab is enforced by default, but for CLEVR-Humans models, for example, you may append the command line flag option '--enforce_clevr_vocab 0' to ask any string of characters you please.

In addition, one easier way to try out zero-shot with FiLM is to run a trained model with run_model.py, but with the implemented debug command line flag on so you can manipulate the FiLM parameters modulating the FiLMed network during the forward computation. For example, '--debug_every -1' will stop the program after the model generates FiLM parameters but before the FiLMed network carries out its forward pass using FiLM layers.

Thanks for stopping by, and we hope you enjoy playing around with FiLM!

Bibtex

@InProceedings{perez2018film,
  title={FiLM: Visual Reasoning with a General Conditioning Layer},
  author={Ethan Perez and Florian Strub and Harm de Vries and Vincent Dumoulin and Aaron C. Courville},
  booktitle={AAAI},
  year={2018}
}

film's People

Contributors

davidmascharka avatar ethanjperez avatar iamsimha avatar jcjohnson avatar jeff-da avatar lvdmaaten avatar rbgirshick avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.