GithubHelp home page GithubHelp logo

dingmyu / vrdp Goto Github PK

View Code? Open in Web Editor NEW
45.0 2.0 7.0 19.8 MB

[NeurIPS 2021] Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

License: MIT License

Python 53.75% MATLAB 0.85% Makefile 0.09% TeX 45.31%

vrdp's Introduction

VRDP (NeurIPS 2021)

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, and Chuang Gan

image

More details can be found at the Project Page.

If you find our work useful in your research please consider citing our paper:

@inproceedings{ding2021dynamic,
  author = {Ding, Mingyu and Chen, Zhenfang and Du, Tao and Luo, Ping and Tenenbaum, Joshua B and Gan, Chuang},
  title = {Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language},
  booktitle = {Advances In Neural Information Processing Systems},
  year = {2021}
}

Prerequisites

  • Python 3
  • PyTorch 1.3 or higher
  • All relative packages are covered by Miniconda
  • Both CPUs and GPUs are supported

Dataset preparation

  • Download videos, video annotation, questions and answers, and object proposals accordingly from the official website

  • Transform videos into ".png" frames with ffmpeg.

  • Organize the data as shown below.

    clevrer
    ├── annotation_00000-01000
    │   ├── annotation_00000.json
    │   ├── annotation_00001.json
    │   └── ...
    ├── ...
    ├── image_00000-01000
    │   │   ├── 1.png
    │   │   ├── 2.png
    │   │   └── ...
    │   └── ...
    ├── ...
    ├── questions
    │   ├── train.json
    │   ├── validation.json
    │   └── test.json
    ├── proposals
    │   ├── proposal_00000.json
    │   ├── proposal_00001.json
    │   └── ...
    
  • We also provide data for physics learning and program execution in Google Drive. You can download them optionally and put them in the ./data/ folder.

  • Download the processed data executor_data.zip for the executor. Put it in and unzip it to ./executor/data/.

Get Object Dictionaries (Concepts and Trajectories)

Download the object proposals from the region proposal network and follow the Step-by-step Training in DCL to get object concepts and trajectories.

The above process includes:

  • trajectory extraction
  • concept learning
  • trajectory refinement

Or you can download our extracted object dictionaries object_dicts.zip directly from Google Drive.

Learning

1. Differentiable Physics Learning

After we get the above object dictionaries, we learn physical parameters from object properties and trajectories.

cd dynamics/
python3 learn_dynamics.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output object physical parameters object_dicts_with_physics.zip can be downloaded from Google Drive.

2. Physics Simulation (counterfactual)

Physical simulation using learned physical parameters.

cd dynamics/
python3 physics_simulation.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output simulated trajectories/events object_simulated.zip can be downloaded from Google Drive.

3. Physics Simulation (predictive)

Correction of long-range prediction according to video observations.

cd dynamics/
python3 refine_prediction.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output refined trajectories/events object_updated_results.zip can be downloaded from Google Drive.

Evaluation

After we get the final trajectories/events, we perform the neuro-symbolic execution and evaluate the performance on the validation set.

cd executor/
python3 evaluation.py

The test json file for evaluation on evalAI can be generated by

cd executor/
python3 get_results.py

The Generalized Clerver Dataset (counterfactual_mass)

Examples

  • Predictive question image
  • Counterfactual question image

Acknowledgements

For questions regarding VRDP, feel free to post here or directly contact the author ([email protected]).

vrdp's People

Contributors

dingmyu avatar

Stargazers

YingQiao Wang avatar Soumava Paul avatar Xingrui Wang avatar Nuri Kim avatar TingYu avatar xiangxudong avatar Shuo Chen avatar  avatar Utkarshani Jaimini avatar  avatar Xu Chen avatar  avatar  avatar Fei_Ni avatar Elena avatar Sayem Khan avatar Paarth Shah avatar Yavar Taheri Yeganeh avatar Jack avatar Ziyi Wu avatar Zhenfang Chen avatar  avatar  avatar Mingjie avatar Alan avatar  avatar Ben Duffy avatar hkrsnd avatar Ren Tianhe avatar Shijie avatar Lulu avatar  avatar Artificial_Inspiration avatar  avatar ChongjianGE avatar Teng Wang avatar Rishabh Singh avatar Nikita avatar  avatar İlker Kesen avatar 爱可可-爱生活 avatar Yulei Niu avatar Baoxiong Jia avatar Chuang Gan avatar  avatar

Watchers

 avatar  avatar

vrdp's Issues

Inquire about the results of explanatory queries

Hello. Thank you for sharing code!

I am rehearsing the experiment, and I got similar results on descriptive, predictive and counterfactual queries.
But the result of explanatory queries is quite low. (In the paper, the records are 96.3% per opt. and 91.9% per ques.)

Can you give me some advice?

I gave a input object_updated_results.zip and followed Evaluation

Here are my result on validation set.

============ results ============
overall accuracy per option: 93.764898 %
overall accuracy per question: 91.686308 %
descriptive accuracy per question: 93.376978 %
explanatory accuracy per option: 92.976512 %
explanatory accuracy per question: 88.972667 %
predictive accuracy per option: 95.881361 %
predictive accuracy per question: 91.903289 %
counterfactual accuracy per option: 94.686999 %
counterfactual accuracy per question: 84.110147 %
============ results ============

Thanks!

Hi, thanks very much for sharing this wonderful work. Here is a question about forming the Clevrer dataset.

According to the dataset preparation, 2nd step, all downloaded videos should be transformed into images. However, since there are 1000 videos in the original video_00000_01000 folder, it seems that all images of these videos are stored in the image_00000_01000 folder. Should these images be put under separate folders such as video_00000? If not, how to decide the sequence of images transformed from different videos?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.