GithubHelp home page GithubHelp logo

leesunfreshing / counterfactualworldmodels Goto Github PK

View Code? Open in Web Editor NEW

This project forked from neuroailab/counterfactualworldmodels

0.0 0.0 0.0 75.31 MB

An approach to building pure vision foundation models by prompting masked predictors with "counterfactual" visual inputs.

License: MIT License

Shell 0.09% Python 99.91%

counterfactualworldmodels's Introduction

Counterfactual World Models

This is the official implementation of Unifying (Machine) Vision via Counterfactual World Modeling.

See Setup below to install. Please reference our work as Bear, D.M. et al. (2023).

image

Demos of using CWMs to generate "counterfactual" simulations and analyze scenes

Counterfactual World Models (CWMs) can be prompted with "counterfactual" visual inputs: "What if?" questions about slightly perturbed versions of real scenes.

Beyond generating new, simulated scenes, properly prompting CWMs can reveal the underlying physical structure of a scene. For instance, asking which points would also move along with a selected point is a way of segmenting a scene into independently movable "Spelke" objects.

The provided notebook demos are a subset of the use cases described in our paper.

Making factual and counterfactual predictions with a pretrained CWM

Run the jupyter notebook CounterfactualWorldModels/demo/FactualAndCounterfactual.ipynb

Factual predictions

Given all of one frame and a few patches of a subsequent frame from a real video, a CWM makes predictions about the rest of the second frame. The ability to prompt the CWM with a small number of tokens relies on training with a very small number of patches revealed in the second frame.

image

Counterfactual simulations

A small number of patches (colored) in a single image can be selected to counterfactually move in a chosen direction, while other patches (black) are static. This produces object movement in the intended directions.

image

Segmenting Spelke objects by applying motion-counterfactuals

Run the jupyter notebook CounterfactualWorldModels/demo/SpelkeObjectSegmentation.ipynb

Users can upload their own images on which to run counterfactuals.

Example Spelke objects from interactive motion counterfactuals

In each row, one patch is selected to move "upward" (green square) and in the last two rows, one patch is selected to remain static (red square). The optical flow resulting from the simulation represents the CWM's implicit segmentation of the moved object. In the last row, the implied segment includes both the robot arm and the object it is grasping, as the CWM predicts they will move as a unit.

image image image image

Estimating the movability of elements of a scene

Run the jupyter notebook CounterfactualWorldModels/demo/MovabilityAndMotionCovariance.ipynb

Example estimate of movability

A number of motion counterfactuals were randomly sampled (i.e. patches placed throughout the input image and moved.) This produces a "movability" heatmap of which parts of a scene tend to move and which tend to remain static. Spelke objects are inferred to be most movable, while the background rarely moves.

image

Example estimate of counterfactual motion covariance at selected (cyan) points

By inspecting the pixel-pixel covariance across many motion counterfactuals, we can estimate which parts of a scene move together on average. Shown are maps of what tends to move along with a selected point (cyan.) Objects adjacent to one another tend to move together, as some motion counterfactuals include collisions between them; however, motion counterfactuals in the appropriate direction can isolate single Spelke objects (see above.)

image

Setup

We recommend installing required packages in a virtual environment, e.g. with venv or conda.

  1. clone the repo: git clone https://github.com/neuroailab/CounterfactualWorldModels.git
  2. install requirements and cwm package: cd CounterfactualWorldModels && pip install -e .

Note: If you want to run models on a CUDA backend with Flash Attention (recommended), it needs to be installed separately via these instructions.

Pretrained Models

Weights are currently available for three VMAEs trained with the temporally-factored masking policy:

See demo jupyter notebooks for urls to download these weights and load them into VMAEs.

These notebooks also download weights for other models required for some computations:

Coming Soon!

  • Fine control over counterfactuals (multiple patches moving in different directions)
  • Iterative algorithms for segmenting Spelke objects
  • Using counterfactuals to estimate other scene properties
  • Model training code

Citation

If you found this work interesting or useful in your own research, please cite the following:

@misc{bear2023unifying,
      title={Unifying (Machine) Vision via Counterfactual World Modeling}, 
      author={Daniel M. Bear and Kevin Feigelis and Honglin Chen and Wanhee Lee and Rahul Venkatesh and Klemen Kotar and Alex Durango and Daniel L. K. Yamins},
      year={2023},
      eprint={2306.01828},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

counterfactualworldmodels's People

Contributors

danielbear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.