GithubHelp home page GithubHelp logo

alexxchen / eai-vc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from facebookresearch/eai-vc

0.0 0.0 0.0 95.87 MB

The repository for the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).

Home Page: https://eai-vc.github.io/

License: Other

Shell 0.26% Python 99.33% Jupyter Notebook 0.41%

eai-vc's Introduction

Visual Cortex and CortexBench

Website | Blog post | Paper

Visual Cortex and CortexBench

Support Ukraine Model Card CC-BY-NC License Python 3.8 Code style: black CicleCI Status

We're releasing CortexBench and our first Visual Cortex model: VC-1. CortexBench is a collection of 17 different EAI tasks spanning locomotion, navigation, dexterous and mobile manipulation. We performed the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) for Embodied AI (EAI), and find that none of the existing PVRs perform well across all tasks. Next, we trained VC-1 on a combination of over 4,000 hours of egocentric videos from 7 different sources and ImageNet, totaling over 5.6 million images. We show that when adapting VC-1 (through task-specific losses or a small amount of in-domain data), VC-1 is competitive with or outperforms state of the art on all benchmark tasks.

Open-Sourced Models

We're open-sourcing two visual cortex models (model cards):

  • VC-1 (ViT-L): Our best model, uses a ViT-L backbone, also known simply as VC-1 | Download
  • VC-1-base (VIT-B): pre-trained on the same data as VC-1 but with a smaller backbone (ViT-B) | Download

Installation

To install our visual cortex models and CortexBench, please follow the instructions in INSTALLATION.md.

Directory structure

  • vc_models: contains config files for visual cortex models, the model loading code and, as well as some project utilities.
    • See README for more details.
  • cortexbench: embodied AI downstream tasks to evaluate pre-trained representations.
  • third_party: Third party submodules which aren't expected to change often.
  • data: Gitignored directory, needs to be created by the user. Is used by some downstream tasks to find (symlinks to) datasets, models, etc.

Load VC-1

To use the VC-1 model, you can install the vc_models module with pip. Then, you can load the model with code such as the following or follow our tutorial:

import vc_models
from vc_models.models.vit import model_utils

model,embd_size,model_transforms,model_info = model_utils.load_model(model_utils.VC1_LARGE_NAME)
# To use the smaller VC-1-base model use model_utils.VC1_BASE_NAME.

# The img loaded should be Bx3x250x250
img = your_function_here ...

# Output will be of size Bx3x224x224
transformed_img = model_transforms(img)
# Embedding will be 1x768
embedding = model(transformed_img)

Reproducing Results with VC-1 Model

To reproduce the results with the VC-1 model, please follow the README instructions for each of the benchmarks in cortexbench.

Load Your Own Encoder Model and Run Across All Benchmarks

To load your own encoder model and run it across all benchmarks, follow these steps:

  1. Create a configuration for your model <your_model>.yaml in the model configs folder of the vc_models module.
  2. In the config, you can specify the custom methods (as _target_ field) for loading your encoder model.
  3. Then, you can load the model as follows:
    import vc_models
    from vc_models.models.vit import model_utils
    
    model, embd_size, model_transforms, model_info = model_utils.load_model(<your_model>)
  4. To run the CortexBench evaluation for your model, specify your model config as a parameter (embedding=<your_model>) for each of the benchmarks in cortexbench.

Contributing

If you would like to contribute to Visual Cortex and CortexBench, please see CONTRIBUTING.md.

Citing Visual Cortex

If you use Visual Cortex in your research, please cite the following paper:

@inproceedings{vc2023,
      title={Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?}, 
      author={Arjun Majumdar and Karmesh Yadav and Sergio Arnaud and Yecheng Jason Ma and Claire Chen and Sneha Silwal and Aryan Jain and Vincent-Pierre Berges and Pieter Abbeel and Jitendra Malik and Dhruv Batra and Yixin Lin and Oleksandr Maksymets and Aravind Rajeswaran and Franziska Meier},
      year={2023},
      eprint={2303.18240},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

The majority of Visual Cortex and CortexBench code is licensed under CC-BY-NC (see the LICENSE file for details), however portions of the project are available under separate license terms: trifinger_simulation is licensed under the BSD 3.0 license; mj_envs, mjrl are licensed under the Apache 2.0 license; Habitat Lab, dmc2gym, mujoco-py are licensed under the MIT license.

The trained policies models and the task datasets are considered data derived from the correspondent scene datasets.

eai-vc's People

Contributors

mathfac avatar sergioarnaud avatar arjunmajum avatar ykarmesh avatar ssilwal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.