GithubHelp home page GithubHelp logo

alesuglia / emma-policy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from emma-heriot-watt/policy

0.0 1.0 0.0 565.88 MB

Model code for Embodied MultiModal Agent (EMMA)

Home Page: https://arxiv.org/abs/2311.04067

License: MIT License

JavaScript 0.23% Python 98.60% Jupyter Notebook 1.00% Dockerfile 0.16%

emma-policy's Introduction

EMMA: Policy

Python 3.9 PyTorch Lightning Poetry Config: Hydra
pre-commit style: black wemake-python-stylegude

Continuous Integration Tests Build and push images


Quick start

Assuming you have pyenv and Poetry, clone the repository and run:

# Use Python 3.9.13 in the project
pyenv local 3.9.13

# Tell Poetry to use pyenv
poetry env use $(pyenv which python)

# Install dependencies
poetry install

# Activate the virtual environment
poetry shell

# Install pre-commit hooks
pre-commit install

Check out the CONTRIBUTING.md for more detailed information on getting started.

Installing optional dependencies

We've separated specific groups of dependencies so that you only need to install what you need.

  • For demonstrating using Gradio, run poetry install --with demo

Project structure

This is organised in very similarly to structure from the Lightning-Hydra-Template to facilitate reproducible research code.

  • scriptssh scripts to run experiments
  • configs — configurations files using the Hydra framework
  • docker — Dockerfiles to ease deployment
  • notebooks — Jupyter notebook for analysis and exploration
  • storage — data for training/inference (and maybe use symlinks to point to other parts of the filesystem)
  • testspytest scripts to verify the code
  • src — where the main code lives

Downloading data

Checkpoints

All checkpoints are available here on HugginFace

These checkpoints include:

Model name Description
emma_base_pretrain.ckpt The EMMA base pretrained checkpoint
unified_emma_base_finetune_arena.ckpt The EMMA-unified variant fine tuned on the DTC task
modular_action_emma_base_finetune_arena.ckpt The EMMA-modular variant fine tuned on the DTC task that performs action execution and visual grounding
vinvl_finetune_arena.ckpt The finetuned VinVL checkpoint

DBs

The DBs are required for pre-training and fine tuning and are available on Hugginface

We are providing DBs:

  1. Pretraining on image-based tasks (one-db per task)
  2. Finetuning on image-based tasks (one-db per task)
  3. Finetuning on the DTC tasks (one-db for action execution / visual grounding & one db for the contextual routing task)

Make sure that these are placed under storage/db folder or alternatively set the path to the dbs within each experiment config.

Features

The image features for all image-base tasks and the DTC benchmark on Huggingface

The image features were extracted using the pretrained VinVL checkpoint. For the DTC benchmark we have finetuned the checkpoint on the Alexa Arena data.

Pretraining

First, make sure that you have downloaded the pretraining db and the corresponding features.

python run.py experiment=pretrain.yaml

Downstream

COCO

python run.py experiment=coco_downstream.yaml

VQAv2

python run.py experiment=vqa_v2_downstream.yaml

RefCOCOg

python run.py experiment=refcoco_downstream.yaml

NLVR^2

python run.py experiment=nlvr2_downstream.yaml

DTC - Unified model

When initializing from the pretrained model, which doesn't include the special tokens for the downstream CR and action prediction tasks, you will need to manually edit the vocabulary size in the model config. For initialization from the pretrained emma-base, set the vocab_size to 10252.

python run.py experiment=simbot_combined.yaml

emma-policy's People

Contributors

amitkparekh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.