EMMA: Policy

Quick start

Assuming you have pyenv and Poetry, clone the repository and run:

# Use Python 3.9.13 in the project
pyenv local 3.9.13

# Tell Poetry to use pyenv
poetry env use $(pyenv which python)

# Install dependencies
poetry install

# Activate the virtual environment
poetry shell

# Install pre-commit hooks
pre-commit install

Check out the CONTRIBUTING.md for more detailed information on getting started.

Installing optional dependencies

We've separated specific groups of dependencies so that you only need to install what you need.

For demonstrating using Gradio, run poetry install --with demo

Project structure

This is organised in very similarly to structure from the Lightning-Hydra-Template to facilitate reproducible research code.

scripts — sh scripts to run experiments
configs — configurations files using the Hydra framework
docker — Dockerfiles to ease deployment
notebooks — Jupyter notebook for analysis and exploration
storage — data for training/inference (and maybe use symlinks to point to other parts of the filesystem)
tests — pytest scripts to verify the code
src — where the main code lives

Downloading data

Checkpoints

All checkpoints are available here on HugginFace

These checkpoints include:

Model name	Description
emma_base_pretrain.ckpt	The EMMA base pretrained checkpoint
unified_emma_base_finetune_arena.ckpt	The EMMA-unified variant fine tuned on the DTC task
modular_action_emma_base_finetune_arena.ckpt	The EMMA-modular variant fine tuned on the DTC task that performs action execution and visual grounding
vinvl_finetune_arena.ckpt	The finetuned VinVL checkpoint

DBs

The DBs are required for pre-training and fine tuning and are available on Hugginface

We are providing DBs:

Pretraining on image-based tasks (one-db per task)
Finetuning on image-based tasks (one-db per task)
Finetuning on the DTC tasks (one-db for action execution / visual grounding & one db for the contextual routing task)

Make sure that these are placed under storage/db folder or alternatively set the path to the dbs within each experiment config.

Features

The image features for all image-base tasks and the DTC benchmark on Huggingface

The image features were extracted using the pretrained VinVL checkpoint. For the DTC benchmark we have finetuned the checkpoint on the Alexa Arena data.

Pretraining

First, make sure that you have downloaded the pretraining db and the corresponding features.

python run.py experiment=pretrain.yaml

Downstream

COCO

python run.py experiment=coco_downstream.yaml

VQAv2

python run.py experiment=vqa_v2_downstream.yaml

RefCOCOg

python run.py experiment=refcoco_downstream.yaml

NLVR^2

python run.py experiment=nlvr2_downstream.yaml

DTC - Unified model

When initializing from the pretrained model, which doesn't include the special tokens for the downstream CR and action prediction tasks, you will need to manually edit the vocabulary size in the model config. For initialization from the pretrained emma-base, set the vocab_size to 10252.

python run.py experiment=simbot_combined.yaml

alesuglia / emma-policy Goto Github PK

emma-policy's Introduction

EMMA: Policy

Quick start

Installing optional dependencies

Project structure

Downloading data

Checkpoints

DBs

Features

Pretraining

Downstream

COCO

VQAv2

RefCOCOg

NLVR^2

DTC - Unified model

emma-policy's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs