GithubHelp home page GithubHelp logo

mutex's Introduction

MUTEX: Learning Unified Policies from Multimodal Task Specifications

Image
Rutav Shah, Roberto Martín-Martín1, Yuke Zhu1
7th Annual Conference on Robot Learning
[Paper] [Project Website] [Dataset] [Pretrained Weights] [Real Robot Controller]
1 Equal Advising

Setup

Installation

git clone --recursive https://github.com/UT-Austin-RPL/MUTEX.git
cd MUTEX && git submodule update --init --recursive
conda create -n mutex python=3.8
conda activate mutex
pip install -r requirements.txt
pip install -e LIBERO/.
pip install -e .

Please set the argument folder= to the dataset directory in the configs.

To use pretrained weights, follow the evaluation instructions mentioned below.

Usage

Training

MUTEX is trained in two stages: a) Masked Modeling and b) Cross-Modal Matching.

To run Masked Modeling,

CUDA_VISIBLE_DEVICES=0 python3 mutex/main_masked_modeling.py \
        benchmark_name=LIBERO_100 \
        policy.task_spec_modalities=gl_inst_img_vid_ai_ag \
        policy.add_mim=True policy.add_mgm=True policy.add_mrm=True \
        policy.add_mfm=True policy.add_maim=True policy.add_magm=True \
        folder=dataset-path \
        hydra.run.dir=experiments/mutex

To run Cross-Modal Matching,

CUDA_VISIBLE_DEVICES=0 python3 mutex/main_cmm.py \
        benchmark_name=LIBERO_100 \
        folder=dataset-path \
        experiment_dir=experiments/mutex

Evaluation

MUTEX is a unified policy capable of executing tasks specified by any modality: video demonstration vid, image goal img, text goals gl, text instructions inst, speech goal ag, and speech instructions ai. To run the model after cross-modal matching at epoch 20 (used in the paper), set model_name=cmm_LIBERO_100_multitask_model_ep020.pth.
An example with text goal modality is given below,

MUJOCO_EGL_DEVICE_ID=0 CUDA_VISIBLE_DEVICES=0 python mutex/eval.py \
        benchmark_name=LIBERO_100 \
        folder=dataset-path \
        eval_spec_modalities=gl \
        experiment_dir=mutex_pretrained \
        model_name=mutex_weights.pth

Citation

@inproceedings{
    shah2023mutex,
    title={{MUTEX}: Learning Unified Policies from Multimodal Task Specifications},
    author={Rutav Shah and Roberto Mart{\'\i}n-Mart{\'\i}n and Yuke Zhu},
    booktitle={7th Annual Conference on Robot Learning},
    year={2023}
}

Acknowledgements: Mentioned here

mutex's People

Contributors

dependabot[bot] avatar shahrutav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mutex's Issues

Question about the dataset format

Hi, thanks for your great work.
I want to deploy your work on my robotic arm, but I am not very clear about the format of the data. Specifically, I have checked the realworld dataset you provided and it has the following format:

{
    "data": {
        "demo_0": {
            "actions": shape (N, 7),
            "obs": {
                "agentview_rgb": shape (N, 128, 128, 3),
                "ee_states": shape (N, 16),
                "eye_in_hand_rgb": shape (N, 128, 128, 3),
                "gripper_states": shape (N, 1),
                "joint_states": shape (N, 7),
            }
        },
        ...
    }
}

I can understand the meanings of agentview_rgb, eye_in_hand_rgb, gripper_states, and joint_states, but I am not clear about the specific meanings represented by each value of actions, and ee_states.
Can you explain their meanings?
Thanks!

Evaluation visualization

Thanks for open-sourcing MUTEX & for the great codebase! A quick question -- in mutex/eval.py, it looks like summary2video is not defined anywhere, do you mind updating this?

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.