GithubHelp home page GithubHelp logo

thuml / ivideogpt Goto Github PK

View Code? Open in Web Editor NEW
52.0 4.0 2.0 20.79 MB

Official repo for "iVideoGPT: Interactive VideoGPTs are Scalable World Models", https://arxiv.org/abs/2405.15223

Home Page: https://thuml.github.io/iVideoGPT/

License: MIT License

Python 100.00%
model-based-reinforcement-learning video-prediction visual-planning world-model

ivideogpt's Introduction

iVideoGPT: Interactive VideoGPTs are Scalable World Models

[Website] [Paper] [Model]

This repo provides official code and checkpoints for iVideoGPT, a generic and efficient world model architecture that has been pre-trained on millions of human and robotic manipulation trajectories.

architecture

News

  • ๐Ÿšฉ 2024.05.31: Project website with video samples is released.
  • ๐Ÿšฉ 2024.05.30: Model pre-trained on Open X-Embodiment and inference code are released.
  • ๐Ÿšฉ 2024.05.28: The pre-trained model, inference code, project website, and a demo are coming soon in about one week!
  • ๐Ÿšฉ 2024.05.27: Our paper is released on arXiv.

Installation

conda create -n ivideogpt python==3.9
conda activate ivideogpt
pip install -r requirements.txt

Models

At the moment we provide the following models:

Model Resolution Action Tokenizer Size Transformer Size
ivideogpt-oxe-64-act-free 64x64 No 114M 138M

If no network connection to Hugging Face, you can manually download from Tsinghua Cloud.

Inference Examples

Action-free Video Prediction on Open X-Embodiment

python predict.py --pretrained_model_name_or_path "thuml/ivideogpt-oxe-64-act-free" --input_path samples/fractal_sample.npz --dataset_name fractal20220817_data

To try more samples, download the dataset from the Open X-Embodiment Dataset and extract single episodes as follows:

python oxe_data_converter.py --dataset_name {dataset_name, e.g. bridge} --input_path {path to OXE} --output_path samples --max_num_episodes 10

Showcases

showcase

Citation

If you find this project useful, please cite our paper as:

@article{wu2024ivideogpt,
    title={iVideoGPT: Interactive VideoGPTs are Scalable World Models}, 
    author={Jialong Wu and Shaofeng Yin and Ningya Feng and Xu He and Dong Li and Jianye Hao and Mingsheng Long},
    journal={arXiv preprint arXiv:2405.15223},
    year={2024},
}

Contact

If you have any question, please contact [email protected].

ivideogpt's People

Contributors

manchery avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.