GithubHelp home page GithubHelp logo

panacea's Introduction

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Official Repository of Panacea.

[Paper] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen1*†, Yucheng Zhao2*,Yingfei Liu2*, Fan Jia2, Yanhui Wang1, Chong Luo1, Chi Zhang3, Tiancai Wang2‡, Xiaoyan Sun1‡, Xiangyu Zhang2
1University of Science and Technology of China, 2MEGVII Technology, 3Mach Drive
*Equal Contribution, This work was done during the internship at MEGVII, Corresponding Author.

[WebPage] https://panacea-ad.github.io/

Generating Multi-View and Controllable Videos for Autonoumous Driving

Overview of Panacea. (a). The diffusion training process of Panacea, enabled by a diffusion encoder and decoder with the decomposed 4D attention module. (b). The decomposed 4D attention module comprises three components: intra-view attention for spatial processing within individual views, cross-view attention to engage with adjacent views, and cross-frame attention for temporal processing. (c). Controllable module for the integration of diverse signals. The image conditions are derived from a frozen VAE encoder and combined with diffused noises. The text prompts are processed through a frozen CLIP encoder, while BEV sequences are handled via ControlNet. (d). The details of BEV layout sequences, including projected bounding boxes, object depths, road maps and camera pose.

The two-stage inference pipeline of Panacea. Its two-stage process begins by creating multi-view images with BEV layouts, followed by using these images, along with subsequent BEV layouts, to facilitate the generation of following frames.

🎬   BEV-guided Video Generation   🎬

Controllable multi-view video generation. Panacea is able to generate realistic, controllable videos with good temporal and view consistensy.

🎞   Attribute Controllable Video Generation   🎞

Video generation with variable attribute controls, such as weather, time, and scene, which allows Panacea to simulate a variety of rare driving scenarios, including extreme weather conditions such as rain and snow, thereby greatly enhancing the diversity of the data.

🔥   Benefiting Autonomous Driving   🔥

(a). Panoramic video generation based on BEV (Bird’s-Eye-View) layout sequence facilitates the establishment of a synthetic video dataset, which enhances perceptual tasks. (b). Producing panoramic videos with conditional images and BEV layouts can effectively elevate image-only datasets to video datasets, thus enabling the advancement of video-based perception techniques.

BibTex

                
@artical{@misc{wen2023panacea,
    title={Panacea: Panoramic and Controllable Video Generation for Autonomous Driving}, 
    author={Yuqing Wen and Yucheng Zhao and Yingfei Liu and Fan Jia and Yanhui Wang and Chong Luo and Chi Zhang and Tiancai Wang and Xiaoyan Sun and Xiangyu Zhang},
    year={2023},
    eprint={2311.16813},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
}

Contact

Feel free to contact us at wenyuqing AT mail.ustc.edu.cn or wangtiancai AT megvii.com

panacea's People

Contributors

wenyuqing avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.