GithubHelp home page GithubHelp logo

gz475 / layoutgpt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from weixi-feng/layoutgpt

0.0 0.0 0.0 91.84 MB

Official repo for LayoutGPT

License: MIT License

C++ 1.05% Python 93.29% C 0.40% Cuda 5.24% Dockerfile 0.02%

layoutgpt's Introduction

[NeurIPS 2023] LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Weixi Feng1*, Wanrong Zhu1*, Tsu-Jui Fu1, Varun Jampani3, Arjun Akula3, Xuehai He2, Sugato Basu3, Xin Eric Wang2, William Yang Wang1
1UC Santa Barbara, 2UC Santa Cruz, 3Google
*Equal Contribution

Project Page | arxiv

Teaser figure

Example 1 Example 2 Example 3

Updates

2023.10.28 Now support Llama-2; camera ready version updated

2023.10.10 We released our preprocessed 3D-FRONT and 3D-FUTURE data (see below). Simplified the installation and preparation process.

2023.09.22 LayoutGPT is accepted to NeurIPS 2023!

Installation & Dependencies

LayoutGPT and the downstream generation requires different libraries. You can install everything all at once

conda env create -f environment.yml

and additionally

# for GLIGEN
wget https://huggingface.co/gligen/gligen-generation-text-box/resolve/main/diffusion_pytorch_model.bin -O gligen/gligen_checkpoints/checkpoint_generation_text.pth

# for image evaluation using GLIP
cd eval_models/GLIP
python setup.py build develop --user
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth -O MODEL/swin_large_patch4_window12_384_22k.pth
wget https://huggingface.co/GLIPModel/GLIP/blob/main/glip_large_model.pth -O MODEL/glip_large_model.pth

# for scene synthesis
cd ATISS
python setup.py build_ext --inplace
pip install -e .

You may also refer to the official repo of GLIGEN, GLIP and ATISS for detailed guidance.

Data Preparation

Our image layout benchmark NSR-1K and the 3D scene data split is provided under ./dataset.

2D image layouts

NSR-1K contains ground truth image layouts for each prompt extracted from the MSCOCO dataset. The extracted clip image features are provided under ./dataset/NSR-1K/. The json files contain ground truth layouts, captions and other metadata.

3D scene layouts

For indoor scene synthesis, we are able to provide our preprocessed dataset after checking the licenses of 3D-FRONT and 3D-FUTURE. Unzip the downloaded file to ./ATISS/ and you should have ./ATISS/data_output and ./ATISS/data_output_future.

You may also refer to ATISS if you prefer to go through the preprocessing steps on your own.

2D Image Layout Generation

We provide the script to generate layouts for NSR-1K benchmark. First set up your openai authentication in the script. Then run

python run_layoutgpt_2d.py --icl_type k-similar --K 8 --setting counting --llm_type gpt4 --n_iter 5

The generated layout will be saved to ./llm_output/counting by default. To generate images based on the layouts, run

cd gligen
python gligen_layout_counting.py --file ../llm_output/counting/gpt4.counting.k-similar.k_8.px_64.json --batch_size 5

Note that the script will save a clean image and an image with bounding boxes for each prompt into two separate folders. In our experiment in the preprint, we generate 5 different layouts for each prompt to reduce variance.

Layout & Image Evaluation

To evaluate the raw layouts, run

# for numerical prompts
python eval_counting_layout.py --file ../llm_output/counting/gpt4.counting.k-similar.k_8.px_64.json

To evaluate the generated images using GLIP, run

cd eval_models/GLIP
python eval_counting.py --dir path_to_generated_clean_images

3D Indoor Scene Synthesis

First set up your openai authentication in the script, then run the script to generate scenes

python run_layoutgpt_3d.py --dataset_dir ./ATISS/data_output --icl_type k-similar --K 8 --room bedroom --llm_type gpt4 --unit px --normalize --regular_floor_plan

To evaluate the out-of-bound rate (OOB) and KL divergence (KL-div.) of the generated layouts, run

python eval_scene_layout.py --dataset_dir ./ATISS/data_output --file ./llm_output/3D/gpt4.bedroom.k-similar.k_8.px_regular.json --room bedroom

Visualization

Following ATISS, you can visualize the generated layout by rendering the scene images using simple-3dviz

cd ATISS/scripts
python render_from_files.py ../config/bedrooms_eval_config.yaml visuslization_output_dir ../data_output_future ../demo/floor_plan_texture_images ../../llm_output/3D/gpt4.bedroom.k-similar.k_8.px_regular.json --up_vector 0,1,0 --camera_position 2,2,2 --split test_regular --export_scene

To render just the image of particular scene(s), add --scene_id id1 id2. For all visualization shown in the preprint, we use Blender to manually render the scene images. With --export_scene, you can find a folder under visuslization_output_dir for each scene, which contains *.obj and *.mtl files. You can import these files into Blender and render the scenes. While this can be done with Python, we do not have a script to achieve it yet.

Citation

Please consider citing our work if you find it relevant or helpful:

@article{feng2023layoutgpt,
  title={LayoutGPT: Compositional Visual Planning and Generation with Large Language Models},
  author={Feng, Weixi and Zhu, Wanrong and Fu, Tsu-jui and Jampani, Varun and Akula, Arjun and He, Xuehai and Basu, Sugato and Wang, Xin Eric and Wang, William Yang},
  journal={arXiv preprint arXiv:2305.15393},
  year={2023}
}

Disclaimer

We thank the authors of GLIGEN, GLIP and ATISS for making their code available. It is important to note that the code present here is not the official or original code of the respective individual or organization who initially created it. Part of the code may be subject to retraction upon official requests. Any use of downstream generation code should be governed by the official terms and conditions set by the original authors or organizations. It is your responsibility to comply with these terms and conditions and ensure that your usage adheres to the appropriate guidelines.

layoutgpt's People

Contributors

weixi-feng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.