GithubHelp home page GithubHelp logo

woodfrog / heat Goto Github PK

View Code? Open in Web Editor NEW
77.0 7.0 16.0 744 KB

Code for "HEAT: Holistic Edge Attention Transformer for Structured Reconstruction", CVPR 2022

Home Page: https://heat-structured-reconstruction.github.io/

License: Other

Python 86.32% Shell 0.12% C++ 1.23% Cuda 12.33%
cvpr cvpr2022 floorplan inverse-cad planar-geometry planar-graph cad-reconstruction structured-reconstruction

heat's Introduction

HEAT: Holistic Edge Attention Transformer for Structured Reconstruction

License: GPL v3

Official implementation of the paper HEAT: Holistic Edge Attention Transformer for Structured Reconstruction (CVPR 2022).

[Project page], [Arxiv]

Please use the following bib entry to cite the paper if you are using resources from this repo.

@inproceedings{chen2022heat,
     title={HEAT: Holistic Edge Attention Transformer for Structured Reconstruction},
     author={Chen, Jiacheng and Qian, Yiming and Furukawa, Yasutaka},
     booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
     year={2022}
} 

Introduction

This paper focuses on a typical family of structured reconstruction tasks: planar graph reconstruction. Two different tasks are included: 1) outdoor architecture reconstruction from a satellite image; or 2) floorplan reconstruction from a point density image. The above below shows examples. The key contributions of the paper are:

  • a transformer-based architecture w/ state-of-the-art performance and efficiency on two different tasks, w/o domain specific heuristics
  • a geometry-only decoder branch w/ a masked training strategy to enhance the geometry learning

As shown by the above figure, the overall pipeline of our method consists of three key steps: 1) edge node initialization; 2) edge image feature fusion and edge filtering; and 3) holistic structural reasoning with two weight-sharing transformer decoders. Please refer to the paper for more details.

This repo provides the code, data, and pre-trained checkpoints of HEAT for the two tasks covered in the paper.

Preparation

Note: The code, data, and pre-trained models in this repo are for non-commercial research purposes only, please check the LICENSE file for details.

Environment

This repo was developed and tested with Python3.7

Install the required packages, and compile the deformable-attention modules (from deformable-DETR)

pip install -r requirements.txt
cd  models/ops/
sh make.sh
cd ...

Data

Please download the data for the two tasks from the link here. Extract the data into the ./data directory.

The file structure should be like the following:

data
├── outdoor
│   ├── cities_dataset  # the outdoor architecture dataset from previous works
│   │      ├── annot    # the G.T. planar graphs
│   │      ├── rgb      # the input images
│   │      ├── ......   # dataset splits, miscs
│   │
│   └── det_finals   # corner detection results from previous works (not used by our full method, but used for ablation studies) 
│
└── s3d_floorplan       # the Structured3D floorplan dataset, produced with the scripts from MonteFloor
    ├── annot           # the G.T. planar graphs 
    │
    ├── density         # the point density images
    │
    │── ......          # dataset splits, miscs

Note that the Structured3D floorplan data is generated with the scripts provided by MonteFloor[1]. We thank the authors for kindly sharing the processing scripts, please cite their paper if you use the corresponding resources.

Data preprocessing for floorplan reconstruction (Optional)

All the data used in our paper are provided in the download links above. However, If you are interested in the data preparation process for the floorplan reconstruction task, please refer to the s3d_preprocess directory in which we provide the scripts and a brief doc.

Checkpoints

We provide the checkpoints for our full method under this link, please download and extract.

Inference, evaluation, and visualization

We provide the instructions to run the inference, quantitative evaluation, and qualitative visualization in this section.

Outdoor architecture reconstruction

  • Inference. Run the inference with the pre-trained checkpoints, with image size 256:

    python infer.py --checkpoint_path ./checkpoints/ckpts_heat_outdoor_256/checkpoint.pth  --dataset outdoor --image_size 256 --viz_base ./results/viz_heat_outdoor_256 --save_base ./results/npy_heat_outdoor_256
    

    or with image size 512:

    python infer.py --checkpoint_path ./checkpoints/ckpts_heat_outdoor_512/checkpoint.pth  --dataset outdoor --image_size 512 --viz_base ./results/viz_heat_outdoor_512 --save_base ./results/npy_heat_outdoor_512
    
  • Quantitative evaluation. The quantitative evaluation for this dataset is included in the inference script. The metric implementations (in ./metrics) are borrowed from Explore-classify[2].

  • Qualitative evaluation. To get the qualitative visualizations used in our paper, set the paths properly in ./qualitative_outdoor/visualize_npy.py, and then run:

    cd qualitative_outdoor
    python visualize_npy.py
    cd ..
    

Floorplan reconstruction

  • Inference. Run the inference with the pre-trained checkpoints:

    python infer.py --checkpoint_path ./checkpoints/ckpts_heat_s3d_256/checkpoint.pth  --dataset s3d_floorplan --image_size 256 --viz_base ./results/viz_heat_s3d_256 --save_base ./results/npy_heat_s3d_256 
    
  • Quantitative evaluation. The quantitative evaluation is again adapted from the code of MonteFloor[1], we thank the authors for sharing the evaluation code. Please first download the data used by MonteFloor with this link (required by evaluation code) and extract it as ./s3d_floorplan_eval/montefloor_data. Then run the evaluation by:

    cd s3_floorplan_eval
    python evaluate_solution.py --dataset_path ./montefloor_data --dataset_type s3d --scene_id val
    cd ..
    

    Note that we augment the original evaluation code with an algorithm for extracting valid planar graph from our outputs (implemented in /s3d_floorplan_eval/planar_graph_utils.py). Invalid structures including crossing edges or unclosed loops are discarded. The same algorithm is also applied to all our baseline approaches.

  • Qualitative evaluation. To generate the qualitative visualization results used in the paper, set the paths properly in ./s3d_floorplan_eval/visualize_npy.py, and then run:

    cd s3d_floorplan_eval
    python visualize_npy.py
    cd ..
    

    Note that the incomplete regions are discarded before the quantitative evaluation. The quantitative metrics from MonteFloor[1] are room-based, and incomplete regions are simply treated as missing rooms. For qualitative visualization, we plot all predicted corners and edges, but only complete (i.e., valid) regions are colored.

Training

Set up the training arguments in arguments.py, and then run the training by:

CUDA_VISIBLE_DEVICES={gpu_ids} python train.py

Or specify the key arguments in the command line and run the outdoor experiment by:

CUDA_VISIBLE_DEVICES={gpu_ids} python train.py  --exp_dataset outdoor  --epochs 800 --lr_drop 600  --batch_size 16  --output_dir ./checkpoints/ckpts_heat_outdoor_256  --image_size 256  --max_corner_num 150  --lambda_corner 0.05  --run_validation

or run the s3d floorplan experiment by:

CUDA_VISIBLE_DEVICES={gpu_ids} python train.py  --exp_dataset s3d_floorplan  --epochs 400 --lr_drop 300  --batch_size 16  --output_dir ./checkpoints/ckpts_heat_s3d_256  --image_size 256  --max_corner_num 200  --lambda_corner 0.10  --run_validation

With the default setting (e.g., model setup, batch size, etc.), training the full HEAT (i.e., the end-to-end corner and edge modules) needs at least 2 GPUs with ~16GB memory each.

References

[1]. Stekovic, Sinisa, Mahdi Rad, Friedrich Fraundorfer and Vincent Lepetit. “MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans.” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021): 16014-16023.

[2]. Zhang, Fuyang, Xiangyu Xu, Nelson Nauata and Yasutaka Furukawa. “Structured Outdoor Architecture Reconstruction by Exploration and Classification.” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021): 12407-12415.

heat's People

Contributors

woodfrog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

heat's Issues

Usage of Normals

Hi @woodfrog thank you for the great work! I think one missing point of this codebase is that what's the intuition behind the usage of normals, which is not discussed in paper or in this codebase. What's your idea of incorporating normals into density maps as the input to HEAT? Is there any reference?

I'd also like to have you clarify some details in your codebase. Especially, for the below two snippets:

normals = np.array(pcd.normals)
normals_map = np.zeros((density.shape[0], density.shape[1], 3))
import time
start_time = time.time()
for i, unique_coord in enumerate(unique_coordinates):
# print(normals[unique_ind])
normals_indcs = np.argwhere(np.all(coordinates[::10] == unique_coord, axis=1))[:,0]
normals_map[unique_coordinates[i, 1], unique_coordinates[i, 0], :] = np.mean(normals[::10][normals_indcs, :], axis=0)
print("Time for normals: ", time.time() - start_time)
normals_map = (np.clip(normals_map,0,1) * 255).astype(np.uint8)

  • Why normals[::10] here? I guess it's kind of a sampling operation on the dense point cloud. How did you decide this hyperparameter 10 ? I personally found that a lower number such as 5 works better for my data.
  • For L175, it clip the normal values into [0, 1] followed by transform into [0, 255]. However, the original normal values are [-1, 1]. Why do you ignore the negative normal values?

rgb = np.maximum(density, normal)

  • Density input is a 3-channel copy of a single 256x256x1 array. I wonder what's the purpose of taking a np.maximum with the 3-dimensional (x, y, z) normals map?

about 'extract_regions'

Hi, I find there is a postprocessing after getting the planar graph, which extracts room from the edges. But I can not find a detailed illustration of this part in the paper. Can you give an explanation of how it works?

Ground truth of Structured3D

Hi, Thanks for your great work! I have a question regarding the ground truth of Structured3D floorplan. For quantitative evaluation of floorplan reconstruction, it is mentioned that you also used the data used by MonteFloor (downloaded from here). I downloaded the data and found the ground truth corner is actually preprocessed. Below is a comparison (scene_00030) between the GT label you used and the GT I generated from Structured3D annotations. It shows that in the GT label you used, corners and edges of adjacent rooms are merged. Could you please explain how the adjacent rooms are merged? Another thing is that it seems that the GT label has been re-annotated (see bottom right). Could you please also explain this? It would be great if you can provide the preprocessing script. Thanks in advance!

GT_stru3d

Threshold for input normals

Hi there,
Thank you very much for sharing your work. It is amazing!

I was going through the inference script and i noticed that you use normals along with the density maps and these normals are either [0,255]. However, the normals calculated during data generation are between 0 and 255. May I know what threshold (between 0 and 255) you used to get the normal maps for your inputs? It would be really nice if I get an idea here. Looking forward to your response!

PIL trouble

I want confirmation about the warning on having Pillow and PIL in the same environment as I'm trying to perform a transfer training, and I'm receiving this message:

ImportError: cannot import name '_imaging' from 'PIL' (/home/jovyan/.local/lib/python3.8/site-packages/PIL/init.py)

after trying to run this line:

ython train.py --exp_dataset outdoor --epochs 800 --lr_drop 600 --batch_size 16 --output_dir ./checkpoints/ckpts_heat_outdoor_256_ENS --image_size 256 --max_corner_num 150 --lambda_corner 0.05 --run_validation --resume ./checkpoints/ckpts_heat_outdoor_256

If the dataset has large average number of corners

Thank you very much for your open source code. I build floorplan on structured3d with very good results! Now, I run this algorithm on my own complex dataset, which has average/maximum numbers of corners 300/800(structured3d is 22/52), so the algorithm will use huge GPU memory and run failed,maybe because the initial number of edges is set to the square of the number of corners(O(N^2))? So do you have any good solutions? Looking forward to your reply!

How to use the model on a new dataset

I hope this email finds you well. My name is Carlos Campoverde, and I am a student at ITC-UT. I am reaching out to express my excitement about your model and the exceptional work you did on "HEAT: Holistic Edge Attention Transformer for Structured Reconstruction."

Currently, I am conducting research for my master's thesis "AUTOMATIC BUILDING ROOF PLANE STRUCTURE EXTRACTION FROM REMOTE SENSING DATA" and would like to explore the application of your model for plane roof extraction. I have conducted experiments on your dataset by following the instructions outlined in the README file. However, I am unsure about how to apply your model to new images and would appreciate your guidance.

I am writing to kindly ask for your assistance in using your model to make inferences on a new dataset of building images, each of size 256x256. Your valuable help would be greatly appreciated.
Thank you for your time and consideration.

HEAT perfomance on other datasets

Thank you for releasing your code!
I notice that you only compared its performance on two tasks: outdoor architecture reconstruction & floorplan reconstruction.
Do you try HEAT on other datasets, like YorkUrban and Wireframe?

error in ms_deformable_im2col_cuda: invalid device function

Thanks for sharing the code!
But I'm having some problems trying to use this code.You mentioned in readme.txt that the python version used is 3.7, but after I run

$ conda create -n try python=3.7
$ conda activate try
$ pip install -r requirements.txt

I met an error:

56b2a26123949a51b1278cde116e78e

So I changed to version 3.8. After this I compiled without problems, but when trying to run:

$ python infer.py --checkpoint_path ./checkpoints/ckpts_heat_s3d_256/checkpoint.pth --dataset s3d_floorplan --image_size 256 --viz_base ./results/viz_heat_s3d_256 --save_base ./results/npy_heat_s3d_256

I met a new error:

30727555ca32767279f5fcce08fb9d8

This looks like a version mismatch issue.
So I would like to ask about the version of CUDA, cuDNN, pytorch and cudatoolkit in the environment where you run the code, so that it can run successfully, thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.