GithubHelp home page GithubHelp logo

wangyida / forknet Goto Github PK

View Code? Open in Web Editor NEW
59.0 5.0 17.0 2.15 GB

ForkNet: Adversarial Semantic Scene Completion from a Single Depth Image - ICCV 2019

Python 45.87% Shell 4.86% Cuda 2.23% C++ 7.62% CMake 11.10% C 3.90% Makefile 24.40%
volumetric-data semantic-segmentation depth-images nyu suncg iccv2019

forknet's Introduction

ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image

The implementation of our paper accepted in ICCV 2019 (International Conference on Computer Vision, IEEE)

Authors: Yida Wang, David Tan, Nassir Navab and Federico Tombari

ForkNet

Based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space.

NYU ShapeNet
road condition road condition

Results of registration

Slow Fast
road condition road condition

Data preprocessing

Depth image to TSDF volumes

Firstly you need to go to depth-tsdf folder to compile the our depth converter. Then camake and make are suggested tools to compile our codes.

cmake . # configure
make # compiles demo executable

After the file named with back-project is compiled, depth images of NYU or SUNCG datasets could be converted into TSDF volumes parallelly.

CUDA_VISIBLE_DEVICES=0 python2 data/depth_backproject.py -s /media/wangyida/SSD2T/database/SUNCG_Yida/train/depth_real_png -tv /media/wangyida/HDD/database/SUNCG_Yida/train/depth_tsdf_camera_npy -tp /media/wangyida/HDD/database/SUNCG_Yida/train/depth_tsdf_pcd

Semantic volumes used for training

We further convert the binary files from SUNCG and NYU datasets into numpy arrays in dimension of [80, 48, 80] with 12 semantic channels. Those voxel data are used as training ground truth. Notice that our data is presented in numpy array format which is converted from the original binary data

python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/SUNCGtrain_1001_2000  -tv /media/wangyida/HDD/database/SUNCG_Yida/train/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/SUNCGtrain_501_1000  -tv /media/wangyida/HDD/database/SUNCG_Yida/train/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/SUNCGtrain_1_1000  -tv /media/wangyida/HDD/database/SUNCG_Yida/train/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/SUNCGtrain_1001_3000  -tv /media/wangyida/HDD/database/SUNCG_Yida/train/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/SUNCGtrain_3001_5000  -tv /media/wangyida/HDD/database/SUNCG_Yida/train/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/SUNCGtrain_1_500  -tv /media/wangyida/HDD/database/SUNCG_Yida/train/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/SUNCGtrain_5001_7000  -tv /media/wangyida/HDD/database/SUNCG_Yida/train/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/depthbin_NYU_SUNCG/SUNCGtest_49700_49884 -tv /media/wangyida/HDD/database/SUNCG_Yida/test/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/depthbin_NYU_SUNCG/NYUtrain -tv /media/wangyida/HDD/database/NYU_Yida/train/voxel_semantic_npy &
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/depthbin_NYU_SUNCG/NYUtest -tv /media/wangyida/HDD/database/NYU_Yida/test/voxel_semantic_npy &
wait

Train and Test

Training

CUDA_VISIBLE_DEVICES=0 python3 main.py --mode train --discriminative True

Testing

Listing samples

Firstly a list of name of the samples are needed, you can generate it easilly in Linux using find, assume that all the testing samples are located in /media/wangyida/HDD/database/050_200/test/train, a test_fusion.list would be generated

find /media/wangyida/HDD/database/050_200/test/train -name '*.npy' > test_fusion.list

Then the path prefix of /media/wangyida/HDD/database/050_200/test/train should be removed in the .list file. It could be easilly dealt by VIM using

:%s/\/media\/wangyida\/HDD\/database\/050_200\/test\/train\///gc

Inference

We provide a compact version of ForkNet which is only 25 MB in the pretrained_model folder If the model is not discriminative, notice that this model is sparser

CUDA_VISIBLE_DEVICES=1 python main.py --mode evaluate_recons --conf_epoch 59

Otherwise

CUDA_VISIBLE_DEVICES=1 python main.py --mode evaluate_recons --conf_epoch 37 --discriminative True

where '--conf_epoch' indicates the index of the pretrained model

Architecture

The overall architecture is combined with 1 encoder with input of a TSDF volume and 3 decoders.

Qualitatives

The NYU dataset is composed of 1,449 indoor depth images captured with a Kinect depth sensor

Generated synthetic samples

We build the new dataset by sampling the features directly from latent space which generates a pair of partial volumetric surface and completed volumetric semantic surface. Given 1 latent sample, we can use 2 decoders to generate a pair of TSDF volume and semantic scene separately.

If you find this work useful in yourr research, please cite:

@inproceedings{wang2019forknet,
  title={ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image},
  author={Wang, Yida and Tan, David Joseph and Navab, Nassir and Tombari, Federico},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={8608--8617},
  year={2019}
}

forknet's People

Contributors

wangyida avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

forknet's Issues

How to get "surface_semantic_npy" ?

Hello!

I want to retrain the forknet network for scene reconstruction. But I have some problems, could you help me to solve them?

In the config.py , you have used the surface_semantic_npy for training. I tried to use the data/depthbin2surfce.py to generate the data, but this script met some mistakes(out of memory).

Could you give some instruction for how to get "surface_semantic_npy" ?

When do you plan to release the data?

Hi, thank you for your sharing. There are many commands like
python2 data/depthbin2npy.py -s /media/wangyida/HDD/database/SUNCGtrain_1001_2000 -tv /media/wangyida/HDD/database/SUNCG_Yida/train/voxel_semantic_npy &
but there is no data or download-link in the repo. Could you supply the download link of the train/test data? Thank you!

Some problems in the test phase using the pretrained model

Hello,
When I tested five pictures in SUNCG, I found that it was mostly empty in result.
I have successfully generated these related files in depth_tsdf_camera_npy, surface_semantic_npy, and voxel_semantic_npy .
So what do you think is the reason why it's mostly empty ? Can this be a problem with the pre-training model? And what's the difference between a small model and a large one ?
Looking forward to your reply
Best wishes

The results are as follows:
Completion
IoU of empty:0.919
IoU of objec:0.01
IoU average: 0.464

Geometric segmentation
IoU of empty:0.0
/home/lt/Documents/ForkNet/forknet-master/evaluate.py:45: RuntimeWarning: invalid value encountered in double_scalars
IoU_calc = np.round(child / mother, 3)
IoU of ceili:nan
IoU of floor:nan
IoU of wall:nan
IoU of windo:nan
IoU of chair:nan
IoU of bed:nan
IoU of sofa:nan
IoU of table:nan
IoU of tvs:nan
IoU of furni:nan
IoU of objec:nan
IoU average: nan

Generative segmentation
IoU of empty:0.0
IoU of ceili:nan
IoU of floor:nan
IoU of wall:nan
IoU of windo:nan
IoU of chair:nan
IoU of bed:nan
IoU of sofa:nan
IoU of table:nan
IoU of tvs:nan
IoU of furni:nan
IoU of objec:nan
IoU average: nan

Geometric semantic completion
IoU of empty:0.956
IoU of ceili:0.0
IoU of floor:0.0
IoU of wall:0.0
IoU of windo:0.0
IoU of chair:0.0
IoU of bed:0.0
IoU of sofa:nan
IoU of table:0.0
IoU of tvs:0.0
IoU of furni:0.0
IoU of objec:0.0
IoU average: nan

Generative semantic completion
IoU of empty:0.95
IoU of ceili:0.033
IoU of floor:0.0
IoU of wall:0.0
IoU of windo:0.0
IoU of chair:0.0
IoU of bed:0.0
IoU of sofa:nan
IoU of table:0.0
IoU of tvs:0.0
IoU of furni:0.0
IoU of objec:0.0
IoU average: nan

Solid generative semantic completion
IoU of empty:0.952
IoU of ceili:0.028
IoU of floor:0.0
IoU of wall:0.0
IoU of windo:0.0
IoU of chair:0.0
IoU of bed:0.0
IoU of sofa:nan
IoU of table:0.0
IoU of tvs:0.0
IoU of furni:0.0
IoU of objec:0.0
IoU average: nan

请教数据处理相关的问题

hi yida, 我是3D-cv领域的新手,想通过你的项目学习一下。但是在数据准备时发现suncg的数据我已经没有办法获取到了,因此我准备了NYU-V2的数据(labeled-dataset: 2.5GB),并且我发现可以从这份数据里提取出depth-image和rgb_image,然后使用open3d的库可以准备出tsdf的ply格式的数据。我阅读你的数据处理脚本,发现你能很方便的就统一处理,但是我不太理解时怎么做的,希望能得到你的解答,谢谢!

The version of tensorflow

Dear Yida,
I'm glad to learn about your work. And I'm learning your scripts, but I have trouble in the experimental environment. I have tried several versions of tensorflow, but still found a mismatch.
Could you tell me the version of tensorflow?
Thank you a lot.

How to get "depth_tsdf_bin" ?

Hello,
I have some problems when I use depth_backproject.py. I found it depends on depth_tsdf_bin. In your demo, there is a folder named depth_tsdf_bin in the path of "/forknet-master/depth-tsdf/data". So how can I get these bin files if I use other depth pictures?

What's more, I found some errors about reshape in the process of "vox_max = np.reshape(checkVox, (80, 48, 80))" in depthbin2surface.py. I don't know what I did wrong. In order to generate the file normally, I made the following modifications by referring to the relevant code of VVNet. I'm still curious to know how you did it. Can you help me ?

Looking forward to your reply. Thanks a lot.

The changes I made :
vox_size = np.array([240, 144, 240])
scaled_vox_size = np.array([80, 48, 80])
labels, vox_nums = [np.squeeze(x) for x in np.split(np.array(checkVoxValIter).reshape([-1, 2]), 2, axis=1)]
full_voxel = np.full(vox_size, 37, np.uint8).reshape([-1])
offset = 0
for label, vox_num in zip(labels, vox_nums):
if label != 255:
full_voxel[offset:offset+vox_num] = label
offset += vox_num
full_voxel = np.take(seg_class_map, full_voxel)
full_voxel = np.reshape(full_voxel, vox_size)
final_voxel = semantic_down_sample_voxel(full_voxel, scaled_vox_size)
vox_max = final_voxel

使用SYMMETRIC模式的tf.pad对卷积输入做padding

把网络中的12个pad不是0的Conv3D算子做如下修改:

Conv3D的输入使用算子tf.pad(*, "SYMMETRIC")进行padding,并且将对应conv3d的pad属性由same改为valid

修改后须重新训练或fine tune。

convert binary data to pcd

I wonder know how to convert the binary data to pcd data? There are lots of converting python files in your project, I don't know how to deal with them.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.