tui-nicr / esanet Goto Github PK

ESANet: Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis

License: Other

Python 89.85% Shell 10.15%

semantic-segmentation rgbd deep-learning deep-neural-networks machine-learning mobile-robotics real-time pytorch

esanet's Introduction

ESANet: Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis

You may also want to have a look at our follow-up work EMSANet (multi-task approach, better results for semantic segmentation, and cleaner and more extendable code base)

This repository contains the code to our paper "Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis" (IEEE Xplore, arXiv).

Our carefully designed network architecture enables real-time semantic segmentation on a NVIDIA Jetson AGX Xavier and, thus, is well suited as a common initial processing step in a complex system for real-time scene analysis on mobile robots:

(Click on the image to open Youtube Video)

Our approach can also be applied to outdoor scenarios such as Cityscapes:

(Click on the image to open Youtube Video)

This repository contains the code for training, evaluating our networks. Furthermore, we provide code for converting the model to ONNX and TensorRT, as well as for measuring the inference time.

License and Citations

The source code is published under BSD 3-Clause license, see license file for details.

If you use the source code or the network weights, please cite the following paper:

Seichter, D., Köhler, M., Lewandowski, B., Wengefeld T., Gross, H.-M. Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis in IEEE International Conference on Robotics and Automation (ICRA), pp. 13525-13531, 2021.

@inproceedings{esanet2021icra,
  title={Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis},
  author={Seichter, Daniel and K{\"o}hler, Mona and Lewandowski, Benjamin and Wengefeld, Tim and Gross, Horst-Michael},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2021},
  volume={},
  number={},
  pages={13525-13531}
}

@article{esanet2020arXiv,
  title={Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis},
  author={Seichter, Daniel and K{\"o}hler, Mona and Lewandowski, Benjamin and Wengefeld, Tim and Gross, Horst-Michael},
  journal={arXiv preprint arXiv:2011.06961},
  year={2020}
}

Note that the preprint was accepted to be published in IEEE International Conference on Robotics and Automation (ICRA).

Setup

Clone repository:

git clone https://github.com/TUI-NICR/ESANet.git

cd /path/to/this/repository

Set up anaconda environment including all dependencies:

# create conda environment from YAML file
conda env create -f rgbd_segmentation.yaml
# activate environment
conda activate rgbd_segmentation

Data preparation (training / evaluation / dataset inference):
We trained our networks on NYUv2, SUNRGB-D, and Cityscapes. The encoders were pretrained on ImageNet. Furthermore, we also pretrained our best model on the synthetic dataset SceneNet RGB-D.

The folder src/datasets contains the code to prepare NYUv2, SunRGB-D, Cityscapes, SceneNet RGB-D for training and evaluation. Please follow the instructions given for the respective dataset and store the created datasets in ./datasets. For ImageNet, we used TensorFlowDatasets (see imagenet_pretraining.py).

Pretrained models (evaluation):
We provide the weights for our selected ESANet-R34-NBt1D (with ResNet34 NBt1D backbones) on NYUv2, SunRGBD, and Cityscapes:

Dataset	Model	mIoU	FPS*	URL
NYUv2 (test)	ESANet-R34-NBt1D	50.30	29.7	Download
	ESANet-R34-NBt1D (pre. SceneNet)	51.58	29.7	Download
SUNRGB-D (test)	ESANet-R34-NBt1D	48.17	29.7**	Download
	ESANet-R34-NBt1D (pre. SceneNet)	48.04	29.7**	Download
Cityscapes (valid half)	ESANet-R34-NBt1D	75.22	23.4	Download
Cityscapes (valid full)	ESANet-R34-NBt1D	80.09	6.2	Download

Download and extract the models to ./trained_models.

*We report the FPS for NVIDIA Jetson AGX Xavier (Jetpack 4.4, TensorRT 7.1, Float16).
**Note that we only reported the inference time for NYUv2 in our paper as it has more classes than SUNRGB-D. Thus, the FPS for SUNRGB-D can be slightly higher (37 vs. 40 classes).

Content

There are subsection for different things to do:

Evaluation: Reproduce results reported in our paper.
Dataset Inference: Apply trained model to samples from dataset.
Sample Inference: Apply trained model to samples in ./samples.
Time Inference: Time inference on NVIDIA Jetson AGX Xavier using TensorRT.
Training: Train new ESANet model.

Evaluation

To reproduce the mIoUs reported in our paper, use eval.py.

Note that building the model correctly depends on the respective dataset the model was trained on. Passing no additional model arguments to eval.py defaults to evaluating our ESANet-R34-NBt1D either on NYUv2 or SUNRGB-D. For Cityscapes the parameters differ. You will find a argsv_*.txt file next to the network weights listing the required arguments.

Examples:

To evaluate our ESANet-R34-NBt1D trained on NYUv2, run:

python eval.py \
    --dataset nyuv2 \
    --dataset_dir ./datasets/nyuv2 \
    --ckpt_path ./trained_models/nyuv2/r34_NBt1D.pth
 
# Camera: kv1 mIoU: 50.30
# All Cameras, mIoU: 50.30

To evaluate our ESANet-R34-NBt1D trained on SUNRGB-D, run:

python eval.py \
    --dataset sunrgbd \
    --dataset_dir ./datasets/sunrgbd \
    --ckpt_path ./trained_models/sunrgbd/r34_NBt1D.pth

# Camera: realsense mIoU: 32.42
# Camera: kv2 mIoU: 46.28
# Camera: kv1 mIoU: 53.39
# Camera: xtion mIoU: 41.93
# All Cameras, mIoU: 48.17

To evaluate our ESANet-R34-NBt1D trained on Cityscapes, run:

# half resolution (1024x512)
python eval.py \
    --dataset cityscapes-with-depth \
    --dataset_dir ./datasets/cityscapes \
    --ckpt_path ./trained_models/cityscapes/r34_NBt1D_half.pth \
    --height 512 \
    --width 1024 \
    --raw_depth \
    --context_module appm-1-2-4-8

# Camera: camera1 mIoU: 75.22
# All Cameras, mIoU: 75.22  


# full resolution (2048x1024)
# note that the model is created and was trained on half resolution, only
# the evalution is done using full resolution  
python eval.py \
    --dataset cityscapes-with-depth \
    --dataset_dir ./datasets/cityscapes \
    --ckpt_path ./trained_models/cityscapes/r34_NBt1D_full.pth \
    --height 512 \
    --width 1024 \
    --raw_depth \
    --context_module appm-1-2-4-8 \
    --valid_full_res

# Camera: camera1 mIoU: 80.09
# All Cameras, mIoU: 80.09

Inference

We provide scripts for inference on both sample input images (inference_samples.py) and samples drawn from one of our used datasets (inference_dataset.py).

Note that building the model correctly depends on the respective dataset the model was trained on. Passing no additional model arguments to eval.py defaults to evaluating our ESANet-R34-NBt1D either on NYUv2 or SUNRGB-D. For Cityscapes the parameters differ. You will find a argsv_*.txt file next to the network weights listing the required arguments for Cityscapes.

Dataset Inference

Use inference_dataset.py to apply a trained model to samples drawn from one of our used datasets:

Example: To apply ESANet-R34-NBt1D trained on SUNRGB-D to samples from SUNRGB-D, run:

# note that the entire first batch is visualized, so larger batch sizes results 
# in smaller images in the plot
python inference_dataset.py \
    --dataset sunrgbd \
    --dataset_dir ./datasets/sunrgbd \
    --ckpt_path ./trained_models/sunrgbd/r34_NBt1D_scenenet.pth \
    --batch_size 4

Sample Inference

Use inference_samples.py to apply a trained model to the samples given in ./samples.

Note that the dataset argument is required to determine the correct preprocessing and the class colors. However, you do not need to prepare the respective dataset. Furthermore, depending on the given depth images and the used dataset for training, an additional depth scaling might be necessary.

Examples:

To apply our ESANet-R34-NBt1D trained on SUNRGB-D to the samples, run:

python inference_samples.py \
    --dataset sunrgbd \
    --ckpt_path ./trained_models/sunrgbd/r34_NBt1D.pth \
    --depth_scale 1 \
    --raw_depth

To apply our ESANet-R34-NBt1D trained on NYUv2 to the samples, run:

python inference_samples.py \
    --dataset nyuv2 \
    --ckpt_path ./trained_models/nyuv2/r34_NBt1D.pth \
    --depth_scale 0.1 \
    --raw_depth

Time Inference

We timed the inference on a NVIDIA Jetson AGX Xavier with Jetpack 4.4 (TensorRT 7.1.3, PyTorch 1.4.0).

Reproducing the timings on a NVIDIA Jetson AGX Xavier with Jetpack 4.4 further requires:

the PyTorch 1.4.0 wheel from NVIDIA Forum
the NVIDIA TensorRT Open Source Software (onnx2trt is used to convert the onnx model to a TensorRT engine)
the requirements listed in requirements_jetson.txt:
```
pip3 install -r requirements_jetson.txt --user
```

Subsequently, you can run inference_time.sh to reproduce the reported timings for ESANet.

The inference time of a single model can be computed with inference_time_whole_model.py.

Example: To reproduce the timings of our ESANet-R34-NBt1D trained on NYUv2, run:

python3 inference_time_whole_model.py \
    --dataset nyuv2 \
    --no_time_pytorch \
    --no_time_onnxruntime \
    --trt_floatx 16

Note that a Jetpack version earlier than 4.4 fails completely or results in deviating outputs due to differently handled upsampling.

To reproduce the timings of other models we compared in our paper to, follow the instructions given in src/models/external_code.

Training

Use train.py to train ESANet on NYUv2, SUNRGB-D, Cityscapes, or SceneNet RGB-D (or implement your own dataset by following the implementation of the provided datasets). The arguments default to training ESANet-R34-NBt1D on NYUv2 with the hyper-parameters from our paper. Thus, they could be omitted but are presented here for clarity.

Note that training ESANet-R34-NBt1D requires the pretrained weights for the encoder backbone ResNet-34 NBt1D. You can download our pretrained weights on ImageNet from Link. Otherwise, you can use imagenet_pretraining.py to create your own pretrained weights.

Examples:

Train our ESANet-R34-NBt1D on NYUv2 (except for the dataset arguments, also valid for SUNRGB-D):

# either specify all arguments yourself
python train.py \
    --dataset nyuv2 \
    --dataset_dir ./datasets/nyuv2 \
    --pretrained_dir ./trained_models/imagenet \
    --results_dir ./results \
    --height 480 \
    --width 640 \
    --batch_size 8 \
    --batch_size_valid 24 \
    --lr 0.01 \
    --optimizer SGD \
    --class_weighting median_frequency \
    --encoder resnet34 \
    --encoder_block NonBottleneck1D \
    --nr_decoder_blocks 3 \
    --modality rgbd \
    --encoder_decoder_fusion add \
    --context_module ppm \
    --decoder_channels_mode decreasing \
    --fuse_depth_in_rgb_encoder SE-add \
    --upsampling learned-3x3-zeropad

# or use the default arguments
python train.py \
    --dataset nyuv2 \
    --dataset_dir ./datasets/nyuv2 \
    --pretrained_dir ./trained_models/imagenet \
    --results_dir ./results

Train our ESANet-R34-NBt1D on Cityscapes:

# note that the some parameters are different
python train.py \
    --dataset cityscapes-with-depth \
    --dataset_dir ./datasets/cityscapes \
    --pretrained_dir ./trained_models/imagenet \
    --results_dir ./results \
    --raw_depth \
    --he_init \
    --aug_scale_min 0.5 \
    --aug_scale_max 2.0 \
    --valid_full_res \
    --height 512 \
    --width 1024 \
    --batch_size 8 \
    --batch_size_valid 16 \
    --lr 1e-4 \
    --optimizer Adam \
    --class_weighting None \
    --encoder resnet34 \
    --encoder_block NonBottleneck1D \
    --nr_decoder_blocks 3 \
    --modality rgbd \
    --encoder_decoder_fusion add \
    --context_module appm-1-2-4-8 \
    --decoder_channels_mode decreasing \
    --fuse_depth_in_rgb_encoder SE-add \
    --upsampling learned-3x3-zeropad

For further information, use python train.py --help or take a look at src/args.py.

To analyze the model structure, use model_to_onnx.py with the same arguments to export an ONNX model file, which can be nicely visualized using Netron.

esanet's People

Contributors

Stargazers

Watchers

esanet's Issues

No module named 'cityscapesscripts'

'Hello, I have run the evaluation process:
"python eval.py
--dataset sunrgbd
--dataset_dir ./datasets/sunrgbd
--ckpt_path ./trained_models/sunrgbd/r34_NBt1D.pth",
and
python inference_dataset.py
--dataset sunrgbd
--dataset_dir ./datasets/sunrgbd
--ckpt_path ./trained_models/sunrgbd/r34_NBt1D_scenenet.pth
--batch_size 4

There always report "No module named 'cityscapesscripts'" ，because this line of code“ from cityscapesscripts.helpers.labels import labels”
But I didn't use the citycityscapes datasets ,How can I sovle it?

Thank you！

load_weight problem

Thank you for your great job!
i am a greenhand ,i have meet some problems when i analyse your code.
1:when i am prepared to train your code,i download the weight from the link,which is named train_r34_NBt1D.pth.Then i open the file:
encoder.conv1.weight <class 'str'>
encoder.bn1.weight <class 'str'>
encoder.bn1.bias <class 'str'>
encoder.bn1.running_mean <class 'str'>
encoder.bn1.running_var <class 'str'>
encoder.bn1.num_batches_tracked <class 'str'>
encoder.layer1.0.conv3x1_1.weight <class 'str'>

meanwhile i print the state_dict of model,it is like this:
encoder_rgb.conv1.weight
encoder_rgb.bn1.weight
encoder_rgb.bn1.bias
encoder_rgb.bn1.running_mean
encoder_rgb.bn1.running_var
encoder_rgb.bn1.num_batches_tracked
encoder_rgb.layer1.0.conv1.weight
encoder_rgb.layer1.0.bn1.weight
encoder_rgb.layer1.0.bn1.bias

i find the key in the weight and the key in the model.state_dict is not same thus i think the keys are not matching,how to use the model.load_state_dict in the build_model.py?

2: there are two branch in the strcture,so there are two Resnet34 branch,but i just input a weight,how does the two branch use a weight?

Preparing of Dataset

There is also "instances" in nyu_depth_v2_labeled.mat, why you load labels but not instances mask in nyuv2/prepare_dataset.py or class NYU

Pretrained models

First thank you for sharing the excellent work for us. I cannot open the links to download pretrained models, can you send them all to my email? Thank you very much, the email is [email protected].

The low mIoU of validation dataset

I have trained the model for 100 epochs on SUNRGBD with pretrained ResNet34 on Imagenet, and the best mIoU is still 10%. Is it normal?

My batchsize is 16, and the other hyper-parameters are set as default.

logs.csv doesn't exist

I'm having trouble with training ESA.

Pretrained weights link(https://drive.google.com/uc?id=1neUb6SJ87dIY1VvrSGxurVBQlH8Pd_Bi) provided here only has r34_NBt1D.pth file not the logs.csv file. So when I run the train.py, it shows like below.

FileNotFoundError: [Errno 2] No such file or directory: './trained_models/imagenet/logs.csv'
Where can I get logs.csv file?
Is there any possible way except for running imagenet_pretraining.py by myself?

Training Error

Hi Team, First of all thank you for providing open source of such good work.

I have the pretrained model in trained_models/sunrgbd/r34_NBt1D.pth and I got the eval.py to run successfully with mIoU replicating whats on the paper.

My goal is to mess with the parameters in args.py to compare the results to see what I can improve.

I have been trying to train the model only using --modality rgb.
I get the following error upon using train.py:

(rgbd_segmentation) dlee640@dlee640-lenovo:~/ESANet$ python train.py \
>     --dataset sunrgbd \
>     --dataset_dir ./datasets/sunrgbd \
>     --pretrained_dir ./trained_models/sunrgbd \
>     --results_dir ./results \
>     --modality rgb \
> 
Compute class weights
5285/5285
Saved class weights under /home/dlee640/ESANet/src/datasets/sunrgbd/weighting_median_frequency_1+37_train.pickle.
/home/dlee640/ESANet/src/build_model.py:29: UserWarning: Argument --channels_decoder is ignored when --decoder_chanels_mode decreasing is set.
  warnings.warn('Argument --channels_decoder is ignored when '
/home/dlee640/ESANet/src/models/resnet.py:101: UserWarning: parameters groups, base_width and norm_layer are ignored in NonBottleneck1D
  warnings.warn('parameters groups, base_width and norm_layer are '
Loaded r34 with encoder block NonBottleneck1D pretrained on ImageNet
./trained_models/sunrgbd/r34_NBt1D.pth
/home/dlee640/ESANet/src/models/model_one_modality.py:139: UserWarning: for the context module the learned upsampling is not possible as the feature maps are not upscaled by the factor 2. We will use nearest neighbor instead.
  warnings.warn('for the context module the learned upsampling is '
Traceback (most recent call last):
  File "train.py", line 552, in <module>
    train_main()
  File "train.py", line 92, in train_main
    model, device = build_model(args, n_classes=n_classes_without_void)
  File "/home/dlee640/ESANet/src/build_model.py", line 87, in build_model
    upsampling=args.upsampling
  File "/home/dlee640/ESANet/src/models/model_one_modality.py", line 164, in __init__
    num_classes=num_classes
TypeError: __init__() got an unexpected keyword argument 'height'

Why is it giving this error? How can I address this issue?

UPDATE: I fixed this issue by commenting out height and width parameters in model_one_modality.py. The part I commented out is shown below. But I have a new issue.

        # decoder
        self.decoder = Decoder(
            channels_in=channels_after_context_module,
            channels_decoder=channels_decoder,
            activation=self.activation,
            nr_decoder_blocks=nr_decoder_blocks,
            encoder_decoder_fusion=encoder_decoder_fusion,
            **height=height,
            width=width,**
            upsampling_mode=upsampling,
            num_classes=num_classes
        )

Upon running train.py, the process randomly gets terminated with 'Killed" message..

(rgbd_segmentation) dlee640@dlee640-lenovo:~/ESANet$ python train.py \
>     --dataset sunrgbd \
>     --dataset_dir ./datasets/sunrgbd \
>     --pretrained_dir ./trained_models/sunrgbd \
>     --results_dir ./results \
>     --height 480 \
>     --width 640 \
>     --batch_size 8 \
>     --batch_size_valid 24 \
>     --lr 0.01 \
>     --optimizer SGD \
>     --class_weighting median_frequency \
>     --encoder resnet34 \
>     --encoder_block NonBottleneck1D \
>     --nr_decoder_blocks 3 \
>     --modality rgb \
>     --encoder_decoder_fusion add \
>     --context_module ppm \
>     --decoder_channels_mode decreasing \
>     --fuse_depth_in_rgb_encoder SE-add \
>     --upsampling learned-3x3-zeropad 

Compute class weights
5285/5285
Saved class weights under /home/dlee640/ESANet/src/datasets/sunrgbd/weighting_median_frequency_1+37_train.pickle.
/home/dlee640/ESANet/src/build_model.py:29: UserWarning: Argument --channels_decoder is ignored when --decoder_chanels_mode decreasing is set.
  warnings.warn('Argument --channels_decoder is ignored when '
/home/dlee640/ESANet/src/models/resnet.py:101: UserWarning: parameters groups, base_width and norm_layer are ignored in NonBottleneck1D
  warnings.warn('parameters groups, base_width and norm_layer are '
Loaded r34 with encoder block NonBottleneck1D pretrained on ImageNet
./trained_models/sunrgbd/r34_NBt1D.pth
/home/dlee640/ESANet/src/models/model_one_modality.py:139: UserWarning: for the context module the learned upsampling is not possible as the feature maps are not upscaled by the factor 2. We will use nearest neighbor instead.
  warnings.warn('for the context module the learned upsampling is '
Device: cpu
ESANetOneModality(
  (activation): ReLU(inplace=True)
  (encoder): ResNet(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): NonBottleneck1D(
        (conv3x1_1): Conv2d(64, 64, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0))
        (conv1x3_1): Conv2d(64, 64, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1))
        (bn1): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
...
...
...
...
...
        (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128)
      )
      (side_output): Conv2d(128, 37, kernel_size=(1, 1), stride=(1, 1))
    )
    (conv_out): Conv2d(128, 37, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (upsample1): Upsample(
      (pad): Identity()
      (conv): Conv2d(37, 37, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=37)
    )
    (upsample2): Upsample(
      (pad): Identity()
      (conv): Conv2d(37, 37, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=37)
    )
  )
)
Compute class weights
5050/5050
Saved class weights under /home/dlee640/ESANet/src/datasets/sunrgbd/weighting_linear_1+37_test.pickle.
Using SGD as optimizer
Unfreezing
/home/dlee640/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:100: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
**Killed**

I get that "Killed" argument at the end and the training abruptly terminates! I cannot find any constructive logs on what actually happened, so my progress is halted. I tried both default training settings and the rgb only settings for training, and the same error happens. What is happening?

Make ESANet work with arbitrary input size/shape

Hi ESANet team!

Fantastic job!

We are trying to change the model to work with input dimension of 256x256.

We have successfully done that and trained the model with training samples with size 256x256.

But when we try to convert the model to ONNX using the provided "model_to_onnx.py" script, it keep raising the "ONNX export failed on adaptive_avg_pool2d because output size that are not factor of input size not supported" error.

We have no issue converting the model with the default dimensions (1,3,480,640) and (1,1,480,640).
480X480 and 640x640 appears to work too.

Since the script has no problem converting the model with the default dimensions, we don't think it is a onnx support issue. And since the model can be trained with samples of size 256x256, we don't think it's the model's issue either. The fact that 480X480 and 640x640 work means the model and onnx export can deal with square shape inputs, so it's not a input size ratio problem.

We wonder why the conversion can only deal with these two magic numbers and how to make it work with other input sizes.
(We are working on a FPS benchmark, so it is important that all the tested models share the same input size)

Thanks! : )

Can I use ESANet with a Realsense-camera in real time with pre-trained models?

First of all, thank you very much for releasing an excellent research result.
I'm trying to do an indoor segmentation project using ESANet, but I can't find a way to use the camera directly on the source code.
By any chance, can you tell me how to connect the camera and do segmentation in real time on EASNet?
I would really appreciate your reply.

compute_class_weights issue for 894 labels

Hello, thank you so much for making the code publicly available. It is very helpful and easy to read and follow. I'm facing an issue with compute_class_weights function for 894 labels. For 13 and 40 labels it is working fine, but for 894 labels it show class weighting contains NaN. Not sure what might be the reason. Is it because of the classMapping.mat file? can you explain this file's role?

The following arguments are required:--ckpt_path

Hi Dear,
Great work!
I installed all required dependencies and datasets but when I am running through the below error/issue:
"eval.py: error: the following arguments are required: --ckpt_path". Please see details after running "python eval.py --dataset..."

I got this type of errors for all syntaxes that include ckpt_path. Any hint or recommendation to resolve this issue will be highly appreciated.
Thank you very much

(py39) PS C:\ESANet> python eval.py --dataset nyuv2 --dataset_dir ./src/datasets/nyuv2 --ckpt_path ./trained_models/nyuv2/r34_NBt1D.pth
2021-06-24 12:45:50.804733: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
usage: eval.py [-h] [--results_dir RESULTS_DIR] [--last_ckpt PATH] [--pretrained_dir PRETRAINED_DIR] [--pretrained_scenenet PRETRAINED_SCENENET]
[--no_imagenet_pretraining] [--finetune FINETUNE] [--freeze FREEZE] [--batch_size BATCH_SIZE]
[--batch_size_valid BATCH_SIZE_VALID] [--height HEIGHT] [--width WIDTH] [--epochs N] [--lr LR] [--weight_decay WEIGHT_DECAY]
[--momentum M] [--optimizer {SGD,Adam}] [--class_weighting {median_frequency,logarithmic,None}]
[--c_for_logarithmic_weighting C_FOR_LOGARITHMIC_WEIGHTING] [--he_init] [--valid_full_res] [--activation {relu,swish,hswish}]
[--encoder {resnet18,resnet34,resnet50}] [--encoder_block {BasicBlock,NonBottleneck1D}]
[--nr_decoder_blocks NR_DECODER_BLOCKS [NR_DECODER_BLOCKS ...]] [--encoder_depth {resnet18,resnet34,resnet50,None}]
[--modality {rgbd,rgb,depth}] [--encoder_decoder_fusion {add,None}] [--context_module {ppm,None,ppm-1-2-4-8,appm,appm-1-2-4-8}]
[--channels_decoder CHANNELS_DECODER] [--decoder_channels_mode {constant,decreasing}]
[--fuse_depth_in_rgb_encoder {SE-add,add,None}] [--upsampling {nearest,bilinear,learned-3x3,learned-3x3-zeropad}]
[--dataset {sunrgbd,nyuv2,cityscapes,cityscapes-with-depth,scenenetrgbd}] [--dataset_dir DATASET_DIR] [--raw_depth]
[--aug_scale_min AUG_SCALE_MIN] [--aug_scale_max AUG_SCALE_MAX] [-j N] [--debug] --ckpt_path CKPT_PATH
eval.py: error: the following arguments are required: --ckpt_path

what's the difference between 'depth' and 'depth_bfx'?

Hi, thx for ur great job!
Would you plz tell me what the difference between 'depth' and 'depth_bfx' in the SUNRGBD dataset is ?
And are the depth images saved in SUNRGBD encoded in 16UC1 ?
Thx!

The fps cannot reach 29.7 on agx by tensorrt

This's a great project, It shows only 15.16 fps which I run the ESANet-R34-NBt1D model on the jetson AGX by tensorrt. I confused where wrong

the issue about confusion matrix

i’m sorry to bother you again.i don't understand some codes in the ''confusion_matrix.py''.
①where are the numbers [0,0,1,2,3,] and [1, 1, 0, 2, 3] come from?The categorys of dataset is more than 4.
②why do you give the prediction before training?
③Is that number just for testing the function of confusion matrix?
Thank you!

if name == 'main':
# test if pytorch confusion matrix and tensorflow confusion matrix
# compute the same
label = np.array([0, 0, 1, 2, 3])
prediction = np.array([1, 1, 0, 2, 3])

cm_tf = ConfusionMatrixTensorflow(4)
cm_pytorch = ConfusionMatrixPytorch(4)
miou = miou_pytorch(cm_pytorch)

What is the difference between PPM and APPM?

Sorry for asking a lot questions...
Your work is fantistic! It works great on my own dataset.
I'm now reading the code and try to understand how some essential parts work.

I have a question about the APPM.
I'm little confused about the difference between the PPM and APPM, and why appm-1-2-4-8 on Cityscapes performs better results.
In the code, there is

bin_multiplier_h = int((h / h_inp) + 0.5)
bin_multiplier_w = int((w / w_inp) + 0.5)

I think that h and h_inp are the same, then the bin_multiplier_h is 1. So h_pool = bin.
Then it does a advative pooling, the result size is (bin, bin).
It seems the same with PPM.

Could you explain more about the difference? Thanks you very much.

how to solve the below question?

urllib.error.HTTPError: HTTP Error 403: Forbidden

Training loss, validation loss and mIOU curves

I had short question. Do you guys have loss curves for training and validation as well as mIOU curves with respect to epochs? I am using a custom dataset with equal distribution of classes in the training and validation set. I just have 4 classes but there there are certain classes which appear more compared to others. The distribution is close to 74%,11%,12%,3%. I've noticed that my validation decreases and reaches its lowest around 15% of the total epochs. However after that it starts increasing slightly and stays around the same range but the mIOU on the whole shows an upward trend. I just wanted to check if you have loss and miou curves to have some comparison. Thank you. Great job with ESANet! This shouldn't be included in the issues section, but, I couldn't find a discussion section to post this.
Best,
Bryan

Adaptation to New Dataset

Hi,

Could you write a basic instruction for preparation for a new Dataset?

The results of evaluation

Hello, I have run the evaluation process:
"python eval.py
--dataset sunrgbd
--dataset_dir ./datasets/sunrgbd
--ckpt_path ./trained_models/sunrgbd/r34_NBt1D.pth",
where are the results saved?

onnx inference problem

I convert the model to onnx using the model_to_onnx.py, with the args:--dataset nyuv2 --last_ckpt ./trained_models/nyuv2/r34_NBt1D.pth
the model can be converted to model.onnx,

the input shape is :
NodeArg(name='rgb', type='tensor(float)', shape=[1, 3, 480, 640]) NodeArg(name='depth', type='tensor(float)', shape=[1, 1, 480, 640])

but when I use the onnx to do inference, using device = "cpu", not using cuda
error occurred:
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'' Status Message: Input channels C is not equal to kernel channels * group. C: 514 kernel channels: 1024 group: 1

I cannot understand why the channels does not match, can you give me a hint?
here is my inference code:

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

if __name__ == '__main__':
    img_path = "~/dataset/nyuv2/test/rgb/0028.png"
    depth_path = "~/dataset/nyuv2/test/depth_raw/0028.png"
    img = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
    depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
    if img.ndim == 3:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    depth = depth.astype('float32') * 0.1

    #preprocess
    #not write for  short

    session = onnxruntime.InferenceSession("./onnx_models/model.onnx")
    inputs = {session.get_inputs()[0].name: to_numpy(img), session.get_inputs()[1].name: to_numpy(depth)}
    outs = session.run(None, inputs)[0]

inputs shape:
'rgb': ndarray:(1, 3, 480, 640)
'depth': ndarray: (1, 1, 480, 640)

ESANet-R18-NBt1D pretrained weights

Hi @danielS91,
Thanks for your work!
Could you provide ESANet-R18-NBt1D pretrained weights on ImageNet?

Is it able to segment unseen object?

Hi,
Is your work able to segment unseen objects, which are not labelled?
For example, if there is a big box in the center of room, and it is not labelled, then ESANet segment it without classification? Or just ignore it?

Train my own dataset without void class

Hello,
I want to train the model on my own dataset, which has no void class.
I changed the code in dataset_base.py and train.py as following.

dataset_base.py:

train.py:

The training code starts to run, but the mIoU for each epoch is very small even after 50 epoches.
It seems to go wrong after my changes.
Could you give me some advice about how to make it right?

Thank you.

The mIOU of trt model on Xavier is only 6.4

I converted it into an engine file according to the process and evaled the nyuv2 data, I get mIOU 6.45. I have changed multiple ways of onnx2trt, but none of them resolved

Rgbd_segmentation (Depth image)

I'm using rgbd segmentation pretrained model. Can you tell me, how to get a depth image of room?

Thanks, in advance.

About 3D target detection

Sorry to bother you again, and I saw that the video you posted seems to have content about 3D target detection. Have you published any related papers? Or what method do you use to achieve that.

cannot compute argmax and copy back to cpu when use Tensorrt

Hello, This is a great job! But I found a question that the program cannot compute argmax and copy back to cpu when I use tensorrt to speed up the model. So It's very slow beacuse use np.argmax()

Cannot dlopen some GPU libraries

Hi all,
I followed ReadMe for installation but I got lib errors when I train models via train.py. I have checked tf-GPU, cuda and cudnn versions. Could you help me figure it out?

Using Mutiple GPUs

Hi, I found only the GPU 0 will be used in training, which leads to the low efficiency in training.

tensor is not a torch image

Hello，I have a new problem. I want to test this model on my samples . I have got the rgb images and depth images .But i can not run the inference_samples.py normally .There report 'tensor is not a torch image' . Can you help me? Thank you ~

The factors of 4 loss

I'm studying your network,it is very good and interesting.I spent much time in looking for 4 factors of 4 training loss.
In other words,i can not find a,b,c,d, in All-training-loss =a * loss_from_the_last_output +b * loss_from_first_decodermodule
+c * loss_from_second_decodermodule +d * loss_from_third_decodermodule .what are them and how can i change them.I'd appreciate it if you could give me a cue.Thank you very much.

can not train on SUNRGBD

I change the SE model,and I want to train the neural network on SUNRGBD.

I run the prepare_datasets.py .extract the images 、depths and labels.

When i run the train.py ,there report :

Compute class weights
Traceback (most recent call last):
File "train.py", line 552, in
train_main()
File "train.py", line 87, in train_main
c=args.c_for_logarithmic_weighting)
File "/data/run01/scv2391/ESANet-main/src/datasets/dataset_base.py", line 169, in compute_class_weights
label = self.load_label(i)
File "/data/run01/scv2391/ESANet-main/src/datasets/sunrgbd/pytorch_dataset.py", line 134, in load_label
label_dir[idx])).astype(np.uint8)
File "/data/home/scv2391/.conda/envs/rgbd_segmentation/lib/python3.7/site-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'output_path-sun/SUNRGBD/kv2/kinect2data/000065_2014-05-16_20-14-38_260595134347_rgbf000121-resize\label/label.npy'

But the label.npy exists .IS the '\' due to the error ? How can i correct it ？ thank you ！

fix typo.

https://github.com/TUI-NICR/ESANet/blob/main/eval.py#L20

        description='Efficient RGBD Indoor Sematic Segmentation (Evaluation)',

Evaluate with SUNRGBD Dataset with the weight file which is trained on Scenenet Dataset

I have some questions about your model ESANet.

I trained your ESANet model on the scenenetrgbd dataset. And I want to evaluate the weight file (which calculated through scenenetrgbd dataset) in the SUNRGBD dataset. But the scenenet dataset has only 13 classes, while SUNRGBD dataset has 37 classes. Due to the number of classes are mismatched, the error occured. Is it possible to evaluate the scenenet pretrained weight file to SUNRGBD dataset?

Thank you for reading my issue.

Acceptance of varying size of images

Hello,
I try to use this architecture to train my own dataset for segmentation. I wrote a dataloader for my dataset. My dataset contains RGB images, depth images and corresponding annotation segmentation. It does training OK but when it comes to validate(), it gives error saying something like: Tensor sizes do not match. 480 is expected but 476 is got.

I know images in my dataset vary in terms of height and weight in both test and train set. I did not change parameters weight and height.

I do not understand why it doesn't accept an image with width/height with 476. How can I make the architecture accept varying size of images? I does not give error while training but gives in validation. This is also weird to me.

Thanks in advance.

A question about the unit of depth value(m or mm)

Thanks a lot for your work!

I have a question about the unit of depth value, should it be meters or milimeters?
In nyuv2 files, it seems you change the unit to milimeters, but I didnot find similar process in Cityscapes data.

cityscapes weights not working

while using cityscapes weights,faced issue as runtime error occurs:
RuntimeError: Error(s) in loading state_dict for ESANet:

RGB inference only

Dear team,
Thank you for this awesome project. My question is can I run segmentation inference with RGB input images only?

How can I change resnet18 to mobilenetv2?

I have a question about your ESANet model inference time.

Firstly, I really appreciate about your clean and kind code.

I have trained ESANet model with SUNRGBD dataset and scenenet dataset. For the SUNRGBD dataset, when i implemented the code prepare_dataset.py in the src colder, the refined depth data was generated, and when i train the depth mode to 'refined', it was trained with the refined depth dataset. Also when i downloaded scenenet dataset, the depth file was refined depth. And I have a question here, in the paper of this model, is the inference time that you mentioned in Fig 4. and Fig 5. include the data pre-processing time (making the raw depth to refined depth)? Without refining the depth data, is it able to implement in the real-time application?

Thank you for reading my issue.

Pretrained model with ResNet50 backbone.

Thank you for the great work, could you provide the pretrained model with ResNet50 backbone. Thank you very much.

How to pre-process depth when the train and test data are taken from different distance?

The train and test data I am using was taken from different distance.
Say the train data is taken from 30m, and test data is taken from 60m.

Do I need to pre-processing the test data (eg. depth/60*30) to make it has the approximate same depth range with the train data?
Thanks.

lower mIoU than the result of the paper

Hi,
I used the .yml file to create conda environment and default setting to train NUYv2 and the result 47. 48 is much lower than the result of the paper 50.3.

Does anyone get a similar result as follows?
mIoU test kv1: 0.4748908361165817
mIoU test: 0.4748908361165817
0.4748908361165817

What do you think is the reason for this?

Do you use the maximum or average in the paper ? Do you use different hyperparameters?

# or use the default arguments python train.py \ --dataset nyuv2 \ --dataset_dir ./datasets/nyuv2 \ --pretrained_dir ./trained_models/imagenet \ --results_dir ./results

How to Dataset Inference in cityscapes‘ image

I runed a command：python inference_samples.py --dataset cityscapes --ckpt_path ./trained_models/cityscapes/r34_NBt1D_half.pth --depth_scale 1 --raw_depth
then changed sample_rgb.png and sample_rgb.png in the test in ./samples but this segmentation is terrible. I do not know why,how can I slove this question .
thanks.Look forward to your reply！

Void pixels in ground truth labels

Hello,

I'm trying to figure out how the void pixels affect the mIou and the loss computation.
The labels are being decremented by 1 in the code so that class numbers in the prediction and the ground truth labels will match.
So to my understanding, there are pixels with the value -1 in the label image but the prediction doesn't include void pixels.
Are those pixels being ignored or being considered as true/false predictions?

Thanks,
Shany

Where to find Dataset Inference results

Hi，thanks for your great work！
I would like to follow your work and reproduce your paper segmentation results on NYUv2 with the pre-trained model you provided, when I run the program, where can I find the segmented results, it should be shown as loading the pre-trained model currently, It's over!

Loss is None while training

I tried a training with the SUNRGBD dataset, and got the error Loss is None. Inspecting the code, it seems like it can only be caused by the loss function in ESANet/src/utils.py, and specifically here:

number_of_pixels_per_class = torch.bincount(targets.flatten().type(self.dtype), minlength=self.num_classes) divisor_weighted_pixel_sum = torch.sum(number_of_pixels_per_class[1:] * self.weight) # without void losses.append(torch.sum(loss_all) / divisor_weighted_pixel_sum)

My assumption is that divisor_weighted_pixel_sum can be 0 with some very 'unlucky' random cropping.

The following modification seems to solve the problem:
divisor_weighted_pixel_sum = torch.sum(number_of_pixels_per_class[1:] * self.weight).clamp(min=1e-5) # without void

Let me know if you ever experienced something similar, or if you have a better fix.

It uses the NYU pre-training model to verify the semantic segmentation effect on the Replica dataset

Appreciating your excellent work !!!
In order to perform semantic segmentation on my Replica dataset using your pre-trained model from NYU, I have run into a few issues. I would appreciate your assistance and response.
The Replica dataset contains 900 images and 88 class labels. I changed the files under the nyuv2 data with the RGB, labels_88, and labels_88_colored files from Replica. The program displays an error when I run it.
How can I change the program so that it can adapt to the new dataset, even if some classes won't be split, since the class specified in your program is 40 and mine is 88? ?

add and SE-add

hi!，Dear author：
I see two modules in the Model，add and SE-add.I did SE-add,in Reasoning process,I'm using the same RGB and a different depth as the input,I get the same result, why isn't my depth much affected，Did I input the model in the wrong way？

Batch size = 1 leads to error

I tried to train with default parameters and sunrgbd dataset with batch_size set to 1. The following error arises:

  File "train.py", line 577, in <module>
    train_main()
  File "train.py", line 206, in train_main
    label_downsampling_rates, args.epochs, args.results_dir, monitoring_images, debug_mode=args.debug)
  File "train.py", line 310, in train_one_epoch
    pred_scales = model(image, depth)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/matteo/Code/Semantic_Segmentation/ESANet/src/models/model.py", line 238, in forward
    out = self.context_module(fuse)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/matteo/Code/Semantic_Segmentation/ESANet/src/models/context_modules.py", line 72, in forward
    y = f(x)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "/home/matteo/anaconda3/envs/rgbd_segmentation/lib/python3.7/site-packages/torch/nn/functional.py", line 1666, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

Any hint on how to fix it?

tui-nicr / esanet Goto Github PK

esanet's Introduction

ESANet: Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis

License and Citations

Setup

Content

Evaluation

Inference

Dataset Inference

Sample Inference

Time Inference

Training

esanet's People

Contributors

Stargazers

Watchers

Forkers

esanet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs