GithubHelp home page GithubHelp logo

speedinghzl / pytorch-deeplab Goto Github PK

View Code? Open in Web Editor NEW
263.0 5.0 62.0 78 KB

DeepLab-ResNet rebuilt in Pytorch

License: MIT License

Python 100.00%
pascal-voc pytorch deeplab-resnet semantic-segmentation

pytorch-deeplab's Introduction

DeepLab-ResNet-Pytorch

New! We have released Pytorch-Segmentation-Toolbox which contains PyTorch Implementations for DeeplabV3 and PSPNet with Better Reproduced Performance on cityscapes.

This is an (re-)implementation of DeepLab-ResNet in Pytorch for semantic image segmentation on the PASCAL VOC dataset.

Updates

9 July, 2017:

  • The training script train.py has been re-written following the original optimisation setup: SGD with momentum, weight decay, learning rate with polynomial decay, different learning rates for different layers, ignoring the 'void' label (255).
  • The training script with multi-scale inputs train_msc.py has been added: the input is resized to 0.5 and 0.75 of the original resolution, and 4 losses are aggregated: loss on the original resolution, on the 0.75 resolution, on the 0.5 resolution, and loss on the all fused outputs.
  • Evaluation of a single-scale model on the PASCAL VOC validation dataset (using 'SegmentationClassAug') leads to 74.0% mIoU 'VOC12_scenes_20000.pth' without CRF as post-processing step. The evaluation of multi-scale model is in progress.

Model Description

The DeepLab-ResNet is built on a fully convolutional variant of ResNet-101 with atrous (dilated) convolutions, atrous spatial pyramid pooling, and multi-scale inputs (not implemented here).

The model is trained on a mini-batch of images and corresponding ground truth masks with the softmax classifier at the top. During training, the masks are downsampled to match the size of the output from the network; during inference, to acquire the output of the same size as the input, bilinear upsampling is applied. The final segmentation mask is computed using argmax over the logits. Optionally, a fully-connected probabilistic graphical model, namely, CRF, can be applied to refine the final predictions. On the test set of PASCAL VOC, the model achieves 79.7% with CRFs and 76.4% without CRFs of mean intersection-over-union.

For more details on the underlying model please refer to the following paper:

@article{CP2016Deeplab,
  title={DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs},
  author={Liang-Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille},
  journal={arXiv:1606.00915},
  year={2016}
}

Dataset and Training

To train the network, one can use the augmented PASCAL VOC 2012 dataset with 10582 images for training and 1449 images for validation. Pytorch >= 0.4.0.

You can download converted init.caffemodel with extension name .pth here. Besides that, one can also exploit random scaling and mirroring of the inputs during training as a means for data augmentation. For example, to train the model from scratch with random scale and mirroring turned on, simply run:

python train.py --random-mirror --random-scale --gpu 0

Evaluation

The single-scale model shows 74.0% mIoU on the Pascal VOC 2012 validation dataset ('SegmentationClassAug'). No post-processing step with CRF is applied.

The following command provides the description of each of the evaluation settings:

python evaluate.py --help

Acknowledgment

This code is heavily borrowed from pytorch-deeplab-resnet.

Other implementations

pytorch-deeplab's People

Contributors

speedinghzl avatar speedingwisp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pytorch-deeplab's Issues

*** KeyError: 'unexpected key "weight" in state_dict'

I am trying to load VOC12_scenes_20000.pth onto Res_Deeplab
Running this,

    saved_state_dict = torch.load(args.restore_from)
    new_params = model.state_dict().copy()
    for i in saved_state_dict:
        #Scale.layer5.conv2d_list.3.weight
        i_parts = i.split('.')
        # print i_parts
        if not args.num_classes == 21 or not i_parts[1]=='layer5':
            print(i)
            new_params['.'.join(i_parts[1:])] = saved_state_dict[i]
    model.load_state_dict(new_params)

gives me
*** KeyError: 'unexpected key "weight" in state_dict'

This is a fork of another repo: please cite original repo

Hi, I am the author of this repository. Your repository was brought to my notice recently. It seems that you have used a lot of my code and have not cited my repository. My repository has an MIT license. I would really appreciate if you cite my repository in your readme and mention that code in repo has been borrowed from my repo.
Thanks.

There are some errors when I changed 'NUM_CLASSES'.

i chang NUM_CLASSES=21 to NUM_CLASSES=2 ,but i meet a problem, as
RuntimeError: inconsistent tensor size, expected tensor [2 x 2048 x 3 x 3] and src [21 x 2048 x 3 x 3] to have the same number of elements, but got 18432 and 387072 elements respectively at /pytorch/torch/lib/TH/generic/THTensorCopy.c:86_.
it seems that pre-trained model weight doesn't match 'NUM_CLASSES=2' network,

model.load_state_dict(new_params)

how do i solve this problem?

Why 'cudnn.enabled = False'?

Hi, @speedinghzl

Thanks for your sharing. I am a little curious why you set 'cudnn.enabled = False'? I set it to True and find it is faster and use less GPU memory. Will it have a bad influence on accuray? Thank you.

Crf model

Hi, i am sorry but where can i find the crf implementation?

multiscale model has lower IOU performance

Hi @speedinghzl ,

I used your pytorch deeplab v2 implementation, same setting, with torch version of 0.4.0, I used batch size of 16 to train the multi-scale model and evaluated at evaluate.py. However I checked iteration at 10000, 20000, 30000... the meanIOU performance remains at 0.55~0.59.
Could you give me any comments for what I did wrong?
Here is the parameter I used:

BATCH_SIZE = 16
ITER_SIZE = 10
IGNORE_LABEL = 255
INPUT_SIZE = '321,321'
LEARNING_RATE = 2.5e-4
NUM_STEPS = 500000
POWER = 0.9
RANDOM_SEED = 1234
RESTORE_FROM = './model/MS_DeepLab_resnet_pretrained_COCO_init.pth'
SAVE_NUM_IMAGES = 2
SAVE_PRED_EVERY = 50
SNAPSHOT_DIR = './snapshots_msc/'
WEIGHT_DECAY = 0.0005

Thank you for your help!

mean iou?

if not self.M[i, i] == 0:

if self.M[i, i] == 0, then jaccard_perclass will do nothing and class i will be dropped. len(jaccard_perclass) will be num_classes -1 ? The results will be inaccurate. I think if self.M[i, i] == 0, the iou of class i will be 0 and shouldn't be droped.

Some questions about image transposition

Hello, I am a master student in Beijing. I am reading your codes and I am confused about some questions.

In datasets.py, your 'Class VOCDataSet' reads image using 'cv2.imread', so your image should be 'BGR' channel order. But then you use 'image = image.transpose((2, 0, 1))', so the image should be 'RBG' channel order. This really confused me a lot because I think that it should be 'RGB'.

Also, in datasets.py, when you show image, you use 'img = np.transpose(img, (1, 2, 0)), img = img[:, :, ::-1]'. And I also think that this way may generate an image that is not in 'RGB' channel order.

It may just be my misunderstanding. But I am really looking forward to your reply. Thanks!!!

Run on non-CUDA device

Hi, can I run the code on my laptop where I don't have a GPU that supports CUDA? Can you guide me how to make it work? Cheers.

CuDNN error: CUDNN_STATUS_INTERNAL_ERROR

hi, i use it to train for my own data set, but it run error, can u give me some suggestion? (my segmentation class img is gray image).

/opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [190,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [191,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "train.py", line 234, in
main()
File "train.py", line 215, in main
loss.backward()
File "/home/zhangjunyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/zhangjunyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CuDNN error: CUDNN_STATUS_INTERNAL_ERROR

bug report

Hi, thanks for your code.

In train.py, function 'loss_calc ' compute log_softmax for parameter 'pred' , but in CrossEntropy2d, the code use F.cross_entropy which will compute log_softmax again, so I think this maybe a bug. Maybe we should delete line 103 and 105 of train.py?

I have little question about gpu

Thanks for your code, I have learned a lot from it. But there are some question that troubles me.
When I run code, I set args.gpu =1, but I find the program always occupy memory in gpu0.

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 6721 C python 399MiB |
| 1 6721 C python 7215MiB |
+-----------------------------------------------------------------------------+

I don't know where is the problem, can you solve this? Thanks a lot.

miou result can not achieve 74%

HI! @speedinghzl , thanks for you code!
I used the VOC12_scenes_20000.pth model you provided to run the test code. I only change the path to load data, any other thing was changed in my operation. The Miou results achieved 0.693386318233 in the single scale model , when i use the multi scale model you provided , the result is also 69%. Can you help me ? or give me some idea about it.
Thank you very much! Good luck to u.

CUDA error: out of memory

Hi,
By not changing anything on the code, i ve ran train.py for the augmented pascal voc dataset. But im getting CUDA error. Might you have a suggestion on how i can solve this?

A detailed error message is as below.

Thanks in advance.
sinem.

 File "train.py", line 234, in <module>
   main()
 File "train.py", line 213, in main
   pred = interp(model(images))
 File "/home/sinem/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
   result = self.forward(*input, **kwargs)
 File "/media/sinem/LENOVO/Pytorch-Deeplab-master/deeplab/model.py", line 261, in forward
   x = self.layer5(x)
 File "/home/sinem/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
   result = self.forward(*input, **kwargs)
 File "/media/sinem/LENOVO/Pytorch-Deeplab-master/deeplab/model.py", line 115, in forward
   out += self.conv2d_list[i+1](x)
 File "/home/sinem/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
   result = self.forward(*input, **kwargs)
 File "/home/sinem/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
   self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

only use boundary information?

hi, i have a questions about the label maps used in the code.
the folder "SegmentationClassAug" seems only have boundary information rather than class information. So, only boundary information is enough or i have a false understanding of the code and the original paper?

Lower validation accuracy

First of all, thanks for providing your code.
When I use your uploaded VOC12_scenes_20000.pth model to evaluate the single scale model on 1449 validation data, I get 71.8% which is much lower than reported 74.0%.
Moreover, when I try to train the single scale model by running python train.py --random-mirror --random-scale --gpu 0, and then evaluate the trained model, I get 73.1%, which is still lower than 74%.
Do you know where this discrepancy might come from?
Everything is on default settings and I also use MS_DeepLab_resnet_pretrained_COCO_init.pth as the pre-trained model.

Zero_grad only once?

Not quite sure about this, but I think we should write optimizer.zero_grad() in the for loop?

questions about data pre-processing

Hi,
thanks for the code sharing
have a few questions about the implementation.

  1. why load a net pre trained on coco instead of using a pre trained net on Imagenet which is inbuilt in pytorch
    2.why swap dimensions of 'R' and 'B'

Slightly lower validation accuracy in Pytorch 1.0.0

Hi @speedinghzl ,

I used your pytorch deeplab v2 implementation, same setting, with pytorch version of 1.0.0. The validation mIoU of VOC_scenes_20000.pth is 71.1. Is this degradation due to randomness? Could you give me any comments?
Here is the parameter I used:

BATCH_SIZE = 10
DATA_DIRECTORY = './dataset/voc12'
DATA_LIST_PATH = './dataset/list/train_aug.txt'
IGNORE_LABEL = 255
INPUT_SIZE = '321,321'
LEARNING_RATE = 2.5e-4
MOMENTUM = 0.9
NUM_CLASSES = 21
NUM_STEPS = 20000
POWER = 0.9
RANDOM_SEED = 1234
RESTORE_FROM = './dataset/MS_DeepLab_resnet_pretrained_COCO_init.pth'
SAVE_NUM_IMAGES = 2
SAVE_PRED_EVERY = 1000
SNAPSHOT_DIR = './snapshots/'
WEIGHT_DECAY = 0.0005

Thank you for your help!

The normalization in data pre-processing

I notice that the normalization of images in your code uses "image - mean," rather than the normal "(image - mean)/std". Is there a specific reason for using this kind of normalization?

For the scale of pixel values, your code is in the scale of [-128,128], rather than [-0.5,0.5] that are used in other works. Which one is better?

Also, if I use only the ImageNet pre-trained model, which kind of image normalization and pixel scale should I use?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.