isht7 / pytorch-deeplab-resnet Goto Github PK

View Code? Open in Web Editor NEW

601.0 22.0 121.0 646 KB

DeepLab resnet v2 model in pytorch

License: MIT License

Python 100.00%

deep-learning deeplab pytorch semantic-segmentation deeplab-resnet pascal-voc computer-vision

pytorch-deeplab-resnet's People

Contributors

Stargazers

Watchers

Forkers

sunjieee ksharpdabu wanjinchang soonminhwang benjamesbabala clcarwin hyzcn yghlc bearpaw martinxm durandtibo tanxchong jwgu jxchen01 diggerdu seoungwugoh luchak tanhang lkgv qqgeogor youngfly11 lxx1991 goatmessi7 pustar tangxinkevin andfoy kekeblom wangye777 xibinyue zhiwenshao shujonnaha shubhampachori12110095 neoamos mahlermozart aresthu serjioteh mgarbade changyangli omitum yougoforward taihulight yiming989 alrikai storife xmba15 gfl699468 lan1991xu li-js matlmr fatterzhang dongsky feifeibear lijiannuist amandaynzhou juicechen95 kamiyuanyang wpfhtl cwell pplntech ben300694 ganyushi camixxx cattigers niluanwudidadi pumpkin007 hzhang57 aimeng100 tianconghua pinglmlcv jinyige hsouporto yangyongguang tglubenov maliang668 guofenggitlearning nightqing zkghit hyk13213036 cenqian yangsenius sunshine352 tuiqdymy thebrianzhou aeonstasis summer1719 zhukequan xzhang311 luxiaohao kreiswolke deepcharle ackick ericking19 ddeeppnneett andyzhang59 y0728 ycjing zxforchid liuzi919 wenhao-yang jinkuih

pytorch-deeplab-resnet's Issues

How do get car mask only?

I used your code for car segmentation. I loaded the pretrained model. But there are 21 labels in VOC. I just want to get mask of cars while not display other labels on the result image. What should I do?

How can I train my own dataset?

Thank you for sharing this code, I can train VOC database by your code.

Issue 1：I can train VOC，but it stoped at 17636 when I set max iter is 40000. I want to know the reason,

Issue 2：I want to train my own dataset, which has 7 classes. But the model dict size of the init model is for 21 classes. How can I train this network without init model using my dataset?

Thank you very much! Look forward to your reply! @isht7

Fine tuning on a smaller GPU

Hi,

I have a datase with 2 classes in VOC format. as I realized, you have prepared the fine tuning by some flags. Correct me if I'm wrong.

Besides, my GPU is GTX1060 with 6G memory. Does the calculated memory consumption belongs to the full 21 classes original VOC? I mean can I train the model on this small dataset?

Dataset ?

Hi, great code!

Could you specify exactly which dataset corresponds to train_aug.txt and val.txt and where to get it ? It is the augmented VOC2012, right?

segmentation results and iteration number setting

Why do you use 20000 iteration size with only batch_size=1? In this way, each training image is only passed for 2 times to get the final model.

Another question is that I have run evalpyt2.py and your trained VOC12_scenes_16000.pth model to test. But I only got a validation performance of 66.7%. Do you know why it is so low? Thanks!

how to solve this error?

runfile('C:/Users/Mubashir/Desktop/pytorch-deeplab-resnet-master/train.py', wdir='C:/Users/Mubashir/Desktop/pytorch-deeplab-resnet-master')
Reloaded modules: torch._utils, torch.version, torch._C, torch.serialization, torch._tensor_str, torch.storage, torch.tensor, torch.functional, torch.cuda, torch.cuda.random, torch.cuda.sparse, torch.sparse, torch.cuda.nvtx, torch.cuda.streams, torch.autograd, torch.autograd.variable, torch.utils, torch.utils.hooks, torch.autograd._functions, torch.autograd._functions.basic_ops, torch.autograd.function, torch._six, torch.autograd._functions.utils, torch.autograd._functions.tensor, torch.autograd._functions.pointwise, torch._thnn, torch._thnn.utils, torch.autograd._functions.reduce, torch.autograd._functions.linalg, torch.autograd._functions.blas, torch.autograd._functions.stochastic, torch.autograd.stochastic_function, torch.autograd._functions.compare, torch.autograd.gradcheck, torch.nn, torch.nn.modules, torch.nn.modules.module, torch.nn.backends, torch.nn.backends.thnn, torch.nn.backends.backend, torch.nn._functions, torch.nn._functions.thnn, torch.nn._functions.thnn.auto, torch.nn._functions.thnn.auto_double_backwards, torch.nn._functions.thnn.normalization, torch.nn._functions.thnn.activation, torch.nn._functions.thnn.pooling, torch.nn.modules.utils, torch.nn._functions.thnn.sparse, torch.nn._functions.thnn.upsampling, torch.nn._functions.thnn.rnnFusedPointwise, torch.nn._functions.thnn.batchnorm_double_backwards, torch.nn._functions.rnn, torch.backends, torch.backends.cudnn, torch.nn.functional, torch.nn._functions.linear, torch.nn._functions.padding, torch.nn._functions.vision, torch.backends.cudnn.rnn, torch.nn._functions.dropout, torch.nn._functions.activation, torch.nn._functions.loss, torch.nn.parameter, torch.nn.modules.linear, torch.nn.modules.conv, torch.nn.modules.activation, torch.nn.modules.loss, torch.nn.modules.container, torch.nn.modules.pooling, torch.nn.modules.batchnorm, torch.nn.modules.instancenorm, torch.nn.modules.dropout, torch.nn.modules.padding, torch.nn.modules.normalization, torch.nn.modules.sparse, torch.nn.modules.rnn, torch.nn.utils, torch.nn.utils.rnn, torch.nn.utils.clip_grad, torch.nn.utils.weight_norm, torch.nn.modules.pixelshuffle, torch.nn.modules.upsampling, torch.nn.modules.distance, torch.nn.parallel, torch.nn.parallel.parallel_apply, torch.nn.parallel.replicate, torch.cuda.comm, torch.cuda.nccl, torch.nn.parallel.data_parallel, torch.nn.parallel.scatter_gather, torch.nn.parallel._functions, torch.nn.parallel.distributed, torch.distributed, torch.nn.init, torch.optim, torch.optim.adadelta, torch.optim.optimizer, torch.optim.adagrad, torch.optim.adam, torch.optim.adamax, torch.optim.asgd, torch.optim.sgd, torch.optim.rprop, torch.optim.rmsprop, torch.optim.lbfgs, torch.multiprocessing, torch.multiprocessing.reductions, torch.utils.backcompat
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Mubashir/Desktop/pytorch-deeplab-resnet-master/train.py', wdir='C:/Users/Mubashir/Desktop/pytorch-deeplab-resnet-master')

File "D:/Anaconda3/lib/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "D:/Anaconda3/lib/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Mubashir/Desktop/pytorch-deeplab-resnet-master/train.py", line 1, in
import torch

File "D:\Anaconda3\lib\site-packages\torch_init_.py", line 358, in
from . import _torch_docs, _tensor_docs, _storage_docs

File "D:\Anaconda3\lib\site-packages\torch_torch_docs.py", line 15, in
""")

RuntimeError: function 'abs' already has a docstring

the result?

Hi,Good job.
I used your trained model, VOC12_scenes_16000.pth, to test,But get the lower mIou 66.35%，not get your mIou result.

read the ground truth of the pascal voc dataset

hello,
the code you use to read the ground truth of dataset is

    gt_temp = cv2.imread(os.path.join(gt_path,piece+'.png'))[:,:,0]
    gt_temp[gt_temp == 255] = 0

why there is "[:,:,0]" for the ground truth?
thank you very much!
and when I use cv2 to read the ground truth of pascal voc dataset,

import cv2
import numpy as np
img_cv= cv2.imread('/Users/zhangying/Desktop/2007_000129.png')
img_array_cv = img_cv[:,:,0]
print('use the method CV\n ',np.unique(img_array_cv))

the output is

use the method CV
  [  0 128 192]

however the sum of the categories is 21 for voc dataset(128>20,192>20, is it reasonable?). How do you think about this problem? Thanks a lot!

model.eval() will change mean and var

when set model.eval(), the BN layers weights and bias are fixed, but var and mean will changed when finetuned, is there any influences if var and mean change or should set them fixed using momentum=0?

why training from scratch is worse than caffe implementation?

As said in your README, training from scratch by using your code is 3 point worse than the caffe training. Can you point out the reason?

Best Wishes

Training produces model generating blank segmentations

Hi, thanks for the work implementing the model and training script.

I'm attempting to train on optical flow RGB image data with binary segmentation masks (where 0=background and 1=foreground). However, no matter my choice of hyperparameters or number of iterations (20k/40k/80k), the loss generally steadily decreases but the resulting model predicts all background pixels for input images at test time.

I've confirmed that the pretrained model segments correctly, so there's something wrong in the training process. The weights are non-zero but argmax always seems to choose class 0. Do you have any idea what might be wrong?

I'm using a GeForce GTX 1080 and Python 2.7 and am not encountering any memory or other such errors during training.

Can I train gray images using this code?

I trained RGB images before, which have 3 channels.
Now I want to train gray images that have only one channel.
How should I modify this code?
Thank you very much!

RuntimeError: cuda runtime error (59) : device-side assert triggered

Hello! Thanks for sharing!
When I run the train.py, I encounter a problem like that:

warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.")
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [129,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [832,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [833,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [834,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [866,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [867,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [868,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [900,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [902,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [800,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [813,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [700,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [701,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [123,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [125,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [127,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [768,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [769,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [770,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [777,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [798,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [799,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [968,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [970,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [971,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [980,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [934,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [935,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [936,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [950,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [731,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [732,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [734,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [735,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [48,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [49,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [55,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [58,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1002,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1003,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1004,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1013,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1014,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1015,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [736,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [737,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [738,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [739,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [740,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [741,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [765,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [766,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [83,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [89,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [91,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [92,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [93,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [94,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:40: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [95,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/generic/THCTensorCopy.c line=18 error=59 : device-side assert triggered
Traceback (most recent call last):
File "/home/user/cltdevelop/Code/Competition/Cancer/Code/Pytorch_Version/pytorch-deeplab-resnet/train.py", line 229, in
loss = loss + loss_calc(out[i+1], label[i+1],gpu0)
File "/home/user/cltdevelop/Code/Competition/Cancer/Code/Pytorch_Version/pytorch-deeplab-resnet/train.py", line 131, in loss_calc
label = Variable(label).cuda(gpu0)
File "/home/user/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 279, in cuda
return CudaTransfer.apply(self, device_id, async)
File "/home/user/anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 149, in forward
return i.cuda(device_id, async=async)
File "/home/user/anaconda2/lib/python2.7/site-packages/torch/_utils.py", line 66, in cuda
return new_type(self.size()).copy(self, async)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/generic/THCTensorCopy.c:18

Can you give me some advice? Thank you!

Why is loss for multiple scales calculated in a strange way?

I would like to know why you scaled the output of scale 0.75 to the same size as that of output of scale 1 (unscaled, 1/8 of the input size). At the same time, output of scale 0.5 is not scaled. The relevant piece of code is in deeplab_resnet.py in class MS_Deeplab:

def forward(self,x):
    input_size = x.size()[2]
    self.interp1 = nn.UpsamplingBilinear2d(size = (  int(input_size*0.75)+1,  int(input_size*0.75)+1  ))
    self.interp2 = nn.UpsamplingBilinear2d(size = (  int(input_size*0.5)+1,   int(input_size*0.5)+1   ))
    self.interp3 = nn.UpsamplingBilinear2d(size = (  outS(input_size),   outS(input_size)   ))
    out = []
    x2 = self.interp1(x)
    x3 = self.interp2(x)
    out.append(self.Scale(x))	# for original scale
    ####################################################
    out.append(self.interp3(self.Scale(x2)))	# for 0.75x scale
    out.append(self.Scale(x3))	# for 0.5x scale
    ####################################################

    x2Out_interp = out[1]
    x3Out_interp = self.interp3(out[2])
    temp1 = torch.max(out[0],x2Out_interp)
    out.append(torch.max(temp1,x3Out_interp))
    return out

Thank you!

problem of train.py

In line 223 of train.py file. input and target should be both out[0] and label[0] if we want to use data after multi-scale fusion for training.

different with between original version

Notation. In original version, weights of three scale resnet are shared.

performance

Why is the result on the val set more than three points different from that published by the author? If I want to use it in my WSSS job，will it have a big impact？ Thanks！！

How to train pytorch_deeplab_resnet by using my own data?

Hello,
I have my own data, also can be converted to the format like VOC2012. But my data have only two classes such as the target and the background.
How to set the number of class?
Thanks

ASPP or LargeFOV? Should be 76.35%.

I have a question about the performance. "This is in comparision to 75.54% that is acheived by using train_iter_20000.caffemodel released by authors, which can be replicated by running this file . The .pth model converted from .caffemodel using the first section also gives 75.54% mean IOU. " But when I checked the model file and the trained .pth file, I found that the model applied ASPP instead of LargeFOV, which means the model performance should have achieved 76.35% as depicted in the paper. Why the performance is 75.54?

Wrong Evaluation script

Nice work porting the model. I found that your evaluation code is wrong. You are evaluating on an image by image basis and summing up IoUs across the val. set. That is not how mean IoU is computed. It is accumulated over pixels. Refer to the FCN code here to see what I mean: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/score.py

If I change your evaluation to the correct eval. script, your trained model gets a mIoU of only 72.1%. Also, the deeplab resnet-101 model you had ported into torch only gives 75.4% as opposed to the original 76.3% from CAFFE. This might be due to the different preprocessing you do compared to the deeplab guys or might be due to small errors in porting the model.

It will be great if you can confirm this and put out a log in the README saying you are fixing your eval. script.

TypeError: 'float' object cannot be interpreted as an integer

Thanks for the amazing work. I am trying to use your model description for another segmentation problem.

But when I run

python train.py

this is the error log that I get.

Traceback (most recent call last):
File "train.py", line 219, in
images, label = get_data_from_chunk_v2(chunk)
File "train.py", line 113, in get_data_from_chunk_v2
labels = [resize_label_batch(gt,i) for i in [a,a,b,a]]
File "train.py", line 113, in
labels = [resize_label_batch(gt,i) for i in [a,a,b,a]]
File "train.py", line 65, in resize_label_batch
label_resized = np.zeros((size,size,1,label.shape[3]))
TypeError: 'float' object cannot be interpreted as an integer

Additional Info: I am running python 3.5

No Relu in the ClassifierModule

spatial sizes mismatch

The spatial sizes of the output of the model and the target labels aren't matching for some of the inputs/targets. Does anyone know how to fix this? Thanks a lot. Error message below

C:\Users\Aidan\Documents\DL_Practice\pytorch-deeplab-resnet-master>python train.py --lr 0.00025 --wtDecay 0.0005 --maxIter 20000 --GTpath data/gt --IMpath data/img --LISTpath data/list/train_aug.txt
{'--GTpath': 'data/gt',
'--IMpath': 'data/img',
'--LISTpath': 'data/list/train_aug.txt',
'--NoLabels': '21',
'--gpu0': '0',
'--help': False,
'--iterSize': '10',
'--lr': '0.00025',
'--maxIter': '20000',
'--wtDecay': '0.0005'}
C:\Users\Aidan\Anaconda3\lib\site-packages\torch\nn\modules\upsampling.py:221: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.
warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.")
C:\Users\Aidan\Anaconda3\lib\site-packages\torch\nn\modules\loss.py:198: UserWarning: NLLLoss2d has been deprecated. Please use NLLLoss instead as a drop-in replacement and see http://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss for more details.
warnings.warn("NLLLoss2d has been deprecated. "
train.py:132: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
out = m(out)
Traceback (most recent call last):
File "train.py", line 227, in
loss = loss + loss_calc(out[i+1],label[i+1],gpu0)
File "train.py", line 134, in loss_calc
return criterion(out,label)
File "C:\Users\Aidan\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Aidan\Anaconda3\lib\site-packages\torch\nn\modules\loss.py", line 193, in forward
self.ignore_index, self.reduce)
File "C:\Users\Aidan\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1334, in nll_loss
return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
RuntimeError: input and target batch or spatial sizes don't match: target [1 x 27 x 27], input [1 x 21 x 26 x 26] at c:\users\administrator\downloads\new-builder\win-wheel\pytorch\aten\src\thcunn\generic/SpatialClassNLLCriterion.cu:24

Bad mIOU tested with provided model

Hello! I follow the Readme to test the mIOU with given pretrained caffe model on VOC. However, I get unexpected Mean iou = 0.046470389101241724.
Do you have any suggestions about this problem?
By the way, why the image is only substracted by the mean value of each channel without normlization to range such as [0,1] or [-1,1] in the code?

fine tune on custom image

Hi @isht7 ,
Thanks for your great work, I really appreciate it. I want to ask you when I run train.py on my custom images, how should I to modify outS function? I don't quite understand why do we need outS function even after I read the comments above it and original paper of deeplab v2 (sorry for dumb me..), could you please give some some advice?

Unable to train on multiple gpus

Hi! I have two 1080ti nvidia gpus. I followed this tutorial : http://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html. However, I am getting the following error:

Traceback (most recent call last):
File "/home/omkar/pycharm-community-2017.3.4/helpers/pydev/pydev_run_in_console.py", line 53, in run_file
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/omkar/Documents/Omkar/PycharmProjects/Masktrack1/pytorch-deeplab-resnet-master/train_online_multiple_objs&gpus.py", line 325, in
lr=base_lr, momentum=0.9, weight_decay=weight_decay)
File "/home/omkar/anaconda3/envs/deeplab_resnet/lib/python2.7/site-packages/torch/optim/sgd.py", line 57, in init
super(SGD, self).init(params, defaults)
File "/home/omkar/anaconda3/envs/deeplab_resnet/lib/python2.7/site-packages/torch/optim/optimizer.py", line 39, in init
self.add_param_group(param_group)
File "/home/omkar/anaconda3/envs/deeplab_resnet/lib/python2.7/site-packages/torch/optim/optimizer.py", line 146, in add_param_group
param_group['params'] = list(params)
File "/home/omkar/Documents/Omkar/PycharmProjects/Masktrack1/pytorch-deeplab-resnet-master/train_online_multiple_objs&gpus.py", line 106, in get_1x_lr_params_NOscale
b.append(model.Scale.conv1)
File "/home/omkar/anaconda3/envs/deeplab_resnet/lib/python2.7/site-packages/torch/nn/modules/module.py", line 398, in getattr
type(self).name, name))
AttributeError: 'DataParallel' object has no attribute 'Scale'

The error is present because the scale attribute is being used in the 'get_1x_lr_params_NOscale' method. However, the network has been wrapped by the DataParallel class and hence the error. Could you suggest a solution for the problem? Thank you!

network stucture issue

I just go through the model in MS_Deeplab. In forward pass, the input image x has been pass through Resnet 4 times with different scales. But according to original paper, multi scale should happen after layer5 in resnet. May I know why implementation this way?

About image preprocessing

If I want to implement deeplabv2 in my own task，should the image preprocessing be same？like no padding，just resize

Why isn't the input to model normalized?

Hi! I observed that your input to your model is not normalized between 0 and 1 and even has values in the 140s. Is there any reason why you did this?

The size of the prediction

Thanks for sharing your code.

I am not sure it is an issue. I did a sanity check on the deeplab model:
input = Variable(torch.randn(1, 3, 512, 512)).cuda()
print(model(input).size())

The prediction is of such a size: torch.Size([1,21, 65, 65]. So how is a dense prediction of the same size of the input is generated? I understand atrous convolution can capture features on different scales, but atrous convolution does not do upsampling, am I right?

Inconsistency in memory consumption of Resnet-101 libraries

Hi,
Thank you @isht7 for writing this code. I am having problems in the memory used by the code. If I use a batch size of 1, the memory consumed is around 7-8 GB. I have only 1 GPU and hence I cannot increase the batch size further. However, when I used this library - https://github.com/speedinghzl/Pytorch-Deeplab which implements Deeplabv2 Resnet 101, the batch size can be increased to 10. Isn't this unusual? Could you tell me of any changes I need to make to your code so that I can increase batch size? My GPU has 11.1 GB of memory. Thanking you in anticipation.
Regards,
Omkar.

cuda runtime error

hi , i'm new and i have a very basic question.

I run python train.py but I get error like this:

Traceback (most recent call last):
  File "train.py", line 232, in <module>
    loss = loss + loss_calc(out[i+1],label[i+1],gpu0)
  File "train.py", line 134, in loss_calc
    label = Variable(label).cuda(gpu0)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 279, in cuda
    return CudaTransfer.apply(self, device_id, async)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/tensor.py", line 149, in forward
    return i.cuda(device_id, async=async)
  File "/usr/local/lib/python2.7/dist-packages/torch/_utils.py", line 66, in _cuda
    return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:18

I find this,
pytorch/pytorch#1010
but it doesn't help.

BatchNorm usage

Hi, the parameters of BatchNorm layer in resnet101 is fixed by here

But the running_mean and running_var is also need to be fix, so I think we need to set BatchNorm to eval mode, not just fix parameters (weight and bias)

Label Conversion, Training and inference

Hello,

The original VOC PNG labels are in RGB format, but it seems that we have to convert them to the single pallet 1D PNGs (0, 1, 2, ... 255). May I ask you what tool have you used to make this conversion?
in the training parameters you mentioned:
--LISTpath=<str> Input image number list file [default: data/list/train_aug.txt]
should this contain training images names or training labels names or both?
is there any code to make some visual inference?

where is the crfs implementation?

i am sorry but where can i find the crf implementation?

docopt

Hi to all!
I have never used docopt package before. To be able to execute the train.py file, which argument should I input to docstr variable? Im getting an error at "args = docopt(docstr, version='v0.1')"

The error I get is as belows:

  File "/media/sinem/LENOVO/deeplab-resnet-pytorch/train.py", line 41, in <module>
    args = docopt(docstr, version='v0.1')
  File "/usr/local/lib/python2.7/dist-packages/docopt.py", line 558, in docopt
    DocoptExit.usage = printable_usage(doc)
  File "/usr/local/lib/python2.7/dist-packages/docopt.py", line 468, in printable_usage
    raise DocoptLanguageError('"usage:" (case-insensitive) not found.')
docopt.DocoptLanguageError: "usage:" (case-insensitive) not found.

Do you have a suggestion on how i can solve this?

I installed the latest version: docopt-0.6.2.tar.gz

Cheers,
sinem.

UnsamplingBilinear2d throwing error

Hello
I am using anaconda environment to execute your train code.
But I am getting below error:
/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:180: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.
warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.")
Traceback (most recent call last):
File "train.py", line 222, in
out = model(images)
File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/abhash/Documents/Abhash/MG/deeplab-resnet/pytorch-deeplab-resnet/deeplab_resnet.py", line 198, in forward
out.append(self.interp3(self.Scale(x2))) # for 0.75x scale
File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py", line 181, in forward
return super(UpsamplingBilinear2d, self).forward(input)
File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py", line 79, in forward
return F.upsample(input, self.size, self.scale_factor, self.mode)
File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1375, in upsample
return _functions.thnn.UpsamplingBilinear2d.apply(input, _pair(size), scale_factor)
File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/thnn/upsampling.py", line 277, in forward
ctx.output_size[1],
TypeError: CudaSpatialUpSamplingBilinear_updateOutput received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, torch.cuda.FloatTensor, float, float), but expected (int state, torch.cuda.FloatTensor input, torch.cuda.FloatTensor output, int outputHeight, int outputWidth)

Also, I am new to pytorch, so let pardon me if I am missing any obvious thing here.
P.S.: anaconda3, python3.6 with GPU
CUDA:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Random cropping during preprocessing in train.py

Thanks a lot for your repository. While reading through train.py I noticed that PASCAL VOC images are being resized to 321x321 in this line.
In the original paper they perform a random cropping on img and gt mask. Here is an example from the corresponding Tensorflow code. There mask and image are concatenated and then cropped simultaneously.

error: out of memory

Dear all,
I tried to train the model with VOC2012, but had error: out of memory. Following is the output message,

train.py --lr 0.00025 --wtDecay 0.0005 --gpu0 0 --maxIter 20000 --GTpath /home/hlc/Data/VOCdevkit/VOC2012/SegmentationClassAug --IMpath /home/hlc/Data/VOCdevkit/VOC2012/JPEGImages --LISTpath data/list/train_aug.txt
{'--GTpath': '/home/hlc/Data/VOCdevkit/VOC2012/SegmentationClassAug',
 '--IMpath': '/home/hlc/Data/VOCdevkit/VOC2012/JPEGImages',
 '--LISTpath': 'data/list/train_aug.txt',
 '--gpu0': '0',
 '--help': False,
 '--iterSize': '10',
 '--lr': '0.00025',
 '--maxIter': '20000',
 '--wtDecay': '0.0005'}
('iter = ', 0, 'of', 20000, 'completed, loss = ', array([ 2.40648198], dtype=float32))
('(poly lr policy) learning rate', 0.00025)
('iter = ', 1, 'of', 20000, 'completed, loss = ', array([ 1.26656163], dtype=float32))
('iter = ', 2, 'of', 20000, 'completed, loss = ', array([ 0.74460578], dtype=float32))
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "/home/hlc/codes/PycharmProjects/pytorch-deeplab-resnet/train.py", line 229, in <module>
    out = model(images)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hlc/codes/PycharmProjects/pytorch-deeplab-resnet/deeplab_resnet.py", line 201, in forward
    out.append(self.Scale3(x3))	# for 0.5x scale
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hlc/codes/PycharmProjects/pytorch-deeplab-resnet/deeplab_resnet.py", line 178, in forward
    x = self.layer3(x)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.py", line 64, in forward
    input = module(input)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hlc/codes/PycharmProjects/pytorch-deeplab-resnet/deeplab_resnet.py", line 89, in forward
    out = self.conv2(out)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py", line 237, in forward
    self.padding, self.dilation, self.groups)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 40, in conv2d
    return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

Process finished with exit code 1
`

The memory of my GPU is 8 GB, and the batchsize used in the code is 1.
Any idea to this error? Thanks for your help.

Training memory

@isht7 Thanks for sharing the code. I am wondering is there a simple way to reduce the memory required for training, as it is just a bit of more than the card (1080ti, 11G) I have.

model.load_state_dict(saved_state_dict) error

Excuse me , when i run 'python train.py',a mistake happened as follow:

File "train.py", line 222, in
model.load_state_dict(saved_state_dict)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 331, in load_state_dict
.format(name))
KeyError: 'unexpected key "Scale.conv1.weight" in state_dict'

I use the coco pretrained model 'MS_DeepLab_resnet_pretrained_COCO_init.pth' to fine tune by voc , hope for response , Thank you !

Issue while evaluating trained model

RuntimeError: sizes do not match at /b/wheel/pytorch-src/torch/lib/THC/THCTensorCopy.cu:31

I have finetuning this model to train on my custom dataset of images. The groundtruth has only two labels [0 and 255]. However when I test my image using the eval2.py script, I get the following error:

{'--gpu0': '0',
'--help': False,
'--snapPrefix': 'VOC12_scenes_',
'--testGTpath': '/mnt/VTSRAID01/SAMI_EXPERIMENTS/Segmentation/DataForAnalytics/GIS_ALL_IMAGES/BinaryResizedGroundtruthPng/',
'--testIMpath': '/mnt/VTSRAID01/SAMI_EXPERIMENTS/Segmentation/DataForAnalytics/GIS_ALL_IMAGES/ResizedOriginalImages/',
'--visualize': True}
VOC12_scenes_
Traceback (most recent call last):
File "evalpyt2.py", line 87, in
model.load_state_dict(saved_state_dict)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 335, in load_state_dict
own_state[name].copy_(param)
RuntimeError: sizes do not match at /b/wheel/pytorch-src/torch/lib/THC/THCTensorCopy.cu:31

I have crosschecked and the sizes of my input image and the groundtruth do match. I am not sure what is causing this error.

Any help would be much appreciated.

error in convert_deeplab_resnet.py

Dear all,
I used convert_deeplab_resnet.py to convert caffemodel, but following error. I don't know what's going on.
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 24:16: Message type "caffe.LayerParameter" has no field named "interp_param". WARNING: Logging before InitGoogleLogging() is written to STDERR F0514 16:39:28.816993 17207 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: data/test.prototxt *** Check failure stack trace: ***
I have caffe installed, but I'm not sure whether it's a correct version.
Thank you for your attention!

I'm not familiar with caffe. Would anyone like to share the pytorch model file (.pth) converted from caffemodel, thank you!

Could you give more details? thanks

isht7 / pytorch-deeplab-resnet Goto Github PK

pytorch-deeplab-resnet's People

Contributors

Stargazers

Watchers

Forkers

pytorch-deeplab-resnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs