GithubHelp home page GithubHelp logo

xialipku / emanet Goto Github PK

View Code? Open in Web Editor NEW
679.0 679.0 130.0 186 KB

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

Home Page: https://xialipku.github.io/publication/expectation-maximization-attention-networks-for-semantic-segmentation/

License: GNU General Public License v3.0

Python 99.35% Shell 0.65%

emanet's People

Contributors

xialipku avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emanet's Issues

why does EMANet suffer from vanishing /exploding gradient even though T_train (=3) is small?

Hello,

I would like to ask the authors why does EMAnet suffer from the vanishing / exploding gradient inherent in RNNs even though the EM iterations are unrolled only for a small number (in this case 3) of steps? Vanilla RNNs with with tanh non-linearities can typically work on sequences on the order of 100 time steps, and LSTMs can work on sequences on the order of 1000 time steps.

Since the mIOU peaks at a very small value of T_train, is vanishing / exploding gradients really the reason that the mIOU deteriorates for higher values of T_train (>3)? Have the authors by any chance printed the gradient norms of every layer to check for vanishing or exploding gradients?

Thank you in advance.

How long does it take to train EMANet with a Resnet-101 backbone?

Hello,

Thank you for publishing the code to your excellent work.
I was wondering how long it takes to train the EMANet with a Resnet-101 backbone - both for when the number of input channels is 256 and 512? How many GPUs did you use to achieve this training time?

Thank you in advance :)

Fail to reproduce your result: EMANet(512)80.05%?

Hi @XiaLiPKU ! This work is wonderful and thanks so much for releasing the code.

May I ask a question? I used your pretrained model to evaluate on val set and got 80.50% mIoU using single-scale test, but when I trained this model from scratch, I can only get 79.44% finally, which is supposed to be 80.05%.

I just followed your default settings(using pretrained ResNet weights, batch size 16,4 gpus, 30k iterations and so on...).

Are there any other techniques special you adopted to get this final model?

Looking forward to your reply!

RuntimeError: self.net.module.load_state_dict(obj['net'])

using the pretrained model throw out this error:

RuntimeError: Error(s) in loading state_dict for EMANet:
Missing key(s) in state_dict: "extractor.4.0.conv1.weight", "extractor.4.0.bn1.weight", "extractor.4.0.bn1.bias", "extractor.4.0.bn1.running_mean", "extractor.4.0.bn1.running_var", "extractor.4.0.conv2.weight", ......
Unexpected key(s) in state_dict: "layer1.0.0.conv1.weight", "layer1.0.0.bn1.weight", "layer1.0.0.b

someone can help?

Fail to reproduce your results on COCO-STUFF

Hi @XiaLiPKU ! Your work is amazing, and i am appreciate that you have released your code.
May i ask question? Based on your code, i have modified your code to suit COCO-STUFF training. But i can only get 34.55% miou. I just followed your default setting, but in single gpu. ( pretrained ResNet-101, batch size 3, 30k iterations and so on...).

Looking forward to your reply!
Best wishes

EMA Normalization

In 6.2.1, it says that 'From the right part of Fig. 3, it is clear to see that LN is better than no normalization'. But in the right part of Fig. 3, it seems that no normalization is better than LN. Is the figure wrong?
image

The difference between the code and the original paper.

Hi, thank you for releasing the code for EMANet. I find a difference between the code and the paper. The difference lies in the formulation of Equation 13 (in the paper). In the paper, the M step (bases reconstruct) is formulated as follows:
image
However, in the code, the M step is formulated as:
mu = torch.bmm(x, z_)
Actually, mu = torch.bmm(x, z_) is the weighted summation of X. However, Equation 13 (in the paper) is not the weighted summation of X. Anything wrong in the paper?

can't reproduce the ablation study results in figure 3, 4

Hello,

I can't seem to be able to reproduce the ablation study results in figure 3, 4 of the ICCV paper. When trained and evaluated on an iteration number of 3 (T_train = T_eval = 3), my final mIOU is 76.04%, which is 2.48% much less than the result shown in figure 4 (78.52%).

I used the default settings in settings.py except the following:

  • N_LAYERS = 50 (experiment done on Resnet-50)
  • STRIDE = 16 (for training) and 8 (for evaluation) as stated in sec. 6.1
  • BATCH_SIZE = 12
  • DEVICE = 0
  • DEVICES = list(range(0, 1))
  • NUM_WORKERS = 12

Furthermore, my Pillow version is 6.1.0 and my cv2 version is 3.4.2, unlike the version used by the authors.

Is it possible that using a single GPU to train EMANet results in such a significant decrease in the mIOU (possible due to the use of synchronized batchnorm?) or could using a different version Pillow / cv2 be the root cause of this problem?

Thanks in advance :)

分割结果边缘呈锯齿状

在使用EMANet跑VOC数据集和自己的数据集,最后的分割边缘都有明显的锯齿状,请问你们的结果也是这样的吗?

The training process was stopped

Dear XiaLiPKU,
I clone your codes and followed the step to train in my servers, but there were just three lines:

2020-03-13 21:29:17,166 - INFO - set log dir as ./logdir
2020-03-13 21:29:17,166 - INFO - set model dir as ./models
2020-03-13 21:29:19,127 - ERROR - No checkpoint ./models/latest.pth!>

I know that error will not influence my training process. but there were no models saved in the ./models and when I run "sh tensorboard.sh", there was nothing. It seems that the training process was stopped. I just replace obj.cuda(async=True) with obj.cuda(non_blocking=True), then I didn't change any codes. Could you help me?

Thanks!

Questions about parameters and FLOPs in Tab. 1.

You note "All results are achieved with the backbone ResNet-101 with output stride 8". Therefore, why the parameters and FLOPs of EMANet are substantially less than the backbone (ResNet-101)? Taking EMANet512 as an example, it contains 10M parameters and 43.1G FLOPs. However, the backbone (ResNet-101) network totally contains 42.6M parameters and 190.6G FLOPs. Are there some errors in this place?

question about bn layer

i am puzzled with the bn layer,in your code ,u did not use torch.nn.batchnorm2d ,What's the difference between the torch.bn with synchronizedbn2d

Can't train the model?

(base) davis@davis-MS-7B17:~/Network/EMANet-master$ python train.py
2019-08-31 13:50:14,703 - INFO - set log dir as ./logdir
2019-08-31 13:50:14,703 - INFO - set model dir as ./models
2019-08-31 13:50:17,131 - ERROR - No checkpoint ./models/latest.pth!

The Training step is stopped, so I have to Keyboard Interrupt it...
Does anybody know how to solve it?

About the BN!!!

I add other block to replace EMAU, but get some warning. I guess it's bn_lib you used not suitable for my block.

`2020-07-30 20:05:26,727 - INFO - step: 1 loss: 2.429 lr: 0.009

WARNING batched routines are designed for small sizes. It might be better to use the
Native/Hybrid classical routines if you want good performance.

=========================================================================================
WARNING batched routines are designed for small sizes. It might be better to use the
Native/Hybrid classical routines if you want good performance.

=========================================================================================
WARNING batched routines are designed for small sizes. It might be better to use the
Native/Hybrid classical routines if you want good performance.

=========================================================================================
WARNING batched routines are designed for small sizes. It might be better to use the
Native/Hybrid classical routines if you want good performance.

2020-07-30 20:05:29,586 - INFO - step: 2 loss: 2.398 lr: 0.009`

ResNet18 pretrained model

Hi,

Thanks for providing the pre-trained ResNet50 and ResNet101 models.
Do you have the pre-trained ResNet18 model that replaces the first 7x7 Conv to three 3x3 Conv?
I have surfed it for a long time but unfortunately, I didn't find it. If you have saved this model, could you please share it with me?
Many thanks in advance.

FLOPs

How to compute the FLOPs and parameters of your EMA module?
Could you please share the computing details? Thanks!

No latest.pth!

HI, Thanks for the great repo. I can not get latest.pth in training. What should I do? 
error:
ERROR - No checkpoint ./models/latest.pth!
Thank you.

selection of K

In my opinion, besides T, the selection of K is also important (like in GMM or k-means). I didn't see any ablation study on the effect of different K's, did you do some experiments?

Intuitively, I have the impression that mu represents different features for different classes, so the first K I would try is the number of classes (e.g. 19 for Cityscapes). Can you explain how you decide to use K=64?

As the visualization of responsibility shows, different z's tend to represent different classes, so won't it happen that having K>number of class makes some z's be actually close to each other, making them eventually redundant?

Thanks.

Why use padding for ValDataset?

Hey, I found the ValDataset used padding for both image and label.

#image, label = pad_inf(image, label)

def pad_inf(image, label=None):
    h, w = image.size()[-2:] 
    stride = settings.STRIDE
    pad_h = (stride + 1 - h % stride) % stride
    pad_w = (stride + 1 - w % stride) % stride
    if pad_h > 0 or pad_w > 0:
        image = F.pad(image, (0, pad_w, 0, pad_h), mode='constant', value=0.)
        if label is not None:
            label = F.pad(label, (0, pad_w, 0, pad_h), mode='constant', 
                          value=settings.IGNORE_LABEL)
    return image, label

Can you show the reasons for doing so?

about voc12

hi, I submitted the results of the val set and the test set to the official website for testing, but the two results differ by four points. How can I reduce this gap.

could run in one gpu

(base) pf@pf-System-Product-Name:~/EMANet$ python train.py
2019-12-06 21:37:49,527 - INFO - set log dir as ./logdir
2019-12-06 21:37:49,528 - INFO - set model dir as ./models
Traceback (most recent call last):
File "train.py", line 181, in
main()
File "train.py", line 146, in main
sess = Session(dt_split='trainaug')
File "train.py", line 93, in init
self.net = DataParallel(self.net, device_ids=settings.DEVICES)
File "/home/pf/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 131, in init
_check_balance(self.device_ids)
File "/home/pf/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 18, in _check_balance
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/home/pf/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 18, in
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/home/pf/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py", line 301, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id

is this a bug?

in train.py line 134 and 135
self.net.module.ema.mu *= momentum self.net.module.ema.mu += mu * (1 - momentum)

it maybe like this
self.net.module.emau.mu *= momentum self.net.module.emau.mu += mu * (1 - momentum)

I had some trouble,could you help me?

Thanks for your reply!!!
According to your ground truth,I made the ground truth of my dataset .But during the training, there was a problem,which I've compiled below. Emmmm, Can you help me? Maybe my dataset is too messy, and their boundaries are not obvious.What advice would you offer to me?

RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:564)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f5345247441 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f5345246d7a in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x13652 (0x7f534261a652 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x50 (0x7f5345237ce0 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #4: + 0x30facb (0x7f52f071aacb in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #5: + 0x376d60 (0x7f52f0781d60 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #6: + 0x3128ea (0x7f52f071d8ea in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #7: torch::autograd::deleteFunction(torch::autograd::Function*) + 0xa2 (0x7f52f071d9a2 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #8: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0xa2 (0x7f5330b81bb2 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: + 0x14216b (0x7f5330ba516b in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #10: + 0x1421d9 (0x7f5330ba51d9 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #11: torch::autograd::Variable::Impl::release_resources() + 0x1b (0x7f52f0d5708b in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #12: + 0x1420bb (0x7f5330ba50bb in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #13: + 0x3c30f4 (0x7f5330e260f4 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #14: + 0x3c3141 (0x7f5330e26141 in /home/r/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #15: + 0x19aa5e (0x55791a64ba5e in /home/r/.conda/envs/pytorch/bin/python3)
frame #16: + 0xf1b77 (0x55791a5a2b77 in /home/r/.conda/envs/pytorch/bin/python3)
frame #17: + 0xf1a07 (0x55791a5a2a07 in /home/r/.conda/envs/pytorch/bin/python3)
frame #18: + 0xf1a1d (0x55791a5a2a1d in /home/r/.conda/envs/pytorch/bin/python3)
frame #19: + 0xf1a1d (0x55791a5a2a1d in /home/r/.conda/envs/pytorch/bin/python3)
frame #20: PyDict_SetItem + 0x3da (0x55791a5e963a in /home/r/.conda/envs/pytorch/bin/python3)
frame #21: PyDict_SetItemString + 0x4f (0x55791a5f065f in /home/r/.conda/envs/pytorch/bin/python3)
frame #22: PyImport_Cleanup + 0x99 (0x55791a655d89 in /home/r/.conda/envs/pytorch/bin/python3)
frame #23: Py_FinalizeEx + 0x61 (0x55791a6c0231 in /home/r/.conda/envs/pytorch/bin/python3)
frame #24: Py_Main + 0x35e (0x55791a6ca57e in /home/r/.conda/envs/pytorch/bin/python3)
frame #25: main + 0xee (0x55791a59488e in /home/r/.conda/envs/pytorch/bin/python3)
frame #26: __libc_start_main + 0xf0 (0x7f5348fdd830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #27: + 0x1c3160 (0x55791a674160 in /home/r/.conda/envs/pytorch/bin/python3)

Output stride

Hi, first of all thanks for your paper.
You mention that for some nets the stride is 16 while for other 8. However, there is nothing on how do you recover it back to the original size. Do you use bi-linear upsampling? If yes, don't have a problem with borders and fine structures for using such a steep upsampling method?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.