hrnet / hrnet-semantic-segmentation Goto Github PK

The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919

License: Other

Python 89.26% C++ 5.18% Cuda 5.22% Shell 0.33%

segmentation semantic-segmentation cityscapes pascal-context lip high-resolution high-resolution-net hrnets transformer segmentation-transformer

hrnet-semantic-segmentation's People

Contributors

Stargazers

Watchers

Forkers

tu-8tu hust-wayne kleinxin github2016hdu william-zhan sunke123 zxr1314 yangsenwxy lightningsoon leo-xxx powder21 980380446 freeah peterzhousz xuecaihu deisler134 jtpils lphxx6222712 jimmytyt sunshinezhihuo xingjici sybil12 playai hankkung buryang mqchen1993 k383556 belye ieee820 xiaohuihui52309 peakge fireae jarygrace yuquant eugene-twj ruofeidu fighterzzzh lonestar686 mahlermozart zhulei2016 lonely-geese shannongxn githubfragments artechstark wangqinglhc weihuang527 louisnust xxyqsy xiaoketongxue robot000 macos fendaq m-pasek thekeviv liuwenhaha pzw520125 ytzhao wamiq-reyaz wanglixilinx yyfyan irfanicmll peterzs egrass ml-lab zhoumaomin liushuaicare wwwzhangshenzecn jhuxiang happog xiaolaodi deppmeng boyuezhong helloimrobert presageboat windson87 mathpopo zymale sujinjang jy1023408440 zealoe itsme3d gcy190811 monteyang wohaiyo addflex wuhuikai michaeltu1 dontlovebugs bearcatt weiliansong fishlikeapple xiaoyufenfei morawi dltkhacene caoliangjie justin0111 tuananh1007 hushunbo chenmingthu natumeyuzuru

hrnet-semantic-segmentation's Issues

Training got stuck while no log in cmd line

Trainning job got stuck when epochs greater than 150(epochs>150),config is followed the default seeting in yaml file and model trained by 4V100(branch pytorch1.1).I run about 4 times and this problem seems appear every time(python -m torch.distributed.launch --nproc_per_node=4 tools/train.py --cfg experiments/cityscapes/seg_hrnet_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml.)

training job1,job2,job3,job4 stoped at 154 187 167 167 epochs(No error message,just got stuck)

Graph vs code differences

I think that the graph and the code are not in sync.
In the graph seems that the yellow branch maintain the original resolution (there is no stride symbol on the yellow branch) but instead the yellow branch have stride on the first two yellow blocks so the output is 1/4 and so orange is 1/8.
Then you need to up-sample in inference.

https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/lib/models/seg_hrnet.py#L267-L271

The download link for the cityscape smodel cannot be opened.Can you check it for me? Thank you.

Inconsistent size between output and target

Hi, congrats on your great work.

I was trying to experimenting with your proposed code on a dataset other than cityscape, where I set the input image shape to be 512x512. But I see that with your default settings for the network, the output has a shape of 128x128, so do I have to add the code for upsampling manually based on your implementation?
I might me dumb somehow, but I don't see where to adjust the output shape.

Regards.

subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

Thanks for your appealing work, but I encountered a problem when having a try on training your code. Here is the error informations:
`
Frame skipped from debugging during step-in.
Note: may have been skipped because of "justMyCode" option (default == true).
F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py:184: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
Traceback (most recent call last):
File "c:\Users\msi-pc.vscode\extensions\ms-python.python-2019.8.29288\pythonFiles\ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "c:\Users\msi-pc.vscode\extensions\ms-python.python-2019.8.29288\pythonFiles\lib\python\ptvsd_main_.py", line 432, in main
run()
File "c:\Users\msi-pc.vscode\extensions\ms-python.python-2019.8.29288\pythonFiles\lib\python\ptvsd_main_.py", line 316, in run_file
runpy.run_path(target, run_name='main')
File "F:\anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "F:\anaconda3\lib\runpy.py", line 96, in run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "F:\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools\train.py", line 27, in
import models
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models_init.py", line 11, in
import models.seg_hrnet
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models\seg_hrnet.py", line 22, in
from .sync_bn.inplace_abn.bn import InPlaceABNSync
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models\sync_bn_init.py", line 1, in
from .inplace_abn import bn
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models\sync_bn\inplace_abn_init_.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models\sync_bn\inplace_abn\bn.py", line 14, in
from functions import *
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\lib\models\sync_bn\inplace_abn\functions.py", line 16, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 644, in load
is_python_module)
File "F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 813, in _jit_compile
with_cuda=with_cuda)
File "F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 862, in _write_ninja_file_and_build
with_cuda=with_cuda)
File "F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 1072, in _write_ninja_file
'cl']).decode().split('\r\n')
File "F:\anaconda3\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "F:\anaconda3\lib\subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

`
How can I fix this problem? ( My pytorch version is 1.1.0, cuda is 9.0) Looking forward to your reply.

The order of image_size is incorrect?

HRNet-Semantic-Segmentation/lib/config/default.py

Line 58 in c65ebfb

_C.TRAIN.IMAGE_SIZE = [1024, 512] # width * height

It seems that the order should be height * width when looking at the code at lib/datasets/base_dataset.py.

questions about LIP dataset labels.

Hi, I didn't find the 'train_segmentations_reversed' labels of the LIP datasets.
Is this your pre-processed part?
How can I get this?
Thanks

transform the model into ScriptModules

when i transform the hrnet model into ScriptModules using the command "traced_script_module=torch.jit.trace(kp_model,example)
traced_script_module.save("hrnet_model.pt")" ,the error "assert(isinstance(orig, torch.nn.Module)) AssertionError" occur .i find it is caused by the
,any suggestion

May I ask how you compute the 'class_weight' of dataset Class Cityscapes?

Hi, I notice there is a member variable in class Cityscapes. I wonder how you figure out the result?

have you train pascal voc 2012 dataset?

I had train the pascal voc 2012 dataset,but The result of training is 67.23, the training loss is decreaseing, but the valing loss unchange,Can you tell me your result?thank you.

The 2nd, 3rd, 4th stages contain 1, 4,3 multi-resolution blocks, respectively

The 2nd, 3rd, 4th stages contain 1, 4,3 multi-resolution blocks, respectively
How to understand this?

How to use Mapillary dataset for Cityscapes Benchmark?

Hi. I have noticed that HRNetV2 + OCR achieves high performance in Cityscapaes leaderboard with external Mapillary dataset. Can you share your advice about how to use Mapillary?

Did you pretrain your model on Mapillary and finetune on Cityscapes? Or just mix Mapillary and Cityscapes? How do you handle the inconsistent number of categories?

It would be great if you can share your ideas! Thanks!

Inference time

Could you help share the inference time of this model?

wrong loss value logged in the pytorch 1.1 version

As in lib/core/function.py

HRNet-Semantic-Segmentation/lib/core/function.py

Line 35 in 06142dc

dist.reduce(reduced_inp, dst=0)

you use the default 'sum' op to reduce the tensors across different processes,

    reduced_loss = reduce_tensor(loss)

    ave_loss.update(reduced_loss.item())

here, the loss value logged is the sum over different processes, but it should be averaged

ImportError: No module named 'inplace_abn'

Even though I downloaded inplace_abn, I cannot import it.
Is there an additional step to make ? Is this because inplace_abn switched to pytorch 1.0 ?
Thanks

Ninja related error during training

Hello, I am very happy to see your code, I try to run the training, as you said, execute python tools/train.py --cfg experiments/cityscapes/seg_hrnet_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml
And I installed ninja using the pip install ninja method.

Traceback (most recent call last):
  File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 700, in verify_ninja_availability
    subprocess.check_call('ninja --version'.split(), stdout=devnull)
  File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ninja': 'ninja'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train.py", line 27, in <module>
    import models
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/__init__.py", line 11, in <module>
    import models.seg_hrnet
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/seg_hrnet.py", line 22, in <module>
    from .sync_bn.inplace_abn.bn import InPlaceABNSync
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/__init__.py", line 1, in <module>
    from .inplace_abn import bn
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/__init__.py", line 1, in <module>
    from .bn import ABN, InPlaceABN, InPlaceABNSync
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in <module>
    from functions import *
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/functions.py", line 16, in <module>
    extra_cuda_cflags=["--expt-extended-lambda"])
  File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 514, in load
    with_cuda=with_cuda)
  File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 656, in _jit_compile
    verify_ninja_availability()
  File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 702, in verify_ninja_availability
    raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions

Is this an installation issue or something else? Can you tell me more? Thank you.

请问您有用coco数据集做预训练？

我看了deeplabv3+，他用了coco数据集做预训练，请问你有用这个数据集做预训练吗？效果能提高多少呢？

May I ask, I want to train PASCAL, how should I run? I didn't change anything, but I made a mistake in running, thank you

run for my own data

Hi,I love the work you have done.
How would we run your LIP pre-trained models on my own set of videos or images to get the output of human Segmentation?

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317 terminate called after throwing an instance of 'at::Error'

I meet an error and I really know how to solve this error! Help!!!!! Someone say,"May be your labels are out of n". But my labels is from 0 to n-1! And I need your help! Thanks!

/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [574,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered
Traceback (most recent call last):
File "/home/cartur/HRNet-Semantic-Segmentation/tools/train.py", line 251, in
main()
File "/home/cartur/HRNet-Semantic-Segmentation/tools/train.py", line 220, in main
trainloader, optimizer, model, writer_dict)
File "/home/cartur/HRNet-Semantic-Segmentation/tools/../lib/core/function.py", line 46, in train
loss = ### losses.mean()#
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317
terminate called after throwing an instance of 'at::Error'
what(): CUDA error: invalid device pointer (CudaCachingDeleter at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCCachingAllocator.cpp:498)
frame #0: THStorage_free + 0x44 (0x7fd7638cf314 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: THTensor_free + 0x2f (0x7fd76396ea1f in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #2: at::CUDAFloatTensor::~CUDAFloatTensor() + 0x9 (0x7fd7404d2a59 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x5d (0x7fd7656d1e7d in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #4: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #5: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #6: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #7: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #9: std::_Sp_counted_deleter<torch::autograd::PyFunction, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #11: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #12: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #13: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #14: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #15: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #16: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x73 (0x7fd7656d1e93 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #17: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #18: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #19: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #20: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #21: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #22: std::_Sp_counted_deleter<torch::autograd::PyFunction, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #23: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #24: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #25: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #26: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #27: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #28: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #29: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x73 (0x7fd7656d1e93 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #30: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #31: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #32: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #33: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #34: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #35: std::_Sp_counted_deleter<torch::autograd::PyFunction*, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #36: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #37: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #38: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #39: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #40: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #41: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #42: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #43: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #44: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #45: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #46: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #47: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #48: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #49: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #50: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #51: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #52: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #53: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #54: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #55: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #56: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #57: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #58: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #59: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #60: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #61: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #62: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #63: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

Error when exporting the model to ONNX

I get following error when trying to export the model to ONNX format once the pretrained model is loaded, by adding torch.onnx.export(...) command. Do you know what might be the cause of this?

Thanks, Nikola

Message=Failed to export an ONNX attribute, since it's not constant, please try to make things (e.g., kernel size) static if possible
Source=
StackTrace:
File "E:\git\hrnet-image-classification\tools\valid.py", line 100, in main
torch.onnx.export(model, dump_input, r'e:\hrnetv2_w18_imagenet_pretrained.onnx', verbose=True)
File "E:\git\hrnet-image-classification\tools\valid.py", line 136, in
main()

Not necessary to save checkpoint every epoch

Really wasteful it is to save checkpoint every epoch

There seems to be no exchange/fuse in the transition layer according to the code

I drew the architecture of HRNet according to the code.
Architecture of High Resolution Net (HRNet).pdf
Not 100% confident that I am right but there seems to be no exchange/fuse across stages in the transition layers. The new branch is only generated from the closest branch, not fused with all previous branches. Can you take a look at that?
https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/lib/models/seg_hrnet.py#L332-L345
In addition, for segmentation tasks, as mentioned in #2 by some other developer, I was wondering how do you match the output resolution to original image since the output resolution is 1/4 in both width and height? Do you upsample the output or upsample all the final feature maps?
Thank you! Impressive Work!

Regarding to the problem related to ninja...

Dear guys,

I also meet some issue about the ninja... Here is my understanding:

This project uses JIT coding style, which requires ninja building system.
Solution 1. To install set up the system, there are two ways.
- apt-get install ninja-build. The cuda version in the system has to match the one used in conda env
- conda install ninja or pip install ninja: does not work for me.
Solution 2 that I am using. To avoid ninja, write in the "ahead of time" is one possible solution.
- Create a new file setup.py under models/hrnet/sync_bn/inplace_abn
- Install the inplace_abn module by python setup.py install
- Modify models/hrnet/sync_bn/inplace_abn/functions.py, import the module as _backend

About training epoches on custom data

I am training on my data, training is 7k. About 40 epoches, my val mIoU is only about 0.37 and some class IoU is 0 . In your paper , training is 2975 and about 484 epoches. I wonder if need the same epoches or if there's a problem with my data.

stuck during training

I download the pretrained_models and modified GPU setting from (0,1,2,3) to (0,)
but the training process stuck at here

Total Parameters: 65,773,843

Total Multiply Adds (For Convolution and Linear Layers only): 174.0439453125 GFLOPs

Number of Layers
Conv2d : 307 layers InPlaceABNSync : 306 layers ReLU : 269 layers Bottleneck : 4 layers BasicBlock : 104 layers HighResolutionModule : 8 layers`

any idea about how this happened?

InPlaceABNSync error

hi ,the InPlaceABNSync seems not working, the training process was stuck by https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/lib/models/seg_hrnet.py#L269

only if I change InPlaceABNSync into nn.BatchNorm2d, then I can train without Obstruction.

Cannot run the code, ninja error

I followed your instructions and when running tools/test.py the following error is thrown :

File "/home/travail/jules/anaconda3/envs/HRNet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 759, in _build_extension_module
    ['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory)
  File "/home/travail/jules/anaconda3/envs/HRNet/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/home/travail/jules/anaconda3/envs/HRNet/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

Do you have more details on how to run your code ?
Thanks

Training on Custom Data

Hi , Thanks for sharing your work, Could you please post an some guidelines/steps to train on custom data?

Thanks

Why do you have to make the network so complicated and so difficult to run?

Why do you have to make the network so complicated and so difficult to run?Why do you have to make the network so complicated and so difficult to run? This network is difficult to run alone, and there are various problems.

RuntimeError: Ninja is required to load C++ extensions

您好，首先我出现这样的问题：
RuntimeError: Ninja is required to load C++ extensions
然后我pip install ninja成功以后
又出现这样的问题：
/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py:118: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 4.9 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
Traceback (most recent call last):
File "tools/train.py", line 27, in
import models
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/init.py", line 11, in
import models.seg_hrnet
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/seg_hrnet.py", line 22, in
from .sync_bn.inplace_abn.bn import InPlaceABNSync
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/init.py", line 1, in
from .inplace_abn import bn
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in
from functions import *
File "/data/HRNet-Semantic-Segmentation-master/lib/models/sync_bn/inplace_abn/functions.py", line 16, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 514, in load
with_cuda=with_cuda)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 690, in _jit_compile
return _import_module_from_library(name, build_directory)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 773, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /tmp/torch_extensions/inplace_abn/inplace_abn.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationESs

请问这个BN和pytorch是要同步编译吗？我的pytorch==0.4.1

I have some doubts about the training results.

Which yaml file is used to train it? and get the resule mIou:81.1

model	Train Set	Test Set	#Params	GFLOPs	OHEM	Multi-scale	Flip	mIoU
HRNetV2-W48	Train	Val	65.8M	696.2	No	No	No	81.1

Mistaken reference

Hi,
I've noticed that the result of Deeplab on the LIP dataset comes from CE2P and CE2P
cite the original value from JPPNet.
https://arxiv.org/pdf/1804.01984.pdf
According to the references of this paper, it actually used the DeepLabV2 instead of DeepLabV3+ and you may need to correct this mistake?

What's the meaning of each predicted label?

Thanks for your excellent work.
I am not familiar with semantic segmentation. Here I just want to use the semantic labels as the input of our research. We use your pretrained model hrnet_w48_pascal_context_cls59_480x480 to predict our results. But I can't figure out the meaning of each predicted label. Take the following picture as an example,

The predicted label of the bicycle is 17. However, as the declared label-to-name mapping on PASCAL website, 17 represents sheep which is totally wrong.
So what's wrong with my results?

P.S.: The predicted labels are generated using following function,
preds = np.asarray(np.argmax(preds, axis=1), dtype=np.uint8)

InPlaceABNSync error in torch=0.4.0, and inplace_abn.so error in torch=0.4.1

hi ,the InPlaceABNSync seems not working, when I use pytorch=0.4.0 , the testing process was stuck in the bn layer (functools.partial(InPlaceABNSync, activation='none')):
https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/lib/models/seg_hrnet.py#L269
When I use pytorch=0.4.1, the bug is different as follows:

File "tools/test.py", line 25, in
import models
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/init.py", line 11, in
import models.seg_hrnet
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/seg_hrnet.py", line 22, in
from .sync_bn.inplace_abn.bn import InPlaceABNSync
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/init.py", line 1, in
from .inplace_abn import bn
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in
from functions import *
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/functions.py", line 16, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 514, in load
with_cuda=with_cuda)
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 690, in _jit_compile
return _import_module_from_library(name, build_directory)
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 773, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: /tmp/torch_extensions/inplace_abn/inplace_abn.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationESs

How to fix this bug?

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

ninja is already installed, however, the error is still occured.
/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py:166: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

platform=sys.platform))
Traceback (most recent call last):
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 949, in _build_extension_module
check=True)
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tools/train.py", line 27, in
import models
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/init.py", line 11, in
import models.seg_hrnet
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/seg_hrnet.py", line 22, in
from .sync_bn.inplace_abn.bn import InPlaceABNSync
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/init.py", line 1, in
from .inplace_abn import bn
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in
from functions import *
File "/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/functions.py", line 16, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in load
is_python_module)
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 813, in jit_compile
with_cuda=with_cuda)
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 866, in write_ninja_file_and_build
build_extension_module(name, build_directory, verbose)
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 962, in build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'inplace_abn': b'[1/4] c++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o\nFAILED: inplace_abn_cpu.o \nc++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp: In function \xe2\x80\x98std::vectorat::Tensor backward_cpu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, float)\xe2\x80\x99:\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:82:41: error: could not convert \xe2\x80\x98z.at::Tensor::type()\xe2\x80\x99 from \xe2\x80\x98at::DeprecatedTypeProperties\xe2\x80\x99 to \xe2\x80\x98c10::IntArrayRef {aka c10::ArrayRef}\xe2\x80\x99\n auto dweight = at::empty(z.type(), {0});\n ^\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:83:39: error: could not convert \xe2\x80\x98z.at::Tensor::type()\xe2\x80\x99 from \xe2\x80\x98at::DeprecatedTypeProperties\xe2\x80\x99 to \xe2\x80\x98c10::IntArrayRef {aka c10::ArrayRef}\xe2\x80\x99\n auto dbias = at::empty(z.type(), {0});\n ^\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:89:29: error: could not convert \xe2\x80\x98{dx, dweight, dbias}\xe2\x80\x99 from \xe2\x80\x98\xe2\x80\x99 to \xe2\x80\x98std::vectorat::Tensor\xe2\x80\x99\n return {dx, dweight, dbias};\n ^\n[2/4] /cluster/apps/cuda/10.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o\nFAILED: inplace_abn_cuda.cuda.o \n/cluster/apps/cuda/10.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(99): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(99): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(100): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(100): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(202): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(202): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(203): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(203): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n8 errors detected in the compilation of "/tmp/tmpxft_0002e7bc_00000000-6_inplace_abn_cuda.cpp1.ii".\n[3/4] c++ -MMD -MF inplace_abn.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn.cpp -o inplace_abn.o\nIn file included from /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn.cpp:1:0:\n/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]\n #warning \\n ^\nninja: build stopped: subcommand failed.\n'

Unable to reproduce `seg_hrnet_w18_small_v1`

Thanks for 27488d4, the configuration file is very helpful. With that said, training on 4 GPUs as prescribed, I'm unable to reproduce Cityscapes validation accuracy of 70.3% (attained 65.21%) https://github.com/HRNet/HRNet-Semantic-Segmentation#small-models.

Is https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/experiments/cityscapes/seg_hrnet_w18_small_v1_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml verbatim the file used to produce 70.3% or does it need further hyperparameter tuning? (I'm on the pytorch-v1.1 branch.)

In case it's helpful (although I'm sure this isn't informative), here are the cIoUs for the w18-v1 retrained model:

Loss: 0.179, MeanIU:  0.6509, Best_mIoU:  0.6521
[0.97245895 0.79921705 0.8969752  0.43651182 0.47062117 0.56336364
 0.57983322 0.68906234 0.91533262 0.60986547 0.93415257 0.74804671
 0.46804914 0.91671634 0.4241423  0.58802203 0.24108752 0.41514963
 0.69802723]

Why use abn instead of bn?

Is it because abn has a big help in saving memory? Will it improve performance?

A question about multi_scale_output

In the file /lib/models/seg_hrnet.py, the 389th line indicates that multi_scale_output is used last module. However, the 390th line and the 391 line means that when multi_scale_output=False and i is the index of the last module, multi_scale_output is set False. So , i am confused with the condition for setting reset_multi_scale_output True.

how does this model pretrain on imagenet?

LIP Dataset Performance

Hi Ke,
Really good work and idea for the HRNet. I was trying to reproduce the performance on LIP dataset from your experiment yaml file. But only achieve 50.59% for the best mIoU.

saving checkpoint to output/lip/seg_hrnet_w48_473x473_sgd_lr7e-3_wd5e-4_bs_40_epoch150checkpoint.pth.tar
Loss: 0.543, MeanIU: 0.5059, Best_mIoU: 0.5059
[0.86811489 0.63133359 0.6837422 0.38433764 0.29877409 0.66269771
0.1957649 0.54022205 0.45206418 0.74160168 0.26058858 0.21910118
0.20592365 0.7124526 0.57165922 0.61438857 0.55989074 0.55301222
0.47372399 0.4895043 ]

The only thing I changed is that I reinstalled the sync-bn (https://github.com/mapillary/inplace_abn) using pytorch 1.0. Will there be any possible reasons for the gap?

RuntimeError: weight tensor should be defined either for all or no classes

I met an error and I really don't know why!! Help!!
return torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: weight tensor should be defined either for all or no classes at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:27

Could you give some reference about the training time? For me, the training time for cityscapes is about 50h using 8gpu with BS16.

using 'Pytorch-v1.1' verison code

Why is `nn.ReLU(inplace=False)` set for most activations?

Hi, thanks for the work. I notice that in the backbone's code, the nn.ReLU layers are with inplace=False, which differ from the implementation of deep-high-resolution-pose-estimation and other HRNet codes, where inplace are set to True.
Is this for specific reasons? Thanks.

hrnet / hrnet-semantic-segmentation Goto Github PK

hrnet-semantic-segmentation's People

Contributors

Stargazers

Watchers

Forkers

hrnet-semantic-segmentation's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs