GithubHelp home page GithubHelp logo

hrnet / hrnet-semantic-segmentation Goto Github PK

View Code? Open in Web Editor NEW
3.1K 56.0 681.0 1.84 MB

The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919

License: Other

Python 89.26% C++ 5.18% Cuda 5.22% Shell 0.33%
segmentation semantic-segmentation cityscapes pascal-context lip high-resolution high-resolution-net hrnets transformer segmentation-transformer

hrnet-semantic-segmentation's People

Contributors

hsfzxjy avatar pkurainbow avatar sunke123 avatar welleast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hrnet-semantic-segmentation's Issues

Training got stuck while no log in cmd line

Trainning job got stuck when epochs greater than 150(epochs>150),config is followed the default seeting in yaml file and model trained by 4V100(branch pytorch1.1).I run about 4 times and this problem seems appear every time(python -m torch.distributed.launch --nproc_per_node=4 tools/train.py --cfg experiments/cityscapes/seg_hrnet_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml.)

training job1,job2,job3,job4 stoped at 154 187 167 167 epochs(No error message,just got stuck)

Inconsistent size between output and target

Hi, congrats on your great work.

I was trying to experimenting with your proposed code on a dataset other than cityscape, where I set the input image shape to be 512x512. But I see that with your default settings for the network, the output has a shape of 128x128, so do I have to add the code for upsampling manually based on your implementation?
I might me dumb somehow, but I don't see where to adjust the output shape.

Regards.

subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

Thanks for your appealing work, but I encountered a problem when having a try on training your code. Here is the error informations:
`
Frame skipped from debugging during step-in.
Note: may have been skipped because of "justMyCode" option (default == true).
F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py:184: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
Traceback (most recent call last):
File "c:\Users\msi-pc.vscode\extensions\ms-python.python-2019.8.29288\pythonFiles\ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "c:\Users\msi-pc.vscode\extensions\ms-python.python-2019.8.29288\pythonFiles\lib\python\ptvsd_main_.py", line 432, in main
run()
File "c:\Users\msi-pc.vscode\extensions\ms-python.python-2019.8.29288\pythonFiles\lib\python\ptvsd_main_.py", line 316, in run_file
runpy.run_path(target, run_name='main')
File "F:\anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "F:\anaconda3\lib\runpy.py", line 96, in run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "F:\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools\train.py", line 27, in
import models
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models_init
.py", line 11, in
import models.seg_hrnet
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models\seg_hrnet.py", line 22, in
from .sync_bn.inplace_abn.bn import InPlaceABNSync
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models\sync_bn_init
.py", line 1, in
from .inplace_abn import bn
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models\sync_bn\inplace_abn_init_.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\tools..\lib\models\sync_bn\inplace_abn\bn.py", line 14, in
from functions import *
File "f:\缩小版备份\研究生\19年暑假\HRNet-Semantic-Segmentation-master\lib\models\sync_bn\inplace_abn\functions.py", line 16, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 644, in load
is_python_module)
File "F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 813, in _jit_compile
with_cuda=with_cuda)
File "F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 862, in _write_ninja_file_and_build
with_cuda=with_cuda)
File "F:\anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 1072, in _write_ninja_file
'cl']).decode().split('\r\n')
File "F:\anaconda3\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "F:\anaconda3\lib\subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

`
How can I fix this problem? ( My pytorch version is 1.1.0, cuda is 9.0) Looking forward to your reply.

questions about LIP dataset labels.

Hi, I didn't find the 'train_segmentations_reversed' labels of the LIP datasets.
Is this your pre-processed part?
How can I get this?
Thanks

transform the model into ScriptModules

when i transform the hrnet model into ScriptModules using the command "traced_script_module=torch.jit.trace(kp_model,example)
traced_script_module.save("hrnet_model.pt")" ,the error "assert(isinstance(orig, torch.nn.Module)) AssertionError" occur .i find it is caused by the
84,any suggestion

have you train pascal voc 2012 dataset?

I had train the pascal voc 2012 dataset,but The result of training is 67.23, the training loss is decreaseing, but the valing loss unchange,Can you tell me your result?thank you.

How to use Mapillary dataset for Cityscapes Benchmark?

Hi. I have noticed that HRNetV2 + OCR achieves high performance in Cityscapaes leaderboard with external Mapillary dataset. Can you share your advice about how to use Mapillary?

Did you pretrain your model on Mapillary and finetune on Cityscapes? Or just mix Mapillary and Cityscapes? How do you handle the inconsistent number of categories?

It would be great if you can share your ideas! Thanks!

Inference time

Could you help share the inference time of this model?

Ninja related error during training

Hello, I am very happy to see your code, I try to run the training, as you said, execute python tools/train.py --cfg experiments/cityscapes/seg_hrnet_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml
And I installed ninja using the pip install ninja method.

Traceback (most recent call last):
  File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 700, in verify_ninja_availability
    subprocess.check_call('ninja --version'.split(), stdout=devnull)
  File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ninja': 'ninja'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train.py", line 27, in <module>
    import models
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/__init__.py", line 11, in <module>
    import models.seg_hrnet
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/seg_hrnet.py", line 22, in <module>
    from .sync_bn.inplace_abn.bn import InPlaceABNSync
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/__init__.py", line 1, in <module>
    from .inplace_abn import bn
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/__init__.py", line 1, in <module>
    from .bn import ABN, InPlaceABN, InPlaceABNSync
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in <module>
    from functions import *
  File "/scratch2/hzhou/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/functions.py", line 16, in <module>
    extra_cuda_cflags=["--expt-extended-lambda"])
  File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 514, in load
    with_cuda=with_cuda)
  File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 656, in _jit_compile
    verify_ninja_availability()
  File "/atlas/home/hzhou/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 702, in verify_ninja_availability
    raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions

Is this an installation issue or something else? Can you tell me more? Thank you.

run for my own data

Hi,I love the work you have done.
How would we run your LIP pre-trained models on my own set of videos or images to get the output of human Segmentation?

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317 terminate called after throwing an instance of 'at::Error'

I meet an error and I really know how to solve this error! Help!!!!! Someone say,"May be your labels are out of n". But my labels is from 0 to n-1! And I need your help! Thanks!

/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [574,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered
Traceback (most recent call last):
File "/home/cartur/HRNet-Semantic-Segmentation/tools/train.py", line 251, in
main()
File "/home/cartur/HRNet-Semantic-Segmentation/tools/train.py", line 220, in main
trainloader, optimizer, model, writer_dict)
File "/home/cartur/HRNet-Semantic-Segmentation/tools/../lib/core/function.py", line 46, in train
loss = ### losses.mean()#

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317
terminate called after throwing an instance of 'at::Error'
what(): CUDA error: invalid device pointer (CudaCachingDeleter at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCCachingAllocator.cpp:498)
frame #0: THStorage_free + 0x44 (0x7fd7638cf314 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: THTensor_free + 0x2f (0x7fd76396ea1f in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #2: at::CUDAFloatTensor::~CUDAFloatTensor() + 0x9 (0x7fd7404d2a59 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x5d (0x7fd7656d1e7d in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #4: torch::autograd::deleteFunction(torch::autograd::Function
) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #5: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #6: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #7: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #9: std::_Sp_counted_deleter<torch::autograd::PyFunction
, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #11: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #12: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #13: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #14: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #15: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #16: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x73 (0x7fd7656d1e93 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #17: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #18: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #19: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #20: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #21: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #22: std::_Sp_counted_deleter<torch::autograd::PyFunction, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #23: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #24: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #25: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #26: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #27: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #28: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #29: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x73 (0x7fd7656d1e93 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #30: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #31: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #32: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #33: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #34: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #35: std::_Sp_counted_deleter<torch::autograd::PyFunction*, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #36: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #37: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #38: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #39: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #40: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #41: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #42: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #43: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #44: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #45: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #46: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #47: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #48: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #49: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #50: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #51: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #52: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #53: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #54: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #55: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #56: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #57: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #58: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #59: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #60: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #61: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #62: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #63: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

Error when exporting the model to ONNX

I get following error when trying to export the model to ONNX format once the pretrained model is loaded, by adding torch.onnx.export(...) command. Do you know what might be the cause of this?

Thanks, Nikola

Message=Failed to export an ONNX attribute, since it's not constant, please try to make things (e.g., kernel size) static if possible
Source=
StackTrace:
File "E:\git\hrnet-image-classification\tools\valid.py", line 100, in main
torch.onnx.export(model, dump_input, r'e:\hrnetv2_w18_imagenet_pretrained.onnx', verbose=True)
File "E:\git\hrnet-image-classification\tools\valid.py", line 136, in
main()

There seems to be no exchange/fuse in the transition layer according to the code

I drew the architecture of HRNet according to the code.
Architecture of High Resolution Net (HRNet).pdf
Not 100% confident that I am right but there seems to be no exchange/fuse across stages in the transition layers. The new branch is only generated from the closest branch, not fused with all previous branches. Can you take a look at that?
https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/lib/models/seg_hrnet.py#L332-L345
In addition, for segmentation tasks, as mentioned in #2 by some other developer, I was wondering how do you match the output resolution to original image since the output resolution is 1/4 in both width and height? Do you upsample the output or upsample all the final feature maps?
Thank you! Impressive Work!

Regarding to the problem related to ninja...

Dear guys,

I also meet some issue about the ninja... Here is my understanding:

  • This project uses JIT coding style, which requires ninja building system.

  • Solution 1. To install set up the system, there are two ways.

    • apt-get install ninja-build. The cuda version in the system has to match the one used in conda env
    • conda install ninja or pip install ninja: does not work for me.
  • Solution 2 that I am using. To avoid ninja, write in the "ahead of time" is one possible solution.

    • Create a new file setup.py under models/hrnet/sync_bn/inplace_abn
    • Install the inplace_abn module by python setup.py install
    • Modify models/hrnet/sync_bn/inplace_abn/functions.py, import the module as _backend

About training epoches on custom data

I am training on my data, training is 7k. About 40 epoches, my val mIoU is only about 0.37 and some class IoU is 0 . In your paper , training is 2975 and about 484 epoches. I wonder if need the same epoches or if there's a problem with my data.

stuck during training

I download the pretrained_models and modified GPU setting from (0,1,2,3) to (0,)
but the training process stuck at here

Total Parameters: 65,773,843


Total Multiply Adds (For Convolution and Linear Layers only): 174.0439453125 GFLOPs


Number of Layers
Conv2d : 307 layers InPlaceABNSync : 306 layers ReLU : 269 layers Bottleneck : 4 layers BasicBlock : 104 layers HighResolutionModule : 8 layers`

any idea about how this happened?

Cannot run the code, ninja error

I followed your instructions and when running tools/test.py the following error is thrown :

File "/home/travail/jules/anaconda3/envs/HRNet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 759, in _build_extension_module
    ['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory)
  File "/home/travail/jules/anaconda3/envs/HRNet/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/home/travail/jules/anaconda3/envs/HRNet/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

Do you have more details on how to run your code ?
Thanks

Training on Custom Data

Hi , Thanks for sharing your work, Could you please post an some guidelines/steps to train on custom data?

Thanks

RuntimeError: Ninja is required to load C++ extensions

您好,首先我出现这样的问题:
RuntimeError: Ninja is required to load C++ extensions
然后我pip install ninja成功以后
又出现这样的问题:
/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py:118: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 4.9 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
Traceback (most recent call last):
File "tools/train.py", line 27, in
import models
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/init.py", line 11, in
import models.seg_hrnet
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/seg_hrnet.py", line 22, in
from .sync_bn.inplace_abn.bn import InPlaceABNSync
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/init.py", line 1, in
from .inplace_abn import bn
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in
from functions import *
File "/data/HRNet-Semantic-Segmentation-master/lib/models/sync_bn/inplace_abn/functions.py", line 16, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 514, in load
with_cuda=with_cuda)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 690, in _jit_compile
return _import_module_from_library(name, build_directory)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 773, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /tmp/torch_extensions/inplace_abn/inplace_abn.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationESs

请问这个BN和pytorch是要同步编译吗?我的pytorch==0.4.1

Mistaken reference

Hi,
I've noticed that the result of Deeplab on the LIP dataset comes from CE2P and CE2P
cite the original value from JPPNet.
https://arxiv.org/pdf/1804.01984.pdf
According to the references of this paper, it actually used the DeepLabV2 instead of DeepLabV3+ and you may need to correct this mistake?

What's the meaning of each predicted label?

Thanks for your excellent work.
I am not familiar with semantic segmentation. Here I just want to use the semantic labels as the input of our research. We use your pretrained model hrnet_w48_pascal_context_cls59_480x480 to predict our results. But I can't figure out the meaning of each predicted label. Take the following picture as an example,
00000
The predicted label of the bicycle is 17. However, as the declared label-to-name mapping on PASCAL website, 17 represents sheep which is totally wrong.
So what's wrong with my results?

P.S.: The predicted labels are generated using following function,
preds = np.asarray(np.argmax(preds, axis=1), dtype=np.uint8)

InPlaceABNSync error in torch=0.4.0, and inplace_abn.so error in torch=0.4.1

hi ,the InPlaceABNSync seems not working, when I use pytorch=0.4.0 , the testing process was stuck in the bn layer (functools.partial(InPlaceABNSync, activation='none')):
https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/lib/models/seg_hrnet.py#L269
When I use pytorch=0.4.1, the bug is different as follows:

File "tools/test.py", line 25, in
import models
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/init.py", line 11, in
import models.seg_hrnet
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/seg_hrnet.py", line 22, in
from .sync_bn.inplace_abn.bn import InPlaceABNSync
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/init.py", line 1, in
from .inplace_abn import bn
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in
from functions import *
File "/home/fuyi02/vos/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/functions.py", line 16, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 514, in load
with_cuda=with_cuda)
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 690, in _jit_compile
return _import_module_from_library(name, build_directory)
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 773, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/fuyi02/anaconda3/envs/HRNet/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: /tmp/torch_extensions/inplace_abn/inplace_abn.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationESs

How to fix this bug?

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

ninja is already installed, however, the error is still occured.
/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py:166: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

platform=sys.platform))
Traceback (most recent call last):
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 949, in _build_extension_module
check=True)
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tools/train.py", line 27, in
import models
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/init.py", line 11, in
import models.seg_hrnet
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/seg_hrnet.py", line 22, in
from .sync_bn.inplace_abn.bn import InPlaceABNSync
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/init.py", line 1, in
from .inplace_abn import bn
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in
from functions import *
File "/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/functions.py", line 16, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in load
is_python_module)
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 813, in jit_compile
with_cuda=with_cuda)
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 866, in write_ninja_file_and_build
build_extension_module(name, build_directory, verbose)
File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 962, in build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'inplace_abn': b'[1/4] c++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o\nFAILED: inplace_abn_cpu.o \nc++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp: In function \xe2\x80\x98std::vectorat::Tensor backward_cpu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, float)\xe2\x80\x99:\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:82:41: error: could not convert \xe2\x80\x98z.at::Tensor::type()\xe2\x80\x99 from \xe2\x80\x98at::DeprecatedTypeProperties\xe2\x80\x99 to \xe2\x80\x98c10::IntArrayRef {aka c10::ArrayRef}\xe2\x80\x99\n auto dweight = at::empty(z.type(), {0});\n ^\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:83:39: error: could not convert \xe2\x80\x98z.at::Tensor::type()\xe2\x80\x99 from \xe2\x80\x98at::DeprecatedTypeProperties\xe2\x80\x99 to \xe2\x80\x98c10::IntArrayRef {aka c10::ArrayRef}\xe2\x80\x99\n auto dbias = at::empty(z.type(), {0});\n ^\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:89:29: error: could not convert \xe2\x80\x98{dx, dweight, dbias}\xe2\x80\x99 from \xe2\x80\x98\xe2\x80\x99 to \xe2\x80\x98std::vectorat::Tensor\xe2\x80\x99\n return {dx, dweight, dbias};\n ^\n[2/4] /cluster/apps/cuda/10.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o\nFAILED: inplace_abn_cuda.cuda.o \n/cluster/apps/cuda/10.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(99): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(99): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(100): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(100): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(202): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(202): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(203): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(203): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n8 errors detected in the compilation of "/tmp/tmpxft_0002e7bc_00000000-6_inplace_abn_cuda.cpp1.ii".\n[3/4] c++ -MMD -MF inplace_abn.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn.cpp -o inplace_abn.o\nIn file included from /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn.cpp:1:0:\n/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]\n #warning \\n ^\nninja: build stopped: subcommand failed.\n'

Unable to reproduce `seg_hrnet_w18_small_v1`

Thanks for 27488d4, the configuration file is very helpful. With that said, training on 4 GPUs as prescribed, I'm unable to reproduce Cityscapes validation accuracy of 70.3% (attained 65.21%) https://github.com/HRNet/HRNet-Semantic-Segmentation#small-models.

Is https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/experiments/cityscapes/seg_hrnet_w18_small_v1_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml verbatim the file used to produce 70.3% or does it need further hyperparameter tuning? (I'm on the pytorch-v1.1 branch.)

In case it's helpful (although I'm sure this isn't informative), here are the cIoUs for the w18-v1 retrained model:

Loss: 0.179, MeanIU:  0.6509, Best_mIoU:  0.6521
[0.97245895 0.79921705 0.8969752  0.43651182 0.47062117 0.56336364
 0.57983322 0.68906234 0.91533262 0.60986547 0.93415257 0.74804671
 0.46804914 0.91671634 0.4241423  0.58802203 0.24108752 0.41514963
 0.69802723]

A question about multi_scale_output

In the file /lib/models/seg_hrnet.py, the 389th line indicates that multi_scale_output is used last module. However, the 390th line and the 391 line means that when multi_scale_output=False and i is the index of the last module, multi_scale_output is set False. So , i am confused with the condition for setting reset_multi_scale_output True.

LIP Dataset Performance

Hi Ke,
Really good work and idea for the HRNet. I was trying to reproduce the performance on LIP dataset from your experiment yaml file. But only achieve 50.59% for the best mIoU.

saving checkpoint to output/lip/seg_hrnet_w48_473x473_sgd_lr7e-3_wd5e-4_bs_40_epoch150checkpoint.pth.tar
Loss: 0.543, MeanIU: 0.5059, Best_mIoU: 0.5059
[0.86811489 0.63133359 0.6837422 0.38433764 0.29877409 0.66269771
0.1957649 0.54022205 0.45206418 0.74160168 0.26058858 0.21910118
0.20592365 0.7124526 0.57165922 0.61438857 0.55989074 0.55301222
0.47372399 0.4895043 ]

The only thing I changed is that I reinstalled the sync-bn (https://github.com/mapillary/inplace_abn) using pytorch 1.0. Will there be any possible reasons for the gap?

RuntimeError: weight tensor should be defined either for all or no classes

I met an error and I really don't know why!! Help!!
return torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: weight tensor should be defined either for all or no classes at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:27

Why is `nn.ReLU(inplace=False)` set for most activations?

Hi, thanks for the work. I notice that in the backbone's code, the nn.ReLU layers are with inplace=False, which differ from the implementation of deep-high-resolution-pose-estimation and other HRNet codes, where inplace are set to True.
Is this for specific reasons? Thanks.

some problem

good job, now I meet some problem when I read the code:
1.why use the: loss = losses.mean
2.the bn_monent=0.01, but the pytorch default use 0.1.
3.I find the relu(inplace=False), but most network, like resnet, use inplace=True

need your help

@sunke123
Thank you very much for your work.
I see your code has a greet performance in cityscapes. Could you please show us those result files which you submit to the sityscapes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.