GithubHelp home page GithubHelp logo

repoptimizers's People

Contributors

dingxiaoh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

repoptimizers's Issues

repopt移植到u2netp上ptq量化损失较大

我比较了u2netp,u2netp-repconv,u2netp-repopt在自训练数据集上的分割精度,miou分别是0,9159,0.9169,0.9170,但是进行ptq-uint8量化后,原生结构的量化损失较小,而repconv和repopt均存在较大的量化损失,repopt量化损失很大正常吗?

文中的 PTQ 结果怎么测的?

@DingXiaoH 大佬你好,RepOptimizer 文中给出了RepVGG-B1的 PTQ精度为54.5左右,想请教下在怎样的设置下测试到的。我本地PTQ测试,B1的PTQ精度会直接掉到 10一下。目前有些工作,需要对齐一下你的结果,求具体的PTQ配置🙏

RepOpt-VGG-A0-target精度达不到repvggA0?

请问一下博主,使用RepOpt-VGG-A0-hs在cifar100上进行超参搜索训练的精度只有56左右,使用搜索后的超参在imagenet上,使用RepOpt-VGG-A0-target得到精度只有70多,低了repvggA0 2个点,这是为什么?

RepOPT in Yolov6: training in the hyper search (hs) mode appears SINE-LIKE mAP curve

Thank you for reading.

In this experiment, I proposed to train yolov6s using the repopt method on the DOTA dataset. According to the official document, I firstly trained the model in hs mode to search the optimal hyper-parameters of optimizer, but found the weird val/mAP curves like a sine function. As seen in the figure, the orange curve refers to the yolov6s model trained after 400 epochs, the blue one is yolov6s in hs mode after 250 training epochs, and the red one represents the yolov6s trained in hs mode after 400 epochs.

As far as I know, in the hs mode, the Scales (hyper-parameters) are trained just as normal parameters together with other model parameters, everything should run like the orange curve. But what made the hs-mode mAP curves wave like sine function?
248711483-c7e245d8-9803-4aa8-a8c4-3b7320c3f150

对于B1以下的网络效果是否同样有效?

您好,我看到您的工作中只实现了B1以上的大模型精度对齐,想问下B1以下的小模型是否也能保持这种精度对齐呢?因为在下游任务中很少会用到这么大的模型作为backbone。

训练出错

数据使用的是torchvision.dataset.imagenet这个接口,但是训练时报错
Traceback (most recent call last):
File "main_repopt.py", line 461, in
main(config)
File "main_repopt.py", line 199, in main
train_one_epoch(config, model, criterion, data_loader_train, optimizer, epoch, mixup_fn, lr_scheduler, model_ema=model_ema)
File "main_repopt.py", line 298, in train_one_epoch
loss.backward()
File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 264, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/init.py", line 153, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
terminate called after throwing an instance of 'c10::Error'
what(): NCCL error in: /opt/pytorch/pytorch/torch/csrc/distributed/c10d/NCCLUtils.hpp:161, unhandled cuda error, NCCL version 21.1.4
ncclUnhandledCudaError: Call to CUDA function failed.
Exception raised from ncclCommAbort at /opt/pytorch/pytorch/torch/csrc/distributed/c10d/NCCLUtils.hpp:161 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6c (0x7f37de87663c in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xfa (0x7f37de841a28 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: + 0x3c1e92e (0x7f361ae5892e in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0xac (0x7f361ae393fc in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0xd (0x7f361ae395cd in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0x10f3211 (0x7f366cc99211 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x1105810 (0x7f366ccab810 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: + 0xa71082 (0x7f366c617082 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: + 0xa72043 (0x7f366c618043 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: + 0xf8b98 (0x56188c6e6b98 in /opt/conda/bin/python3)
frame #10: + 0xfa78b (0x56188c6e878b in /opt/conda/bin/python3)
frame #11: + 0xf8b4f (0x56188c6e6b4f in /opt/conda/bin/python3)
frame #12: + 0x1ef516 (0x56188c7dd516 in /opt/conda/bin/python3)
frame #13: + 0x11c574 (0x56188c70a574 in /opt/conda/bin/python3)
frame #14: _PyGC_CollectNoFail + 0x2b (0x56188c8435db in /opt/conda/bin/python3)
frame #15: PyImport_Cleanup + 0x371 (0x56188c85d7b1 in /opt/conda/bin/python3)
frame #16: Py_FinalizeEx + 0x7a (0x56188c85da9a in /opt/conda/bin/python3)
frame #17: Py_RunMain + 0x1b8 (0x56188c8625c8 in /opt/conda/bin/python3)
frame #18: Py_BytesMain + 0x39 (0x56188c862939 in /opt/conda/bin/python3)
frame #19: __libc_start_main + 0xf3 (0x7f37f39470b3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #20: + 0x1e8f39 (0x56188c7d6f39 in /opt/conda/bin/python3)

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 8744) of binary: /opt/conda/bin/python3
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in
main()
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 187, in main
launch(args)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 173, in launch
run(args)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 688, in run
elastic_launch(
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:


         main_repopt.py FAILED

================================================
Root Cause:
[0]:
time: 2022-07-20_08:16:08
rank: 0 (local_rank: 0)
exitcode: -6 (pid: 8744)
error_file: <N/A>
msg: "Signal 6 (SIGABRT) received by PID 8744"

Other Failures:
<NO_OTHER_FAILURES>


不知道这个错误的原因是什么?希望大佬们帮我分析一下

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.