GithubHelp home page GithubHelp logo

open-mmlab / mmengine Goto Github PK

View Code? Open in Web Editor NEW
1.1K 1.1K 329.0 3.93 MB

OpenMMLab Foundational Library for Training Deep Learning Models

Home Page: https://mmengine.readthedocs.io/

License: Apache License 2.0

Python 99.90% Dockerfile 0.10%
ai computer-vision deep-learning machine-learning python pytorch

mmengine's People

Contributors

c1rn09 avatar dai-wenxun avatar enkilee avatar fanqino1 avatar gt9505 avatar haochenye avatar harold-lkk avatar hhaandroid avatar hit-cwh avatar ice-tong avatar imabackstabber avatar jbwang1997 avatar ly015 avatar lzhgrla avatar mambawong avatar mzr1996 avatar okotaku avatar plyfager avatar rangeking avatar rangilyu avatar sanbuphy avatar sjiang95 avatar teamwong111 avatar vansin avatar xiangxu-0103 avatar xin-li-67 avatar youkaichao avatar yuanliuuuuuu avatar zhouzaida avatar zwwwayne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmengine's Issues

CI codecov use `--source mmdet`

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

coverage run --branch --source mmdet -m pytest tests/

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

用户设置完了paramwise_cfg,如何知道是否符合预期,这个是否也要提供相应的脚本,用户运行后可以很容易的知道哪些参数被 frozen,不太参数组超参的不同。暂时没有时间开发的话,可以作为未来一个需求吧

用户设置完了,如何知道是否符合预期,这个是否也要提供相应的脚本,用户运行后可以很容易的知道哪些参数被 frozen,不太参数组超参的不同。暂时没有时间开发的话,可以作为未来一个需求吧

Originally posted by @hhaAndroid in #25 (comment)

Support auto-scaling LR in param_scheduler

Describe the feature

Motivation
It is quite common that users need to update LR based on their GPU numbers. A brief solution might be:
add an argument like default_batchsize somewhere, when start to initialize the param_scheduler, calculate the real batch_size then scale the LR based on their ratio. This enables different repos to set different default_batchsize for their own needs.

Related resources
See auto_scale_lr in mmdet

Additional context
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.

Fully support of different file clients in `BaseDataset`.

Describe the feature
We have already supported multiple file clients in BaseDataset, but some arguments are still not.
Especially the usage of the os.path package may cause incompatibility in different file clients, please check it.

Clear some `type: ignore` flags

Describe the feature

Motivation
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when [....].
Ex2. There is a recent paper [....], which is very helpful for [....].

Related resources
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.

Additional context
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.

Explain the priority design of meta info in code.

不行,如果这样,在lazy init=True情况,meta里的内容为用户传入meta(高优先级)与类属性 BaseDataset.META 字典(低优先级),之后调用full_init读取标注文件中的meta(中优先级),中优先级meta里的key不知道怎么覆盖高优先级与低优先级里的key

Originally posted by @GT9505 in #7 (comment)

Config 文档说明不同格式 config 之间功能性上的差异

YAML/JSON/PY 格式支持的 config 内容范围是不一样的,例如 JSON 格式中不支持 tuple,因此 PY config 中的 tuple 在dump 到 JSON 以后会变成 list。
除此之外,还有一些功能性接口是针对 python config 支持的,这种情况下,接口和文档应当予以说明。
在 API 文档或者 config 教程中,应当清晰地列出对不同格式 config 的支持程度,以及不同格式 config 的局限性/差异性。

Enable automatically loading latest checkpoint from ceph

Describe the feature

Motivation
Since the storage is limited, more and more users save their checkpoints in ceph and leaves no checkpoints in the local working directory. However, when resuming the job, the auto-resume function is only able to find the checkpoint in the local path and cannot automatically load the checkpoints saved in ceph.

To solve this issue, a naive description can be as below:
When saving the checkpoints during training, no matter where the checkpoint is saved, save last_checkpoint.txt in the local&ceph working directory indicating the real path of the lastest checkpoint (can be either local storage or ceph). When auto-resuming the checkpoint in training, read the file and load the checkpoint based on the file string. Thus, users can safely use auto-resume using the command like below

sh ./tools/slurm_train.sh $PATITION $CONFIG $WORK_DIR --auto-resume

Or users can manually resume the model in a unified way no matter where the latest checkpoint is saved like below:

sh ./tools/slurm_train.sh $PATITION $CONFIG $WORK_DIR --load-from $WORK_DIR/last_checkpoint --resume

The last_checkpoint.txt serves as a soft like of the latest checkpoint across platforms and works for any kind of storages.

Related resources
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.

Additional context
Detectron2 has similar design.

'Runner' object has no attribute 'log_buffer'

When I run fcos_r50_caffe_fpn_gn-head_1x_coco.py which has the setting of default_hooks = dict(optimizer=dict(type='OptimizerHook', grad_clip=dict(max_norm=35, norm_type=2))), the program reports an error as below.

File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/hooks/optimizer_hook.py", line 98, in after_train_iter
    getattr(hook, fn_name)(self, **kwargs)
  File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/hooks/optimizer_hook.py", line 98, in after_train_iter
    outputs=self.runner.outputs)
    getattr(hook, fn_name)(self, **kwargs)
  File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/hooks/optimizer_hook.py", line 98, in after_train_iter
    getattr(hook, fn_name)(self, **kwargs)
  File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/hooks/optimizer_hook.py", line 98, in after_train_iter
  File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/runner/runner.py", line 1304, in call_hook
    getattr(hook, fn_name)(self, **kwargs)
  File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/hooks/optimizer_hook.py", line 98, in after_train_iter
    getattr(hook, fn_name)(self, **kwargs)
  File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/hooks/optimizer_hook.py", line 98, in after_train_iter
    getattr(hook, fn_name)(self, **kwargs)
  File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/hooks/optimizer_hook.py", line 98, in after_train_iter
    getattr(hook, fn_name)(self, **kwargs)
  File "/mnt/cache/wangjiabao1.vendor/workspace/refactor/mmengine/mmengine/hooks/optimizer_hook.py", line 98, in after_train_iter
    runner.log_buffer.update({'grad_norm': float(grad_norm)},
    runner.log_buffer.update({'grad_norm': float(grad_norm)},
    runner.log_buffer.update({'grad_norm': float(grad_norm)},
AttributeError: 'Runner' object has no attribute 'log_buffer'
    runner.log_buffer.update({'grad_norm': float(grad_norm)},
    runner.log_buffer.update({'grad_norm': float(grad_norm)},
AttributeError: 'Runner' object has no attribute 'log_buffer'
AttributeError: 'Runner' object has no attribute 'log_buffer'
AttributeError: 'Runner' object has no attribute 'log_buffer'
    runner.log_buffer.update({'grad_norm': float(grad_norm)},
AttributeError: 'Runner' object has no attribute 'log_buffer'
AttributeError: 'Runner' object has no attribute 'log_buffer'
    runner.log_buffer.update({'grad_norm': float(grad_norm)},
AttributeError: 'Runner' object has no attribute 'log_buffer'
    runner.log_buffer.update({'grad_norm': float(grad_norm)},
AttributeError: 'Runner' object has no attribute 'log_buffer'
phoenix-srun: error: SH-IDC1-10-140-0-252: tasks 0-7: Exited with exit code 1
phoenix-srun: Terminating job step 1084231.0

I consider if the log_buffer need to be replaced with message_hub or logger.

Support to visualize learning rate status before training

Describe the feature

Motivation
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when [....].
Ex2. There is a recent paper [....], which is very helpful for [....].

Related resources
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.

Additional context
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.

`AverageModel` has bug in updating judgement

if self.steps % self.interval == 0:
avg_param = (
itertools.chain(self.module.parameters(),
self.module.buffers())
if self.update_buffers else self.parameters())
src_param = (
itertools.chain(model.parameters(), model.buffers())
if self.update_buffers else model.parameters())
for p_avg, p_src in zip(avg_param, src_param):
device = p_avg.device
p_src_ = p_src.detach().to(device)
if self.steps == 0:
p_avg.detach().copy_(p_src_)
else:
p_avg.detach().copy_(
self.avg_func(p_avg.detach(), p_src_,
self.steps.to(device)))
self.steps += 1

self.steps starts from 0. Should we change this condition to (self.step + 1) % self.interval == 0?

Add documentation of evaluation on multiple dataset with multiple metric.

Describe the feature
Add documentation to show how to evaluate multiple datasets with multiple metrics and use one of the metrics of a dataset as the best indicator.

Motivation
Users might need to evaluate different metrics on multiple datasets.
In such a case, only one metric on one dataset needs to be selected to indicate whether the model is the best model and should be saved.
It is unnecessary to officially support this feature in MMEngine, but MMEngine supports users to create a new Loop class to support this feature. Therefore, we should update the documentation to show such an example.

Related resources
See a previous PR in MMSeg open-mmlab/mmsegmentation#1461

Additional context
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.