GithubHelp home page GithubHelp logo

Comments (23)

pppppM avatar pppppM commented on August 27, 2024 1

Sounds great!
Are you interested in making a PR? We can discuss further.

from mmrazor.

twmht avatar twmht commented on August 27, 2024 1

@HIT-cwh

In fact, I have implemented my own autoslim, it's quite different from mmrazor, the memory usage is much efficient than mmrazor.

I use grad clip to clip the gradient in object detection, without distilling the result is satisfied. but when applying the distilling like cwd, the result is bad. You may try grad clip if you hava nan in the beginning of training.

by the way, most anytime network (like BigNAS) does not explain how they use distilling in object detection, I am exploring this and I am looking forward your experiment on this.

from mmrazor.

HIT-cwh avatar HIT-cwh commented on August 27, 2024 1

@HIT-cwh

In fact, I have implemented my own autoslim, it's quite different from mmrazor, the memory usage is much efficient than mmrazor.

I use grad clip to clip the gradient in object detection, without distilling the result is satisfied. but when applying the distilling like cwd, the result is bad. You may try grad clip if you hava nan in the beginning of training.

by the way, most anytime network (like BigNAS) does not explain how they use distilling in object detection, I am exploring this and I am looking forward your experiment on this.

I will appreciate it if you can share how to save memory in your implementation. And we will improve our codes based on that.

from mmrazor.

pppppM avatar pppppM commented on August 27, 2024

Could you upload pruner config?

from mmrazor.

twmht avatar twmht commented on August 27, 2024

it's the same as https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/autoslim_mbv2_supernet_8xb256_in1k.py#L41, except the model is change to https://github.com/open-mmlab/mmdetection/blob/master/configs/atss/atss_r50_fpn_1x_coco.py

from mmrazor.

pppppM avatar pppppM commented on August 27, 2024

I'm very sorry for the inconvenience to you.
There is a bug in the trace mechanism in pruner. The sharable head only traces its first parent module (FPN 0), and other parent modules (FPN 1, FPN 2, ...) are not traced.
I will fix it as soon as possible.

from mmrazor.

twmht avatar twmht commented on August 27, 2024

@pppppM

This is what I concerned.

By the way, I think it's better to let users to configure the whole block as a group (like neck and bbox_head) which sharing the mask, since these blocks are always complicated, and the parsers are hard to modify to deal with these cases.

I have done this by passing the prebuilt channel space (in txt format) to my reimplemented autoslim.

The parser is hard to deal with all the network architectures. the same problem can be found in nni(https://nni.readthedocs.io/en/stable/Compression/ModelSpeedup.html#limitations).

the channel space can be generated by nni or mmrazor and saved it with text file, and can be modified by the users if the channel dependencies are not correctly built.

What is your opinion?

from mmrazor.

twmht avatar twmht commented on August 27, 2024

@pppppM

Sure.

from mmrazor.

pppppM avatar pppppM commented on August 27, 2024

Before our open source version, most popular models can be handled correctly, such as ResNet, MobileNet, RetinaNet, Yolox, etc. Probably something went wrong when we refactored the code.
It does require some configurable mechanism to handle models that cannot be handled correctly.
I'm very excited to develop this feature with you. Looking forward to your PR.

from mmrazor.

HIT-cwh avatar HIT-cwh commented on August 27, 2024

Hi! This bug has been fixed in pr#126.

from mmrazor.

Bing1002 avatar Bing1002 commented on August 27, 2024

it's the same as https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/autoslim_mbv2_supernet_8xb256_in1k.py#L41, except the model is change to https://github.com/open-mmlab/mmdetection/blob/master/configs/atss/atss_r50_fpn_1x_coco.py

Hi, Can you please upload the prune config file? I used the way you referred but still got errors. Did you successfully to run autoslim on object detection task? Thanks.

from mmrazor.

twmht avatar twmht commented on August 27, 2024

@Bing1002

I have not tried the latest mmraor. Did you try?

from mmrazor.

Bing1002 avatar Bing1002 commented on August 27, 2024

from mmrazor.

HIT-cwh avatar HIT-cwh commented on August 27, 2024

I'm very sorry for the inconvenience to you.
Pruning models with GroupNorm haven't been supported at present. And GroupNorm is the default normalization in ATSSHead. We will fix it as soon as possible.
Models, such as RetinaNet and YoloX, can be pruned in our codes. The following codes can be used:

model = dict(
    type='mmdet.RetinaNet',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_input',
        num_outs=5),
    bbox_head=dict(
        type='RetinaHead',
        num_classes=80,
        in_channels=256,
        stacked_convs=4,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            octave_base_scale=4,
            scales_per_octave=3,
            ratios=[0.5, 1.0, 2.0],
            strides=[8, 16, 32, 64, 128]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    # model training and testing settings
    train_cfg=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.4,
            min_pos_iou=0,
            ignore_iof_thr=-1),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100))

algorithm_cfg = ConfigDict(
    type='AutoSlim',
    architecture=dict(type='MMDetArchitecture', model=model),
    pruner=dict(
        type='RatioPruner',
        ratios=(2 / 12, 3 / 12, 4 / 12, 5 / 12, 6 / 12, 7 / 12, 8 / 12, 9 / 12,
                10 / 12, 11 / 12, 1.0)),
    retraining=False,
    bn_training_mode=True,
    input_shape=None)

algorithm = build_algorithm(algorithm_cfg)

from mmrazor.

Bing1002 avatar Bing1002 commented on August 27, 2024

Hi, thanks for your reply. I tried this config but still failed.

Here is the config:

model = dict(
    type='mmdet.RetinaNet',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_input',
        num_outs=5),
    bbox_head=dict(
        type='RetinaHead',
        num_classes=80,
        in_channels=256,
        stacked_convs=4,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            octave_base_scale=4,
            scales_per_octave=3,
            ratios=[0.5, 1.0, 2.0],
            strides=[8, 16, 32, 64, 128]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    # model training and testing settings
    train_cfg=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.4,
            min_pos_iou=0,
            ignore_iof_thr=-1),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100))

dataset_type = 'CocoDataset'
data_root = '/mnt/data/coco_demo/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=2,
    train=dict(
        type='CocoDataset',
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ]),
    val=dict(
        type='CocoDataset',
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CocoDataset',
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
work_dir = './work_dirs/retinanet_r50_fpn_1x_coco'
auto_resume = False
gpu_ids = range(0, 1)


algorithm = dict(
    type='AutoSlim',
    architecture=dict(type='MMDetArchitecture', model=model),
    pruner=dict(
        type='RatioPruner',
        ratios=(2 / 12, 3 / 12, 4 / 12, 5 / 12, 6 / 12, 7 / 12, 8 / 12, 9 / 12,
                10 / 12, 11 / 12, 1.0)),
    retraining=False,
    bn_training_mode=True,
    input_shape=None)

And the error is the same as usual:

2022-04-16 09:49:55,736 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2022-04-16 09:49:55,736 - mmdet - INFO - Checkpoints will be saved to /home/local/york.lan/bing.zha/code/mmrazor/work_dirs/retinanet_r50_fpn_1x_coco by HardDiskBackend.
Traceback (most recent call last):
  File "/home/local/york.lan/bing.zha/code/mmrazor/tools/mmdet/train_mmdet.py", line 210, in <module>
    main()
  File "/home/local/york.lan/bing.zha/code/mmrazor/tools/mmdet/train_mmdet.py", line 199, in main
    train_mmdet_model(
  File "/home/local/york.lan/bing.zha/code/mmrazor/mmrazor/apis/mmdet/train.py", line 206, in train_mmdet_model
    runner.run(data_loader, cfg.workflow)
  File "/home/local/york.lan/bing.zha/code/mmcv_1.4.6/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/local/york.lan/bing.zha/code/mmcv_1.4.6/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/home/local/york.lan/bing.zha/code/mmcv_1.4.6/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/local/york.lan/bing.zha/code/mmcv_1.4.6/mmcv/runner/hooks/optimizer.py", line 56, in after_train_iter
    runner.outputs['loss'].backward()
  File "/home/local/york.lan/bing.zha/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/local/york.lan/bing.zha/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Looking forward your response. Thanks.

from mmrazor.

twmht avatar twmht commented on August 27, 2024

You may try to set optimizer_config to None.

from mmrazor.

Bing1002 avatar Bing1002 commented on August 27, 2024

You may try to set optimizer_config to None.

After changing that part, now I can run pruning. Could you please explain why that setting matters?

from mmrazor.

twmht avatar twmht commented on August 27, 2024

They call optimizer.step() in autoslim, not by mmcv hook. Setting optimizer_config to None would not register mmcv hook and you would not call optimizer.step() twice.

from mmrazor.

Bing1002 avatar Bing1002 commented on August 27, 2024

Thanks. Then it seems the return loss become nan.

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 15.2 task/s, elapsed: 329s, ETA:     0s2022-04-16 14:02:58,786 - mmdet - INFO - Evaluating bbox...
Loading and preparing results...
2022-04-16 14:02:58,787 - mmdet - ERROR - The testing results of the whole dataset is empty.
2022-04-16 14:02:58,816 - mmdet - INFO - Exp name: autoslim_retinanet.py
2022-04-16 14:02:58,841 - mmdet - INFO - Epoch(val) [4][5000]
2022-04-16 14:04:45,274 - mmdet - INFO - Epoch [5][50/1239]     lr: 1.000e-02, eta: 5:31:29, time: 2.128, data_time: 0.051, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:06:28,567 - mmdet - INFO - Epoch [5][100/1239]    lr: 1.000e-02, eta: 5:29:53, time: 2.066, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:08:13,967 - mmdet - INFO - Epoch [5][150/1239]    lr: 1.000e-02, eta: 5:28:21, time: 2.108, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:09:58,547 - mmdet - INFO - Epoch [5][200/1239]    lr: 1.000e-02, eta: 5:26:47, time: 2.092, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:11:42,629 - mmdet - INFO - Epoch [5][250/1239]    lr: 1.000e-02, eta: 5:25:11, time: 2.082, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:13:25,492 - mmdet - INFO - Epoch [5][300/1239]    lr: 1.000e-02, eta: 5:23:34, time: 2.057, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:15:09,866 - mmdet - INFO - Epoch [5][350/1239]    lr: 1.000e-02, eta: 5:21:59, time: 2.087, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan

Do you have any idea about it? Thanks a lot!

from mmrazor.

HIT-cwh avatar HIT-cwh commented on August 27, 2024

We have not verified whether AutoSlim works on object detection. Maybe you can try to prune Mobilenet v2 first to check if there is a problem with the codes or the AutoSlim.

from mmrazor.

HIT-cwh avatar HIT-cwh commented on August 27, 2024

Thanks. Then it seems the return loss become nan.

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 15.2 task/s, elapsed: 329s, ETA:     0s2022-04-16 14:02:58,786 - mmdet - INFO - Evaluating bbox...
Loading and preparing results...
2022-04-16 14:02:58,787 - mmdet - ERROR - The testing results of the whole dataset is empty.
2022-04-16 14:02:58,816 - mmdet - INFO - Exp name: autoslim_retinanet.py
2022-04-16 14:02:58,841 - mmdet - INFO - Epoch(val) [4][5000]
2022-04-16 14:04:45,274 - mmdet - INFO - Epoch [5][50/1239]     lr: 1.000e-02, eta: 5:31:29, time: 2.128, data_time: 0.051, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:06:28,567 - mmdet - INFO - Epoch [5][100/1239]    lr: 1.000e-02, eta: 5:29:53, time: 2.066, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:08:13,967 - mmdet - INFO - Epoch [5][150/1239]    lr: 1.000e-02, eta: 5:28:21, time: 2.108, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:09:58,547 - mmdet - INFO - Epoch [5][200/1239]    lr: 1.000e-02, eta: 5:26:47, time: 2.092, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:11:42,629 - mmdet - INFO - Epoch [5][250/1239]    lr: 1.000e-02, eta: 5:25:11, time: 2.082, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:13:25,492 - mmdet - INFO - Epoch [5][300/1239]    lr: 1.000e-02, eta: 5:23:34, time: 2.057, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:15:09,866 - mmdet - INFO - Epoch [5][350/1239]    lr: 1.000e-02, eta: 5:21:59, time: 2.087, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan

Do you have any idea about it? Thanks a lot!

Do you detach the teacher's output in the loss function? Such as here.

from mmrazor.

twmht avatar twmht commented on August 27, 2024

@HIT-cwh

he did not use distilling.

from mmrazor.

HIT-cwh avatar HIT-cwh commented on August 27, 2024

@HIT-cwh

he did not use distilling.

My bad.
Due to a lack of manpower, the progress of transferring AutoSlim to other tasks is not very satisfactory. And I'm very sorry for the inconvenience to you.
We are reproducing BigNAS, if it goes well, we will release the BigNAS example on semantic segmentation.

from mmrazor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.