Hello, thanks for such a wonderful work. After reading this paper, I have a question r

Removing classification loss is training now. <p dir="a

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

A question about model training about ld HOT 5 CLOSED

AerysNan commented on August 24, 2024

A question about model training

from ld.

Comments (5)

Zzh-tju commented on August 24, 2024

Removing classification loss is training now.
Removing bbox regression loss causes very slight performance drop whe main LD is used (if add VLR LD, never try yet, but probably help).
Removing DFL will improve AP slightly when LD is used (this phenomenon was observed many times).

from ld.

AerysNan commented on August 24, 2024

Removing classification loss is training now.

Thanks for your reply. But I can't quite understand this sentence. Please let me express my questions more clearly.

Currently the implementation of LDHead.loss_single indicates the overall loss is composed of 4 parts: loss_cls, loss_bbox, loss_dfl and loss_ld. And the first 3 parts require ground truth annotations to be computed, while loss_ld only requires soft_targets which is computed by the teacher model. So:

Does the current implementation of loss computation rely on ground truth annotations? Because according to my understanding, there are cases in knowledge distillation where ground truth annotations are unavailable.
If no, how does the current implementation work without ground truth? Can I only use loss_ld as the overall loss and drop loss_cls, loss_bbox and loss_dfl? According to your reply, loss_bbox and loss_dfl only affect mAP slightly, so what about loss_cls?

from ld.

Zzh-tju commented on August 24, 2024

I'm trying an experiment without cls_loss.

BTW, why does KD method remove GT annotation? Is there any literature?

You can disable cls_loss, bbox_loss and DFL of course, however, the label assignment still leverages the GT information (i.e., decide where to distill). If you remove these three losses and you use the full map locations to distill, then no GT information will be used. But notice that even if you do so, the teacher detector was trained with GT annotation.

from ld.

Zzh-tju commented on August 24, 2024

2022-02-28 00:19:13,754 - mmdet - INFO - Epoch [12][7050/7330]	lr: 1.000e-04, eta: 0:02:22, time: 0.507, data_time: 0.009, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3418, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1564, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4982
2022-02-28 00:19:39,186 - mmdet - INFO - Epoch [12][7100/7330]	lr: 1.000e-04, eta: 0:01:57, time: 0.509, data_time: 0.009, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3358, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1495, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4852
2022-02-28 00:20:04,561 - mmdet - INFO - Epoch [12][7150/7330]	lr: 1.000e-04, eta: 0:01:31, time: 0.508, data_time: 0.009, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3399, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1526, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4925
2022-02-28 00:20:29,878 - mmdet - INFO - Epoch [12][7200/7330]	lr: 1.000e-04, eta: 0:01:06, time: 0.507, data_time: 0.009, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3240, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1539, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4779
2022-02-28 00:20:55,356 - mmdet - INFO - Epoch [12][7250/7330]	lr: 1.000e-04, eta: 0:00:40, time: 0.510, data_time: 0.008, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3488, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1390, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4879
2022-02-28 00:21:20,694 - mmdet - INFO - Epoch [12][7300/7330]	lr: 1.000e-04, eta: 0:00:15, time: 0.507, data_time: 0.008, memory: 3824, loss_cls: 0.0000, loss_bbox: 0.3350, loss_dfl: 0.0000, loss_ld: 0.0000, loss_ld_neg: 0.0000, loss_cls_kd: 0.1496, loss_cls_kd_neg: 0.0000, loss_gibox_im: 0.0000, loss_im: 0.0000, loss_im_neg: 0.0000, loss: 0.4845
2022-02-28 00:21:45,821 - mmdet - INFO - Saving checkpoint at 12 epochs
2022-02-28 00:24:14,925 - mmdet - INFO - Evaluating bbox...
2022-02-28 00:25:27,692 - mmdet - INFO - Exp name: im_r101_r50_coco_1x.py
2022-02-28 00:25:27,693 - mmdet - INFO - Epoch(val) [12][7330]	bbox_mAP: 0.3270, bbox_mAP_50: 0.4820, bbox_mAP_75: 0.3520, bbox_mAP_s: 0.1940, bbox_mAP_m: 0.3680, bbox_mAP_l: 0.3980, bbox_mAP_copypaste: 0.327 0.482 0.352 0.194 0.368 0.398

The above training settings are bbox_loss on positive locations, and cls KD on full map locations.

Removing cls_loss causes significant AP drops (7.4 points).

from ld.

AerysNan commented on August 24, 2024

All my doubts are cleared. Thanks a lot!

from ld.

A question about model training about ld HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs