Dear authors, Thank you for your exciting work and very clean code.

Reproducing the paper results about uno HOT 7 CLOSED

donkeyshot21 commented on August 16, 2024

Reproducing the paper results

from uno.

Comments (7)

vlfom commented on August 16, 2024 1

It was indeed an issue specifically with the libraries you mentioned, - I completely overlooked that they affect the optimization, etc.

Thank you for your prompt reply and time @DonkeyShot21.

from uno.

vlfom commented on August 16, 2024 1

As a side comment, I noticed that to evaluate your algorithm in the "task-agnostic" setup you generate predictions and solve the assignment problem separately for samples of known and novel classes.

Correct me if I'm wrong, but this might be the wrong way to perform such evaluation because one should have no knowledge on whether the group of samples belongs to known or novel classes, which you assume when performing such evaluation. The way I expected it to be is to use a joint dataloader that contains all the samples mixed and to apply Hungarian on all the resulting predictions at once.

It probably has negligible implications for this work because the classifier is trained well and the assignment problem solver assigns only one logit index to each class. But, as far as I see, assuming some novel class gets confused for the known class the majority of the time (I mean the highest logit it gets comes from the first head), such evaluation would not catch that as an error.

It might be, however, that such evaluation is expected and I just didn't get it from the paper, so just sharing my thoughts here.

from uno.

DonkeyShot21 commented on August 16, 2024 1

Happy to help! I added a note in the README that warns about package versions.

Regarding the evaluation, I think the procedure I am following is correct because I am first concatenating the logits (preds_inc) and then I am taking the max of those concatenated logits. By doing this I lose the information about the task. Then in the compute() method of the metric class, I compute the best mapping on all classes (and not separately).

from uno.

DonkeyShot21 commented on August 16, 2024 1

Ok, now I understand. You are right, this is a potential problem. However, the assignments are quite stable (they are computed on the whole validation set) and, as you said, the potential issue never happens in practice. I remember I tried once to remove the unwanted assignments (the ones that contradicted the labeled head), but the results were exactly the same, while the code was more complicated, so I just removed it. Also, if I remember correctly, in Ranking Statistics they use the same evaluation procedure, so I just sticked to that.

from uno.

vlfom commented on August 16, 2024

I have just noticed that the batch size mentioned in the paper is 512, while the one in the ReadMe is 256. I suspect this could be the issue, will test it soon.

from uno.

DonkeyShot21 commented on August 16, 2024

Hi! I have just rerun an experiment with batch size 256 on CIFAR100-80_20 and I got 71.7 for incremental/unlab/test/acc/avg, which is very close to the result published in the paper with batch size 512.

Also if you check the shape of the curve, it looks very different from your curves. I guess you are doing something wrong. It is very likely due to the version of pytorch-lightning and/or lightning-bolts. Try to use the exact versions I specified in the README.

Regarding ranking statistics, I remember I had some problems with hyperparameters too but in the end it was quite easy to get it running. I am going through my logs now and it seems I used this command for RS+:

auto_novel.py --dataset_root ./data/datasets/CIFAR/ --exp_root ./data/experiments/ --warmup_model_dir ./data/experiments/supervised_learning/resnet_rotnet_cifar100-50.pth --lr 0.1 --gamma 0.1 --weight_decay 1e-4 --step_size 340 --batch_size 256 --epochs 400 --rampup_length 300 --rampup_coefficient 25 --num_labeled_classes 50 --num_unlabeled_classes 50 --dataset_name cifar100 --IL --increment_coefficient 0.05 --seed 0 --model_name resnet_IL_cifar100 --mode train --comment RS+-50_50

I did not modify their code much so it should just work.

from uno.

vlfom commented on August 16, 2024

The (potential) problem, I believe, is that the linear_sum_assignment is computed not for all the data at once.

In the case when one tests on data with novel classes only, assume the highest logits (predictions) for all the samples come from the first head. Then the accuracy should be zero because you explicitly train the first head to predict known classes. However, because you test only on novel classes, the linear_sum_assignment could "distribute"/"match" logits from the first head to the novel classes and the accuracy would be above zero. There would be no problem with assigning those first-head logits to novel classes because no "known" images are being fed to linear_sum_assignment and those first-head logits are "free" to be assigned to something else.

However, if you would have a mixed dataset, then linear_sum_assignment would most likely have assigned those first-head logits to known classes, because they would dominate. And the novel classes would have to be assigned to something else, marking all the first-head predictions for novel classes as errors, and the final accuracy would be much lower.

But, once again, I think it doesn't affect evaluation for balanced and moderately large datasets like CIFAR/ImageNet.

from uno.

Reproducing the paper results about uno HOT 7 CLOSED

Comments (7)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs