zhmiao / openlongtailrecognition-oltr Goto Github PK

View Code? Open in Web Editor NEW

834.0 834.0 129.0 2.24 MB

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

computer-vision cvpr2019 deep-learning long-tail oltr open-long-tail-recognition open-set pytorch-implementation

openlongtailrecognition-oltr's People

Stargazers

Watchers

Forkers

trantorrepository jiangxuehan evovvan ml-lab 953250587 peterzhousz qq314000558 leo-xxx shubhampachori12110095 hyzcn yuckfu lemonyhw foreveryanbing peterzs shuyufranky mrliliang ideaplexus zhyj3038 xiaoye77 tau-yihouxiang wjx2 liu3xing3long rotorliu hx-idiot lihungchieh benzei rubenszimbres codeaudit salomexin lxwithgod yijiuzai jason-lee-lxx yuew08 tectal zhangxujinsh yangsuhui weinajin christinaliang bboyhanat kjzju attendfov xiaotie1005 drcege marcelomata jiasenlu xiaosean xinxinww konami86 jeacwen visenzeadam dbxinj yangwf1 tonysy mystoreage qiutingli rannichan yjingyu stefanxinhong beyond1235 ngts-aus xingyifei2016 wengdunfang yangfly harveyguo960817 yuanmengzhixing deyituo guohongli ssitb seeker1943 sherif-abdelkarim gritycda xiuyu-li busixingxing dinsangrasi hayeonlee zhushaoquan dinarior swansealeo zebrajack tor4z xyyue yunchenhust yuanwanglll yuanwei0908 caotianwei nirvanalan dorniwang xrosliang snehashischatterjee1997 edwardnguyen1705 soumithk bxshin baiboat damonzhenghuang aireadinggroup bruinxiong shenmengen saturdays mathematicalmodels lihuikenny

openlongtailrecognition-oltr's Issues

Why is nn.Linear instead of nn. Conv2d used to compute the spatial attention？

Hi，when I read the code I find nn.Linear instead of nn. Conv2d is used to compute the spatial attention, I don't know why nn.Linear is adopted to compute the spatial attention? may you tell me the reason or the advantages of nn.Linear than nn.Conv2d. Thanks

Some errors in equation (7)

I think the vn(meta) in equation(7) is wrong, otherwise, the networks of concept selector, Hallucinator will not be trained

about center loss implement

i am not very sure about center loss implement. Why do we need implement attracting loss using DiscCentroidsLossFunc(torch.autograd.function)? Can we just implement as repelling loss？

Will data input schedual influence performance

ClassAwareSampler configuration

Hi, both in the Places and ImageNet, the batch_size and number_samples_cls are configured as 256 and 4, respectively. Does it mean each batch only will contain 256/4=64 categories?

Openset f-measure calculation problem

Could you please show us how do you determine the "Positive" & "Negative" for F-measure?
think like some other guys also concern about that :)

Question about centroids update

Thank you for releasing the code for this awesome work.
I have a question about the centroids update. I have read the code and find that centroids only calculated once at the beginning of the model initialization at stage 2. Could you help me to find out how the centroids update? And I wonder that is the centroids correct because the parameters of attention is just initialized and the features of centroids are not learned.

Why the final feature was called as Meta-embedding? what's the specifical meaning of meta

Openset accuracy calculation is wrong

@zhmiao As for the accuracy calculation, the correct equation is accuracy=TP/(TP+FP), however, in this repository the implementation of accuracy is replaced by recall (TP/(TP+FN))

Why Eq(6) can help amplify the effects of the reachability gama

The calculation of open-set F-measure

Hi, I wonder if true positive, false positive and false negative are counted correctly.

OpenLongTailRecognition-OLTR/utils.py

Lines 86 to 89 in 4a1f400

 for i in range(len(labels)): 

 true_pos += 1 if preds[i] == labels[i] and labels[i] != -1 else 0 

 false_pos += 1 if preds[i] != labels[i] and labels[i] != -1 and preds[i] != -1 else 0 

 false_neg += 1 if preds[i] != labels[i] and labels[i] == -1 else 0

Here are some examples according to the above code:
(pairs of prediction and label)

class_a, class_a (TP)
class_b, class_a (FP)
-1, class_a (?)
class_a, -1 (FN)
-1, -1 (?)

I'm confused about

why the 2nd example is counted as FP rather than FN? (FP means the label is negative, but the prediction is positive, so what is positive here)
why the 3nd example is not counted as FN?
is the last example TN or TP?

Code Error

Hello,
When I run python main.py --config ./config/Places_LT/stage_2_meta_embedding.py, there is an error.

File "./models/MetaEmbeddingClassifier.py", line 33, in forward
dist_cur = torch.norm(x_expand - centroids_expand, 2, 2)
RuntimeError: The size of tensor a (365) must match the size of tensor b (122) at non-singleton dimension 1

Here, I print the shape of x_expand and centroids_expand.

torch.Size([86, 365, 512])
torch.Size([86, 122, 512])

Could you give some advice to solve this problem?

Code on MS1M-LT dataset

Hi zhmiao,

Thanks for your code.
I am wondering if you are planning to update the code (configs) for reproducing the results on the MS1M-LT dataset. Thanks.

Regarding the datasets

Hi,
Thank you again for your code release. I am puzzled by the following issues, which I'm hoping you can help me with :
-> Places-LT has 62.5K examples, differently from the reported 184.5K images in the paper. Is the mistake in the paper or in the released dataset ?
-> I am unable to reproduce the dataset statistics for ImageNet-LT and Places-LT using Zipf's law ( discrete Pareto distribution : https://en.wikipedia.org/wiki/Pareto_distribution, https://en.wikipedia.org/wiki/Zipf%27s_law ) with alpha=6 ( which seems rather high ). Moreover, the log-log plot is not completely linear in my opinion :

Reproducing Plain Model Baseline Accuracies

Hi,
Thank you for releasing the code for your paper. Can you please clarify how to reproduce the accuracies for the plain model baseline on ImageNet-LT in Table 3 of your paper ? I'm running the following commands :
-> python main.py --config ./config/ImageNet_LT/stage_1.py
-> python main.py --config ./config/ImageNet_LT/stage_1.py --test
which gives me the following output :
Evaluation_accuracy_micro_top1: 0.119
Averaged F-measure: 0.108
Many_shot_accuracy_top1: 0.148 Median_shot_accuracy_top1: 0.112 Low_shot_accuracy_top1: 0.062

From the paper, the numbers should be :
Evaluation_accuracy_micro_top1: 0.209
Many_shot_accuracy_top1: 0.409 Median_shot_accuracy_top1: 0.107 Low_shot_accuracy_top1: 0.004

Stage 1 is simply the baseline resnet-10 training on the entire dataset, right, or am I missing something ?

One thing about the normalization

Thanks for sharing the code. I have one question about the squashing function + cross entropy loss. Do you have some experiments about using softmax + cross entropy loss? Or other normalization method

Can you explain the implementation of the disccentroidloss?

Hi, thanks for your work.
I am confusing about the implementation of the disccentroidloss, could you share the correct mathematical formulation of this loss function. I think the implementation is different from the paper details (equation. 9)

Thanks.

Reproduce model results

Thanks for the inspiring work and code :)

I'm having trouble to reproduce the results (plain model as well as final model on both datasets.) I have used the default settings without any alterations. Can you shed some insights on the results (perhaps this is caused by the hyper-parameters) and maybe if it is OK for you to provide the trained models for both stage1 and stage2?

The results I have reproduced are as following:

ImageNet-LT

Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.204
Averaged F-measure: 0.160
Many_shot_top1: 0.405; Median_shot_top1: 0.099; Low_shot_top1: 0.006

Stage1(open-setting):
Open-set Accuracy: 0.178
Evaluation_accuracy_micro_top1: 0.199
Averaged F-measure: 0.291
Many_shot_top1: 0.396; Median_shot_top1: 0.096; Low_shot_top1: 0.006

Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.339
Averaged F-measure: 0.322
Many_shot_top1: 0.411; Median_shot_top1: 0.330; Low_shot_top1: 0.167

Stage2(open-setting):
Open-set Accuracy: 0.245
Evaluation_accuracy_micro_top1: 0.327
Averaged F-measure: 0.455
Many_shot_top1: 0.398; Median_shot_top1: 0.318; Low_shot_top1: 0.159

Places-LT

Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.268
Averaged F-measure: 0.248
Many_shot_top1: 0.442; Median_shot_top1: 0.221; Low_shot_top1: 0.058

Stage1(open-setting):
Open-set Accuracy: 0.018
Evaluation_accuracy_micro_top1: 0.267
Averaged F-measure: 0.373
Many_shot_top1: 0.441; Median_shot_top1: 0.219; Low_shot_top1: 0.057

Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.349
Averaged F-measure: 0.338
Many_shot_top1: 0.387; Median_shot_top1: 0.355; Low_shot_top1: 0.263

Stage2(open-setting):
Open-set Accuracy: 0.120
Evaluation_accuracy_micro_top1: 0.342
Averaged F-measure: 0.477
Many_shot_top1: 0.382; Median_shot_top1: 0.349; Low_shot_top1: 0.254

ImageNet dataset

@zhmiao Could you please provide the link for downloading the ImageNet dataset?

A question about the initialization of memory M.

Hi, brother, it was great to have a look at your paper recently,
what i confused is ,at the beginning of stage 2, you use average representation for each class as memory, how about just using the fc parameter trained in stage_1?(we can also use cosine classifier in stage_1) Maybe it's more conclusive? Does there some difference between the two ways?
Thanks~

What's the difference of datasets used in calculating closed-set setting and open-set setting's Many、Mediumn and Few-shot

@zhmiao Dear, after carefully reading the code, I found both the train and test's known type data will be used to calculate the closed-set setting and open-set setting's Many、Mediumn and Few-shot, but why the results are slightly different？ e.g. the many-shot in closed-set is 44.7 while is 44.6 in open-set， I really feel confused about it. May you help to explain it， thanks.

Calculation of equation(6)

I find some differences between ex of the equation (6) in paper and the implementation in the code, may you help to check?

def forward(self, input, *args):
    norm_x = torch.norm(input, 2, 1, keepdim=True)
    ex = (norm_x / (1 + norm_x)) * (input / norm_x)
    ew = self.weight / torch.norm(self.weight, 2, 1, keepdim=True)
return torch.mm(self.scale * ex, ew.t())

Code Error for Training (Stage 1)

Hello,
When I run python main.py --config ./config/ImageNet_LT/stage_1.py, there is an error.

Loading Dot Product Classifier.
Traceback (most recent call last):
File "/media/Elements/OLTR/OpenLongTailRecognition-OLTR/main.py", line 55, in
training_model = model(config, data, test=False)
File "/media/Elements/OLTR/OpenLongTailRecognition-OLTR/run_networks.py", line 26, in init
self.init_models()
File "/media/Elements/OLTR/OpenLongTailRecognition-OLTR/run_networks.py", line 69, in init_models
self.networks[key] = source_import(def_file).create_model(*model_args)
File "./models/DotProductClassifier.py", line 16, in create_model
clf = DotProduct_Classifier(num_classes, feat_dim)
File "./models/DotProductClassifier.py", line 8, in init
self.fc = nn.Linear(feat_dim, num_classes)
File "/home/.local/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 81, in init
self.reset_parameters()
File "/home/.local/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 84, in reset_parameters
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
File "/home/.local/lib/python3.5/site-packages/torch/nn/init.py", line 325, in kaiming_uniform_
std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero

Could you give some advice to solve this problem?

comparison to only class balanced sampling trainilng

HI! Thanks for sharing the work! I'm wondering if you have compare to the most straightforward baseline of class balanced sampling of training data when train the model?

What's the main purpose of training of stage_1.py

In my understanding, all of the modulated attention, dynamic meta-embedding and cosine classifier are not used in stage_1, so I have a question what's the main purpose of training of stage_1.py? Just in order to finetune the ResNet152?

Code error when using Place dataset

When I use the Place dataset and implement the command “python3 main.py --config ./config/Places_LT/stage_1.py”，an error is occur, does anyone can help me? Thanks

Traceback (most recent call last):
File "main.py", line 55, in
training_model = model(config, data, test=False)
File "/home/huang/OpenLongTailRecognition-OLTR-master2/run_networks.py", line 26, in init
self.init_models()
File "/home/huang/OpenLongTailRecognition-OLTR-master2/run_networks.py", line 69, in init_models
self.networks[key] = source_import(def_file).create_model(*model_args)
File "./models/DotProductClassifier.py", line 17, in create_model
clf = DotProduct_Classifier(num_classes, feat_dim)
File "./models/DotProductClassifier.py", line 9, in init
self.fc = nn.Linear(feat_dim, num_classes)
File "/home/huang/.virtualenvs/OLTR/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 76, in init
self.weight = Parameter(torch.Tensor(out_features, in_features))
TypeError: new() received an invalid combination of arguments - got (str, bool), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
didn't match because some of the arguments have invalid types: (str, bool)
(object data, torch.device device)
didn't match because some of the arguments have invalid types: (str, bool)

Traing Time

Hello,

How long does the model need to train?

Missing very important baseline

Hi! Thanks for sharing your work, I'm wondering have you tried the baseline of class balanced sampling and inverse frequecy sampling when train the model?

“Modulated Attention” no use?

In ImageNet_LT, I tested stage1 with 'use_selfatt':True or False.
And training results are:
#True
Evaluation_accuracy_micro_top1: 0.219
Averaged F-measure: 0.175
Many_shot_accuracy_top1: 0.422 Median_shot_accuracy_top1: 0.114 Low_shot_accuracy_top1: 0.006
Training Complete.
Best validation accuracy is 0.219 at epoch 30
Done
ALL COMPLETED.

#Fasle
Evaluation_accuracy_micro_top1: 0.220
Averaged F-measure: 0.175
Many_shot_accuracy_top1: 0.427 Median_shot_accuracy_top1: 0.113 Low_shot_accuracy_top1: 0.007
Training Complete.
Best validation accuracy is 0.220 at epoch 30
Done
ALL COMPLETED.

About The training time in ImageNet

It is a wonderful work! I am curious about the training time in the ImageNet. Would it be very long since it trains the model from scratch?

What's the purpose of “train plain” phase

Except the “train”，“val”， there are also “train plain” phase, however what's the purpose of “train plain”？ or what's actions are taken during "train plain"

hyper-parameters

Hi, I'm curious if you could give some insights on how to decide the value of hyper-parameter scale (currently is set to 16) in the CosNormClassifier, is it set empirically or is there a theoretical way to set it? Also the parameter scale (currently is set to 10) in meta-embedding for reach-ability.

Running on windows

It seems there would be some problem running the code on windows, as discussed in pytorch/pytorch#5858 (comment) . By far we do not have any solutions to this issue, and it seems to be a pytorch issue. We will test it out when we have a windows machine.

Training error for stage 1

Loading dataset from: /mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/data/train_256_places365standard
{'criterions': {'PerformanceLoss': {'def_file': 'loss/SoftmaxLoss.py',
'loss_params': {},
'optim_params': None,
'weight': 1.0}},
'memory': {'centroids': False, 'init_centroids': False},
'networks': {'classifier': {'def_file': 'models/DotProductClassifier.py',
'optim_params': {'lr': 0.1,
'momentum': 0.9,
'weight_decay': 0.0005},
'params': {'dataset': 'Places_LT',
'in_dim': 512,
'num_classes': 365,
'stage1_weights': False}},
'feat_model': {'def_file': 'models/ResNet152Feature.py',
'fix': True,
'optim_params': {'lr': 0.01,
'momentum': 0.9,
'weight_decay': 0.0005},
'params': {'caffe': True,
'dataset': 'Places_LT',
'dropout': None,
'stage1_weights': False,
'use_fc': True,
'use_modulatedatt': False}}},
'training_opt': {'batch_size': 256,
'dataset': 'Places_LT',
'display_step': 10,
'feature_dim': 512,
'log_dir': './logs/Places_LT/stage1',
'num_classes': 365,
'num_epochs': 30,
'num_workers': 4,
'open_threshold': 0.1,
'sampler': None,
'scheduler_params': {'gamma': 0.1, 'step_size': 10}}}
Loading data from ./data/Places_LT/Places_LT_train.txt
Use data transformation: Compose(
RandomResizedCrop(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=PIL.Image.BILINEAR)
RandomHorizontalFlip(p=0.5)
ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0)
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
No sampler.
Shuffle is True.
Loading data from ./data/Places_LT/Places_LT_val.txt
Use data transformation: Compose(
Resize(size=256, interpolation=PIL.Image.BILINEAR)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
No sampler.
Shuffle is True.
Using 1 GPUs.
Loading Scratch ResNet 152 Feature Model.
Loading Caffe Pretrained ResNet 152 Weights.
Pretrained feature model weights path: ./logs/caffe_resnet152.pth
Freezing feature weights except for self attention weights (if exist).
Loading Dot Product Classifier.
Random initialized classifier weights.
Using steps for training.
Initializing model optimizer.
Loading Softmax Loss.
Phase: train
Traceback (most recent call last):

File "", line 1, in
runfile('/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/main.py', wdir='/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master')

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 668, in runfile
execfile(filename, namespace)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/main.py", line 62, in
training_model.train()

File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/run_networks.py", line 212, in train
phase='train')

File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/run_networks.py", line 137, in batch_forward
self.logits, self.direct_memory_feature = self.networks['classifier'](self.features, self.centroids)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "models/DotProductClassifier.py", line 11, in forward
x = self.fc(x)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)

File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/functional.py", line 1024, in linear
return torch.addmm(bias, input, weight.t())

RuntimeError: size mismatch, m1: [256 x 2048], m2: [512 x 365] at /opt/conda/conda-bld/pytorch_1533672544752/work/aten/src/THC/generic/THCTensorMathBlas.cu:249

Overall vs F-measre

For the datasets of ImageNet-LT and Places_LT, why only overall criterion is measured for close-set and only F-measure is measured for open-set other than these two indexes are used at the same time? Under such incomplete comparison, it is easy cause doubts that the proposed algorithm do not work well.

embedded_gaussian is not trained

OpenLongTailRecognition-OLTR/run_networks.py

Line 76 in 4dc25ea

if 'selfatt' not in param_name and 'fc' not in param_name:

Seeing this line, Stage1 config and Stage2 config, I noticed that embedded gaussian non-local filtering is not trained at all as "selfatt" is not present in ModulatedAttLayer. In Stage 1 we don't use the modulated attention

OpenLongTailRecognition-OLTR/config/Places_LT/stage_1.py

Line 19 in 4dc25ea

feature_param = {'use_modulatedatt': False, 'use_fc': True, 'dropout': None,

In Stage 2 we use it:

OpenLongTailRecognition-OLTR/config/Places_LT/stage_2_meta_embedding.py

Line 20 in 4dc25ea

feature_param = {'use_modulatedatt': True, 'use_fc': True, 'dropout': None,

initialize it, but then fix all the convolution layers weights due to:

OpenLongTailRecognition-OLTR/config/Places_LT/stage_2_meta_embedding.py

Line 26 in 4dc25ea

'fix': True}

and the above if condition.

Should we change "selfatt" to "modulatedatt"

Convolutional layers in ResNet152 are freezed in stage 1

@zhmiao After carefully reading the code, I find that the convolutional layers in ResNet152 are freezed during the stage #1’ training which resulting the training no meaning, So pls help to check. thanks

comparisons

Hello, reading carefully your code (for ImageNet-LT), it seems that the plain models was trained in 30 epochs while your own model in 90 epochs (30 stage1 + 60 stage2). Could you please confirm this? And moreover, was all the comparisons (focal loss, etc) performed with 30 or 90 epochs?

Thanks!

Unable to reproduce the results of the paper

For ImageNet_LT，I just use default config in the code, but cannot reproduce the results in paper Table3(a).

For stage1, my result is（some last logs when training complete）:
Epoch: [30/30] Step: 440 Minibatch_loss_performance: 2.833 Minibatch_accuracy_micro: 0.438
Epoch: [30/30] Step: 450 Minibatch_loss_performance: 2.886 Minibatch_accuracy_micro: 0.379
Phase: val
100%|██████████| 79/79 [01:40<00:00, 1.34it/s]
Phase: val
Evaluation_accuracy_micro_top1: 0.220
Averaged F-measure: 0.175
Many_shot_accuracy_top1: 0.427 Median_shot_accuracy_top1: 0.113 Low_shot_accuracy_top1: 0.007
Training Complete.
Best validation accuracy is 0.220 at epoch 30

Few/Low shot acc 0.7% is better with 0.4% Plain model in Table3(a) .

[Below is IMPORTANT!!!!!]
2.However for stage2, my result is（some last logs when training complete）:
Epoch: [60/60] Step: 440 Minibatch_loss_feature: 0.569 Minibatch_loss_performance: 2.938 Minibatch_accuracy_micro: 0.566
Epoch: [60/60] Step: 450 Minibatch_loss_feature: 0.567 Minibatch_loss_performance: 2.845 Minibatch_accuracy_micro: 0.539
Phase: val
100%|██████████| 79/79 [01:34<00:00, 1.02it/s]
Phase: val
Evaluation_accuracy_micro_top1: 0.340
Averaged F-measure: 0.324
Many_shot_accuracy_top1: 0.401 Median_shot_accuracy_top1: 0.334 Low_shot_accuracy_top1: 0.197
Training Complete.
Best validation accuracy is 0.341 at epoch 48

However Many, Median and Few/Low shot acc are 40.1%, 33.4% and 19.7%, which are a little diff with 43.2%, 35.1% and 18.5% in "Ours" model in Table3(a) .
And I retrained for several times, the Many-shot acc always some lower than 43.2%.

Are there any tricks not released?

stronger model for imagenet

I'm wondering if you have tried stronger model on imagenet, as resNet10's performance is comparitively low than other larger models like resnet50?

CosNormClassifier#L24

Hi,
Thank you for releasing the code for your paper. I'm a little confused when I look at this line of code.
https://github.com/zhmiao/OpenLongTailRecognition-OLTR/models/CosNormClassifier.py#L24.
It seems to be different from what is described in the paper. Could you help me with this problem?

Thank you very much.

About visual memory M (centroids) update problem

Thanks for the awesome work.

But I do not find the related codes to update the centroids.
In line 45 of file run_networks.py :

if self.memory['init_centroids']:
    self.criterions['FeatureLoss'].centroids.data = self.centroids_cal(self.data['train_plain'])

these codes are utilized for centroids initialization.

In the paper, Section 3.1, Para Learning Visual Memory M, the centroids are updated in two steps.
Could you kindly give me more hints about how to realize the second step, which is the propagation step by alternatively updating the direct feature and the centroids.

Thanks a lot.

All the Backbone Net used by the compared methods are also frozen when training the datasets？

Here， I am curious about that the weights of Backbone Net used by the compared methods are also frozen when training the datasets？If so，I think it is not fair to compare their performances since their performances does not exhaust if their weights are frozen

How to understand the metric learning mentioned in the paper？Specifically, whats' metric learned in this paper

Maybe wrong in F_measure calculation on openset

https://github.com/zhmiao/OpenLongTailRecognition-OLTR/blob/master/utils.py#L88

Seems it should be changed from

false_pos += 1 if preds[i] != labels[i] and labels[i] != -1 and preds[i] != -1 else 0
to :
false_pos += 1 if preds[i] != labels[i] and ((labels[i] != -1 and preds[i] != -1) or label[i] == -1) else 0

Caffe pretrained models

Hi, where did you get the Caffe pretrained ResNet152 as well as the ResNet-10 and ResNet-50 mentioned in the paper?

Are they from Kaiming He's repository, and how did you convert them to pytorch format?

Page of benchmark and model zoo cannot be opened, pls check it.

Multiple-GPU support?

@liuziwei7 @zhmiao thanks for your amazing work, from the CAUTION

The current code was prepared using single GPU. The use of multi-GPU can cause problems.

and the error is:

File "./models/MetaEmbeddingClassifier.py", line 48, in forward
    memory_feature = torch.matmul(values_memory, keys_memory)
RuntimeError: size mismatch, m1: [16 x 7], m2: [4 x 512] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:268

# for 2 GPUs:
torch.Size([16, 7])
torch.Size([3, 512])
torch.Size([16, 7])
torch.Size([4, 512])

# for 1 GPU:
torch.Size([32, 7])
torch.Size([7, 512])

Is there any idea to support Multiple-GPU?

How to get openset accuracy?

Thanks for sharing your work. May I ask how to calculate openset accuracy? Because, before calculating openset accuracy, it has

probs, preds = F.softmax(self.total_logits.detach(), dim=1).max(dim=1)

so that open_threshold doesn't work.

OpenLongTailRecognition-OLTR/run_networks.py

Line 296 in fb6203c

preds[probs < self.training_opt['open_threshold']] = -1

What's the difference and relationship between spatial attention and self attention

Since both of the spatial and self attention are used in the modulated attention, in
my understanding both of these two parts are used to capture the spatial information, so may you tell me what's the difference and relationship between spatial attention and self attention.

	for i in range(len(labels)):
	true_pos += 1 if preds[i] == labels[i] and labels[i] != -1 else 0
	false_pos += 1 if preds[i] != labels[i] and labels[i] != -1 and preds[i] != -1 else 0
	false_neg += 1 if preds[i] != labels[i] and labels[i] == -1 else 0

zhmiao / openlongtailrecognition-oltr Goto Github PK

openlongtailrecognition-oltr's People

Stargazers

Watchers

Forkers

openlongtailrecognition-oltr's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs