zhmiao / openlongtailrecognition-oltr Goto Github PK
View Code? Open in Web Editor NEWPytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)
License: BSD 3-Clause "New" or "Revised" License
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)
License: BSD 3-Clause "New" or "Revised" License
Hi,when I read the code I find nn.Linear instead of nn. Conv2d is used to compute the spatial attention, I don't know why nn.Linear is adopted to compute the spatial attention? may you tell me the reason or the advantages of nn.Linear than nn.Conv2d. Thanks
I think the vn(meta) in equation(7) is wrong, otherwise, the networks of concept selector, Hallucinator will not be trained
i am not very sure about center loss implement. Why do we need implement attracting loss using DiscCentroidsLossFunc(torch.autograd.function)? Can we just implement as repelling loss?
Hi, both in the Places and ImageNet, the batch_size and number_samples_cls are configured as 256 and 4, respectively. Does it mean each batch only will contain 256/4=64 categories?
Could you please show us how do you determine the "Positive" & "Negative" for F-measure?
think like some other guys also concern about that :)
Thank you for releasing the code for this awesome work.
I have a question about the centroids update. I have read the code and find that centroids only calculated once at the beginning of the model initialization at stage 2. Could you help me to find out how the centroids update? And I wonder that is the centroids correct because the parameters of attention is just initialized and the features of centroids are not learned.
Why the final feature was called as Meta-embedding? what's the specifical meaning of meta
@zhmiao As for the accuracy calculation, the correct equation is accuracy=TP/(TP+FP), however, in this repository the implementation of accuracy is replaced by recall (TP/(TP+FN))
Hi, I wonder if true positive, false positive and false negative are counted correctly.
OpenLongTailRecognition-OLTR/utils.py
Lines 86 to 89 in 4a1f400
I'm confused about
Hello,
When I run python main.py --config ./config/Places_LT/stage_2_meta_embedding.py, there is an error.
File "./models/MetaEmbeddingClassifier.py", line 33, in forward
dist_cur = torch.norm(x_expand - centroids_expand, 2, 2)
RuntimeError: The size of tensor a (365) must match the size of tensor b (122) at non-singleton dimension 1
Here, I print the shape of x_expand and centroids_expand.
torch.Size([86, 365, 512])
torch.Size([86, 122, 512])
Could you give some advice to solve this problem?
Hi zhmiao,
Thanks for your code.
I am wondering if you are planning to update the code (configs) for reproducing the results on the MS1M-LT dataset. Thanks.
Hi,
Thank you again for your code release. I am puzzled by the following issues, which I'm hoping you can help me with :
-> Places-LT has 62.5K examples, differently from the reported 184.5K images in the paper. Is the mistake in the paper or in the released dataset ?
-> I am unable to reproduce the dataset statistics for ImageNet-LT and Places-LT using Zipf's law ( discrete Pareto distribution : https://en.wikipedia.org/wiki/Pareto_distribution, https://en.wikipedia.org/wiki/Zipf%27s_law ) with alpha=6 ( which seems rather high ). Moreover, the log-log plot is not completely linear in my opinion :
Hi,
Thank you for releasing the code for your paper. Can you please clarify how to reproduce the accuracies for the plain model baseline on ImageNet-LT in Table 3 of your paper ? I'm running the following commands :
-> python main.py --config ./config/ImageNet_LT/stage_1.py
-> python main.py --config ./config/ImageNet_LT/stage_1.py --test
which gives me the following output :
Evaluation_accuracy_micro_top1: 0.119
Averaged F-measure: 0.108
Many_shot_accuracy_top1: 0.148 Median_shot_accuracy_top1: 0.112 Low_shot_accuracy_top1: 0.062
From the paper, the numbers should be :
Evaluation_accuracy_micro_top1: 0.209
Many_shot_accuracy_top1: 0.409 Median_shot_accuracy_top1: 0.107 Low_shot_accuracy_top1: 0.004
Stage 1 is simply the baseline resnet-10 training on the entire dataset, right, or am I missing something ?
Thanks for sharing the code. I have one question about the squashing function + cross entropy loss. Do you have some experiments about using softmax + cross entropy loss? Or other normalization method
Hi, thanks for your work.
I am confusing about the implementation of the disccentroidloss, could you share the correct mathematical formulation of this loss function. I think the implementation is different from the paper details (equation. 9)
Thanks.
Thanks for the inspiring work and code :)
I'm having trouble to reproduce the results (plain model as well as final model on both datasets.) I have used the default settings without any alterations. Can you shed some insights on the results (perhaps this is caused by the hyper-parameters) and maybe if it is OK for you to provide the trained models for both stage1 and stage2?
The results I have reproduced are as following:
Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.204
Averaged F-measure: 0.160
Many_shot_top1: 0.405; Median_shot_top1: 0.099; Low_shot_top1: 0.006
Stage1(open-setting):
Open-set Accuracy: 0.178
Evaluation_accuracy_micro_top1: 0.199
Averaged F-measure: 0.291
Many_shot_top1: 0.396; Median_shot_top1: 0.096; Low_shot_top1: 0.006
Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.339
Averaged F-measure: 0.322
Many_shot_top1: 0.411; Median_shot_top1: 0.330; Low_shot_top1: 0.167
Stage2(open-setting):
Open-set Accuracy: 0.245
Evaluation_accuracy_micro_top1: 0.327
Averaged F-measure: 0.455
Many_shot_top1: 0.398; Median_shot_top1: 0.318; Low_shot_top1: 0.159
Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.268
Averaged F-measure: 0.248
Many_shot_top1: 0.442; Median_shot_top1: 0.221; Low_shot_top1: 0.058
Stage1(open-setting):
Open-set Accuracy: 0.018
Evaluation_accuracy_micro_top1: 0.267
Averaged F-measure: 0.373
Many_shot_top1: 0.441; Median_shot_top1: 0.219; Low_shot_top1: 0.057
Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.349
Averaged F-measure: 0.338
Many_shot_top1: 0.387; Median_shot_top1: 0.355; Low_shot_top1: 0.263
Stage2(open-setting):
Open-set Accuracy: 0.120
Evaluation_accuracy_micro_top1: 0.342
Averaged F-measure: 0.477
Many_shot_top1: 0.382; Median_shot_top1: 0.349; Low_shot_top1: 0.254
@zhmiao Could you please provide the link for downloading the ImageNet dataset?
Hi, brother, it was great to have a look at your paper recently,
what i confused is ,at the beginning of stage 2, you use average representation for each class as memory, how about just using the fc parameter trained in stage_1?(we can also use cosine classifier in stage_1) Maybe it's more conclusive? Does there some difference between the two ways?
Thanks~
@zhmiao Dear, after carefully reading the code, I found both the train and test's known type data will be used to calculate the closed-set setting and open-set setting's Many、Mediumn and Few-shot, but why the results are slightly different? e.g. the many-shot in closed-set is 44.7 while is 44.6 in open-set, I really feel confused about it. May you help to explain it, thanks.
I find some differences between ex of the equation (6) in paper and the implementation in the code, may you help to check?
def forward(self, input, *args):
norm_x = torch.norm(input, 2, 1, keepdim=True)
ex = (norm_x / (1 + norm_x)) * (input / norm_x)
ew = self.weight / torch.norm(self.weight, 2, 1, keepdim=True)
return torch.mm(self.scale * ex, ew.t())
Hello,
When I run python main.py --config ./config/ImageNet_LT/stage_1.py, there is an error.
Loading Dot Product Classifier.
Traceback (most recent call last):
File "/media/Elements/OLTR/OpenLongTailRecognition-OLTR/main.py", line 55, in
training_model = model(config, data, test=False)
File "/media/Elements/OLTR/OpenLongTailRecognition-OLTR/run_networks.py", line 26, in init
self.init_models()
File "/media/Elements/OLTR/OpenLongTailRecognition-OLTR/run_networks.py", line 69, in init_models
self.networks[key] = source_import(def_file).create_model(*model_args)
File "./models/DotProductClassifier.py", line 16, in create_model
clf = DotProduct_Classifier(num_classes, feat_dim)
File "./models/DotProductClassifier.py", line 8, in init
self.fc = nn.Linear(feat_dim, num_classes)
File "/home/.local/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 81, in init
self.reset_parameters()
File "/home/.local/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 84, in reset_parameters
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
File "/home/.local/lib/python3.5/site-packages/torch/nn/init.py", line 325, in kaiming_uniform_
std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero
Could you give some advice to solve this problem?
HI! Thanks for sharing the work! I'm wondering if you have compare to the most straightforward baseline of class balanced sampling of training data when train the model?
In my understanding, all of the modulated attention, dynamic meta-embedding and cosine classifier are not used in stage_1, so I have a question what's the main purpose of training of stage_1.py? Just in order to finetune the ResNet152?
When I use the Place dataset and implement the command “python3 main.py --config ./config/Places_LT/stage_1.py”,an error is occur, does anyone can help me? Thanks
Traceback (most recent call last):
File "main.py", line 55, in
training_model = model(config, data, test=False)
File "/home/huang/OpenLongTailRecognition-OLTR-master2/run_networks.py", line 26, in init
self.init_models()
File "/home/huang/OpenLongTailRecognition-OLTR-master2/run_networks.py", line 69, in init_models
self.networks[key] = source_import(def_file).create_model(*model_args)
File "./models/DotProductClassifier.py", line 17, in create_model
clf = DotProduct_Classifier(num_classes, feat_dim)
File "./models/DotProductClassifier.py", line 9, in init
self.fc = nn.Linear(feat_dim, num_classes)
File "/home/huang/.virtualenvs/OLTR/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 76, in init
self.weight = Parameter(torch.Tensor(out_features, in_features))
TypeError: new() received an invalid combination of arguments - got (str, bool), but expected one of:
Hello,
How long does the model need to train?
Hi! Thanks for sharing your work, I'm wondering have you tried the baseline of class balanced sampling and inverse frequecy sampling when train the model?
In ImageNet_LT, I tested stage1 with 'use_selfatt':True or False.
And training results are:
#True
Evaluation_accuracy_micro_top1: 0.219
Averaged F-measure: 0.175
Many_shot_accuracy_top1: 0.422 Median_shot_accuracy_top1: 0.114 Low_shot_accuracy_top1: 0.006
Training Complete.
Best validation accuracy is 0.219 at epoch 30
Done
ALL COMPLETED.
#Fasle
Evaluation_accuracy_micro_top1: 0.220
Averaged F-measure: 0.175
Many_shot_accuracy_top1: 0.427 Median_shot_accuracy_top1: 0.113 Low_shot_accuracy_top1: 0.007
Training Complete.
Best validation accuracy is 0.220 at epoch 30
Done
ALL COMPLETED.
It is a wonderful work! I am curious about the training time in the ImageNet. Would it be very long since it trains the model from scratch?
Except the “train”,“val”, there are also “train plain” phase, however what's the purpose of “train plain”? or what's actions are taken during "train plain"
Hi, I'm curious if you could give some insights on how to decide the value of hyper-parameter scale (currently is set to 16) in the CosNormClassifier, is it set empirically or is there a theoretical way to set it? Also the parameter scale (currently is set to 10) in meta-embedding for reach-ability.
It seems there would be some problem running the code on windows, as discussed in pytorch/pytorch#5858 (comment) . By far we do not have any solutions to this issue, and it seems to be a pytorch issue. We will test it out when we have a windows machine.
Loading dataset from: /mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/data/train_256_places365standard
{'criterions': {'PerformanceLoss': {'def_file': 'loss/SoftmaxLoss.py',
'loss_params': {},
'optim_params': None,
'weight': 1.0}},
'memory': {'centroids': False, 'init_centroids': False},
'networks': {'classifier': {'def_file': 'models/DotProductClassifier.py',
'optim_params': {'lr': 0.1,
'momentum': 0.9,
'weight_decay': 0.0005},
'params': {'dataset': 'Places_LT',
'in_dim': 512,
'num_classes': 365,
'stage1_weights': False}},
'feat_model': {'def_file': 'models/ResNet152Feature.py',
'fix': True,
'optim_params': {'lr': 0.01,
'momentum': 0.9,
'weight_decay': 0.0005},
'params': {'caffe': True,
'dataset': 'Places_LT',
'dropout': None,
'stage1_weights': False,
'use_fc': True,
'use_modulatedatt': False}}},
'training_opt': {'batch_size': 256,
'dataset': 'Places_LT',
'display_step': 10,
'feature_dim': 512,
'log_dir': './logs/Places_LT/stage1',
'num_classes': 365,
'num_epochs': 30,
'num_workers': 4,
'open_threshold': 0.1,
'sampler': None,
'scheduler_params': {'gamma': 0.1, 'step_size': 10}}}
Loading data from ./data/Places_LT/Places_LT_train.txt
Use data transformation: Compose(
RandomResizedCrop(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=PIL.Image.BILINEAR)
RandomHorizontalFlip(p=0.5)
ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0)
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
No sampler.
Shuffle is True.
Loading data from ./data/Places_LT/Places_LT_val.txt
Use data transformation: Compose(
Resize(size=256, interpolation=PIL.Image.BILINEAR)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
No sampler.
Shuffle is True.
Using 1 GPUs.
Loading Scratch ResNet 152 Feature Model.
Loading Caffe Pretrained ResNet 152 Weights.
Pretrained feature model weights path: ./logs/caffe_resnet152.pth
Freezing feature weights except for self attention weights (if exist).
Loading Dot Product Classifier.
Random initialized classifier weights.
Using steps for training.
Initializing model optimizer.
Loading Softmax Loss.
Phase: train
Traceback (most recent call last):
File "", line 1, in
runfile('/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/main.py', wdir='/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master')
File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/main.py", line 62, in
training_model.train()
File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/run_networks.py", line 212, in train
phase='train')
File "/mnt/ImageryAnalysis/t0l/OpenLongTailRecognition-OLTR-master/OpenLongTailRecognition-OLTR-master/run_networks.py", line 137, in batch_forward
self.logits, self.direct_memory_feature = self.networks['classifier'](self.features, self.centroids)
File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "models/DotProductClassifier.py", line 11, in forward
x = self.fc(x)
File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/home/t0l/anaconda3/envs/dl/lib/python3.5/site-packages/torch/nn/functional.py", line 1024, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [256 x 2048], m2: [512 x 365] at /opt/conda/conda-bld/pytorch_1533672544752/work/aten/src/THC/generic/THCTensorMathBlas.cu:249
For the datasets of ImageNet-LT and Places_LT, why only overall criterion is measured for close-set and only F-measure is measured for open-set other than these two indexes are used at the same time? Under such incomplete comparison, it is easy cause doubts that the proposed algorithm do not work well.
Seeing this line, Stage1 config and Stage2 config, I noticed that embedded gaussian non-local filtering is not trained at all as "selfatt" is not present in ModulatedAttLayer. In Stage 1 we don't use the modulated attention
In Stage 2 we use it:
and the above if condition.
Should we change "selfatt" to "modulatedatt"
Hello, reading carefully your code (for ImageNet-LT), it seems that the plain models was trained in 30 epochs while your own model in 90 epochs (30 stage1 + 60 stage2). Could you please confirm this? And moreover, was all the comparisons (focal loss, etc) performed with 30 or 90 epochs?
Thanks!
For ImageNet_LT,I just use default config in the code, but cannot reproduce the results in paper Table3(a).
Few/Low shot acc 0.7% is better with 0.4% Plain model in Table3(a) .
[Below is IMPORTANT!!!!!]
2.However for stage2, my result is(some last logs when training complete):
Epoch: [60/60] Step: 440 Minibatch_loss_feature: 0.569 Minibatch_loss_performance: 2.938 Minibatch_accuracy_micro: 0.566
Epoch: [60/60] Step: 450 Minibatch_loss_feature: 0.567 Minibatch_loss_performance: 2.845 Minibatch_accuracy_micro: 0.539
Phase: val
100%|██████████| 79/79 [01:34<00:00, 1.02it/s]
Phase: val
Evaluation_accuracy_micro_top1: 0.340
Averaged F-measure: 0.324
Many_shot_accuracy_top1: 0.401 Median_shot_accuracy_top1: 0.334 Low_shot_accuracy_top1: 0.197
Training Complete.
Best validation accuracy is 0.341 at epoch 48
However Many, Median and Few/Low shot acc are 40.1%, 33.4% and 19.7%, which are a little diff with 43.2%, 35.1% and 18.5% in "Ours" model in Table3(a) .
And I retrained for several times, the Many-shot acc always some lower than 43.2%.
Are there any tricks not released?
I'm wondering if you have tried stronger model on imagenet, as resNet10's performance is comparitively low than other larger models like resnet50?
Hi,
Thank you for releasing the code for your paper. I'm a little confused when I look at this line of code.
https://github.com/zhmiao/OpenLongTailRecognition-OLTR/models/CosNormClassifier.py#L24.
It seems to be different from what is described in the paper. Could you help me with this problem?
Thank you very much.
Thanks for the awesome work.
But I do not find the related codes to update the centroids.
In line 45 of file run_networks.py
:
if self.memory['init_centroids']:
self.criterions['FeatureLoss'].centroids.data = self.centroids_cal(self.data['train_plain'])
these codes are utilized for centroids initialization.
In the paper, Section 3.1, Para Learning Visual Memory M, the centroids are updated in two steps.
Could you kindly give me more hints about how to realize the second step, which is the propagation step by alternatively updating the direct feature and the centroids.
Thanks a lot.
Here, I am curious about that the weights of Backbone Net used by the compared methods are also frozen when training the datasets?If so,I think it is not fair to compare their performances since their performances does not exhaust if their weights are frozen
How to understand the metric learning mentioned in the paper?Specifically, whats' metric learned in this paper
https://github.com/zhmiao/OpenLongTailRecognition-OLTR/blob/master/utils.py#L88
Seems it should be changed from
false_pos += 1 if preds[i] != labels[i] and labels[i] != -1 and preds[i] != -1 else 0
to :
false_pos += 1 if preds[i] != labels[i] and ((labels[i] != -1 and preds[i] != -1) or label[i] == -1) else 0
Hi, where did you get the Caffe pretrained ResNet152 as well as the ResNet-10 and ResNet-50 mentioned in the paper?
Are they from Kaiming He's repository, and how did you convert them to pytorch format?
Page of benchmark and model zoo cannot be opened, pls check it.
@liuziwei7 @zhmiao thanks for your amazing work, from the CAUTION
The current code was prepared using single GPU. The use of multi-GPU can cause problems.
and the error is:
File "./models/MetaEmbeddingClassifier.py", line 48, in forward
memory_feature = torch.matmul(values_memory, keys_memory)
RuntimeError: size mismatch, m1: [16 x 7], m2: [4 x 512] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:268
# for 2 GPUs:
torch.Size([16, 7])
torch.Size([3, 512])
torch.Size([16, 7])
torch.Size([4, 512])
# for 1 GPU:
torch.Size([32, 7])
torch.Size([7, 512])
Is there any idea to support Multiple-GPU?
Thanks for sharing your work. May I ask how to calculate openset accuracy? Because, before calculating openset accuracy, it has
probs, preds = F.softmax(self.total_logits.detach(), dim=1).max(dim=1)
so that open_threshold doesn't work.
OpenLongTailRecognition-OLTR/run_networks.py
Line 296 in fb6203c
Since both of the spatial and self attention are used in the modulated attention, in
my understanding both of these two parts are used to capture the spatial information, so may you tell me what's the difference and relationship between spatial attention and self attention.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.