yxgeee / mmt Goto Github PK

[ICLR-2020] Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.

Home Page: https://yxgeee.github.io/projects/mmt

License: MIT License

Python 96.87% Shell 3.13%

cross-domain domain-adaptation iclr2020 image-retrieval open-set-domain-adaptation person-re-identification person-reid person-retrieval pseudo-labels unsupervised-domain-adaptation unsupervised-learning

mmt's Introduction

Hi there 👋

🌱 I’m currently a Principal Researcher at Tencent ARC Lab.
🔭 I’m currently working on vision and multimodal foundation models.
👯 I’m looking for self-motivated interns to collaborate on related research topics.
📫 Reach me at my homepage.

mmt's People

Contributors

Stargazers

Watchers

mmt's Issues

About sampler

Thank you for your great work. I'm very interested in your paper.

When trying your code, I found RandomMultipleGallerySampler sometimes gives an ID which contains only 1 instance in a batch.
This is because 1 instance pushed in 'ret' at line 74 was remained even when continued at https://github.com/yxgeee/MMT/blob/master/mmt/utils/data/sampler.py#L93.

Would you check and fix it? Thank you.

Question about seed

sh scripts/pretrain.sh dukemtmc msmt17 resnet50 1
sh scripts/pretrain.sh dukemtmc msmt17 resnet50 2
1 和2 代表的似乎是seed？意思是在sourcetrain的时候就生成了两个不同的模型吗，我看论文的理解是在第二步MMT中为了coteaching 才会有两个不同的网络

Pre-train Stage 1

Thanks for the amazing implementation, really inspired.

One question regarding the pre-training the model on stage1, I have tested the released models, they work amazing. However, I have trouble to reproduce the performance using the example scripts, (Duke->Market 24.3 mAP / 52.3 top-1), though I trained the model by using one single GPU, can you shed some insights on how much influence does multi-GPU make? Thanks! :)

Why use jaccard_dist to cluster?

In MMT and SSG, both use re-rank distance to cluster, both use source domain features to compute Jaccard distance.
I want to know the performance of using original distance and no source feature distance, if possible. Thanks!

Features for clustering

MMT/examples/mmt_train_kmeans.py

Line 156 in aeb5470

cf = (cf_1+cf_2)/2

Hi, is there a reason why you perform clustering on averaged features? Did you find empirically that this works better than other options like concatenation or simply taking either cf_1 or cf_2?
Thanks for your time and again thanks for sharing the code.

请问是否可以公布weights文件，想基于本工作在模型推理加速方面进行探索

我想基于本工作在端侧模型推理加速方面进行探索，但是目前在复现论文精度方面有困难
所以想请问是否可以公布weights文件？

One question about Euclidean computation

There are two lines code in function compute_jaccard_dist
original_dist = torch.pow(target_features, 2).sum(dim=1, keepdim=True) * 2
original_dist = original_dist.expand(N, N) - 2 * torch.mm(target_features, target_features.t())
I think it should be:
original_dist = torch.pow(target_features, 2).sum(dim=1, keepdim=True)
original_dist = original_dist.expand(N, N) + original_dist.t().expand(N,N) - 2 * torch.mm(target_features, target_features.t())

Error in DBSCAN

Hello there. I was running dbscan on Sysu dataset and some of the results I got from manipulating Sysu. I am encountering some error. watch -d -n 0.5 nvidia-smi

Full call back is as such
sh scripts/train_mmt_dbscan.sh Sysu Sysuresults resnet50

Args:Namespace(alpha=0.999, arch='resnet50', batch_size=64, data_dir='/home/sagar18174/Thesis/person_RE-identification/MMT/examples/data', dataset_source='Sysu', dataset_target='Sysuresults', dropout=0.0, epochs=40, eval_step=1, features=0, height=256, init_1='logs/SysuTOSysuresults/resnet50-pretrain-1/model_best.pth.tar', init_2='logs/SysuTOSysuresults/resnet50-pretrain-2/model_best.pth.tar', iters=400, lambda_value=0.0, logs_dir='logs/SysuTOSysuresults/resnet50-MMT-DBSCAN', lr=0.00035, momentum=0.9, num_instances=4, print_freq=1, rr_gpu=False, seed=1, soft_ce_weight=0.5, soft_tri_weight=0.8, weight_decay=0.0005, width=128, workers=4)

=> Market1501 loaded
Dataset statistics:

subset | # ids | # images | # cameras

train | 500 | 14870 | 2
query | 500 | 745 | 2
gallery | 486 | 486 | 1

=> Market1501 loaded
Dataset statistics:

subset | # ids | # images | # cameras

train | 259 | 11801 | 2
query | 259 | 518 | 2
gallery | 259 | 259 | 1

/home/sagar18174/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py:26: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 0 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
=> Loaded checkpoint 'logs/SysuTOSysuresults/resnet50-pretrain-1/model_best.pth.tar'
mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048])
missing keys in state_dict: set(['module.classifier.weight'])
mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048])
missing keys in state_dict: set(['module.classifier.weight'])
=> Loaded checkpoint 'logs/SysuTOSysuresults/resnet50-pretrain-2/model_best.pth.tar'
mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048])
missing keys in state_dict: set(['module.classifier.weight'])
mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048])
missing keys in state_dict: set(['module.classifier.weight'])
Extract Features: [50/185] Time 0.105 (0.196) Data 0.000 (0.011)
Extract Features: [100/185] Time 0.106 (0.148) Data 0.000 (0.006)
Extract Features: [150/185] Time 0.106 (0.133) Data 0.000 (0.004)
Extract Features: [50/185] Time 0.102 (0.115) Data 0.000 (0.011)
Extract Features: [100/185] Time 0.123 (0.111) Data 0.015 (0.006)
Extract Features: [150/185] Time 0.106 (0.109) Data 0.000 (0.005)
Computing original distance...
Computing Jaccard distance...
Time cost: 154.446565151
examples/mmt_train_dbscan.py:182: RuntimeWarning: Mean of empty slice.
eps = tri_mat[:top_num].mean()
/home/sagar18174/.local/lib/python2.7/site-packages/numpy/core/methods.py:85: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
eps for cluster: nan
Clustering and labeling...
Traceback (most recent call last):
File "examples/mmt_train_dbscan.py", line 304, in
main()
File "examples/mmt_train_dbscan.py", line 130, in main
main_worker(args)
File "examples/mmt_train_dbscan.py", line 187, in main_worker
labels = cluster.fit_predict(rerank_dist)
File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan.py", line 354, in fit_predict
self.fit(X, sample_weight=sample_weight)
File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan_.py", line 322, in fit
**self.get_params())
File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan_.py", line 127, in dbscan
raise ValueError("eps must be positive.")
ValueError: eps must be positive.

Any help is highly appreciated.
Sagar

about classifier weight initial

Hi, thanks for your great work, i noticed in base_train_dbscan.py, line 171
model.module.classifier.weight.data[:args.num_clusters].copy_(F.normalize(cluster_centers, dim=1).float().cuda())
what does it mean and why not directly setting classifier weight==0?
Looking for your reply.

Some questions

If I only have 2 GPUs, for example, 2 V100 32G. Can I reproduce your experiments by enlarging batch size?

Meaning of seed

Hi,

As you have mentioned in your documentation

sh scripts/pretrain.sh dukemtmc market1501 resnet50 1
sh scripts/pretrain.sh dukemtmc market1501 resnet50 2

1 and 2 are the seeds. What is the significance of seed in the code?

Waiting for reply

Thanks

About GPUs and batch size

HELLO there，you mentioned that 16 images per GPU is better，so when I only use one GPU for training, how to set batch and instance?

Something Wrong

MMT/examples/data
├── dukemtmc
│ └── DukeMTMC-reID
├── market1501
│ └── Market-1501-v15.09.15
└── msmt17
└── MSMT17_V1

should be:

MMT/data
├── dukemtmc
│ └── DukeMTMC-reID
├── market1501
│ └── Market-1501-v15.09.15
└── msmt17
└── MSMT17_V1

RuntimeError: bool value of Tensor with more than one value is ambiguous

您好，之前在其他机器上没有出现过这个问题，然后在学校的机器上Titan V 上运行发现这个错误，网上也搜索了好多信息，也没有解决这个问题，请问您可以帮助解决这个问题吗？非常感谢您，期待您的回复！！

Traceback (most recent call last):
File "examples/mmt_train_kmeans.py", line 294, in
main()
File "examples/mmt_train_kmeans.py", line 138, in main
main_worker(args)
File "examples/mmt_train_kmeans.py", line 234, in main_worker
is_best = (mAP_1 > best_mAP) or (mAP_2 > best_mAP)
RuntimeError: bool value of Tensor with more than one value is ambiguous

MSMT17_1V

Hi, I can't download the MSMT17_V1 dataset. Can you send it to me? If necessary, I can sign some agreements. Thank you!

Stage-1 pre-train with target feature

Hi,
Thanks for releasing the code.
I've a question about pre-train stage.
In PreTrainer, I cannot make sense why there is a forward for target input, since I did not find the usage of target_feature in training.

Thanks.

The performance of MSMST

Hello, I have a question about the performance of MSMT17. I find that the performance is related to the version of your code. The previous code is abbreviated as V1, and the current is abbreviated as V2. The perfrmace of V1 is the same as the performance of you paper, but the performace of V2 is worse than the performace of V1. I carefully investigate the code V1 and the code V2, bu I don't find the difference except for iters from 800 to 400. The above code is based on Kmeans method. When iters is set as 800 in V2, the performance is still worse than that of your paper.

Besides. I hace a question. That is, the time cost of V1 is much more than that of V2 for the Kmeans method. I find that this is because the clustering time except for iters and Jaccard distance. I want to know this is why?

Thank you!

Question For MSMT17

I want to use MSMT17 dataset, however, I cannot find "'list_train.txt'", 'list_val.txt', 'list_query.txt', 'list_gallery.txt'. Maybe there is something wrong about my download. How can I fix the problem?

Question For Complementarity

I have understood you set two seeds to enhance the complementarity. However, I want to know the performance when two neural networks are initialized identically. Thanks.

If the classifier C^t need to be re-set after each clustering?

Both the paper you referred ''Unsupervised Person Re-identification: Clustering and Fine-tuning'' and another unsupervised paper: Deep Cluster, they mentioned that the parameter of classifier head need to be re-set because there is no mapping between two consecutive pseudo-label assignment.

But in your paper, I can not find such operation in Algorithm 1, is it be omitted, or the proposed method do not need re-set classifier head after each clustering?

Thanks!

How to include more dataset ?

I am currently working and have got results on Sysu Dataset(through some other algorithm). What changes do I make in the code so as to run this on Sysu. I have made Sysu.py just like dukemtmc.py and placed the dataset at the same place as other dataset. I have also included it in init.py (__factory) of the datasets folder. Whenever I try running the sh scripts/pretrain.sh Sysu market1501 resnet50 1 it show the error that only defaults datasets "dukemtmc", "msmt", market1501 can be used.

Pre-training on the source domain

Hi, I have some question. In the processing of pre-training on the source domain, there are two lines codes in the PreTrainer of Trainer:
s_features, s_cls_out = self.model(s_inputs)
# target samples: only forward
t_features, _ = self.model(t_inputs)

        # backward main #

        loss_ce, loss_tr, prec1 = self._forward(s_features, s_cls_out, targets)
        loss = loss_ce + loss_tr

I think that the first line code is necessary for the overall optimization process and the three line code is not necessaty, which is not related to the overall loss. However, the third line code can boost the performance on target dataset. As you pointed out, this is only forward. I don't understand it. Can you give me help? Thank you.

a question on the vesion of MSMT17 dataset

Hi,
Thx for the good code.
I have a question，does the performance of the MSMT17_v1 dataset differ from that of the MSMT17_v2 dataset?

an error in evaluator

你好作者在目录mmt的feature_extraction文件夹下的cnn.py文件,定义了一个extraction_cnn_feature函数 outputs＝model（inputs） outputs包含了.data属性与.values属性我在用其他backbone替换resnet时候会在evaluate时候报错错误就是我的outputs没有这两个属性
请问作者 outputs.data 与outputs.values 这两个属性是在哪里定义的我没有找到
辛苦作者回答一下十分感谢

Question about test feature

Hello, I have a small question: why to test with a feature after the BN layer and not before it?

AttributeError: 'ResNetIBN' object has no attribute 'module'

=> Loaded checkpoint 'logs/dukemtmcTOmarket1501/resnet_ibn50a-pretrain-1/model_best.pth.tar'
mismatch: classifier.weight torch.Size([702, 2048]) torch.Size([500, 2048])
missing keys in state_dict: {'classifier.weight'}
mismatch: classifier.weight torch.Size([702, 2048]) torch.Size([500, 2048])
missing keys in state_dict: {'classifier.weight'}

Traceback (most recent call last):
File "examples/mmt_train_kmeans.py", line 272, in
main()
File "examples/mmt_train_kmeans.py", line 131, in main
main_worker(args)
File "examples/mmt_train_kmeans.py", line 149, in main_worker
model_1, model_2, model_1_ema, model_2_ema = create_model(args)
File "examples/mmt_train_kmeans.py", line 107, in create_model
model_1_ema.module.classifier.weight.data.copy_(model_1.module.classifier.weight.data)
File "/root/miniconda/envs/berttext/lib/python3.6/site-packages/torch/nn/modules/module.py", line 591, in getattr
type(self).name, name))
AttributeError: 'ResNetIBN' object has no attribute 'module'

Questions on the clustering algorithm

Hi, I recently read your paper and it is an excellent work. However, I have some confusion about the clustering part. According to my knowledge, at every epoch, you re-run the clustering algorithm to obtain the cluster ID for each sample, and my question is: How to ensure that each cluster ID has been correctly assigned to a subset of samples in order to train the model in a hard manner? Suppose there are three samples (x1, x2, x3) which can be grouped into two clusters (C1, C2) after convergence, and (x1, x1) belong to C1, x3 belongs to C2 at t-th epoch, however, at (t+1)-th epoch, (x1, x2) may belong to C2 while x3 belongs to C1 when we run the clustering algorithm, it would be hard to train you model since the assignment ambiguity of the cluster ID. Can you give me some hints on it? Thanks.

Low accuracy in pretrained model on Market-1501

Hello there,

I ran the "Prepare the Pretrained Models" with backbone ResNet50 on 4GPUs with batch_size 64. All settings are unchanged and the command is "sh scripts/pretrain.sh dukemtmc market1501 resnet50 1". However, I only get 40.7%/59.5% (mAP/Rank-1) on source dataset (DukeMTMC-reID), and 19.1%/42.6% on target dataset (Market-1501). Any idea why it might be happening, since the reported performance on target data is 31.8%/61.9%.

Some phenomenons

the triplet loss is about 0.5 in average when training is over
the precision is about 100% when training is over

Besides, I have some questions about settings

the "margin" is set 0.0 in scripts/pretrain.sh, while the previous UDA methods use a "margin" of 0.5 (Sec 3.1 in paper)
"iters" is set 100 in scripts/pretrain.sh.

Thanks a lot!

No module named 'mmt'

sh scripts/pretrain.sh dukemtmc market1501 resnet50 1
info:No module named 'mmt'
thanks your code,hope your reply

the link of ImageNet pre-trained model failure

The matching problem of loss function and corresponding code

There are several lines code in function SoftTripletLoss
triple_dist = torch.stack((dist_ap, dist_an), dim=1)
triple_dist = F.log_softmax(triple_dist, dim=1)
mat_dist_ref = euclidean_dist(emb2, emb2)
dist_ap_ref = torch.gather(mat_dist_ref, 1, ap_idx.view(N,1).expand(N,N))[:,0]
dist_an_ref = torch.gather(mat_dist_ref, 1, an_idx.view(N,1).expand(N,N))[:,0]
triple_dist_ref = torch.stack((dist_ap_ref, dist_an_ref), dim=1)
triple_dist_ref = F.softmax(triple_dist_ref, dim=1).detach()
oss = (- triple_dist_ref * triple_dist).mean(0).sum()
return loss
I think it should be:
triple_dist = torch.stack((dist_ap, dist_an), dim=1)
triple_dist = F.log_softmax(triple_dist, dim=1)
mat_dist_ref = euclidean_dist(emb2, emb2)
dist_ap_ref = torch.gather(mat_dist_ref, 1, ap_idx.view(N,1).expand(N,N))[:,0]
dist_an_ref = torch.gather(mat_dist_ref, 1, an_idx.view(N,1).expand(N,N))[:,0]
triple_dist_ref = torch.stack((dist_ap_ref, dist_an_ref), dim=1)
triple_dist_ref = F.softmax(triple_dist_ref, dim=1).detach()
# loss = (- triple_dist_ref * triple_dist).mean(0).sum()
loss = (- triple_dist_ref[:,0] * triple_dist[:,0]).mean()
return loss

your code is : -log{exp(F(x_i)F(x_i,p))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]} - log{exp([F(x_i)F(x_i,n))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]} , which is not consistent with the loss in your paper.
my modified code is : -log{exp(F(x_i)F(x_i,p))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]}, which is consistent with your paper. However, the performace of my modified code is worse than you original code.
I can't understand the question above.
I'm looking forward to your reply!

model.eval

你好，最近看你的代码有点疑问，我准备把它用在分类问题上，在预训练阶段，当执行评估时，为什么模型输出的结果只是 [batch_size, 2048] 的tensor，而不是像模型训练时，输出的是 [batch_size, num_class],[batch_size, 2048]，我看 model.train() 和 model.eval() 的区别中没有这个呀，调试了多次了，还望解惑

some

why linear classifier with random initialization (normal distribution) work so well?

About experimental result

作者您好，我复现了您的代码，在用kmeans 、resnet50、dukeTomarket的情况下mAP达到了68.7%，top1达到85.8%。但是和您论w文中的准确性mAP=71.2%, top1=87.7%还差一点。我用了一个GPU，batchsize=16。想问问您是因为batchsize的原因吗。

GPU number

Is the number of GPUs or type of GPU effective in accuracy that achieved in the paper?
If I don't have 4 GPUs and I have 1 GPU, what would happen when running the code?
Is it possible that we cannot get the results of the paper?

Regarding the dataset

I have a doubt. So the images used for training are the bounding box train image only or does it include bounding box test images too ?

question about "delete anything about model_2 and model_2_ema"

when I tried to delete anything about model_2 and model_2_ema,errors occurred：
Traceback (most recent call last):
File "examples/mmt_train_kmeans.py", line 287, in
main()
File "examples/mmt_train_kmeans.py", line 132, in main
main_worker(args)
File "examples/mmt_train_kmeans.py", line 201, in main_worker
trainer = MMTTrainer(model_1, model_1_ema, num_cluster=args.num_clusters, alpha=args.alpha)
TypeError: init() missing 2 required positional arguments: 'model_1_ema' and 'model_2_ema'

the following is MMTTrainer:
class MMTTrainer(object):
def init(self, model_1, model_1_ema, num_cluster=500, alpha=0.999):
super(MMTTrainer, self).init()
self.model_1 = model_1
#self.model_2 = model_2
self.num_cluster = num_cluster

    self.model_1_ema = model_1_ema
    #self.model_2_ema = model_2_ema

BUT,it is very strange that "MMTTrainer" do not contain the parameters" model_2_ema and model_2".

Parameters for resnet50IBNa

Hi,

Thank you for the great work!
I tried your code and things worked well, especially for ResNet-50, the result looked good. But my evaluation curve of mAP for ResNet-50IBNa, 700 saturates at ~ mAP=55, which is lower than the result for ResNet-50 and surprising since the reported results look better with ResNet-50IBNa.
For this reason, I would like to ask if the training hyper parameters are the same as in train.sh for ResNet-50IBNa, if not, would you mind to share them, please?

PS: I used 1 gpu for the moment, so num_instances=1.
Thank you again, and it is indeed a great work!
Congrats!

Low accuracy in Sysu

hello there.

It is not an issue. I am asking for suggestion. I ran the MMT on market 1501 and Sysu(Mix of RGB and IR images) dataset. I am getting very low mAP and CMCs. Individual accuracy of kmeans on Market is in high 80-90s whereas in case of Sysu it is like 0.7% for rank 5. Any idea why it might be happening ?

A question about clustering

Thank you for sharing this work!

Actually, I have a question about the clustering process in mmt_train_kmeans.py file.

dict_f, _ = extract_features(model_1_ema, cluster_loader, print_freq=50) cf_1 = torch.stack(list(dict_f.values())).numpy() dict_f, _ = extract_features(model_2_ema, cluster_loader, print_freq=50) cf_2 = torch.stack(list(dict_f.values())).numpy() cf = (cf_1+cf_2)/2

Here, I find mean-nets of model1 and model2 are used to generate features for clustering and initializing classifiers. But it seems that current models( model1 and model2) in every epoch are used instead of mean-net to compute features for clustering in the MMT paper. Does using mean-net here provide better performance? Could you give some explanations about it? Thanks!

SoftTripletLoss margin

Hi,
first of all thanks for sharing such great code.
Can I ask you the meaning of setting margin=0.0 versus margin=None in the MMTTrainer?
Inspecting the SoftTripletLoss class looks like margin=None implements equation 8 in the paper, while setting the margin to any float would boil it down to equation 6. Am I correct?
Thank you very much.

MMT/mmt/trainers.py

Line 172 in aeb5470

self.criterion_tri = SoftTripletLoss(margin=0.0).cuda()

MMT/mmt/trainers.py

Line 173 in aeb5470

self.criterion_tri_soft = SoftTripletLoss(margin=None).cuda()

A question about precision

I ran your source code (DBSCAN) directly, All hyperparameters follow the paper and Github repository, When using a Tesla V100, the highest accuracy on Duke-Market is mAP: 68.5%. When I use two 2080Ti, the accuracy reaches mAP: 73%. May I ask Why is this? Maybe BN? Maybe triplet loss? The conditions of the laboratory do not allow me to use multiple graphics cards all the time. I look forward to your reply.

yxgeee / mmt Goto Github PK

mmt's Introduction

Hi there 👋

mmt's People

Contributors

Stargazers

Watchers

Forkers

mmt's Issues

Full call back is as such sh scripts/train_mmt_dbscan.sh Sysu Sysuresults resnet50

=> Market1501 loaded Dataset statistics:

subset | # ids | # images | # cameras

train | 500 | 14870 | 2 query | 500 | 745 | 2 gallery | 486 | 486 | 1

=> Market1501 loaded Dataset statistics:

subset | # ids | # images | # cameras

train | 259 | 11801 | 2 query | 259 | 518 | 2 gallery | 259 | 259 | 1

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Full call back is as such
sh scripts/train_mmt_dbscan.sh Sysu Sysuresults resnet50

=> Market1501 loaded
Dataset statistics:

train | 500 | 14870 | 2
query | 500 | 745 | 2
gallery | 486 | 486 | 1

=> Market1501 loaded
Dataset statistics:

train | 259 | 11801 | 2
query | 259 | 518 | 2
gallery | 259 | 259 | 1