GithubHelp home page GithubHelp logo

yxgeee / mmt Goto Github PK

View Code? Open in Web Editor NEW
467.0 467.0 72.0 622 KB

[ICLR-2020] Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.

Home Page: https://yxgeee.github.io/projects/mmt

License: MIT License

Python 96.87% Shell 3.13%
cross-domain domain-adaptation iclr2020 image-retrieval open-set-domain-adaptation person-re-identification person-reid person-retrieval pseudo-labels unsupervised-domain-adaptation unsupervised-learning

mmt's Introduction

Hi there 👋

  • 🌱 I’m currently a Principal Researcher at Tencent ARC Lab.
  • 🔭 I’m currently working on vision and multimodal foundation models.
  • 👯 I’m looking for self-motivated interns to collaborate on related research topics.
  • 📫 Reach me at my homepage.

mmt's People

Contributors

chrizandr avatar yxgeee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmt's Issues

Question about seed

sh scripts/pretrain.sh dukemtmc msmt17 resnet50 1
sh scripts/pretrain.sh dukemtmc msmt17 resnet50 2
1 和2 代表的似乎是seed?意思是在sourcetrain的时候就生成了两个不同的模型吗,我看论文的理解是在第二步MMT中为了coteaching 才会有两个不同的网络

Pre-train Stage 1

Thanks for the amazing implementation, really inspired.

One question regarding the pre-training the model on stage1, I have tested the released models, they work amazing. However, I have trouble to reproduce the performance using the example scripts, (Duke->Market 24.3 mAP / 52.3 top-1), though I trained the model by using one single GPU, can you shed some insights on how much influence does multi-GPU make? Thanks! :)

Why use jaccard_dist to cluster?

In MMT and SSG, both use re-rank distance to cluster, both use source domain features to compute Jaccard distance.
I want to know the performance of using original distance and no source feature distance, if possible. Thanks!

Features for clustering

cf = (cf_1+cf_2)/2

Hi, is there a reason why you perform clustering on averaged features? Did you find empirically that this works better than other options like concatenation or simply taking either cf_1 or cf_2?
Thanks for your time and again thanks for sharing the code.

One question about Euclidean computation

There are two lines code in function compute_jaccard_dist
original_dist = torch.pow(target_features, 2).sum(dim=1, keepdim=True) * 2
original_dist = original_dist.expand(N, N) - 2 * torch.mm(target_features, target_features.t())
I think it should be:
original_dist = torch.pow(target_features, 2).sum(dim=1, keepdim=True)
original_dist = original_dist.expand(N, N) + original_dist.t().expand(N,N) - 2 * torch.mm(target_features, target_features.t())

Error in DBSCAN

Hello there. I was running dbscan on Sysu dataset and some of the results I got from manipulating Sysu. I am encountering some error. watch -d -n 0.5 nvidia-smi

Full call back is as such
sh scripts/train_mmt_dbscan.sh Sysu Sysuresults resnet50

Args:Namespace(alpha=0.999, arch='resnet50', batch_size=64, data_dir='/home/sagar18174/Thesis/person_RE-identification/MMT/examples/data', dataset_source='Sysu', dataset_target='Sysuresults', dropout=0.0, epochs=40, eval_step=1, features=0, height=256, init_1='logs/SysuTOSysuresults/resnet50-pretrain-1/model_best.pth.tar', init_2='logs/SysuTOSysuresults/resnet50-pretrain-2/model_best.pth.tar', iters=400, lambda_value=0.0, logs_dir='logs/SysuTOSysuresults/resnet50-MMT-DBSCAN', lr=0.00035, momentum=0.9, num_instances=4, print_freq=1, rr_gpu=False, seed=1, soft_ce_weight=0.5, soft_tri_weight=0.8, weight_decay=0.0005, width=128, workers=4)

=> Market1501 loaded
Dataset statistics:

subset | # ids | # images | # cameras

train | 500 | 14870 | 2
query | 500 | 745 | 2
gallery | 486 | 486 | 1

=> Market1501 loaded
Dataset statistics:

subset | # ids | # images | # cameras

train | 259 | 11801 | 2
query | 259 | 518 | 2
gallery | 259 | 259 | 1

/home/sagar18174/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py:26: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 0 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
=> Loaded checkpoint 'logs/SysuTOSysuresults/resnet50-pretrain-1/model_best.pth.tar'
mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048])
missing keys in state_dict: set(['module.classifier.weight'])
mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048])
missing keys in state_dict: set(['module.classifier.weight'])
=> Loaded checkpoint 'logs/SysuTOSysuresults/resnet50-pretrain-2/model_best.pth.tar'
mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048])
missing keys in state_dict: set(['module.classifier.weight'])
mismatch: module.classifier.weight torch.Size([500, 2048]) torch.Size([11801, 2048])
missing keys in state_dict: set(['module.classifier.weight'])
Extract Features: [50/185] Time 0.105 (0.196) Data 0.000 (0.011)
Extract Features: [100/185] Time 0.106 (0.148) Data 0.000 (0.006)
Extract Features: [150/185] Time 0.106 (0.133) Data 0.000 (0.004)
Extract Features: [50/185] Time 0.102 (0.115) Data 0.000 (0.011)
Extract Features: [100/185] Time 0.123 (0.111) Data 0.015 (0.006)
Extract Features: [150/185] Time 0.106 (0.109) Data 0.000 (0.005)
Computing original distance...
Computing Jaccard distance...
Time cost: 154.446565151
examples/mmt_train_dbscan.py:182: RuntimeWarning: Mean of empty slice.
eps = tri_mat[:top_num].mean()
/home/sagar18174/.local/lib/python2.7/site-packages/numpy/core/methods.py:85: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
eps for cluster: nan
Clustering and labeling...
Traceback (most recent call last):
File "examples/mmt_train_dbscan.py", line 304, in
main()
File "examples/mmt_train_dbscan.py", line 130, in main
main_worker(args)
File "examples/mmt_train_dbscan.py", line 187, in main_worker
labels = cluster.fit_predict(rerank_dist)
File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan
.py", line 354, in fit_predict
self.fit(X, sample_weight=sample_weight)
File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan_.py", line 322, in fit
**self.get_params())
File "/home/sagar18174/.local/lib/python2.7/site-packages/sklearn/cluster/dbscan_.py", line 127, in dbscan
raise ValueError("eps must be positive.")
ValueError: eps must be positive.

Any help is highly appreciated.
Sagar

about classifier weight initial

Hi, thanks for your great work, i noticed in base_train_dbscan.py, line 171
model.module.classifier.weight.data[:args.num_clusters].copy_(F.normalize(cluster_centers, dim=1).float().cuda())
what does it mean and why not directly setting classifier weight==0?
Looking for your reply.

Some questions

If I only have 2 GPUs, for example, 2 V100 32G. Can I reproduce your experiments by enlarging batch size?

Meaning of seed

Hi,

As you have mentioned in your documentation

sh scripts/pretrain.sh dukemtmc market1501 resnet50 1
sh scripts/pretrain.sh dukemtmc market1501 resnet50 2

1 and 2 are the seeds. What is the significance of seed in the code?

Waiting for reply

Thanks

About GPUs and batch size

HELLO there,you mentioned that 16 images per GPU is better,so when I only use one GPU for training, how to set batch and instance?

Something Wrong

MMT/examples/data
├── dukemtmc
│ └── DukeMTMC-reID
├── market1501
│ └── Market-1501-v15.09.15
└── msmt17
└── MSMT17_V1

should be:

MMT/data
├── dukemtmc
│ └── DukeMTMC-reID
├── market1501
│ └── Market-1501-v15.09.15
└── msmt17
└── MSMT17_V1

RuntimeError: bool value of Tensor with more than one value is ambiguous

您好,之前在其他机器上没有出现过这个问题,然后在学校的机器上Titan V 上运行发现这个错误,网上也搜索了好多信息,也没有解决这个问题,请问您可以帮助解决这个问题吗?非常感谢您,期待您的回复!!

Traceback (most recent call last):
File "examples/mmt_train_kmeans.py", line 294, in
main()
File "examples/mmt_train_kmeans.py", line 138, in main
main_worker(args)
File "examples/mmt_train_kmeans.py", line 234, in main_worker
is_best = (mAP_1 > best_mAP) or (mAP_2 > best_mAP)
RuntimeError: bool value of Tensor with more than one value is ambiguous

MSMT17_1V

Hi, I can't download the MSMT17_V1 dataset. Can you send it to me? If necessary, I can sign some agreements. Thank you!

Stage-1 pre-train with target feature

Hi,
Thanks for releasing the code.
I've a question about pre-train stage.
In PreTrainer, I cannot make sense why there is a forward for target input, since I did not find the usage of target_feature in training.

Thanks.

The performance of MSMST

Hello, I have a question about the performance of MSMT17. I find that the performance is related to the version of your code. The previous code is abbreviated as V1, and the current is abbreviated as V2. The perfrmace of V1 is the same as the performance of you paper, but the performace of V2 is worse than the performace of V1. I carefully investigate the code V1 and the code V2, bu I don't find the difference except for iters from 800 to 400. The above code is based on Kmeans method. When iters is set as 800 in V2, the performance is still worse than that of your paper.

Besides. I hace a question. That is, the time cost of V1 is much more than that of V2 for the Kmeans method. I find that this is because the clustering time except for iters and Jaccard distance. I want to know this is why?

Thank you!

Question For MSMT17

I want to use MSMT17 dataset, however, I cannot find "'list_train.txt'", 'list_val.txt', 'list_query.txt', 'list_gallery.txt'. Maybe there is something wrong about my download. How can I fix the problem?

Question For Complementarity

I have understood you set two seeds to enhance the complementarity. However, I want to know the performance when two neural networks are initialized identically. Thanks.

If the classifier C^t need to be re-set after each clustering?

Both the paper you referred ''Unsupervised Person Re-identification: Clustering and Fine-tuning'' and another unsupervised paper: Deep Cluster, they mentioned that the parameter of classifier head need to be re-set because there is no mapping between two consecutive pseudo-label assignment.

But in your paper, I can not find such operation in Algorithm 1, is it be omitted, or the proposed method do not need re-set classifier head after each clustering?

Thanks!

How to include more dataset ?

I am currently working and have got results on Sysu Dataset(through some other algorithm). What changes do I make in the code so as to run this on Sysu. I have made Sysu.py just like dukemtmc.py and placed the dataset at the same place as other dataset. I have also included it in init.py (__factory) of the datasets folder. Whenever I try running the sh scripts/pretrain.sh Sysu market1501 resnet50 1 it show the error that only defaults datasets "dukemtmc", "msmt", market1501 can be used.

Pre-training on the source domain

Hi, I have some question. In the processing of pre-training on the source domain, there are two lines codes in the PreTrainer of Trainer:
s_features, s_cls_out = self.model(s_inputs)
# target samples: only forward
t_features, _ = self.model(t_inputs)

        # backward main #

        loss_ce, loss_tr, prec1 = self._forward(s_features, s_cls_out, targets)
        loss = loss_ce + loss_tr

I think that the first line code is necessary for the overall optimization process and the three line code is not necessaty, which is not related to the overall loss. However, the third line code can boost the performance on target dataset. As you pointed out, this is only forward. I don't understand it. Can you give me help? Thank you.

an error in evaluator

你好作者 在目录mmt的feature_extraction文件夹下的cnn.py文件,定义了一个extraction_cnn_feature函数 outputs=model(inputs) outputs包含了.data属性与.values属性 我在用其他backbone替换resnet时候会在evaluate时候报错 错误就是我的outputs没有这两个属性
请问作者 outputs.data 与outputs.values 这两个属性是在哪里定义的 我没有找到
辛苦作者回答一下 十分感谢

AttributeError: 'ResNetIBN' object has no attribute 'module'

=> Loaded checkpoint 'logs/dukemtmcTOmarket1501/resnet_ibn50a-pretrain-1/model_best.pth.tar'
mismatch: classifier.weight torch.Size([702, 2048]) torch.Size([500, 2048])
missing keys in state_dict: {'classifier.weight'}
mismatch: classifier.weight torch.Size([702, 2048]) torch.Size([500, 2048])
missing keys in state_dict: {'classifier.weight'}

Traceback (most recent call last):
File "examples/mmt_train_kmeans.py", line 272, in
main()
File "examples/mmt_train_kmeans.py", line 131, in main
main_worker(args)
File "examples/mmt_train_kmeans.py", line 149, in main_worker
model_1, model_2, model_1_ema, model_2_ema = create_model(args)
File "examples/mmt_train_kmeans.py", line 107, in create_model
model_1_ema.module.classifier.weight.data.copy_(model_1.module.classifier.weight.data)
File "/root/miniconda/envs/berttext/lib/python3.6/site-packages/torch/nn/modules/module.py", line 591, in getattr
type(self).name, name))
AttributeError: 'ResNetIBN' object has no attribute 'module'

Questions on the clustering algorithm

Hi, I recently read your paper and it is an excellent work. However, I have some confusion about the clustering part. According to my knowledge, at every epoch, you re-run the clustering algorithm to obtain the cluster ID for each sample, and my question is: How to ensure that each cluster ID has been correctly assigned to a subset of samples in order to train the model in a hard manner? Suppose there are three samples (x1, x2, x3) which can be grouped into two clusters (C1, C2) after convergence, and (x1, x1) belong to C1, x3 belongs to C2 at t-th epoch, however, at (t+1)-th epoch, (x1, x2) may belong to C2 while x3 belongs to C1 when we run the clustering algorithm, it would be hard to train you model since the assignment ambiguity of the cluster ID. Can you give me some hints on it? Thanks.

Low accuracy in pretrained model on Market-1501

Hello there,

I ran the "Prepare the Pretrained Models" with backbone ResNet50 on 4GPUs with batch_size 64. All settings are unchanged and the command is "sh scripts/pretrain.sh dukemtmc market1501 resnet50 1". However, I only get 40.7%/59.5% (mAP/Rank-1) on source dataset (DukeMTMC-reID), and 19.1%/42.6% on target dataset (Market-1501). Any idea why it might be happening, since the reported performance on target data is 31.8%/61.9%.

Some phenomenons

  1. the triplet loss is about 0.5 in average when training is over
  2. the precision is about 100% when training is over

Besides, I have some questions about settings

  1. the "margin" is set 0.0 in scripts/pretrain.sh, while the previous UDA methods use a "margin" of 0.5 (Sec 3.1 in paper)
  2. "iters" is set 100 in scripts/pretrain.sh.

Thanks a lot!

No module named 'mmt'

sh scripts/pretrain.sh dukemtmc market1501 resnet50 1
info:No module named 'mmt'
thanks your code,hope your reply

The matching problem of loss function and corresponding code

There are several lines code in function SoftTripletLoss
triple_dist = torch.stack((dist_ap, dist_an), dim=1)
triple_dist = F.log_softmax(triple_dist, dim=1)
mat_dist_ref = euclidean_dist(emb2, emb2)
dist_ap_ref = torch.gather(mat_dist_ref, 1, ap_idx.view(N,1).expand(N,N))[:,0]
dist_an_ref = torch.gather(mat_dist_ref, 1, an_idx.view(N,1).expand(N,N))[:,0]
triple_dist_ref = torch.stack((dist_ap_ref, dist_an_ref), dim=1)
triple_dist_ref = F.softmax(triple_dist_ref, dim=1).detach()
oss = (- triple_dist_ref * triple_dist).mean(0).sum()
return loss
I think it should be:
triple_dist = torch.stack((dist_ap, dist_an), dim=1)
triple_dist = F.log_softmax(triple_dist, dim=1)
mat_dist_ref = euclidean_dist(emb2, emb2)
dist_ap_ref = torch.gather(mat_dist_ref, 1, ap_idx.view(N,1).expand(N,N))[:,0]
dist_an_ref = torch.gather(mat_dist_ref, 1, an_idx.view(N,1).expand(N,N))[:,0]
triple_dist_ref = torch.stack((dist_ap_ref, dist_an_ref), dim=1)
triple_dist_ref = F.softmax(triple_dist_ref, dim=1).detach()
# loss = (- triple_dist_ref * triple_dist).mean(0).sum()
loss = (- triple_dist_ref[:,0] * triple_dist[:,0]).mean()
return loss

your code is : -log{exp(F(x_i)F(x_i,p))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]} - log{exp([F(x_i)F(x_i,n))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]} , which is not consistent with the loss in your paper.
my modified code is : -log{exp(F(x_i)F(x_i,p))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]}, which is consistent with your paper. However, the performace of my modified code is worse than you original code.
I can't understand the question above.
I'm looking forward to your reply!

model.eval

你好,最近看你的代码有点疑问,我准备把它用在分类问题上,在预训练阶段,当执行评估时,为什么模型输出的结果只是 [batch_size, 2048] 的tensor,而不是像模型训练时,输出的是 [batch_size, num_class],[batch_size, 2048],我看 model.train() 和 model.eval() 的区别中没有这个呀,调试了多次了,还望解惑

About experimental result

作者您好,我复现了您的代码,在用kmeans 、resnet50、dukeTomarket的情况下mAP达到了68.7%,top1达到85.8%。但是和您论w文中的准确性mAP=71.2%, top1=87.7%还差一点。我用了一个GPU,batchsize=16。想问问您是因为batchsize的原因吗。

GPU number

Is the number of GPUs or type of GPU effective in accuracy that achieved in the paper?
If I don't have 4 GPUs and I have 1 GPU, what would happen when running the code?
Is it possible that we cannot get the results of the paper?

Regarding the dataset

I have a doubt. So the images used for training are the bounding box train image only or does it include bounding box test images too ?

question about "delete anything about model_2 and model_2_ema"

when I tried to delete anything about model_2 and model_2_ema,errors occurred:
Traceback (most recent call last):
File "examples/mmt_train_kmeans.py", line 287, in
main()
File "examples/mmt_train_kmeans.py", line 132, in main
main_worker(args)
File "examples/mmt_train_kmeans.py", line 201, in main_worker
trainer = MMTTrainer(model_1, model_1_ema, num_cluster=args.num_clusters, alpha=args.alpha)
TypeError: init() missing 2 required positional arguments: 'model_1_ema' and 'model_2_ema'

the following is MMTTrainer:
class MMTTrainer(object):
def init(self, model_1, model_1_ema, num_cluster=500, alpha=0.999):
super(MMTTrainer, self).init()
self.model_1 = model_1
#self.model_2 = model_2
self.num_cluster = num_cluster

    self.model_1_ema = model_1_ema
    #self.model_2_ema = model_2_ema

BUT,it is very strange that "MMTTrainer" do not contain the parameters" model_2_ema and model_2".

Parameters for resnet50IBNa

Hi,

Thank you for the great work!
I tried your code and things worked well, especially for ResNet-50, the result looked good. But my evaluation curve of mAP for ResNet-50IBNa, 700 saturates at ~ mAP=55, which is lower than the result for ResNet-50 and surprising since the reported results look better with ResNet-50IBNa.
For this reason, I would like to ask if the training hyper parameters are the same as in train.sh for ResNet-50IBNa, if not, would you mind to share them, please?

PS: I used 1 gpu for the moment, so num_instances=1.
Thank you again, and it is indeed a great work!
Congrats!

Low accuracy in Sysu

hello there.

It is not an issue. I am asking for suggestion. I ran the MMT on market 1501 and Sysu(Mix of RGB and IR images) dataset. I am getting very low mAP and CMCs. Individual accuracy of kmeans on Market is in high 80-90s whereas in case of Sysu it is like 0.7% for rank 5. Any idea why it might be happening ?

A question about clustering

Thank you for sharing this work!

Actually, I have a question about the clustering process in mmt_train_kmeans.py file.

dict_f, _ = extract_features(model_1_ema, cluster_loader, print_freq=50) cf_1 = torch.stack(list(dict_f.values())).numpy() dict_f, _ = extract_features(model_2_ema, cluster_loader, print_freq=50) cf_2 = torch.stack(list(dict_f.values())).numpy() cf = (cf_1+cf_2)/2

Here, I find mean-nets of model1 and model2 are used to generate features for clustering and initializing classifiers. But it seems that current models( model1 and model2) in every epoch are used instead of mean-net to compute features for clustering in the MMT paper. Does using mean-net here provide better performance? Could you give some explanations about it? Thanks!

SoftTripletLoss margin

Hi,
first of all thanks for sharing such great code.
Can I ask you the meaning of setting margin=0.0 versus margin=None in the MMTTrainer?
Inspecting the SoftTripletLoss class looks like margin=None implements equation 8 in the paper, while setting the margin to any float would boil it down to equation 6. Am I correct?
Thank you very much.

self.criterion_tri = SoftTripletLoss(margin=0.0).cuda()

self.criterion_tri_soft = SoftTripletLoss(margin=None).cuda()

A question about precision

I ran your source code (DBSCAN) directly, All hyperparameters follow the paper and Github repository, When using a Tesla V100, the highest accuracy on Duke-Market is mAP: 68.5%. When I use two 2080Ti, the accuracy reaches mAP: 73%. May I ask Why is this? Maybe BN? Maybe triplet loss? The conditions of the laboratory do not allow me to use multiple graphics cards all the time. I look forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.