GithubHelp home page GithubHelp logo

zyk100 / llcm Goto Github PK

View Code? Open in Web Editor NEW
92.0 4.0 11.0 5.11 MB

[CVPR 2023] Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification

Home Page: https://github.com/ZYK100/LLCM/blob/main/Agreement/LLCM%20DATASET%20RELEASE%20AGREEMENT.pdf

Python 99.77% Shell 0.23%
cross-modality cvpr2023 dataset low-light llcm vireid person-re-identification t-sne visible-infrared

llcm's Introduction

Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification

Authors: Yukang Zhang, Hanzi Wang*

Paper (CVPR 2023).

Abstract

For the visible-infrared person re-identification (VIReID) task, one of the major challenges is the modality gaps between visible (VIS) and infrared (IR) images. However, the training samples are usually limited, while the modality gaps are too large, which leads that the existing methods cannot effectively mine diverse cross-modality clues. To handle this limitation, we propose a novel augmentation network in the embedding space, called diverse embedding expansion network (DEEN). The proposed DEEN can effectively generate diverse embeddings to learn the informative feature representations and reduce the modality discrepancy between the VIS and IR images. Moreover, the VIReID model may be seriously affected by drastic illumination changes, while all the existing VIReID datasets are captured under sufficient illumination without significant light changes. Thus, we provide a low-light cross-modality (LLCM) dataset, which contains 46,767 bounding boxes of 1,064 identities captured by 9 RGB/IR cameras. Extensive experiments on the SYSU-MM01, RegDB and LLCM datasets show the superiority of the proposed DEEN over several other state-of-the-art methods. image

Dataset download

Please send a signed dataset release agreement copy to [email protected]. If your application is passed, we will send the download link of the dataset.

image

Results

We have made some updates to the results in the our paper on the LLCM dataset. Please cite the results in the table below.

Methods Rank@1 Rank@10 Rank@20 mAP Rank@1 Rank@10 Rank@20 mAP
DDAG 42.36% 72.69% 80.63% 48.97% 51.42 % 81.45% 88.26% 38.77%
CMAlign 42.76% 77.40% 86.11% 50.95% 54.78% 85.12% 91.63% 40.81%
AGW 49.13% 79.06% 85.89% 55.80% 63.72% 88.66% 92.83% 47.21%
CAJ 49.86% 78.91% 85.83% 56.40% 63.73% 87.95% 92.41% 47.71%
MMN 50.14% 79.81% 87.27% 56.66% 63.97% 88.66% 93.05% 48.47%
MRCN 51.32% 80.10% 87.17% 57.74% 65.27% 88.11% 93.13% 49.45%
DART 52.97% 80.82% 87.05% 59.28% 65.33% 89.42% 93.33% 51.13%
DEEN (ours) 55.52% 83.88% 89.98% 62.07% 69.21% 90.95% 95.07% 55.52%

The results may have some fluctuation due to random spliting and it might be better by finetuning the hyper-parameters.

Visualization

tsne

Citation

If you use the dataset, please cite the following paper:

  @InProceedings{Zhang_2023_CVPR,
    author    = {Zhang, Yukang and Wang, Hanzi},
    title     = {Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-Identification},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {2153-2162}
}

Contact

If you have any question, please feel free to contact us. E-mail: [email protected]

llcm's People

Contributors

zyk100 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

llcm's Issues

llcm数据集

您好,请问我跑了您的deen网络中的llcm数据集的test另外一个模式结果和训练的相差有点大,训练的准确率是正常的,但是test有点不一样。

Question about testing

Hi, thanks for your great work.
I would like to know in VIReID, which of the following is correct setting?

  1. Train a SINGLE model to deal with 2 testing model: VIS->IR and IR->VIS. or
  2. For each mode, we could train a model respectively. That is, a model can only handle IR->VIS model and another model can only handle VIS->IR mode.
    Thanks for your response.

softmax.yml?

FileNotFoundError: [Errno 2] No such file or directory: 'configs/softmax.yml'
softmax.yml这个文件在哪

Exp on RegDB

Hi, thanks for your work and codes.

I'm conducting experiments on RegDB, and I have some questions:

  • On RegDB, I need to train the model 10 times with trails 1~10 ?
  • To reproduce the reported results of your paper, do I need to tune any hyper-parameters for RegDB ?

Thank you~

dependencies required by the project

Would it be possible for you to provide a requirements.txt file so that I can easily install the dependencies required by the project? Your assistance in this matter would be greatly appreciated.

DEE_module代码

class DEE_module(nn.Module):
def init(self, channel, reduction=16):
super(DEE_module, self).init()

    self.FC11 = nn.Conv2d(channel, channel // 4, kernel_size=3, stride=1, padding=1, bias=False, dilation=1)
    self.FC11.apply(weights_init_kaiming)
    self.FC12 = nn.Conv2d(channel, channel // 4, kernel_size=3, stride=1, padding=2, bias=False, dilation=2)
    self.FC12.apply(weights_init_kaiming)
    self.FC13 = nn.Conv2d(channel, channel // 4, kernel_size=3, stride=1, padding=3, bias=False, dilation=3)
    self.FC13.apply(weights_init_kaiming)
    self.FC1 = nn.Conv2d(channel // 4, channel, kernel_size=1)
    self.FC1.apply(weights_init_kaiming)

    self.FC21 = nn.Conv2d(channel, channel // 4, kernel_size=3, stride=1, padding=1, bias=False, dilation=1)
    self.FC21.apply(weights_init_kaiming)
    self.FC22 = nn.Conv2d(channel, channel // 4, kernel_size=3, stride=1, padding=2, bias=False, dilation=2)
    self.FC22.apply(weights_init_kaiming)
    self.FC23 = nn.Conv2d(channel, channel // 4, kernel_size=3, stride=1, padding=3, bias=False, dilation=3)
    self.FC23.apply(weights_init_kaiming)
    self.FC2 = nn.Conv2d(channel // 4, channel, kernel_size=1)
    self.FC2.apply(weights_init_kaiming)

    self.dropout = nn.Dropout(p=0.01)

def forward(self, x):
    x1 = (self.FC11(x) + self.FC12(x) + self.FC13(x)) / 3
    x1 = self.FC1(F.relu(x1))
    x2 = (self.FC21(x) + self.FC22(x) + self.FC23(x)) / 3
    x2 = self.FC2(F.relu(x2))
    out = torch.cat((x, x1, x2), 0)
    out = self.dropout(out)
    return out

请问,代码中这里是不是只算两个分支呀?
论文中,说明了Three branches

关于实验结果

您好,请问为什么您的代码在SYSU-MM01数据集上的效果在我本地结果为71.08% 与您论文中差了3个点
不知道是不是因为还需要参数的设置之类的。
还有您的代码中img_w和img_h分别设置为了144和384,能解释一下为什么这么设置吗(因为我本地是设置为192和384,不知道是不是因为这个原因导致结果相差比较多)

972953399e4acf7c7b1250e6bc4822f

A minor mistake of codes

Hi, thanks for your great contribution~
Maybe there is a minor mistake of the training code.

Maybe we should change:

LLCM/DEEN/train.py

Lines 274 to 275 in cd72993

labs = torch.cat((label1, label1, label2, label2), 0)
labels = torch.cat((label1, label1, label1, label2, label2, label2), 0)

to

labs = torch.cat((label1, label2, label1, label2), 0)
labels = torch.cat((label1, label2, label1, label2, label1, label2), 0)

This doesn't influence the training process, because label1 == label2.
However, the original codes are logically wrong.

你好关于tsne可视化,提供的代码 这块的代码不对吧?

for batch_idx, (input1, input2, label1, label2) in enumerate(trainloader):
labels = torch.cat((label1, label2), 0)
z1 = torch.ones(label1.shape)
z2 = torch.zeros(label2.shape)
z = torch.cat((z1, z2), 0)
print(batch_idx)
input1 = Variable(input1.cuda())
input2 = Variable(input2.cuda())

      a = labels.unique()
      for i in range(len(a)):
        for j in range(len(labels)):
          if labels[j] == a[i]:
              labels[j] = i
      #print(labels)
      data_time.update(time.time() - end)
      out = torch.cat((input1, input2), 0)

      tsne = manifold.TSNE(n_components=2, init='pca', random_state=0)
      X_tsne = tsne.fit_transform(out.detach().cpu().numpy())
      plot_embedding(X_tsne, labels, z)
      plt.savefig(osp.join('save_tsne', 'tsne_{}.jpg'.format(batch_idx)))

image size is not correct?

Does the dataset need to be pre-processed when the image size is not correct?

Original Traceback (most recent call last):
  File "/home/zhaoxin/anaconda3/envs/python39/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/zhaoxin/anaconda3/envs/python39/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zhaoxin/anaconda3/envs/python39/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zhaoxin/projects/LLCM-main/DEEN/data_loader.py", line 28, in __getitem__
    img1 = self.transform(img1)
  File "/home/zhaoxin/anaconda3/envs/python39/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 95, in __call__
    img = t(img)
  File "/home/zhaoxin/anaconda3/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhaoxin/anaconda3/envs/python39/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 688, in forward
    i, j, h, w = self.get_params(img, self.size)
  File "/home/zhaoxin/anaconda3/envs/python39/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 647, in get_params
    raise ValueError(f"Required crop size {(th, tw)} is larger than input image size {(h, w)}")
ValueError: Required crop size (384, 144) is larger than input image size (276, 148)

regdb test result

Based on the author's provided train_regdb.bash script, I trained the model and used the following command for testing:

python test.py --tvsearch True --gpu 0 --dataset regdb

However, the results significantly differ from those reported in the author's paper. Can you explain why this discrepancy might occur?

extract.py

非常感谢您的工作,请问在使用已经训练好的权重执行extract.py文件时,出现了以下错误,但是在执行test.py时并没出现,执行的命令是python extract.py --dataset sysu --method agw --gpu 0;权重文件是sysu_agw_p4_n8_lr_0.1_seed_0_best.t
01

数据集

LLCM/test_nir/cam3里的数据集为空的?

关于extract.py这个函数的使用

您好,非常感谢您的工作,在使用您提供的Visualization功能时,我在运行extract.py这个函数时,报了错误,其实就是被关掉的第18个问题。
提问者给我提供了一种解决方法,但我觉得会带来一些影响,而您说的“把DEEN所输出的三个分支的特征进行torch.cat()”的操作,是需要怎么改代码呢,我不太明白,希望能够得到您的帮助,感谢~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.