GithubHelp home page GithubHelp logo

vtbr's Introduction

Rethinking Person Re-Identification via Semantic-Based Pretraining

Suncheng Xiang
Shanghai Jiao Tong University


arXiv paper [arxiv.org/pdf/2110.05074][1]

Model Zoo, Usage Instructions and API docs: VTBR

VTBR uses transformers to learn visual representations from textual annotations, the overview of our framework is illustrated in Figure~\ref{fig1}. Particularly, we jointly train a CNN-based network and transformer-based network from scratch using image caption pairs for the task of image captioning. Then, we transfer the learned residual network to downstream Re-ID tasks. In general, our method seeks a common vision-language feature space with discriminative learning constraints for better practical deployment. VTBR matches or outperforms models which use ImageNet for pretraining -- both supervised or unsupervised -- despite using up to 1.4x fewer images.

VTBR-model

Get the pretrained ResNet-50 visual backbone from our best performing VirTex model in one line without any installation!

Model Preparation

import torch

# That's it, this one line only requires PyTorch.
model = torch.hub.load("kdexd/virtex", "resnet50", pretrained=True)

The pretrained models in our model zoo have changed from v1.0 onwards.

Training the base model

python scripts/pretrain_virtex.py \
    --config configs/_base_bicaptioning_R_50_L1_H1024.yaml \
    --num-gpus-per-machine 8 \
    --cpu-workers 4 \
    --serialization-dir /tmp/VIRTEX_R_50_L1_H1024
    # Default: --checkpoint-every 2000 --log-every 20

After completing the training processing, we transfer the learned model as the backbone of the downstream Re-ID task. More training procedure can be available in open-source Re-ID backbone reid-strong-baseline .

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Project(Grant No.62301315) and Startup Fund for Young Faculty at SJTU (SFYF at SJTU) under Grant No.23X010501967. If you have further questions and suggestions, please feel free to contact us ([email protected]).

If you find this code useful in your research, please consider citing:

@article{xiang2021rethinking,
  title={Rethinking Person Re-Identification via Semantic-Based Pretraining},
  author={Xiang, Suncheng and Gao, Jingsheng and Zhang, Zirui and Guan, Mengyuan and Yan, Binjie and Liu, Ting and Qian, Dahong and Fu, Yuzhuo},
  journal={arXiv preprint arXiv:2110.05074},
  year={2021}
}

vtbr's People

Contributors

jeremyxsc avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.