GithubHelp home page GithubHelp logo

u2mot's Introduction

u2mot

This repo is the official implementation of Uncertainty-aware Unsupervised Multi-Object Tracking

Abstract

Without manually annotated identities, unsupervised multi-object trackers are inferior to learning reliable feature embeddings. It causes the similarity-based inter-frame association stage also be error-prone, where an uncertainty problem arises. The frame-by-frame accumulated uncertainty prevents trackers from learning the consistent feature embedding against time variation. To avoid this uncertainty problem, recent self-supervised techniques are adopted, whereas they failed to capture temporal relations. The inter-frame uncertainty still exists. In fact, this paper argues that though the uncertainty problem is inevitable, it is possible to leverage the uncertainty itself to improve the learned consistency in turn. Specifically, an uncertainty-based metric is developed to verify and rectify the risky associations. The resulting accurate pseudo-tracklets boost learning the feature consistency. And accurate tracklets can incorporate temporal information into spatial transformation. This paper proposes a tracklet-guided augmentation strategy to simulate the tracklet’s motion, which adopts a hierarchical uncertainty-based sampling mechanism for hard sample mining. The ultimate unsupervised MOT framework, namely U2MOT, is proven effective on MOT-Challenges and VisDrone-MOT benchmark. U2MOT achieves a SOTA performance among the published supervised and unsupervised trackers.

Installation

Step1. Install u2mot (verified with PyTorch=1.8.1).

git clone https://github.com/alibaba/u2mot.git
cd u2mot
python -m pip install -r requirements.txt
python setup.py develop

Step2. Install pycocotools.

python -m pip install cython
python -m pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Step3. Others

python -m pip install cython_bbox

Data preparation

Download MOT17, MOT20, CrowdHuman, Cityperson, ETHZ, VisDrone-MOT, BDD100K-MOT(optional)and put them under <u2mot_HOME>/datasets in the following structure:

datasets
   |——————MOT17/images
   |        └——————train
   |        └——————test
   └——————MOT20/images
   |        └——————train
   |        └——————test
   └——————CrowdHuman
   |         └——————images
   |         └——————annotation_train.odgt
   |         └——————annotation_val.odgt
   └——————Cityscapes
   |        └——————images
   |        └——————labels_with_ids
   └——————ETHZ
   |        └——————eth01
   |        └——————...
   |        └——————eth07
   └——————VisDrone-MOT
   |        └——————VisDrone2019-MOT-train
   |        └——————VisDrone2019-MOT-val
   |        └——————VisDrone2019-MOT-test-dev
   └——————BDD100K-MOT (optional)
            └——————images
            └——————labels

Then, you need to turn the datasets to COCO format, and the results will be saved in datasets/<dataset>/annotations:

python tools/data/convert_mot17_to_coco.py
python tools/data/convert_mot20_to_coco.py
python tools/data/convert_crowdhuman_to_coco.py
python tools/data/convert_cityperson_to_coco.py
python tools/data/convert_ethz_to_coco.py
python tools/data/convert_visdrone_to_coco.py
python tools/data/convert_bdd100k_to_coco.py

Before mixing different datasets, you need to follow the operations in tools/data/mix_data_xxx.py to create data folders and soft-links. Finally, you can mix the training data for MOT-Challenge benckmarks (no extra training data is needed for VisDrone-MOT and BDD100K-MOT benchmark):

python tools/data/mix_data_test_mot17.py
python tools/data/mix_data_test_mot20.py

Model zoo

The pre-trained weights are provided blow:

Benchmark Split Weights HOTA MOTA IDF1 IDs
MOT17 test link 64.5 80.9 78.8 1590
MOT20 test link 62.7 76.7 76.5 1552
VisDrone test link - 55.9 70.0 1282
BDD100K val link 58.7 62.9 68.9 16191
BDD100K test link 58.3 63.0 69.2 29985

Since we adopt several tracking tricks from BoT-SORT, some of the results are slightly different from the performance reported in the paper.

For higher results, you may carefully tune the tracking parameters of each sequence, including detection score threshold, matching threshold, etc.

Training

The COCO pretrained YOLOX model can be downloaded from their model zoo. After downloading the pretrained models, you can put them under <u2mot_HOME>/pretrained.

  • Train ablation model (MOT17 half train)
python tools/train.py -f exps/example/u2mot/yolox_x_ablation_u2mot17.py -d 2 -b 12 --fp16 -o -c pretrained/yolox_x.pth.tar
  • Train MOT17 test model (MOT17 train, CrowdHuman, Cityperson and ETHZ)
python tools/train.py -f exps/example/u2mot/yolox_x_mix_u2mot17.py -d 2 -b 12 --fp16 -o -c pretrained/yolox_x.pth.tar

For MOT20, VisDrone-MOT, and BDD100K-MOT, you need to clip the bounding boxes inside the image. Specifically, set _clip=True for yolox/data/data_augment.py:line177, yolox/data/datasets/mosaicdetection.py:line86/358, and yolox/utils/boxes.py:line149.

  • Train MOT20 test model (MOT20 train, CrowdHuman)
python tools/train.py -f exps/example/u2mot/yolox_x_mix_u2mot20.py -d 2 -b 12 --fp16 -o -c pretrained/yolox_x.pth.tar
  • Train VisDrone-MOT test model
python tools/train.py -f exps/example/u2mot/yolox_x_u2mot_visdrone.py -d 2 -b 12 --fp16 -o -c pretrained/yolox_x.pth.tar
  • Train BDD100K-MOT test model
python tools/train.py -f exps/example/u2mot/yolox_x_u2mot_bdd100k.py -d 2 -b 12 --fp16 -o -c pretrained/yolox_x.pth.tar
  • Train with frozen detector

To alleviate the optimization problem on detection head and reid head, you can train the detector (following ByteTrack) first, and then train the reid head only:

python tools/train.py -f exps/example/u2mot/your_exp_file.py -d 2 -b 12 --fp16 -o -c pretrained/your_detector.pth.tar --freeze
  • Train on custom dataset

First, you need to prepare your dataset in COCO format. You can refer to MOT-to-COCO or VisDrone-to-COCO. Second, you need to create a Exp file for your dataset. You can refer to the MOT17 training Exp file. Don't forget to modify get_data_loader() and get_eval_loader() in your Exp file. Third, modify the configuration at img_path2seq(), check_period(), and get_frame_cnt() functions to parse your image path and video info in yolox/data/datasets/mot.py. Finally, you can train u2mot on your dataset by running:

python tools/train.py -f exps/example/u2mot/your_exp_file.py -d 2 -b 12 --fp16 -o -c pretrained/yolox_x.pth.tar

Tracking

  • Evaluation on MOT17 half val

Performance on MOT17 half val is evaluated with the official TrackEval (the configured code has been provided at <u2mot_HOME>/TrackEval).

First, run u2mot to get the tracking results, which will be saved in pretrained/u2mot/track_results:

python tools/track.py datasets/MOT17/images/train --benchmark MOT17-val -f exps/example/u2mot/yolox_x_ablation_u2mot17.py -c pretrained/u2mot/ablation.pth.tar --device 0 --fp16 --fuse

To leverage UTL in inference stage, just add the --use-uncertainty flag:

python tools/track.py datasets/MOT17/images/train --benchmark MOT17-val -f exps/example/u2mot/yolox_x_ablation_u2mot17.py -c pretrained/u2mot/ablation.pth.tar --device 0 --fp16 --fuse --use-uncertainty

Then, run TrackEval to evaluate the tracking performance:

cd ./TrackEval
python scripts/run_mot_challenge.py --BENCHMARK MOT17 --SPLIT_TO_EVAL train --TRACKERS_TO_EVAL ../../../../../YOLOX_outputs/yolox_x_ablation_u2mot17 --TRACKER_SUB_FOLDER track_res --OUTPUT_SUB_FOLDER track_eval --METRICS HOTA CLEAR Identity --USE_PARALLEL False --NUM_PARALLEL_CORES 1 --SEQMAP_FILE data/gt/mot_challenge/seqmaps/MOT17-train.txt --GT_LOC_FORMAT '{gt_folder}/{seq}/gt/gt.half_val.txt' --PRINT_CONFIG False --PRINT_ONLY_COMBINED True --DISPLAY_LESS_PROGRESS True
  • Test on MOT17

Run u2mot, and the results will be saved in YOLOX_outputs/yolox_x_mix_u2mot17/track_res:

python tools/track.py datasets/MOT17/images/test --benchmark MOT17 -f exps/example/u2mot/yolox_x_mix_u2mot17.py -c pretrained/u2mot/mot17.pth.tar --device 0 --fp16 --fuse --cmc-method file --cmc-file-dir MOTChallenge
python tools/interpolation.py --txt_path YOLOX_outputs/yolox_x_mix_u2mot17/track_res --save_path YOLOX_outputs/yolox_x_mix_u2mot17/track_res_dti

Submit the txt files under track_res_dti to MOTChallenge website for evaluation.

  • Test on MOT20

We use the input size 1600 x 896 for MOT20-04, MOT20-07 and 1920 x 736 for MOT20-06, MOT20-08.

Run u2mot:

python tools/track.py datasets/MOT20/images/test --benchmark MOT20 -f exps/example/u2mot/yolox_x_mix_u2mot20.py -c pretrained/u2mot/mot20.pth.tar --device 0 --fp16 --fuse --cmc-method file --cmc-file-dir MOTChallenge
python tools/interpolation.py  --txt_path YOLOX_outputs/yolox_x_mix_u2mot20/track_res --save_path YOLOX_outputs/yolox_x_mix_u2mot20/track_res_dti

Submit the txt files under track_res_dti to MOTChallenge website for evaluation.

  • Test on VisDrone-MOT

We use the input size 1600 x 896 for VisDrone-MOT benchmark. Following TrackFormer the performance is evaluated with the default motmetrics.

Run u2mot, and the results will be saved at YOLOX_outputs/yolox_x_u2mot_visdrone/track_res:

python tools/track.py datasets/VisDrone-MOT/VisDrone2019-MOT-test-dev/sequences --benchmark VisDrone -f exps/example/u2mot/yolox_x_u2mot_visdrone.py -c pretrained/u2mot/visdrone.pth.tar --device 0 --fp16 --fuse --cmc-method file --cmc-file-dir VisDrone/test-dev

Evaluate the results:

python tools/utils/eval_visdrone.py

You will get the performance on MOTA, IDF1 and ID switch.

  • Test on BDD100K-MOT

Run u2mot, and the results will be saved at YOLOX_outputs/yolox_x_u2mot_bdd100k/track_res:

python tools/track.py datasets/BDD100K-MOT/images/test --benchmark BDD100K -f exps/example/u2mot/yolox_x_u2mot_bdd100k.py -c pretrained/u2mot/bdd100k.pth.tar --device 0 --fp16 --fuse --cmc-method file --cmc-file-dir BDD100K/{split}

Then, convert the result files into json format, zip those json files, and submit to the official website EvalAI to get the traking performance:

python tools/utils/convert_bdd.py
cd YOLOX_outputs/yolox_x_u2mot_bdd100k
zip -r -q bdd100k_pred.zip ./track_res_json

Be careful to evaluate on val and test splits separately.

Citation

@inproceedings{liu2023u2mot,
  title={Uncertainty-aware Unsupervised Multi-Object Tracking},
  author={Liu, Kai and Jin, Sheng and Fu, Zhihang and Chen, Ze and Jiang, Rongxin and Ye, Jieping},
  booktitle={International Journal of Computer Vision},
  year={2023}
}

Acknowledgement

A large part of the code is borrowed from YOLOX, FairMOT, ByteTrack, ByteTrack_ReID, and BoT-SORT. Many thanks for their wonderful works.

u2mot's People

Contributors

alibaba-oss avatar kail8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

u2mot's Issues

How to train on unlabeled data?

How can we train on unlabeled custom data? Your instructions mentioned that we have to convert the dataset to COCO annotations, but I don't have track ID labels for my task; I have the bounding boxes from Yolo.

Please let me know. Thanks.

eval result on visdrone test set when using VisDrone2018-MOT-toolkit

Hello, I really like your paper, but I have a question. Regarding the validation script on the VisDronedataset, are you using the py-motmetrics library for validation instead of the official VisDrone matlab toolkil?(https://github.com/VisDrone/VisDrone2018-MOT-toolkit) I tested your model and obtained an MOTA of 55.9, but when using the MATLAB library, the MOTA is only 50.2. It seems that there may be some differences in the validation process between the two libraries, and I haven't found the reason.

May I ask where is the appendix of the paper?

Many thanks for your outstanding work.

Could you open your appendix? Particularly I wonder how the reid head is designed.

Also, it seems that I cannot find the code of reid and trajectory-guided augmentation..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.