GithubHelp home page GithubHelp logo

bingfengyan / co-mot Goto Github PK

View Code? Open in Web Editor NEW
63.0 1.0 4.0 9.65 MB

CO-MOT: Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking

Python 90.77% Shell 1.41% C++ 0.71% Cuda 7.12%

co-mot's Introduction

CO-MOT: Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking

![arXiv] PWC PWC PWC

This repository is an official implementation of CO-MOT.

TO DO

  1. add DINO backbone

Introduction

Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking.

Abstract. Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods. One potential reason is its label assignment strategy during training that consistently binds the tracked objects with tracking queries and then assigns the few newborns to detection queries. With one-to-one bipartite matching, such an assignment will yield unbalanced training, i.e., scarce positive samples for detection queries, especially for an enclosed scene, as the majority of the newborns come on stage at the beginning of videos. Thus, e2e-MOT will be easier to yield a tracking terminal without renewal or re-initialization, compared to other tracking-by-detection methods. To alleviate this problem, we present Co-MOT, a simple and effective method to facilitate e2e-MOT by a novel coopetition label assignment with a shadow concept. Specifically, we add tracked objects to the matching targets for detection queries when performing the label assignment for training the intermediate decoders. For query initialization, we expand each query by a set of shadow counterparts with limited disturbance to itself. With extensive ablations, Co-MOT achieves superior performance without extra costs, e.g., 69.4% HOTA on DanceTrack and 52.8% TETA on BDD100K. Impressively, Co-MOT only requires 38% FLOPs of MOTRv2 to attain a similar performance, resulting in the 1.4× faster inference speed.

News

  • 2023.7.25 Release weight of BDD100K, MOT17
  • 2023.6.28 Using our method, we achieved \bref{second} place in CVSports during CVPR2023, HOTA(69.54) SoccerNet.
  • 2023.5.31 our code is merge into detrex.
  • 2023.5.24 We release a our code and paper

Main Results

DanceTrack

HOTA DetA AssA MOTA IDF1 URL
69.9 82.1 58.9 91.2 71.9 model

BDD100K

TETA LocA AssocA ClsA URL
52.8 38.7 56.2 63.6 model

MOT17

HOTA DetA AssA MOTA IDF1 URL
60.1 59.5 60.6 72.6 72.7 model

Installation

The codebase is built on top of Deformable DETR and MOTR.

Requirements

  • Install pytorch using conda (optional)

    conda create -n comot python=3.7
    conda activate comot
    conda install pytorch=1.8.1 torchvision=0.9.1 cudatoolkit=10.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt
  • Build MultiScaleDeformableAttention

    cd ./models/ops
    sh ./make.sh

Usage

Dataset preparation

  1. Please download DanceTrack and CrowdHuman and unzip them as follows:
/data/Dataset/mot
├── crowdhuman
│   ├── annotation_train.odgt
│   ├── annotation_trainval.odgt
│   ├── annotation_val.odgt
│   └── Images
├── DanceTrack
│   ├── test
│   ├── train
│   └── val

You may use the following command for generating crowdhuman trainval annotation:

cat annotation_train.odgt annotation_val.odgt > annotation_trainval.odgt

Training

You may download the coco pretrained weight from Deformable DETR (+ iterative bounding box refinement), and modify the --pretrained argument to the path of the weight. Then training MOTR on 8 GPUs as following:

./tools/train.sh configs/motrv2ch_uni5cost3ggoon.args

Inference on DanceTrack Test Set

# run a simple inference on our pretrained weights
./tools/simple_inference.sh configs/motrv2ch_uni5cost3ggoon.args ./motrv2_dancetrack.pth

# Or evaluate an experiment run
# ./tools/eval.sh exps/motrv2/run1

# then zip the results
zip motrv2.zip tracker/ -r

Acknowledgements

co-mot's People

Contributors

fengxiuyaun avatar

Stargazers

 avatar Grant.H avatar Zhou Tianhao avatar  avatar Son N. Nguyen avatar  avatar trabish avatar fff_ avatar Ikuma Uchida avatar  avatar Zhou avatar zuochongyan avatar New Farmer avatar FDS avatar  avatar ball avatar PengWang avatar  avatar  avatar zy avatar  avatar Oren WANG avatar Lau Van Kiet avatar HengLiu avatar Big-faced Cat avatar Keyu Li avatar 明引 avatar yates sun avatar 醉舟 avatar  avatar weiWang avatar  avatar xzh_23 avatar Yun Du avatar Knight avatar Jeff Carpenter avatar Sandalots avatar 爱可可-爱生活 avatar amir avatar Vladimir Somers avatar config avatar Saeed Masoomi avatar Sławomir Paszko avatar Zimeng Fang avatar  avatar  avatar  avatar  avatar bolo avatar  avatar LeonV avatar  avatar Ruopeng Gao avatar  avatar  avatar Mingzhan Yang avatar Licong Guan avatar ifling avatar Ren Tianhe avatar tuofeilun avatar Weixin Luo avatar  avatar Liang Xiao avatar

Watchers

 avatar

co-mot's Issues

Question about the training.

Hi, Yan, thanks for your great effort to make your work public. I have a question after runing your script to train the CO-MOT model on dancetrack. In the motr_co.py, because 5 images act as one unit in the loop, the flag is_last is always True, so that the module GQIM is awalys not included in the training stage. At the same time, from the matching strategy, the training seems to be the task of detection rather than tracking. I wonder if my understanding is right? ( If right, then how to train the tracking task?)
Thank you.

How to get the summarized HOTA?

Hi, Yan, when the evaluation is called during training process, there is an error

    return float(res_eval[0]['MotChallenge2DBox']['']['COMBINED_SEQ']['pedestrian']['summaries'][0]['HOTA'])
KeyError: 'summaries'

Is there any modification to the TrackEval files when calculating the final HOTA?

About the baseline in Ablation Table 2(a).

Hi, your method has really impressive results in extensive benchmarks. I've read your paper and have some questions about the baseline you mentioned in Table 2(a) row (a).

I evaluated MOTR's final checkpoint on DanceTrack val set, which got about ~52.0 HOTA. But your baseline has reached 63.8 HOTA, and this was a huge improvement. I wonder to know that what technique improvements you have made over MOTR for your baseline?
As far as I know, there are the following points:

  1. You have changed the number of queries to 80.
  2. Add CrowdHuman for joint training.
  3. Change the training scheduler.
  4. Use the anchor to generate the position embedding as MOTRv2 does.

Besides the above, is there anything else I did not mention but crucial for the final performance?

Looking forward to hearing from you, and thanks anyway.

About COLA

Thank you for your excellent work.

However, I am confused about which part of code using COLA for Decoder, cause my interpretation of the code suggests a uniform application across layers of Decoder. I apologize if I have overlooked any details in the documentation or code comments that might clarify this aspect. If possible, could you kindly provide further insight into where COLA is integrated within the decoder's architecture?

sample_length between images (e.g. CrowdHuman) and video sequences (e.g. DanceTrack)

你好,感谢开源这份工作!
使用过程中,我这边有很多图片数据,但是没什么视频数据,因此我用 add_crowd 的方法像添加 CrowdHuman 数据一样将我的图片数据添加进去,我将图片采样数设置为 2 节约训练时间,视频采样数设置为 10 以保持多样性。但是推理的时候发现有超级多的重框现象(一个人身上出现了很多 box 坐标与得分几乎一模一样但 id 不同的框),请问作者有遇到过类似的现象吗?可能是哪里出了问题呢?

Major changes to the code

Thank you for your excellent work! I wonder if the biggest change (i.e. COLA) in your project code from MOTR is in the def match_for_single_frame(self, outputs: dict) method? Have you made any changes to deformable_transformer_plus.py? I would appreciate it if you could answer!

计算资源

尊敬的作者您好:

我在您的论文中看到您使用了8块V100-16G GPU,请问您在MOT17数据集上的训练时间大致是多少呢?

感谢!!

Occupancy of the model

Hi,
I would like to know what is the memory occupancy of the model once it is instantiated. I know that the weights file of the model trained on BDD100K is 303MB.
Thank you really much in advance.

BDD100K release

Hi,
I am an Italian student at the University of Salerno who is doing a thesis in autonomous vehicle driving concerning transformer-based tracking techniques. While studying the state-of-the-art I came across your paper. I would be very grateful if it would be possible to have your model trained on BDD100K. How soon will it be published?
Thank you in advance.

NotImplementedError: invalid shape: torch.Size([8, 256])

Hi,
I'm just trying load your model with the pretrained weights provided on BDD100K. I have modified dataset_to_num_classes in motr_co.py in this way:
dataset_to_num_classes = { 'coco': 91, 'coco_panoptic': 250, 'e2e_mot': 1, 'e2e_bdd': 11, 'e2e_tao': 2000, 'e2e_bddcc': 100, 'e2e_dance': 1, 'e2e_joint': 1, 'e2e_static_mot': 1, 'e2e_all': 91, 'bdd100k_mot': 8 }

I have modified configs/motrv2ch_uni5cost3ggoon.args in this way:
--meta_arch motr_unincost --dataset_file bdd100k_mot --epoch 20 --with_box_refine --lr_drop 8 --lr 2e-4 --lr_backbone 2e-5 --pretrained /content/drive/MyDrive/CO-MOT-main/comot_bdd100k.pth --batch_size 1 --sample_mode random_interval --sample_interval 10 --sampler_lengths 5 --merger_dropout 0 --dropout 0 --random_drop 0.1 --fp_ratio 0.3 --query_interaction_layer GQIM --num_queries 60 --append_crowd --use_checkpoint --mot_path /content/drive/MyDrive/waymo_open_dataset/coco_waymo/validation --match_type gmatch --g_size 3 --output_dir /content/drive/MyDrive/waymo_open_dataset/coco_waymo/validation/CO-MOT

However, when i run:
/content/drive/MyDrive/CO-MOT-main/tools/simple_inference.sh /content/drive/MyDrive/CO-MOT-main/configs/motrv2ch_uni5cost3ggoon.args --resume /content/drive/MyDrive/CO-MOT-main/comot_bdd100k.pth

Where simple_inference.sh is the following:
`#!/usr/bin/env bash

------------------------------------------------------------------------

Copyright (c) 2022 megvii-research. All Rights Reserved.

------------------------------------------------------------------------

set -x
set -o pipefail

args=$(cat configs/motrv2.args)

args=$(cat $1)
python3 /content/drive/MyDrive/CO-MOT-main/submit_dance.py ${args} --exp_name tracker --resume /content/drive/MyDrive/CO-MOT-main/comot_bdd100k.pth

`

I obtain the following error:
+ python3 /content/drive/MyDrive/CO-MOT-main/submit_dance.py --meta_arch motr_unincost --dataset_file bdd100k_mot --epoch 20 --with_box_refine --lr_drop 8 --lr 2e-4 --lr_backbone 2e-5 --pretrained /content/drive/MyDrive/CO-MOT-main/comot_bdd100k.pth --batch_size 1 --sample_mode random_interval --sample_interval 10 --sampler_lengths 5 --merger_dropout 0 --dropout 0 --random_drop 0.1 --fp_ratio 0.3 --query_interaction_layer GQIM --num_queries 60 --append_crowd --use_checkpoint --mot_path /content/drive/MyDrive/waymo_open_dataset/coco_waymo/validation --match_type gmatch --g_size 3 --output_dir /content/drive/MyDrive/waymo_open_dataset/coco_waymo/validation/CO-MOT --exp_name tracker --resume /content/drive/MyDrive/CO-MOT-main/comot_bdd100k.pth Namespace(lr=0.0002, lr_backbone_names=['backbone.0'], lr_backbone=2e-05, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_linear_proj_mult=0.1, batch_size=1, weight_decay=0.0001, epochs=20, lr_drop=8, save_period=50, lr_drop_epochs=None, clip_max_norm=0.1, meta_arch='motr_unincost', sgd=False, with_box_refine=True, two_stage=False, accurate_ratio=False, dn_labelbook_size=91, dec_pred_class_embed_share=True, dec_pred_bbox_embed_share=True, fix_refpoints_hw=-1, two_stage_class_embed_share=False, two_stage_bbox_embed_share=False, use_dn=True, dn_number=100, dn_box_noise_scale=0.4, dn_label_noise_ratio=0.5, num_select=300, nms_iou_threshold=-1, frozen_weights=None, backbone='resnet50', enable_fpn=False, dilation=False, position_embedding='sine', position_embedding_scale=6.283185307179586, num_feature_levels=4, pe_temperatureH=20, pe_temperatureW=20, return_interm_indices=[0, 1, 2, 3], backbone_freeze_keywords=None, trans_mode='DeformableTransformer', enc_layers=6, dec_layers=6, dim_feedforward=1024, hidden_dim=256, dropout=0.0, nheads=8, num_queries=60, dec_n_points=4, enc_n_points=4, decoder_cross_self=False, sigmoid_attn=False, crop=False, cj=False, extra_track_attn=False, loss_normalizer=False, max_size=1333, val_width=800, filter_ignore=False, append_crowd=True, decoder_layer_noise=False, dln_xy_noise=0.2, dln_hw_noise=0.2, use_detached_boxes_dec_out=False, unic_layers=0, pre_norm=False, query_dim=4, transformer_activation='relu', num_patterns=0, use_deformable_box_attn=False, box_attn_type='roi_align', add_channel_attention=False, add_pos_value=False, random_refpoints_xy=False, two_stage_type='standard', two_stage_pat_embed=0, two_stage_add_query_num=0, two_stage_learn_wh=False, two_stage_keep_all_tokens=False, dec_layer_number=None, decoder_sa_type='sa', decoder_module_seq=['sa', 'ca', 'ffn'], embed_init_tgt=True, no_interm_box_loss=False, interm_loss_coef=1.0, masks=False, aux_loss=True, match_type='gmatch', mix_match=False, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, match_unstable_error=True, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='bdd100k_mot', gt_file_train=None, gt_file_val=None, coco_path='/data/workspace/detectron2/datasets/coco/', coco_panoptic_path=None, remove_difficult=False, output_dir='/content/drive/MyDrive/waymo_open_dataset/coco_waymo/validation/CO-MOT', device='cuda', seed=42, resume='/content/drive/MyDrive/CO-MOT-main/comot_bdd100k.pth', start_epoch=0, eval=False, vis=False, num_workers=2, pretrained='/content/drive/MyDrive/CO-MOT-main/comot_bdd100k.pth', cache_mode=False, mot_path='/content/drive/MyDrive/waymo_open_dataset/coco_waymo/validation', det_db='', input_video='figs/demo.mp4', data_txt_path_train='./datasets/data_path/detmot17.train', data_txt_path_val='./datasets/data_path/detmot17.train', img_path='data/valid/JPEGImages/', query_interaction_layer='GQIM', sample_mode='random_interval', sample_interval=10, random_drop=0.1, fp_ratio=0.3, merger_dropout=0.0, update_query_pos=False, sampler_steps=None, sampler_lengths=[5], exp_name='tracker', memory_bank_score_thresh=0.0, memory_bank_len=4, memory_bank_type=None, memory_bank_with_self_attn=False, use_checkpoint=True, query_denoise=0.0, g_size=3, score_threshold=0.5, update_score_threshold=0.5, miss_tolerance=20, not_valid=True) /usr/local/lib/python3.10/dist-packages/torchvision/__init__.py <module 'torchvision' from '/usr/local/lib/python3.10/dist-packages/torchvision/__init__.py'> /usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or Nonefor 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=None. warnings.warn(msg) Training with Self-Cross Attention. loaded /content/drive/MyDrive/CO-MOT-main/comot_bdd100k.pth Skip loading parameter class_embed.0.weight, required shapetorch.Size([8, 256]), loaded shapetorch.Size([100, 256]). If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. load class_embed: class_embed.0.weight shape=torch.Size([100, 256]) Traceback (most recent call last): File "/content/drive/MyDrive/CO-MOT-main/submit_dance.py", line 318, in <module> detr = load_model(detr, args.resume) File "/content/drive/MyDrive/CO-MOT-main/util/tool.py", line 50, in load_model raise NotImplementedError('invalid shape: {}'.format(model_state_dict[k].shape)) NotImplementedError: invalid shape: torch.Size([8, 256])

Can you please help me?

Pretrained weights / Batch_size

Hello,

Thank you very much for your work !

I had some questions :

  1. Are those trained using the pretrained weights of Deformable DETR (and so with Coco) or are these using only a pretrain of ResNet (and so with ImageNet only ?). It seems I don't get the same results of HOTA and of TETA for BDD100K in my setup.

  2. Is the code working with a batch_size > 1, like with Deformable DETR ?

Thanks again for your time !

Testing on Waymo

Hi, how can I make inference with CO-MOT on the Waymo dataset? In the README.md file it only refers to DanceTrack

difference of MOTRv3 ?

hi, bingfeng! thanks for your great work.
After reading your paper, label assignment of the motivation is consistent with MOTRv3. And I know any differences in implementation details in MOTRv3, which leading performance gap?

Out of memory when training dancetrack

Thanks for your amazing work!
I was training dancetrack with titan-24G but met an out-of-memory error. In the paper, you use A100-16G to train co-mot.
Could you give me some suggestions?

A question about training process.

Thank you for your excellent work!
I have a question about the training process in figure 3.
Why the matched IDs of different queries keeps changing in different layers?
Thank you in advance.

模型效果

尊敬的作者您好:

我最近学习了您的工作,特别钦佩您的独到见解。但是在实测时,我遇到了这样的问题(类似MOT17的场景但是相机有比较剧烈的无规则运动)——视频开始时框很多,但是到后面越来越少(甚至有一帧中几十个目标都很显著,但是只有一个框的情况),MOTA非常低。但是用同样的设置在MOTR上训练、测试,效果要比您的模型好很多(MOTA和IDF1都高很多),想请教您这有可能是什么原因呢?理论上看您的改进相较于MOTR性能提升应该是很显著的,并且在MOT17数据集上您的算法也比MOTR好很多,这个问题困扰了我很长时间。期待您的回复!

衷心感谢您的帮助!!

Question about CUDA version

Hi, I see in your README.md/Requirements that cudatoolkit=10.2 is needed. However, I have CUDA 12.0 and when I try to build MultiScaleDeformableAttention i get the following error "RuntimeError: The detected CUDA version (12.0) mismatches the version that was used to compile PyTorch (10.2). Please make sure to use the same CUDA versions.".
Is there any way to solve this? Do I need to downgrade my local CUDA to version 10.2 or can I do something else?
Thank you very much in advance.

requirements.txt

Thank you for sharing this excellent work! I'm eager to explore more.
Can you please share your requirements.txt for the environment setting? I can't find it in the repo.
Thanks for your time!

训练时长问题

大佬您好,
我最近在复现您的这篇文章,我使用您提供的配置文件在8张T4上训练dancetrack,每个epoch时长竟然需要2天左右,这是正常的吗?
配置文件如下:
--meta_arch motr_unincost
--dataset_file e2e_dance
--epoch 20
--with_box_refine
--lr_drop 8
--lr 2e-4
--lr_backbone 2e-5
--pretrained /pretrained/r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint.pth
--batch_size 1
--sample_mode random_interval
--sample_interval 10
--sampler_lengths 5
--merger_dropout 0
--dropout 0
--random_drop 0.1
--fp_ratio 0.3
--query_interaction_layer GQIM
--num_queries 60
--append_crowd
--use_checkpoint
--mot_path /code/CO-MOT/data
--match_type gmatch
--g_size 3

部分训练log如下:
截屏2024-04-07 21 01 11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.